您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Listen to the God of stocks say that buying gold will never lose money? Then I will use Python to predict the trend of gold price.

編輯：Python

Read gold ETF data

This paper uses machine learning method to predict the price of gold, one of the most important precious metals . We will create a linear regression model , The model from the past gold ETF (GLD) Get information from the price , And return to the next day's gold ETF Price forecast .GLD It is the largest direct investment in physical gold ETF.

The first thing to do is ： Import all necessary Libraries .

# LinearRegression  Is a machine learning library for linear regression
from sklearn.linear_model import LinearRegression
# pandas  and  numpy  For data manipulation
import pandas as pd
import numpy as np
# matplotlib  and  seaborn  Used to draw graphics
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-darkgrid')
# yahoo Finance For retrieving data
import yfinance as yf

then , We read the past 12 Annual daily gold ETF Price data and store it in Df in . We delete irrelevant columns and use dropna() Function delete NaN value . then , We draw gold ETF Closing price .

Df = yf.download('GLD', '2008-01-01', '2020-6-22', auto_adjust=True)
Df = Df[['Close']]
Df = Df.dropna()
Df.Close.plot(figsize=(10, 7),color='r')
plt.ylabel("Gold ETF Prices")
plt.title("Gold ETF Price Series")
plt.show()

Define explanatory variables

The explanatory variable is one that is manipulated to determine the next day's gold ETF Price variables . In short , They are what we want to use to predict gold ETF The characteristics of price .

The explanatory variable in this strategy is the past 3 Days and 9 Day moving average . We use dropna() Function delete NaN Store the characteristic value in X in .

however , You can go to X Add more you think about forecasting gold ETF Price is a useful variable . These variables can be technical indicators 、 other ETF The price of , For example, gold miners ETF (GDX) Or oil ETF (USO), Or US economic data .

Define the dependent variable

Again , The dependent variable depends on the value of the explanatory variable . In short , This is the gold we are trying to predict ETF Price . We will gold ETF Prices are stored in y in .

Df['S_3'] = Df['Close'].rolling(window=3).mean()
Df['S_9'] = Df['Close'].rolling(window=9).mean()
Df['next_day_price'] = Df['Close'].shift(-1)
Df = Df.dropna()
X = Df[['S_3', 'S_9']]
y = Df['next_day_price']

Split the data into training and test data sets

In this step , We split the prediction variables and output data into training data and test data . By pairing the input with the expected output , The training data is used to create a linear regression model .

The test data is used to estimate the training effect of the model .

• front 80% Data for training , The remaining data is used to test

•X_train & y_train It's the training data set

•X_test & y_test It's a test data set

t = .8
t = int(t*len(Df))
X_train = X[:t]
y_train = y[:t]
X_test = X[t:]
y_test = y[t:]

Create a linear regression model

We will now create a linear regression model . however , What is linear regression ？

If we try to catch “x” and “y” The mathematical relationship between variables , By fitting a line to the scatter plot ,“ best ” according to “x” The observations explain “y” The observations , So this equation x and y The relationship between is called linear regression analysis .

To further decompose , Regression uses independent variables to explain the changes of dependent variables . The dependent variable “y” Is the variable you want to predict . The independent variables “x” Is the explanatory variable you use to predict the dependent variable . The following regression equation describes this relationship ：

Y = m1 * X1 + m2 * X2 + C
Gold ETF price = m1 * 3 days moving average + m2 * 15 days moving average + c

Then we use the fitting method to fit the independent variable and dependent variable （x and y） To generate regression coefficients and constants .

linear = LinearRegression().fit(X_train, y_train)
print("Linear Regression model")
print("Gold ETF Price (y) = %.2f * 3 Days Moving Average (x1) \
+ %.2f * 9 Days Moving Average (x2) \
+ %.2f (constant)" % (linear.coef_[0], linear.coef_[1], linear.intercept_))

Output linear regression model :

gold ETF Price (y) = 1.20 * 3 Day moving average (x1) + -0.21 * 9 Day moving average (x2) + 0.43（ constant ）

Forecast gold ETF Price

Now? , It's time to check if the model works in the test dataset . We use a linear model created using the training data set to predict gold ETF Price . The prediction method finds a given explanatory variable X Of gold ETF Price (y).

predicted_price = linear.predict(X_test)
predicted_price = pd.DataFrame(
    predicted_price, index=y_test.index, columns=['price'])
predicted_price.plot(figsize=(10, 7))
y_test.plot()
plt.legend(['predicted_price', 'actual_price'])
plt.ylabel("Gold ETF Price")
plt.show()

The picture shows gold ETF The predicted price and the actual price .

Now? , Let's use score() Function to calculate goodness of fit .

r2_score = linear.score(X[t:], y[t:])*100
float("{0:.2f}".format(r2_score))

Output ：

99.21

It can be seen that , Model R Square is 99.21%.R Square is always between 0 and 100% Between . near 100% The score shows that the model well explains gold ETF The price of .

Plot cumulative income

Let's calculate the cumulative return of this strategy to analyze its performance .

The calculation steps of cumulative income are as follows ：

• Generate daily percentage change in gold price

• When the forecast price of the next day is higher than that of the current day , Create a to “1” Indicates a buying signal

• Calculate the strategic return by multiplying the daily percentage change by the trading signal .

• Last , We will draw a cumulative income chart

gold = pd.DataFrame()
gold['price'] = Df[t:]['Close']
gold['predicted_price_next_day'] = predicted_price
gold['actual_price_next_day'] = y_test
gold['gold_returns'] = gold['price'].pct_change().shift(-1)
gold['signal'] = np.where(gold.predicted_price_next_day.shift(1) < gold.predicted_price_next_day,1,0)
gold['strategy_returns'] = gold.signal * gold['gold_returns']
((gold['strategy_returns']+1).cumprod()).plot(figsize=(10,7),color='g')
plt.ylabel('Cumulative Returns')
plt.show()

Output is as follows ：

We will also calculate Sharpe ratio ：

sharpe = gold['strategy_returns'].mean()/gold['strategy_returns'].std()*(252**0.5)
'Sharpe Ratio %.2f' % (sharpe)

Output is as follows ：

'Sharpe Ratio 1.06'

Forecast daily prices

You can use the following code to predict the price of gold , And give us what we should buy GLD Or a trading signal not to hold a position ：

import datetime as dt
current_date = dt.datetime.now()
data = yf.download('GLD', '2008-06-01', current_date, auto_adjust=True)
data['S_3'] = data['Close'].rolling(window=3).mean()
data['S_9'] = data['Close'].rolling(window=9).mean()
data = data.dropna()
data['predicted_gold_price'] = linear.predict(data[['S_3', 'S_9']])
data['signal'] = np.where(data.predicted_gold_price.shift(1) < data.predicted_gold_price,"Buy","No Position")
data.tail(1)[['signal','predicted_gold_price']].T

Output is as follows ：