本文会对如何使用ARIMA模型进行完整的展示，实现数据获取、数据清洗、平稳性检验、定阶、建立ARIMA模型、预测、误差评估等完整的时间序列预测流程。
本文使用的数据集在本人上传的资源中，链接为mock_kaggle.csv

具体代码

其中pmdarima 库的安装方式为：管理员身份运行cmd,使用pip install pmdarima

import pandas as pd
import numpy as np
import math
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.stattools import durbin_watson #DW检验
from matplotlib.pylab import mpl
import pmdarima as pm

mpl.rcParams['font.sans-serif'] = ['SimHei']   #显示中文
mpl.rcParams['axes.unicode_minus']=False       #显示负号

取数据

data=pd.read_csv('mock_kaggle.csv',encoding ='gbk',parse_dates=['datetime'])
Date=pd.to_datetime(data.datetime)
data['date'] = Date.map(lambda x: x.strftime('%Y-%m-%d'))
datanew=data.set_index(Date)
series = pd.Series(datanew['股票'].values, index=datanew['date'])
values = series.values
values = values.reshape((len(values), 1))
values.shape

(937, 1)

划分数据集并标准化

train,test=train_test_split(values,test_size=0.03)
scaler = MinMaxScaler(feature_range=(0, 1)).fit(train)
train_data=pd.DataFrame(scaler.fit_transform(train))
test_data=pd.DataFrame(scaler.transform(test))

train_data.shape,test_data.shape

((908, 1), (29, 1))

normalized_train_series=pd.Series(train_data[0].values,index=datanew['datetime'][:len(train_data)])
normalized_train_series

datetime
2014-01-01    0.208910
2014-01-02    0.292889
2014-01-03    0.014665
2014-01-04    0.196458
2014-01-05    0.887106...
2016-06-28    0.155091
2016-06-29    0.112341
2016-06-30    0.286248
2016-07-01    0.316132
2016-07-02    0.188019
Length: 908, dtype: float64

plt.figure(figsize=(15,6))
normalized_train_series.plot.line()
plt.show()

平稳性检验

#ACF/PACF检验
f = plt.figure(facecolor='white')
ax1 = f.add_subplot(211)
plot_acf(normalized_train_series, lags=84, ax=ax1)
ax2 = f.add_subplot(212)
plot_pacf(normalized_train_series, lags=84, ax=ax2)
plt.subplots_adjust(left=None, bottom=None, right=None, top=None,wspace=None, hspace=0.4)
plt.show()

#ADF检验
#1.只要统计值(第一个值)是小于1%水平下的数字就可以极显著的拒绝原假设，认为数据平稳
#2.第二值为p值，表示t统计量对应的概率值。越接近0越好
print(adfuller(normalized_train_series))

(-29.227779748490356, 0.0, 0, 907, {'1%': -3.4375803238413085, '5%': -2.8647318597670877, '10%': -2.568469555703587}, -459.0867155373378)

构建ARIMA模型

model = pm.auto_arima(normalized_train_series,seasonal=True, m=12)

预测

forecast_data=model.predict(10)   #为未来10个点进行预测， 返回预测结果， 标准误差， 和置信区间

forecast_data.shape

(10,)

还原为真实值

real_predict=scaler.inverse_transform(forecast_data.reshape(10,1))
real_y=test[:10]

误差评估

from sklearn.metrics import mean_squared_error # 均方误差
round(math.sqrt(mean_squared_error(real_predict,real_y)),1)

956.5

from sklearn.metrics import r2_score
round(r2_score(real_y,real_predict),4)

-0.0263

#mape
per_real_loss=(real_y-real_predict)/real_y
avg_per_real_loss=sum(abs(per_real_loss))/len(per_real_loss)
print(avg_per_real_loss)

[0.37255715]

plt.figure(figsize=(15,6))
plt.plot(per_real_loss,label='真实误差百分比')
plt.legend()
plt.show()

plt.figure(figsize=(15,6))
bwith = 0.75 #边框宽度设置为2
ax = plt.gca()#获取边框
ax.spines['bottom'].set_linewidth(bwith)
ax.spines['left'].set_linewidth(bwith)
ax.spines['top'].set_linewidth(bwith)
ax.spines['right'].set_linewidth(bwith)
plt.plot(real_predict,label='real_predict',linewidth=0.75)
plt.plot(real_y,label='real_y',linewidth=0.75)
plt.plot(real_y*(1+0.15),label='15%上限',linestyle='--',color='green',linewidth=0.5)
# plt.plot(real_y*(1+0.1),label='10%上限',linestyle='--')
# plt.plot(real_y*(1-0.1),label='10%下限',linestyle='--')
plt.plot(real_y*(1-0.15),label='15%下限',linestyle='--',color='green',linewidth=0.5)
plt.fill_between(range(0,10),real_y.reshape(1,-1)[0]*(1+0.15),real_y.reshape(1,-1)[0]*(1-0.15),color='gray',alpha=0.2)
plt.legend()
plt.show()

时间序列预测——ARIMA相关推荐

时间序列预测--ARIMA、LSTM
时间序列预测–ARIMA.LSTM ARIMA ARIMA模型全称为差分自回归移动平均模型(Auto regressive Integrated Moving Average Model,简记ARIM ...
理论加实践，终于把时间序列预测ARIMA模型讲明白了
上篇我们一起学习了一些关于时间序列预测的知识.而本文将通过一段时间内电力负荷波动的数据集来实战演示完整的ARIMA模型的建模及参数选择过程,其中包括数据准备.随机性.稳定性检验.本文旨在实践中学习,在 ...
数据挖掘实战（3）——时间序列预测ARIMA模型（附踩坑日志）
文章目录 1 导包 2 数据准备 3 可视化 4 构建ARIMA模型 5 预测 6 踩坑日志 1 导包 import numpy as np import matplotlib.pyplot as p ...
【时间序列预测-ARIMA模型】
转载 https://blog.csdn.net/qq_35495233/article/details/83514126 参考[概念]https://blog.csdn.net/TU_JCN/art ...
时间序列预测，非季节性ARIMA及季节性SARIMA
Python 3中使用ARIMA进行时间序列预测的指南在本教程中,我们将提供可靠的时间序列预测.我们将首先介绍和讨论自相关,平稳性和季节性的概念,并继续应用最常用的时间序列预测方法之一,称为ARIM ...
机器学习（MACHINE LEARNING）使用ARIMA进行时间序列预测
文章目录 1 引言 2 简介 3 python代码实现 4 代码解析 1 引言在本文章中,我们将提供可靠的时间序列预测.我们将首先介绍和讨论自相关,平稳性和季节性的概念,并继续应用最常用的时间序列预 ...
ARIMA模型实例讲解：时间序列预测需要多少历史数据？
时间序列预测,究竟需要多少历史数据? 显然,这个问题并没有一个固定的答案,而是会根据特定的问题而改变. 在本教程中,我们将基于 Python 语言,对模型输入大小不同的历史数据,对时间序列预测问题展开 ...
时序预测 | MATLAB实现ARIMA时间序列预测(GDP预测)
时序预测 | MATLAB实现ARIMA时间序列预测(GDP预测) 目录时序预测 | MATLAB实现ARIMA时间序列预测(GDP预测) 预测效果基本介绍模型设计模型分析学习总结参考资料 ...
时间序列预测02：经典方法综述自回归ARIMA/SRIMA 指数平滑法等
机器学习和深度学习方法可以在具有挑战性的时间序列预测问题上取得不俗的表现.然而,在许多预测问题中,经典的方法,如SARIMA和指数平滑法(exponential smoothing ),容易优于更复杂 ...

时间序列预测——ARIMA

具体代码

取数据

划分数据集并标准化

平稳性检验

构建ARIMA模型

预测

还原为真实值

误差评估

时间序列预测——ARIMA相关推荐

最新文章

热门文章