0% found this document useful (0 votes)
52 views13 pages

ARIMA vs RNN-LSTM: Stock Prediction Study

The document compares the stock market prediction performance of the Autoregressive integrated moving average (ARIMA) and Recurring neural network – long-term memory (RNN-LSTM) model using data from Indian stock exchanges. It analyzes the prediction performance of different RNN-LSTM architectures and compares the prediction capabilities of the ARIMA and RNN-LSTM models. The results show that the ARIMA model more accurately predicted the short-term direction of changes in stock market indices.

Uploaded by

senthil.jpin8830
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views13 pages

ARIMA vs RNN-LSTM: Stock Prediction Study

The document compares the stock market prediction performance of the Autoregressive integrated moving average (ARIMA) and Recurring neural network – long-term memory (RNN-LSTM) model using data from Indian stock exchanges. It analyzes the prediction performance of different RNN-LSTM architectures and compares the prediction capabilities of the ARIMA and RNN-LSTM models. The results show that the ARIMA model more accurately predicted the short-term direction of changes in stock market indices.

Uploaded by

senthil.jpin8830
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Comparison of Stock Market Prediction Performance of ARIMA and

RNN-LSTM Model – A Case Study on Indian Stock Exchange

J.P. Senthil Kumar 1,a), R. Sundar2,b), and A. Ravi 3,c)


1,2
Department of management studies, GITAM School of Business, Bengaluru, 561203, India
3
University college of commerce & management, Mahatma Gandhi University, Nalgonda, 508003, India.
a)
Corresponding author: sjayapra@[Link]
b)
srangasa@[Link]
c)
aravi13371@[Link]

Abstract.

Forecasting stock market trends helps investors, government regulators, policymakers, and relevant stakeholders make informed decisions.
Predicting stock market movements is crucial and is always the focal point for many researchers and financial practitioners. This research
paper compares the stock market prediction performance of the Autoregressive integrated moving average (ARIMA) and Recurring neural
network – long-term memory (RNN-LSTM) model. The time-series data of monthly closing values of SENSEX and NIFTY FIFTY indices
from January 2000 to December 2020, covering 252 months of observation, have been taken as training data for this study. Both index
values were taken as test data from January 2021 to December 2021. The prediction performance for different RNN-LSTM architectures
was analyzed, and the best architectures were found. We further compared and evaluated the predicting capabilities of two models (ARIMA
and RNN-LSTM). The comparative results showed that the ARIMA (4,1,5) model is more efficient in forecasting the direction of change
in the stock market indices in terms of root mean square error, mean absolute deviation, and mean absolute percent error. The results obtained
from the study confirmed that ARIMA outperformed the RNN-LSTM model in forecasting the future time series in the short run.
keywords: Stock market prediction, ARIMA, RNN-LSTM, SENSEX, NIFTY-FIFTY

INTRODUCTION
Stock market movements are hard to predict and are influenced by many variables such as macroeconomic factors,
market conditions, socio-economic events, investors' sentiments, and preferences. Stock markets are featured by high
return, high risk, and flexible operations, motivating investors to invest. However, there is often a dilemma in the
investors' minds concerning the movements of the stock market. The fluctuations and inconsistency of the stock
market are the biggest concern. Therefore, forecasting the stock market movements is always the focal point and a
complex research topic.
Predicting stock market movements is vital for investment decision-making, managing risk, and creating portfolio
management. Practitioners and academicians have made great efforts to understand the price dynamics (1), and
various forecasting techniques have been developed (2–7). Still, a successful stock market prediction remains one of
the challenging tasks in the studies of financial markets (8) of the chaotic nonlinearity and volatility(5,9–11).
This study compares two time-series forecasting models. Namely, autoregressive integrated moving average
(ARIMA) and recurring neural network – long short-term memory (RNN-LSTM), to find whether these models can
accurately predict the stock market movements. The main contribution to the study is identifying the best architecture
to predict the BSE and NSE indices based on forecasting performances.
This paper is organized as follows: the next section portrays the literature survey in RNN-based time series
prediction. Section 3 (Methods) explains both the models (ARIMA and RNN-LSTM). Section 4 (Results and
Discussion) presents the results and discusses the performance of both models. Section 5 (Conclusion) summarizes
the key points of this study.
REVIEW OF LITERATURE
Time series prediction using deep learning algorithms has become the trend of this era. Many researchers have
applied deep learning models to predict the time series data in recent years. The conventional way of predicting the
time series requires data to be stationary to apply the prediction models. In the case of deep learning models, there
are no preliminary requirements to use the time-series data to forecast the future.
Recurrent Neural Network (RNN) is one of the deep learning models which was used in the following applications,
namely, language modeling (12), image creation (13), handwritten letter recognition, and speech recognition (14). It
uses interconnection structures to acquire the information of previous instances to create the learning model. In recent
years, RNN has frequently been used for time-series forecasting. Due to the computational complexities of RNN,
advanced versions of RNN such as LSTM (15) and gated recurrent unit (GRU) have been introduced to overcome
the issues.
The recent applications of LSTM models in time series forecasting are in table 1.

TABLE 1. List of applications of LSTM in recent years (2018 – 2022)

Literature Applications
(16) Traffic flow forecasting
(17) Emotion recognition
(18) Electricity energy consumption forecasting
(19) Rainfall distribution forecasting
(5,20) Stock market prediction
(21) Forecasting bitcoin returns
(22) Financial data forecasting
(6,23) Stock price prediction
(24) Volatility forecasting
(25) Petroleum production prediction

LSTM has an advantage over conventional RNN, i.e., it can remember data patterns for a longer time than RNN, and
this property makes the LSTM very popular. We have used the RNN-LSTM model for current stock market
prediction based on the literature.
METHODS
Time series model: Univariate time series model predicts a given single variable using its past values and possibly
current and past values of the error term (called innovation or noise).
𝑦𝑡 = 𝑓(𝑦𝑡−1 , 𝑦𝑡−2, … … … 𝑒𝑡 , 𝑒𝑡−1 , … … … )

Where 𝑦𝑡 – time series data at time ‘t’; 𝑓(. ) represents the time series function; 𝑒𝑡 – error term associated to time ‘t’.
Autoregressive integrated moving averages (ARIMA) are not based on any theory and are empirical, and they use a
statistical relationship between a variable and its past values or innovation (26).
• AR: Auto-regressive is the number of past values included, usually denoted by AR(p).
• I: Integration means the number of times the differencing or integrating the series to get stationarity, usually
denoted by I(d).
• MA: Moving average is the number of past error terms included to make a forecast, usually denoted by MA(q).
p – number of past values; d – number of differences; q – number of past error terms.
• ARIMA Model: It is also referred to as the Box-Jenkins model. It is a set of procedures for identifying the
estimating time series models within the class of autoregression integrated moving average (ARIMA) models.
ARIMA models are regression models that use lagged values of a dependent variable and/or random disturbance
term as explanatory variables. ARIMA models rely heavily on the autocorrelation pattern in the data.
Three basic ARIMA models for a stationary times series 𝑦𝑡 :
1. Autoregressive model of order p (AR(p))
𝑦𝑡 = 𝜕 + ∅1 𝑦𝑡−1 + ∅2 𝑦𝑡−2 + ⋯ + ∅𝑝 𝑦𝑡−𝑝 +∈𝑡

𝑖. 𝑒. 𝑦𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑖𝑡𝑠 𝑝 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑣𝑎𝑙𝑢𝑒𝑠.


𝑤ℎ𝑒𝑟𝑒 ∅𝑖 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑒𝑛𝑡 𝑜𝑓 𝑦𝑡−𝑖 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑑𝑎𝑡𝑎; 𝑖 = 1,2, . . 𝑝
2. Moving average model of order q (MA(q))
𝑦𝑡 = 𝜕 + 𝜖𝑡 − ∅1 𝜖𝑡−1 − ⋯ − ∅𝑞 𝜖𝑡−𝑞

𝑖. 𝑒. 𝑦𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑞 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠.


𝑤ℎ𝑒𝑟𝑒 ∅𝑖 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑒𝑛𝑡 𝑜𝑓 𝜖𝑡−𝑖 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑑𝑎𝑡𝑎; 𝑖 = 1,2, . . 𝑞
3. Autoregressive moving average model of order p and q (ARMA (p, q))
𝑦𝑡 = 𝜕 + ∅1 𝑦𝑡−1 + ∅2 𝑦𝑡−2 + ⋯ + ∅𝑝 𝑦𝑡−𝑝 + 𝜖𝑡 − ∅1 𝜖𝑡−1 − ∅2 𝜖𝑡−2 − ⋯ − ∅𝑞 𝜖𝑡−𝑞

𝑖. 𝑒. 𝑦𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑖𝑡𝑠 𝑝 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑛𝑑 𝑞 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚𝑠.


𝑤ℎ𝑒𝑟𝑒 ∅𝑖 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑒𝑛𝑡 𝑜𝑓 𝑦𝑡−𝑖 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑑𝑎𝑡𝑎; 𝑖 = 1,2, . . 𝑝
In the ARIMA model, the random disturbance term 𝜖𝑡 is typically assumed to be 'white noise.' It is identically and
independently distributed with a mean of 0 and a common variance 𝜎 2 across all observations.

Figure 1: Box-Jenkins's decision tree (26)


RNN-LSTM MODEL
A Recurrent neural network (RNN) is one of the deep learning models primarily used to forecast time-series data.
The conventional RNN model faced three issues: time complexity, high computation cost, and vanishing gradient.
To eradicate the problem of RNN, another improved version of RNN, i.e., the long short-term memory (LSTM)
model, was introduced. It remembers selective data patterns for a longer time and forgets the rest. LSTM has a
defined architecture to perform the action of which data to be recognized or retained and which to be overlooked. A
single LSTM block architecture is represented below as a diagram in figure 2.

Figure 2: Single LSTM block architecture

h(t), h(t-1) and C(t), C(t-1) represents hidden states and cell states of time period t, and t-1 respectively. X (t) denotes
the input data. The LSTM block uses the four gates to regularize the memory to be stored and deleted. The four gates
involved in LSTM are the forgotten gate (ft), input gate (it), modulation gate (gt), and output gate (ot). The expression
for those gates is given below:
𝑓𝑡 = 𝜎[𝑤𝑓 . (𝑋𝑡 , ℎ𝑡−1 ) + 𝑏𝑓 ]
𝑖𝑡 = 𝜎[𝑤𝑖 . (𝑋𝑡 , ℎ𝑡−1 ) + 𝑏𝑖 ]
𝑔𝑡 = 𝑡𝑎𝑛ℎ[𝑤𝑔 . (𝑋𝑡 , ℎ𝑡−1 ) + 𝑏𝑔 ]
𝑜𝑡 = 𝜎[𝑤𝑜 . (𝑋𝑡 , ℎ𝑡−1 ) + 𝑏𝑜 ]
The forget gate uses the sigmoidal function (denoted as 𝜎()) to choose how much data (both hidden state information
and input data) is to be forgotten. Next, the input gate uses another sigmoidal function to decide how much input and
previous state information should be taken further. Similarly, the input modulation gate uses a hyperbolic tangent
function to do the action. Note the output of Sigmoidal and tanh functions varies from 0 to 1 and -1 to 1, respectively
(27,28). In figure 2, multiplication of two values are represented as ‘x’ and summation is denoted as ‘+’
The unique feature of LSTM is the cell state, which helps retain patterns' memory for longer times. The expression
for cell state of time, t can be denoted as:
𝐶𝑡 = (𝑓𝑡 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝑔𝑡 )
Figure 3: Proposed RNN – LSTM architecture
Similarly, the hidden state information is an additional input from the previous cell, and the hidden state of time, t
can be determined using
ℎ𝑡 = (𝑜𝑡 ∗ 𝑡𝑎𝑛ℎ(𝐶𝑡 ))
This study has developed the different RNN-LSTM architectures by varying the number of hidden units in the hidden
layer. The pseudo architecture of RNN-LSTM used for this study is shown in figure 3. To identify the best
architecture to forecast the time series data, the number of hidden units (LSTM blocks) is varied from 10 to 200.
Based on the performance, the best architecture will be selected.
RESULTS & DISCUSSION
The ARIMA model analyzes stochastic characteristics of time series data (29). Here Box-Jenkins's methodology
allows 𝑦𝑡 to be explained by past or lagged values of y and stochastic error term (∈𝑡 ). To select the best fitted ARIMA
model, among several experiments performed, many statistical tools are to be applied, such as root mean square error
(RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), Akaike information criteria (AIC),
Bayesian information criteria (BIC), etc. The first step toward analyzing the univariate time series model is to ensure
no upward or downward trend in the time series. Seasonality should not occur, which leads to a non-stationarity
issue. Following the Box-Jenkins procedure to select the best-fitted ARIMA model, we have to check whether or not
NSE & BSE monthly indexes' time series data are stationary. Table 2 shows the correlogram and partial correlogram
of the NSE & BSE monthly indexes and the first difference monthly indexes.
It is evident from table 2 that the NSE & BSE monthly index times series is not stationarity at original raw data since
Auto-correlation Function (ACF) of NSE and BSE index declines gradually up to 35 lags, and they are outside the
95% confidence limits. It can also be seen in Partial Auto-correlation Function (PACF) that both the indexes drop
after the first lag, and these values are statistically insignificant. Before we proceed to the Box-Jenkins methodology,
the raw data should be converted to stationary data. Such a conversion can be seen in the ACF and PACF of the NSE
and BSE monthly index series at the first difference shown in the above table. Since there is no trend in the data, the
first difference in time series data is stationary.
TABLE 2. Correlogram and partial correlogram
BSE 1st difference NSE 1st difference
BSE monthly index NSE monthly index
monthly index monthly index
ACF PACF ACF PACF ACF PACF ACF PACF
1 0.978 0.978 0.979 0.979 0.019 0.019 0.014 0.014
2 0.960 0.057 0.961 0.056 -0.014 -0.014 -0.028 -0.028
3 0.945 0.086 0.947 0.088 0.020 0.021 0.003 0.004
4 0.931 0.025 0.934 0.025 -0.016 -0.017 -0.016 -0.017
5 0.916 -0.009 0.920 -0.005 -0.068 -0.067 -0.064 -0.063
6 0.903 0.035 0.907 0.036 0.010 0.012 0.004 0.005
7 0.893 0.045 0.897 0.045 0.075 0.073 0.084 0.081
8 0.883 0.040 0.888 0.035 -0.058 -0.058 -0.051 -0.054
9 0.872 -0.007 0.878 -0.002 -0.086 -0.085 -0.073 -0.071
10 0.867 0.109 0.872 0.107 0.005 0.000 0.004 -0.001
11 0.852 -0.179 0.859 -0.175 -0.077 -0.073 -0.088 -0.089
12 0.837 -0.045 0.844 -0.041 -0.135 -0.125 -0.151 -0.143
13 0.822 -0.013 0.830 -0.009 0.028 0.019 0.016 0.005
14 0.808 -0.026 0.817 -0.022 -0.045 -0.063 -0.034 -0.060
15 0.794 0.017 0.803 0.013 0.024 0.037 0.020 0.025
16 0.781 0.009 0.791 0.008 -0.014 -0.024 -0.007 -0.017
17 0.769 0.015 0.780 0.017 0.016 -0.007 0.010 -0.015
18 0.756 -0.035 0.768 -0.033 -0.005 0.000 -0.014 -0.007
19 0.742 -0.038 0.754 -0.043 0.016 0.022 0.013 0.023
20 0.726 -0.070 0.739 -0.071 -0.025 -0.056 -0.029 -0.062
21 0.712 0.019 0.725 0.016 0.001 -0.011 0.020 0.007
22 0.697 0.011 0.711 0.005 -0.022 -0.036 -0.027 -0.047
23 0.685 0.062 0.699 0.061 -0.051 -0.076 -0.057 -0.092
24 0.674 0.012 0.688 0.016 0.121 0.118 0.119 0.107
25 0.661 -0.040 0.675 -0.046 0.044 0.031 0.054 0.040
26 0.647 -0.035 0.662 -0.035 -0.127 -0.150 -0.115 -0.138
27 0.636 0.037 0.651 0.032 -0.053 -0.042 -0.052 -0.040
28 0.624 -0.029 0.638 -0.033 0.000 -0.013 0.012 -0.006
29 0.609 -0.045 0.624 -0.051 -0.027 -0.012 -0.030 -0.029
30 0.595 0.038 0.610 0.038 -0.003 0.001 -0.007 0.003
31 0.583 0.020 0.598 0.019 0.040 -0.008 0.029 -0.023
32 0.571 -0.005 0.585 -0.006 0.009 -0.017 0.005 -0.028
33 0.558 -0.029 0.572 -0.030 0.001 0.050 0.008 0.057
34 0.547 0.014 0.561 0.005 -0.015 -0.049 -0.014 -0.053
35 0.535 -0.033 0.548 -0.033 0.039 0.016 0.037 0.014
36 0.521 -0.035 0.534 -0.038 0.014 0.053 0.011 0.056

Figure 4 shows that both the series BSE & NSE Indexes were increasing and seemed exponential, and the original
data is not stationary. To convert the raw data into stationary, we applied the first difference, and the following are
the results in figure 5. At first difference, the monthly series D(BSE) and D(NSE) indicate that both are stationarity
and correlated.
50,000 6,000

4,000
40,000
2,000

30,000 0

-2,000
20,000
-4,000

-6,000
10,000
-8,000

0 -10,000
00 02 04 06 08 10 12 14 16 18 20 00 02 04 06 08 10 12 14 16 18 20

BSE NSE D(BSE) D(NSE)

Figure 4: BSE & NSE Raw data Figure 5: BSE & NSE at First Difference

Augmented Dickey-Fuller unit root test (ADF)


Following the robustness approach, the ADF test examines whether a time series data is non-stationary and possesses
unit root. Unit roots are the cause of non-stationarity. A series of data is called to be stationary when the change in
time does not cause a shift in the distribution's shape, which means the mean and auto-covariance of the series do
not depend on time. The results of ADF are exhibited in table 3
TABLE 3. Augmented Dickey-Fuller Unit Root test on BSE & NSE at original data.
Augmented Dickey-Fuller test statistic Computed value Computed value
BSE NSE
Test critical values: 1% level -3.456302 0.786616 0.598741
5% level -2.872857
10% level -2.572875

Since the computed value of the unit root test in absolute value are less than the critical values at a 5 % significance
level, it is understood that the BSE index series is not stationary. The same test that was run for the NSE index series
is found to be non-stationary.
TABLE 4. Augmented Dickey-Fuller Unit Root test on BSE & NSE at first difference.
Augmented Dickey-Fuller test statistic Computed value Computed value
BSE NSE
Test critical values: 1% level -3.456302 -15.19288 -15.30763
5% level -2.872857
10% level -2.572875
Applying the same test at the first difference shows that the data found to be stationary, as exhibited in table 4

The ADF test is performed at first, showing that both series are stationary. The calculated value of DBSE value -
15.19288, which is greater than the critical values at all the levels of significance. It can be seen that p < 0.05 and
the NSE and BSE series are non-white noise series.
Forecasting using ARIMA (p, d, q) model
The current research is based on monthly data of the closing index values of BSE and NSE ranging from Jan. 2000
to Dec. 2020(252 observations) and the next twelve months from Jan. 2021 to Dec. 2021(12 observations) used to
predict the series. Based on the model, the best fit ARIMA model result is shown in table 5
TABLE 5. Parameter estimate of the ARIMA (4, 1, 5) model for the DBSE Index
Variable Coefficient Std. Error t-Statistic Prob.

C 164.2258 41.18313 3.987697 0.0001


AR (1) 0.842982 0.350418 2.405644 0.0169
AR (2) -0.794944 0.180851 -4.395570 0.0000
AR (3) 1.143292 0.204940 5.578675 0.0000
AR (4) -0.356395 0.338419 -1.053117 0.2934
MA (1) -0.863862 0.348716 -2.477265 0.0139
MA (2) 0.771684 0.195002 3.957310 0.0001
MA (3) -1.207331 0.176413 -6.843771 0.0000
MA (4) 0.262314 0.353720 0.741585 0.4591
MA (5) -0.050498 0.087787 -0.575236 0.5657

R-squared 16.10% Akaike info criterion 16.96905


Adjusted R-squared 12.95 % Schwarz criterion 17.11113
F-statistic 5.055677 Hannan-Quinn criterion 17.02625
Prob(F-statistic) 0.000003 Durbin-Watson stat 2.095641

TABLE 6. Parameter estimate of the ARIMA (3, 1, 3) model for the DNSE

Variable Coefficient Std. Error t-Statistic Prob.

C 44.90377 12.72977 3.527462 0.0005


AR(1) 0.032524 0.029920 1.087043 0.2781
AR(2) 0.000457 0.028812 0.015857 0.9874
AR(3) 0.769235 0.030547 25.18242 0.0000
MA(1) 0.162738 0.036782 4.424379 0.0000
MA(2) -0.066718 0.037267 -1.790265 0.0747
MA(3) -1.094346 0.037345 -29.30400 0.0000

R-squared 20.04 % Akaike info criterion 14.49670


Adjusted R-squared 18.05 % Schwarz criterion 14.59587
F-statistic 10.06865 Hannan-Quinn criterion 14.53662
Prob(F-statistic) 0.000000 Durbin-Watson stat 2.182488

100,000
Forecast: BSEF
Actual: BSE
80,000
Forecast sample: 2000M01 2021M12
Adjusted sample: 2000M06 2021M12
60,000 Included observations: 247
Root Mean Squared Error 8059.710
40,000 Mean Absolute Error 7440.142
Mean Abs. Percent Error 64.72390
Theil Inequality Coefficient 0.160764
20,000
Bias Proportion 0.849839
Variance Proportion 0.011067
0 Covariance Proportion 0.139094

-20,000
00 02 04 06 08 10 12 14 16 18 20

BSEF ± 2 S.E.

Figure 6: Forecast BSE


It is evident from table 5 that the statistics t values for AR (1), AR (2), AR (3), MA (1), MA (2), and MA (3) are
highly significant. The R2 and adjusted R2 are not high enough due to the non-availability of information from the
long-run relationship among variables. The Durbin-Watson static strongly suggests no positive or negative first-
order serial correlation. The AIC, SIC, and HQ criteria are relatively smaller, and F-statistic is significant, showing
that the model ARIMA (4, 1, 5) predicts the data and is more suitable for accurate forecasting.

25,000
Forecast: NSEF
Actual: NSE
20,000
Forecast sample: 2000M01 2021M12
Adjusted sample: 2000M05 2021M12
15,000 Included observations: 248
Root Mean Squared Error 1629.747
10,000 Mean Absolute Error 1346.128
Mean Abs. Percent Error 29.41122
Theil Inequality Coefficient 0.113630
5,000 Bias Proportion 0.635341
Variance Proportion 0.091434
0 Covariance Proportion 0.273224

-5,000
00 02 04 06 08 10 12 14 16 18 20

NSEF ± 2 S.E.

Figure 7: Forecast NSE

Table 6 shows the forecasted model ARIMA (3,1,3) is the best fit model to predict NSE Index. The R2 value of 20.4
% and adjusted R2 (18.5%) are not highly significant due to the loss of information from the long-run relationship
among the variables. AR (3), MA (1), and MA (3) are highly significant. The AIC, SIC, and HQ criterion coefficients
are smaller for ARIMA (3,1,3) model, which shows that this model is the best fit for predicting the NSE index. BSE
forecast (BSEF) figure 6 for the forecasted sample RMSE = 8059.710, MAE = 7440.142, and MAPE = 64.72390
was smaller and the best fit.
Figure 7 shows the NSE series forecast for Jan. 2000 to Dec. 2021, the RMSE=1629.747, MAE = 1346.128, and
MAPE = 29.4122 was smaller and best fit. Figures 8 and 9 below show the forecast line for the twelve observations.
The forecast error for BSE and NSE index are shown in table 7.
BSEF NSEF
60,000 16,000

14,000
50,000
12,000
40,000
10,000

30,000 8,000

6,000
20,000
4,000
10,000
2,000

0 0
00 02 04 06 08 10 12 14 16 18 20 00 02 04 06 08 10 12 14 16 18 20

Figure 8: BSE Forecast Figure 9: NSE Forecast


TABLE 7. BSE & NSE Forecast error

BSE NSE BSE NSE


BSE NSE
Period predicted predicted forecast forecast
actual actual
values values error error
Jan-21 46285.8 13634.6 48074.8 14078.8 -1789.02 -444.24
Feb-21 49100 14529.2 48245 14128.4 854.99 400.71
Mar-21 49509.2 14690.7 48415.2 14178 1093.95 512.66
Apr-21 48782.4 14631.1 48585.4 14227.7 196.95 403.45
May-21 51937.4 15582.8 48755.6 14277.3 3181.82 1305.55
Jun-21 52482.7 15721.5 48925.8 14326.9 3556.88 1394.65
Jul-21 52586.8 15763.1 49096 14376.5 3490.8 1386.59
Aug-21 57552.4 17132.2 49266.3 14426.1 8286.14 2706.14
Sep-21 59126.4 17618.2 49436.5 14475.7 9689.9 3142.48
Oct-21 59306.9 17671.7 49606.7 14525.3 9700.26 3146.38
Nov-21 57064.9 16983.2 49776.9 14574.9 7287.99 2408.33
Dec-21 58253.8 17354.1 49947.1 14624.5 8306.73 2729.57

RNN – LSTM results


The number of hidden units in RNN-LSTM is fixed as follows and forecasted the monthly values of BSE and NSE
indices from Jan-21 to Dec-21. The actual value of BSE and NSE indices are compared with forecasted values. Root
Mean Square Error (RMSE) and Mean Absolute Deviation (MAD) are used as performance metrics to compare the
different architectures' performance of RNN – LSTM. The performance of different RNN-LSTMs is given in tables
8 and 9. Also, the performance of each architecture on RMSE and MAD is shown in figures 10 and 11. It is found
that the number of hidden units for predicting the BSE index is 20 and for the NSE index is 15.
TABLE 8. Finding the best architecture of RNN-LSTM for forecasting the BSE index

BSE Predicted values


BSE
Period Actual Number of Hidden units in RNN – LSTM with a single hidden layer
Values 10 15 20 25 30 40 50 100 200
Jan-21 46285.77 41949.98 42185.87 45203.18 44475.52 44166.69 44089.27 44289 42435.25 46657.23
Feb-21 49099.99 41741.06 42181.38 44963.18 43994.9 43249.75 42779.95 43173.04 40287.77 46772.05
Mar-21 49509.15 41613.3 42201.8 44732.57 43436.39 42455.09 41640.21 42195.11 39022.08 46665.63
Apr-21 48782.36 41511.88 42213.23 44522.08 42839.81 41848.82 40821.54 41426.27 38356.38 45857.88
May-21 51937.44 41426.41 42217.39 44337.99 42247.27 41440.5 40374.38 40878.79 37950.52 44465.88
Jun-21 52482.71 41353 42216.47 44181.72 41691.88 41204.39 40264.78 40530.94 37659.04 42875.16
Jul-21 52586.84 41289.44 42212.14 44051.83 41197.9 41099.2 40406.51 40346.4 37500.17 41388.19
Aug-21 57552.39 41234.22 42205.51 43945.64 40781.82 41081.72 40695.86 40285.8 37529.82 40129.65
Sep-21 59126.36 41186.22 42197.39 43860 40453.1 41114.74 41038.59 40312.47 37760.11 39123.89
Oct-21 59306.93 41144.51 42188.31 43791.77 40214.8 41170.42 41364.88 40394.94 38150.13 38385.74
Nov-21 57064.87 41108.33 42178.67 43738.05 40064.28 41230.55 41633.27 40507.76 38633.45 37952.63
Dec-21 58253.82 41076.97 42168.77 43696.27 39994.18 41284.99 41827.45 40631.55 39144.82 37867.31
RMSE 12967.27 12125.14 10437.14 13097.29 12810.16 13046.69 13404.39 15716.35 13595.25
MAD 12112.78 11301.81 9247.03 11716.4 11720.15 12087.66 12251.38 14796.59 11215.86
16000 4700
15000
4500
14000

Parameter value
Parameter value

13000 4300
12000
11000 4100
RMSE
10000 RMSE
MAD 3900
9000 MAD
8000 3700
0 30 60 90 120 150 180 210 0 30 60 90 120 150 180 210
Number of hidden units in RNN-LSTM Number of hidden units in RNN - LSTM

Figure 10: Finding the best architecture for RNN- Figure 11: Finding the best architecture for RNN-
LSTM for BSE Forecast LSTM for NSE Forecast

Based on the results from both the models (ARIMA results from figures 6 and 7, RNN-LSTM results from Tables 8
and 9), we conclude that the ARIMA model outperformed the RNN-LSTM model in terms of RMSE and MAD
values. It is inferred that the RNN-LSTM model overfitted the training data and in turn, it reduced its forecasting
performance. The forecast values of BSE and NSE indices are shown in figures 12 and 13 in red font.
TABLE 9. Finding the best architecture of RNN-LSTM for forecasting the NSE index

NSE NSE Predicted values


Period Actual Number of Hidden units in RNN – LSTM with a single hidden layer
Values 10 15 20 25 30 40 50 100 200
Jan-21 13634.60 12048.46 12386.35 12363.36 12746.29 12986.89 12653.38 12840.67 13266.21 17310.92
Feb-21 14529.15 11978.04 12302.79 12221.25 12413.68 12660.50 12311.35 12247.45 12555.92 18582.84
Mar-21 14690.70 11936.71 12233.78 12114.91 12102.47 12331.75 11977.51 11796.03 11933.76 19628.70
Apr-21 14631.10 11899.50 12174.03 12028.79 11829.66 12039.00 11691.99 11489.88 11567.10 20484.05
May-21 15582.80 11867.74 12122.80 11960.32 11611.56 11807.43 11479.80 11302.62 11421.87 21158.04
Jun-21 15721.50 11840.33 12079.47 11907.64 11456.83 11648.49 11350.35 11207.98 11387.07 21596.08
Jul-21 15763.05 11816.72 12043.20 11868.56 11366.30 11561.85 11299.78 11183.52 11382.81 21717.10
Aug-21 17132.20 11796.39 12013.08 11840.67 11334.24 11538.15 11314.33 11209.44 11385.25 21482.13
Sep-21 17618.15 11778.88 11988.17 11821.59 11350.05 11562.29 11374.40 11267.63 11400.44 20920.95
Oct-21 17671.65 11763.80 11967.65 11809.22 11400.12 11616.98 11458.86 11341.92 11436.04 20106.54
Nov-21 16983.20 11750.80 11950.78 11801.75 11469.91 11685.77 11548.75 11418.94 11490.27 19121.36
Dec-21 17354.05 11739.58 11936.91 11797.72 11545.93 11755.26 11629.69 11488.89 11554.13 18040.71
RMSE 4328.97 4110.66 4251.20 4559.50 4365.81 4577.81 4695.58 4570.62 4382.12
MAD 4091.27 3842.76 3981.36 4223.76 4009.82 4268.49 4376.43 4210.94 4069.77
Figure 12: BSE Forecast using RNN-LSTM Figure 13: NSE Forecast using RNN-LSTM

CONCLUSION
This study aimed to compare the forecasting performance of two models (ARIMA and RNN-LSTM) on stock market
movements. For this study, two stock market indicators, such as BSE and NSE indices from Jan 2000 to Dec 2020,
were taken as training data. Both models were applied to the training data, and the following year's data (i.e.) from
Jan 2021 to Dec 2021) were used as test data. The forecasting performance on test data was measured using RMSE
and MAD. In this experiment, the number of hidden neurons in RNN-LSTM varied from 10 to 200 to find the best
architecture. Finally, the performance of the two models was compared, and the ARIMA-based forecasting model
was found to outperform the deep learning model (RNN-LSTM). Initial weights of the RNN-LSTM model were
generated randomly in this study. Instead, it can be optimized through an appropriate meta-heuristic optimization
method, and then the model can be used for forecasting. The same can be compared with the ARIMA model's
performance in the future.
REFERENCES
1. Fama EF. Random Walks in Stock Market Prices. Financ Anal J. 1995 Jan;51(1):75–80.
2. Abu-Mostafa YS, Atiya AF. Introduction to financial forecasting. Appl Intell. 1996;6(3):205–13.
3. Gundu V, Simon SP, Çuhadar M, Goel H, Singh NP. Dynamic prediction of Indian stock market: an artificial
neural network approach. J Ambient Intell Humaniz Comput. 2022;12(12):101–19.
4. Gandhmal DP, Kumar K. Systematic analysis and review of stock market prediction techniques. Comput Sci
Rev. 2019;34:100190.
5. Moghar A, Hamiche M. Stock Market Prediction Using LSTM Recurrent Neural Network. Procedia Comput
Sci. 2020;170:1168–73.
6. Ji X, Wang J, Yan Z. A stock price prediction method based on deep learning technology. Int J Crowd Sci.
2021;5(1):55–72.
7. Araújo RDA, Oliveira ALI, Meira S. A hybrid model for high-frequency stock market forecasting. Expert
Syst Appl. 2015;42(8):4081–96.
8. Armano G, Marchesi M, Murru A. A hybrid genetic-neural architecture for stock indexes forecasting. Inf Sci
(Ny). 2005;170(1):3–33.
9. Christy Jackson J, Prassanna J, Abdul Quadir M, Sivakumar V. Stock market analysis and prediction using
time series analysis. Mater Today Proc. 2021;(xxxx).
10. Zhao Y, Chen Z. Forecasting stock price movement: new evidence from a novel hybrid deep learning model.
J Asian Bus Econ Stud. 2021;
11. Kumar D, Sarangi PK, Verma R. A systematic review of stock market prediction using machine learning and
statistical techniques. Mater Today Proc. 2022;49:3187–91.
12. Tomas M. Recurrent neural network based language model ´ s Mikolov Introduction Comparison and model
combination Future work. Elev Annu Conf Int speech Commun Assoc [Internet]. 2010;(September):1–24.
Available from: [Link]
13. Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D. DRAW: A recurrent neural network for image
generation. In: 32nd International Conference on Machine Learning, ICML 2015. 2015.
14. Bengio Y, Goodfellow IJ, Courville A. Sequence Modeling : Recurrent and Recursive Nets. Deep Learn.
2015;324–65.
15. Sepp Hochreiter Jurgen Schmidhuber. Long Short-Term memory. Neural Comput [Internet]. 1937 Dec
31;9(8):5–6. Available from: [Link]
16. Fang W, Zhuo W, Yan J, Song Y, Jiang D, Zhou T. Attention meets long short-term memory: A deep learning
network for traffic flow forecasting. Phys A Stat Mech its Appl [Internet]. 2022;587:126485. Available from:
[Link]
17. Shahin I, Hindawi N, Nassif AB, Alhudhaif A, Polat K. Novel dual-channel long short-term memory
compressed capsule networks for emotion recognition. Expert Syst Appl [Internet]. 2022;188(April
2021):116080. Available from: [Link]
18. Bilgili M, Arslan N, Sekertekin A, Yasar A. Application of Long Short-Term Memory (LSTM) Neural
Network Based on Deep Learning for Electricity Energy Consumption Forecasting. Turkish J Electr Eng
Comput Sci. 2021;140–57.
19. Chen C, Zhang Q, Kashani MH, Jun C, Bateni SM, Band SS, et al. Forecast of rainfall distribution based on
fixed sliding window long short-term memory. Eng Appl Comput Fluid Mech [Internet]. 2022;16(1):248–61.
Available from: [Link]
20. Chhajer P, Shah M, Kshirsagar A. The applications of artificial neural networks, support vector machines, and
long–short term memory for stock market prediction. Decis Anal J [Internet]. 2022;2(November
2021):100015. Available from: [Link]
21. Parvini N, Abdollahi M, Seifollahi S, Ahmadian D. Forecasting Bitcoin returns with long short-term memory
networks and wavelet decomposition: A comparison of several market determinants. Appl Soft Comput
[Internet]. 2022;108707. Available from: [Link]
22. Huang Y, Gao Y, Gan Y, Ye M. A new financial data forecasting model using genetic algorithm and long
short-term memory network. Neurocomputing [Internet]. 2021;425:207–18. Available from:
[Link]
23. Zhang Y, Chu G, Shen D. The role of investor attention in predicting stock prices: The long short-term memory
networks perspective. Financ Res Lett [Internet]. 2021;38:101484. Available from:
[Link]
24. Yu M, Song J. Volatility forecasting: Global economic policy uncertainty and regime switching. Phys A Stat
Mech its Appl [Internet]. 2018;511:316–23. Available from: [Link]
25. Sagheer A, Kotb M. Time series forecasting of petroleum production using deep LSTM recurrent networks.
Neurocomputing [Internet]. 2019;323:203–13. Available from: [Link]
26. Gujarati D. Damodar N. Gujarati - Basic Econometrics (2004, McGraw-Hill). 2004.
27. Gundu V, Simon SP. PSO–LSTM for short term forecast of heterogeneous time series electricity price signals.
J Ambient Intell Humaniz Comput [Internet]. 2021;12(2):2375–85. Available from:
[Link]
28. Gundu V, Simon S pulikottil, Sundareswaran K, Panugothu S rao nayak. Gated recurrent unit based demand
response for preventing voltage collapse in a distribution system. Turkish J Electr Eng Comput Sci.
2020;28(6):3319–34.
29. Al-Shiab M. The predictability of Amman Stock Exchange performance: A univariate autoregressive
integrated moving average (ARIMA) model. Eur J Econ Financ Adm Sci. 2006;22(6):124–39.

You might also like