0% found this document useful (0 votes)
68 views12 pages

1 s2.0 S187705092500050X Main

This paper presents a comparative study of various machine learning algorithms for stock market time series forecasting, highlighting the challenges posed by market volatility and complexity. It evaluates traditional methods like Linear Regression and Support Vector Machines alongside advanced techniques such as Long Short-Term Memory networks, Convolutional Neural Networks, and XGBoost. The research aims to enhance predictive accuracy and provide insights into effective stock market forecasting strategies through rigorous model comparison and evaluation.

Uploaded by

Ivan Medić
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views12 pages

1 s2.0 S187705092500050X Main

This paper presents a comparative study of various machine learning algorithms for stock market time series forecasting, highlighting the challenges posed by market volatility and complexity. It evaluates traditional methods like Linear Regression and Support Vector Machines alongside advanced techniques such as Long Short-Term Memory networks, Convolutional Neural Networks, and XGBoost. The research aims to enhance predictive accuracy and provide insights into effective stock market forecasting strategies through rigorous model comparison and evaluation.

Uploaded by

Ivan Medić
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 252 (2025) 893–904

4th International Conference on Evolutionary Computing and Mobile Sustainable Networks

Stock market time series forecasting using comparative machine


learning algorithms
Samraj Guptaa, Sanchal Nachappab, Nirmala Paramanandhama*
a
School of Electronics Engineering, Vellore Institute of Technology, Chennai 600127, India
b
Department of Chemical Engineering, Manipal Institute of Technology, Manipal 576104, India

Abstract

Stock market trends prediction and their assessment have always been leading topics due to the market’s extreme volatility and
chaotic nature. Determining the stock market possesses an abstract representation of possession over enterprises and organizations,
more commonly known as “stock.” This kind of assessment is generally accepted as the foundation of market conduct is not
provided with as many shares as needed admitted to financial failure. Foreseeing the performance of a single business on stock
markets is a precarious one, as values of stocks are subjected to constant change. Nonetheless, viewing the stock market’s behavior
in retrospect and distributing investor expectations regarding future stock market values is a beneficial approach. Different models
have been proposed in the last few years, and there is a saturation of work to compare efficiency, accuracy or robustness for
identifying which model works best on different applications. In this paper, a detailed comparative study of different machine
learning algorithms for stock market time series prediction has been shown. It provides baselines with widely used algorithms such
as Linear Regression and Support Vector Machines, to state-of-the-art methods including Long Short-Term Memory networks,
Convolutional Neural Networks and Transformer-based architectures.
© 2025 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 4th International Conference on Evolutionary Computing and
Mobile Sustainable Networks
Keywords: Machine Learning; Stock Market; Time Series Data

1. Introduction

The stock market, being one of the key elements in the global financial system, is quite a complex and chaotic place
with thousands of transactions multiple times a day. The ability to predict stock prices and market trends has been a
focus of investors, financial analysts, and academics for many years since it influences investment decisions as well
as economic analysis. In the past, fundamental and technical analysis has been used; both of which are built upon
___________
* Corresponding author. Tel.: +91 77084 57482
E-mail address: [email protected]

1877-0509 © 2025 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 4th International Conference on Evolutionary
Computing and Mobile Sustainable Networks
10.1016/j.procs.2025.01.050
894 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

historical data along with financial indicators. But typically, these traditional methods struggle to account for the
complex and non-linear nature of stock market behavior that forces researchers to dive into complex implementations.
Stock market prediction is one of the most popular applications in finance using machine learning, and notably it has
become more significant in the last few years.Since the rise of artificial intelligence, a technology has emerged that
allows computers to learn from data and make decisions or predictions. In comparison with traditional methodologies,
capabilities of processing and analyzing large amounts of data to reveal patterns that lead towards the predictions
represent one big step forward toward machine learning. This shift has led to an increase in research oriented towards
utilizing different machine learning techniques for stock prediction, market trends and trading signals.
There are many ways to predict the stock market but, the machine learning techniques for stock predictions fall into
three main categories — Neural Network (NN), Support Vector Machine (SVM) and Random Forest. Linear
regression, one of the earliest and most straightforward machine learning techniques, establishes a baseline by
modeling the relationship between stock prices and various explanatory variables. While it offers a foundational
understanding, its reliance on linear relationships limits its effectiveness in capturing the complex dynamics of
financial markets. To address these limitations, more advanced techniques such as decision trees and ensemble
methods like Random Forests have been developed[2]. These methods improve upon linear models by accommodating
non-linear interactions among features, thereby enhancing predictive accuracy. Support Vector Machines provides
another valuable approach by transforming data into higher-dimensional spaces to find the optimal separation between
different classes. This capability makes SVMs particularly useful for distinguishing between market conditions or
forecasting price changes based on a variety of input features. Despite their strengths, SVMs can be computationally
demanding and require precise tuning to achieve the best performance [7] The emergence of neural networks has
further advanced the field of stock market prediction. Feedforward Neural Networks (FNNs) and more complex
architectures like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks have
demonstrated reasonable success in representing temporal dependencies and non-linear patterns embedded in financial
data[1].
These models excel with time-series data that have long-term dependencies and convoluted pattern detection,
making them well-suited for tasks such as stock price prediction. However, their effectiveness often depends on the
availability of large datasets and substantial computational resources. Reinforcement Learning (RL) introduces a novel
paradigm for optimizing trading strategies by interacting with the market environment [12]. The reinforcement
learning model learns through rewards (or punishments) to determine the best course of action at any given time to
maximize cumulative returns. RL shows potential in algorithmic trading for developing intelligent trading strategies,
but it also presents challenges such as high computational complexity and long training times [11]. The difficulty in
forecasting the stock market, primarily due to its volatile and chaotic nature, remains even with improvements in
machine learning techniques. Therefore, short-term changes in market conditions resulting from economic events,
geopolitical developments, or sentiment should factor more heavily into the predictions made by these models. In this
research paper, several machine learning models for predicting stock markets by considering multiple scenarios are
investigated. The goal is to provide valuable insights into how machine learning techniques can be used for stock
market forecasting by comparing models like linear regression, decision trees, Random Forests, SVMs, and neural
networks (RNNs), along with deep reinforcement learning [8]. The results aim to enhance understanding and provide
valuable insights into the development of more reliable prediction methods as part of the new era in financial analysis.
The proposed study is moderately complex, combining various advanced machine learning algorithms on time-series
data, necessitating a thorough understanding of both financial data and machine learning. Since each model requires
different preprocessing, tweaking, and validation, using algorithms like SVM, LSTM, Generalized Autoregressive
Conditional Heteroskedasticity (GARCH), Random Forest, Adaptive Boost (AdaBoost), and eXtreme Gradient
Boosting (XGBoost) increases complexity. It is also necessary to handle both continuous and directed predictions
carefully when implementing evaluation measures like R-squared, Root Mean Squared Error (RMSE), Mean Absolute
Error (MAE), and accuracy. Using techniques like temporal cross-validation, ensembling, and hyperparameter
adjustment to ensure model performance and dependability adds even more complexity.
Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904 895

2. Literature Survey

2.1. Long Short-Term Memory (LSTM)

All previous works have focused on studying how well LSTM can handle the heterogeneity of financial time series
via noise, volatility and non-linearity associated with stock prices. An LSTM model was trained on historical stock
price data and the accuracy of its predictions were assessed with reference to real time movement in stock prices. The
results provide evidence that LSTM models are well-suited to the prediction task of stock price movements, enabling
prediction distributions and thereby being efficient in managing sequential dependencies while capturing long term
trends available in data. Results illustrated that the new LSTM model performed much better than traditional models
in generating stock forecasts, which were more accurate and reliable for use. However, they also noted the essential
unpredictability of stock markets and suggested while LSTM can increase the accuracy in predictions, it may be used
permanently, it cannot completely eliminate the uncertainty involved in financial forecasting [1].

2.2. Artificial Neural Networks (ANN) and Random Forest

The stock market is modelled using Random Forest, a machine learning ensemble method that constructs multiple
decision trees and outputs the most common prediction of these trees. Recent works in machine learning have
rationalized the challenge of stock market prediction as not just academic but also as a frontier problem due to its non-
linearity and dynamism. These aspects are better expressed using modern techniques. While a variety of algorithms
such as SVM and ANN have already been investigated, the ensemble method like Random Forest is not so commonly
used in electronic system design. Studies published show high accuracy on long-term predictions with datasets
featuring AAPL and MSFT files performed in "Reasoning for earnings predictions based on GraphDB technologies."
They demonstrate that significant results can be obtained by plugging a Random Forest classifier into the task at hand,
achieving an accuracy ranging between 85–95%. This model has been validated based on accuracy, precision, recall,
specificity, and ROC curves. It has highlighted the numerous benefits of non-linear models relative to linear
approaches when forecasting stock market directions. [2].

2.3. Generalized Autoregressive Conditional Heteroskedasticity (GARCH)

An enormous literature has developed surrounding the various GARCH-type models that have been used to predict
volatility for corporate stocks, country stock indices, and commodity prices. Within these markets, various GARCH-
type models (e.g., linear vs. nonlinear and symmetric vs. asymmetric models) are evaluated. The research using
artificial neural networks for volatility prediction is a relatively new and modern practice, as GARCH models are the
usual approach used to predict such quantities. Artificial Neural Networks have shown great achievements in
prediction, classification, and anomaly detection.[3].

2.4. eXtreme Gradient Boosting (XGBoost)

eXtreme Gradient Boosting (XGBoost), a more advanced version of Gradient Boosted Decision Trees (GBDT),
also employs those same principles but adds second-order derivatives for better accuracy in the loss function, some
regularization to avoid overfitting, and block storage for faster processing time. With its efficiency, flexibility, and
space optimization, XGBoost became popular in areas like economics, data mining, and recommendation systems[4].
High-frequency trading data are used to predict stock prices using the XGBoost algorithm, which is a popular machine-
learning model in the Big Data Analytics era. The high-precision prediction of daily stock prices using XGBoost is,
to a certain extent, proof that economic management can realize the goal through advanced algorithms, provided
overfitting and underfitting is stopped in processing these model parameters. This leads to the conclusion that
economic analysis in time, based on complex algorithms, will become more common due to these deep market insights
and strategies [5].
896 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

2.5. Adaptive Boost (AdaBoost)

To this end, the AdaBoost algorithm was adopted to improve the model multi-factor stock selection. In AdaBoost,
multiple weak learners are sequentially assembled as a stronger classifier to be trained. A novel way to build a
multifactor stock selection model is proposed in this work designed for AdaBoost, using Decision Trees as the base
classifiers. The binary classification stock selection model was constructed with a multi-factor analysis and employed
Decision Trees as the base case, AdaBoost for boosting. The more general insights regarding the loss sensitivity of
financial investors are used in this framework to enhance stock picks performance, though a prism where all
regularized AUC boosting trees models benefit from. Subsequently, future efforts should be engaged to advance the
accuracy of this model by including some dominant trading characteristics (e.g., industry properties) and detecting
more practical finance market features [6]. Predictive analysis using Adaboost aimed to enhance predictive accuracy
in stock market forecasting by leveraging decision trees as base learners in the AdaBoost framework. The model’s
performance has been compared with traditional machine learning models to evaluate its robustness and reliability in
financial market predictions. Results have indicated that the AdaBoost ensemble significantly outperformed standard
models in terms of predictive accuracy [9].

2.6. Support Vector Machine (SVM)

SVMs have been famously looked at in the machine learning community for many years now. Yang et al. (2002)
used SVM to determine the volatility of financial data through deviations in prediction margins and its depreciation.
The research also tested how only using an asymmetrical margin for downside could too often reduce the negative
risk. Their methodology was shown to produce the most accurate predictions for daily Hang Seng Index closing prices.
The use of the SVM model has shown a significant benefit to both investors and regulators in emerging markets like
the Indian stock market. Subsequent research could go further and extend the model to include more macroeconomic
variables other than stock prices that are proven significant in affecting them too, such as exchange rates, interest
rates, or even the Consumer Price Index [7].

3. Proposed Work

The essence of this project is to compare different machine learning techniques for predicting stock market trends.
It mainly aims to identify models that provide accurate stock predictions. Moreover, both conventional and state-of-
the-art techniques are evaluated in terms of their strengths and weaknesses. Various machine learning algorithms are
selected for comparison. These include old methods like Ordinary Least Squares and SVM that are popular tools for
financial forecasts but may not capture complex patterns at all times.
For a machine learning model to be accurate and successful, stock market time series data must be preprocessed.
Preprocessing for this project would entail a number of crucial procedures since the TCS stock data from the nsepy
library must be changed in order for models like SVM, LSTM, GARCH, Random Forest, AdaBoost, and XGBoost to
work with it.
Post obtaining TCS stock data, a cleaning process is performed, which includes checking past trading volumes and
prices for abnormalities, outliers, and missing values. Missing values can cause machine learning models to perform
poorly, thereby making it essential that they be handled. Also it is necessary to maintain time series continuity without
adding bias by using interpolation, forward-fill, or backward-fill techniques to fill in missing data. It is essential to
take into account Winsorization or transformation methods for extreme outliers, especially if they have resulted from
data mistakes.
Feature engineering can be performed to improve the data by extracting useful features such as volatility metrics,
rolling standard deviations, and moving averages. For some models, such as Random Forest or XGBoost, which
benefit from a varied feature set, technical indicators like Bollinger Bands, Moving Average Convergence Divergence
(MACD), and Relative Strength Index (RSI) can be utilized. These characteristics enable models to understand
underlying patterns in the data by capturing trends, momentum, and volatility patterns in the stock price.
Scaling and normalization can be applied to enhance the performance of many machine learning algorithms,
particularly those like SVM and LSTM that are sensitive to feature magnitudes. To make the features comparable and
to prevent any single feature from dominating the analysis, normalisation between a predetermined range using Min-
Max scaling or Standard scaling is recommended. For LSTM models specifically, it is advised that the features be
Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904 897

scaled to values between 0 and 1.


Advanced models such as Long Short-Term Memory (LSTM) networks, which are specifically designed to handle
sequential data and temporal trends, are also considered in the study. Convolutional Neural Networks (CNNs), which
are capable of extracting features from time series data, are examined as well. Transformer-based models, known for
their attention mechanisms, are also investigated, as they have been shown to outperform other existing models in
many tasks. Additionally, the performance of the GARCH (Generalized Autoregressive Conditional
Heteroskedasticity) modelling approach is assessed. Thus, understanding how forecasting accuracy is improved
through the incorporation of volatility shifts over time is crucial for evaluating the ability of the GARCH model to
predict stock market volatility. The XGBoost model is utilized, as it enhances prediction accuracy through gradient
boosting techniques and the formation of an ensemble of decision trees, which is recognized for its capability to
capture complex, non-linear relationships in stock price data.
The AdaBoost model, on the other hand, which combines multiple weak learners into a strong ensemble and
focuses on hard-to-predict instances, is also incorporated. To undertake a robust analysis, a comprehensive dataset
containing historical stock market prices across various sectors and time periods is utilized. Each model is trained and
tested on this dataset using performance metrics such as Mean Absolute Error (MAE), Root Mean Squared Error
(RMSE) and R-squared values.
Accuracy, precision as well as overall effectiveness are some of the insights that can be obtained from these
measures. Moreover, this study looks at how sturdy these models are when it comes to market conditions like high
volatility times or sudden shifts in markets. Comparing the findings will offer important guidance about what machine
learning methods work best in forecasting stock markets thereby enabling researchers and practitioners to select
appropriate models for financial prediction purposes.
A comparative analysis of machine learning models can be conducted to evaluate and compare the performance
of various approaches, including linear regression, decision trees, Random Forests, Support Vector Machines (SVMs),
neural networks, and reinforcement learning, in predicting stock market trends and prices. The strengths and
limitations of each ML model should be analyzed by examining their accuracy, computational efficiency, and
adaptability to different market conditions, with the aim of identifying the most effective techniques for various
prediction scenarios.

4. Design and Methodology

4.1. Model Outline and Framework

This section outlines the suggested method for predicting the stock market price for the following day. An overview
of the proposed work is illustrated in Figure 1. This analysis uses two types of datasets: the source dataset and the
target dataset. The source dataset, which is substantial and consists of an exchange-traded fund (ETF), is used to pre-
train the model. The target dataset is then employed for new tasks. During the training process, the source data is pre-
processed and fed into the pre-trained model.

4.2. Model Selection

Each model—LSTM, ANN, Random Forest, GARCH, XGBoost, SVM, and AdaBoost—was selected for its
distinct strengths in stock prediction, allowing a comprehensive analysis of TCS stock data from multiple angles.
LSTM is well-suited for predicting stock prices based on historical patterns because it can learn sequential
dependencies in time series. On the other side, GARCH focuses on volatility modeling, helping to understand
fluctuations in TCS stock prices over time. Financial datasets are often complex and high-dimensional, meaning they
often exhibit non-linear relationships between variables. ANN and XGBoost are effective at capturing these non-linear
relationships. Random Forest and AdaBoost increase interpretability by providing insights into which features impact
stock movement the most, and they are very robust for feature selection, while SVM works well for binary trend
classification, such as predicting directional changes in stock prices. This multi-strategy analysis combines different
models to leverage the strengths of each, resulting in a holistic predictive analysis of TCS by capturing price trends
(LSTM), analyzing volatility (GARCH), and extracting underlying market behaviors (contextual signals).
898 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

Fig. 1. Analysis Architecture - Proposed approach for comparative analysis of the suggested models. Methods are differentiated between
traditional, advanced and ensemble models.

5. Data Source and Extraction

5.1. Data collection and scope

For this research, Tata Consultancy Services (TCS) stock data serves as the primary dataset. The collection
process focuses on acquiring high-quality financial data to accurately capture and analyze TCS’s stock performance
over an extended period. To achieve this, nselib library is utilised, which is recognized for its structured and reliable
data offerings, marking a methodological improvement over previous studies that employed nsepy.
Data Source: The data was sourced from the nselib library, specifically from the capital_market module, targeting
the price_volume and deliverable_position_data table. These tables provide a rich set of financial indicators for TCS
stock like open price, last price, low price, high price, volume traded and traded quantity.
Temporal Scope: The dataset encompasses a five-year period, spanning from January 1, 2019, to June 30, 2024.
This temporal range provides a comprehensive view of TCS’s market behaviour over a significant duration, capturing
both market expansions and contractions.By covering a full economic cycle, the dataset provides a holistic view of
the stock’s long-term behavior, making it ideal for understanding recurring patterns and anomalies.
Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904 899

5.2. Extracted Variables

The dataset includes key variables that are essential for a thorough analysis of stock behavior. These variables are
as follows:
Table 1. Extracted Variables from Dataset
Variable Description
Date The date of the trading session
Previous close price The closing price of the stock on the previous trading day
Open Price The price at which the stock first traded upon the opening of the
market on a given day
High Price The highest price at which the stock traded during the day
Low Price The lowest price at which the stock traded during the day
Last Price The final price at which the stock was traded during the day
Average Price The average of the high and low prices for the day
Total traded quantity The total volume of shares traded during the day
Total number of trades The total number of transactions that occurred during the trading day

Fig. 2. Sample of extracted dataset

5.3. Comparative Analysis with nsepy

The nsepy library, while previously used in analogous research, was noted for its less structured data format and
encountered issues during import, such as inconsistencies and incomplete records. In contrast, nselib provides a more
organised and reliable dataset, thus enhancing the integrity and robustness of our analysis. This dataset provides high
data fidelity, ensuring minimal discrepancies and a solid foundation for predictive analysis.

5.4. Data Cleaning and Pre-Processing

The dataset used includes structured tables with the following fields: Date, Open, High, Low, Close, Volume, and
OpenInt. Initially, the fields were converted from object to float, and dates were converted from string to datetime.
Subsequently, missing values were removed, and empty or incomplete rows were eliminated. Further cleaning was
performed on the dataset before it was used for prediction purposes. Maintaining model consistency and resistance to
market swings is necessary to achieve stock forecast dependability. To manage missing values and outliers and
minimise any noise in the model, thorough data pretreatment is ensured. Ensuring the model performs well over time
periods by using cross-validation, especially time-series split, it generalises far beyond training data. Techniques for
regularisation, such as dropout for LSTM or L2 for SVM, enhances stability and avoids overfitting.
900 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

Fig. 3. Closed Price Trend of the cleaned Dataset

5.5. Algorithms used

GARCH (Generalised Autoregressive Conditional Heteroskedasticity): Stock market prediction necessitates


modeling and forecasting volatility in financial markets, which GARCH models are designed for. Volatility is a
measure of the degree of variation in stock prices over time, accurate volatility forecasts have the potential to greatly
enhance trading strategies and risk management. The ARCH (Auto-Regressive Conditional Heteroskedasticity) model
introduced by Engle (1982) on which the GARCH model builds extends it to cater for more general forms of volatility
clustering phenomena. In a GARCH model, the conditional variance of a time series is modeled as a function of past
squared returns and past conditional variances. Specifically, current volatility is supposed to be influenced by p lags
of past squared returns and q lags of past conditional variances in case of GARCH(p,q) model. Estimation of these
parameters by the GARCH model captures changes in the level habitually referred to as high and low market activity.
Often used together with other models for predicting stock market fluctuations, this helps traders make more informed
decisions or adjust risks on their portfolios accordingly.
LSTM (Long Short-Term Memory): LSTMs are a special kind of RNN that is good at capturing dependencies over
time and complex patterns in ordered data like stock price. Traditional RNNs suffer from long-term dependency
problem, such as vanishing/ exploding gradients which LSTM addresses using its unique architecture. In LSTMs,
memory cells retain the information for long periods of time while gating mechanisms regulate the flow of information
into, out of and within the cell [10].The amount of past information to keep as well as update with new information is
controlled by these Gates: input, forget and output gates. These processes allow LSTMs to identify complex patterns
through prices history, trading volumes and other features like technical indicators after being applied on stock market
prediction. They use this to produce predictions based on sequences of former stock prices and trends of coming days.
XGBoost (Extreme Gradient Boosting): XGBoost combines the predictions given by many decision trees to give
more accurate predictions. The algorithm of this gradient boosting framework iteratively builds models; each
successive model corrects the errors of the previous one. This approach is improved in XGBoost through the
introduction of regularisation techniques, which reduce overfitting and improve generalization. In the case of stock
market prediction, the XGBoost model will complex the relationship between stock prices with a set of features,
including technical indicators, macroeconomic variables, and prices in the past. It iteratively fits weak learners,
decision trees, onto the residuals of the combined prediction from previous trees in striving to optimize a loss function.
It is this iterative process of making corrections that allows XGBoost to pick up nonlinear trends and feature
interactions; hence, it becomes very effective in stock price prediction and in the identification of key predictors.
AdaBoost (Adaptive Boosting): Another ensemble-based learning technique is AdaBoost, which also tends to be
blind to weak learners and improves them when combined into a strong predictive model. The key idea in this
algorithm is adaptive adjustment of weights for miss-classified samples, so that subsequent models are more oriented
toward these difficult cases. AdaBoost uses simple base learners, such as decision stumps or one-level decision trees,
and combines them into the final model. In the case of base learners, training occurs in a sequential manner. The
model focuses on those samples misclassified by the previous learners. AdaBoost can be used in stock market
Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904 901

prediction to enhance the accuracy of the forecast by combining the output of a few base learners. Boosting, being
adaptive, is very efficient for noisy or complex patterns in datasets due to its learning from errors and iterative
refinement of predictions.
Random Forest: Random Forest is an ensemble method that has been remediated with the construction of many
decision trees and their aggregated predictions to enhance accuracy and robustness. Thus, the trees forming a forest
are trained on a random subset of the data using a random subset of features for splitting at each node. All this
randomness reduces overfitting and ensures generalization of the model to unseen data. Random Forest in stock market
prediction can handle complex feature interactions and provide robust forecasts from an ensemble of individual tree
predictions. It can handle high-dimensional data without interference from overfitting, so it is rather suitable for stock
price prediction, where various factors play their roles in moving the markets.
Support Vector Machines (SVM): In the case of stock market prediction, Support Vector Regression predicts stock
prices by mapping input features into a higher dimension space in which a linear regression can be applied. Basically,
SVMs try to maximize the margin between different classes or the error bounds in regression to get better
generalization. In stock prices, this would translate into searching for a hyperplane that best fits the historical data
while allowing for flexibility, controlled by a parameter epsilon, to accommodate deviations. One can use various
kernel functions, like linear, polynomial, or radial basis functions, in order to capture different relationship types
among features within SVMs. Therefore, how to choose a proper kernel function and to set the hyper parameters has
become very important in having good performance, and it often fine-tunes the techniques of grid search or cross
validation for good performance in SVM.

6. Results and Discussions

In the comparison of models for stock market prediction, a substantial dataset is utilised (2019 -2024). Specifically,
70% of the data is allocated for training, and the remaining 30% is reserved for validation. Accuracy is a key metric
for evaluating stock market forecasts. To measure deviations between actual and predicted values, the models use
metrics such as Root Mean Squared Error (RMSE), R Squared, Mean Absolute Error (MAE) and Accuracy. Accuracy
here is calculated as:

Accuracy = (TP+TN)/(TP+TN+TN+FP) (1)

R-squared (R2): Better performance is shown by values nearer 1, which indicates how effectively the model
explains the volatility in stock prices. The model's fit is assessed using R-squared, which is the percentage of the
dependent variable's variation that the model can account for.
Root Mean Squared Error (RMSE): By calculating the standard deviation of residuals, RMSE gives information
about how big prediction errors are. The predictions are more in line with the actual stock values when the RMSE is
lower. In stock forecasting, RMSE is helpful for penalising greater errors, which is frequently crucial.
Mean Absolute Error: The average absolute difference between actual and predicted values is determined by MAE.
It is useful for understanding general accuracy without imposing a significant penalty for big deviations because it is
less sensitive to outliers than RMSE.
Accuracy: Accuracy for classification-based predictions is defined as the proportion of accurate predictions.
Accuracy in stock prediction, however, is more helpful in models that forecast directional patterns (such as up or
down) than precise values.
902 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

a b

Fig. 4. (a) XG Boost (Actual vs Predicted); (b) AdaBoost (Actual vs Predicted)

a b

Fig. 5. (a) LSTM (Actual vs Predicted); (b) SVM (Actual vs Predicted)

a b

Fig. 6. (a) Random Forest(Test Predicted); (b) GARCH (Volatility Prediction)

Fig 4. depicts the chart of comparison between XG Boost and AdaBoost where it can be clearly seen that the
maximas and minimas over the period of time is largely saturated in the AdaBoost model as it is unable to change
when going from a high price to a lower one. Although XG boost is able to overcome this problem, if the change
predicted is high then the accuracy of the model starts to wear down. As observed in Fig 5. The LSTM model performs
significantly better than SVM due to its ability to remember and forget the data in a more efficient manner, as well as
learn the entire history of the data before predicting the next price. Fig 6. depicts Random Forest and Garch predictions
for test and volatility which compared to the actual prices showed GARCH performing better than the previous as
Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904 903

GARCH enables time series data to have conditional heteroscedasticity captured. The summary of all the scores for
these models are depicted in Table II for various measures such as RMSE, R squared, etc. These measures of success
can be used to determine which model is suited for the current time series data. Fig. 8 provides the accuracy of the
models being used.

Table 2. Model Comparison Summary


Model R Squared RMSE MAE
XG Boost 0.89211 101.7 74.5
AdaBoost 0.79154 63.23 101.95
GARCH 0.88135 23.28 23.87
LSTM 0.89274 10.15 17.54
Random Forest 0.71893 203.12 308.12
SVM 0.75134 150.41 116.13

Fig. 7. Model Comparison on the basis of Accuracy

7. Conclusion

Based on the scores accumulated for all models, it can be concluded that both LSTM (Long Short-Term Memory)
and GARCH (Generalized Autoregressive Conditional Heteroskedasticity) are highly suitable for time series
forecasting on volatile datasets. Since the data is huge, it can be said that LSTM will have the capability to capture
long-term dependencies and patterns in time series data. It is well-suited for sequential data and capable of learning
complex patterns over time. Like GARCH, it has been particularly developed for time series data with volatility
changes and thus is good at modeling the clustering in volatility of financial data. As observed in Fig.6, GARCH
prediction is done on volatility in order to identify deviation to actual value in a more precise manner. In terms of
accuracy, LSTM showed 20.83% more accuracy as compared to traditional models such as Random Forest. Regression
models such as SVM showed 12.24% less accuracy as compared to GARCH and 14.36% as compared to LSTM.
Hereby concluding that based on the comparison between the proposed model, LSTM shows the highest accuracy
followed by GARCH which is less accurate by 2.42%. The reviewed models retained the trend of data more explicitly
during the prediction horizon as compared to other methods.

8. Limitations and future work

The proposed design has practical limitations such as overfitting, non-temporal focus and performance that suggest
areas for future improvement. Integrating additional technical indicators, such as the Relative Strength Index and
904 Samraj Gupta et al. / Procedia Computer Science 252 (2025) 893–904

Moving Average Convergence Divergence, could capture broader market trends, helping to reduce overfitting. Testing
different structures for the deep learning model, such as moving around any dropout layers you might have, or
changing the number of layers and neurons in your Long Short-Term Memory models can help performance.
Hyperparameters like number of estimators need to be tuned using a grid or randomized search for ensemble
algorithms like Random Forest and Extreme Gradient Boosting, which lead to further accuracy improvement. Additive
complexity is introduced through ensemble techniques such as stacking several models to capture more variability in
the data. Temporal cross-validation is another useful tool for confirming robustness over different time periods,
however is expensive in computation. Moreover, prediction accuracy can be further enhanced by considering
covariates such as macroeconomic indicators which will, however, increase the model complexity. These are some
possibilities for future work that could all contribute to improving model adaptivity and accuracy.

References

[1] Dinesh, S., Rahul, S., Sandeep, O., Relangi, K. and Rama Raju, A.M.S., 2021. Stock price prediction using LSTM. Mukt Shabd Journal, 10(6),
pp.436-442.
[2] Khaidem, L., Saha, S. and Dey, S.R., 2016. Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003.
[3] Nybo, C., 2021. Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks. arXiv preprint
arXiv:2110.09489.
[4] Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international
conference on knowledge discovery and data mining (pp. 785-794).
[5] Zhang, Y., 2022, December. Stock Price Prediction Method Based on XGboost Algorithm. In 2022 International Conference on Bigdata
Blockchain and Economy Management (ICBBEM 2022) (pp. 595-603). Atlantis Press.
[6] Chen, Y., Li, X. and Sun, W., 2020, October. Research on Stock Selection Strategy Based on AdaBoost Algorithm. In Proceedings of the 4th
International Conference on Computer Science and Application Engineering (pp. 1-5).
[7] Tripathy, N., 2019. Stock price prediction using support vector machine approach. In International Academic Conference on Management &
Economics (pp. 44-59).
[8] Kumar, D., Sarangi, P.K. and Verma, R., 2022. A systematic review of stock market prediction using machine learning and statistical techniques.
Materials Today: Proceedings, 49, pp.3187-3191.
[9] Ampomah, E.K., Qin, Z., Nyame, G. and Botchey, F.E., 2021. Stock market decision support modeling with tree-based AdaBoost ensemble
machine learning models. Informatica, 44(4).
[10] Moghar, A. and Hamiche, M., 2020. Stock market prediction using LSTM recurrent neural network. Procedia computer science, 170, pp.1168-
1173.
[11] Franses, P.H. and Van Dijk, D., 1996. Forecasting stock market volatility using (non‐linear) Garch models. Journal of forecasting, 15(3),
pp.229-235.
[12] Lin, Z., 2018. Modelling and forecasting the stock market volatility of SSE Composite Index using GARCH models. Future Generation
Computer Systems, 79, pp.960-972.

You might also like