0% found this document useful (0 votes)
16 views31 pages

Volatility Forecasting Report

The document presents a project on forecasting realized volatility in the derivatives market using Long Short-Term Memory (LSTM) models, focusing on assets like BTC-USD. It covers the importance of volatility forecasting for risk management and trading strategies, the theoretical background of LSTM networks, and the methodology for data collection and preprocessing. The project aims to enhance financial decision-making by accurately predicting market volatility through advanced machine learning techniques.

Uploaded by

Hoàng Tùng Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views31 pages

Volatility Forecasting Report

The document presents a project on forecasting realized volatility in the derivatives market using Long Short-Term Memory (LSTM) models, focusing on assets like BTC-USD. It covers the importance of volatility forecasting for risk management and trading strategies, the theoretical background of LSTM networks, and the methodology for data collection and preprocessing. The project aims to enhance financial decision-making by accurately predicting market volatility through advanced machine learning techniques.

Uploaded by

Hoàng Tùng Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

SCHOOL OF INFORMATION AND COMMUNICATION


TECHNOLOGY

INTRODUCTION TO AI

Topic: REALIZED VOLATILITY


FORECASTING IN THE DERIVATIVES
MARKET WITH LSTM MODEL

GROUP 7
Instructor: Assoc. Prof. Le Thanh Huong

Program: Data Science and Artificial Intelligence

Members: Nguyen Quang Huy 20235505

Do Dang Vu 20235578

Nguyen The Quan 20235548

Nguyen Xuan Khai 20235508

Le Hoang Tung 20235572


TABLE OF CONTENTS

CHAPTER 1. OVERVIEW ..................................................................... 1

1.1 Motivation ........................................................................................... 1

1.2 Derivatives and Volatility Forecasting..................................................... 1

1.2.1 Introduction to Long and Short Positions ..................................... 2

1.2.2 What are Realized Volatility (RV) and Implied Volatility (IV)? ...... 2

1.2.3 How Short/Long Positions Are Affected by Volatility in the Options


and Futures Markets ........................................................................... 2

1.3 Time Series Analysis ............................................................................ 4

1.3.1 Definition .................................................................................. 4

1.3.2 Examples .................................................................................. 4

1.4 Background theory (RNN, LSTM, bi-LSTM).......................................... 5

1.4.1 Recurrent Neural Network .......................................................... 5

1.4.2 Long short-term Memory............................................................ 7

1.4.3 Bidirectional Long short-term Memory ........................................ 10

CHAPTER 2. DATA COLLECTION AND PREPROCESSING .............. 12

2.1 Data Collection .................................................................................... 12

2.2 Data preprocessing ............................................................................... 13

2.2.1 Feature engineering.................................................................... 13

2.2.2 Create sequential data for LSTM model ....................................... 14

2.2.3 Train - Validation - Test Splits ..................................................... 15

CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS


17

3.1 Univariate 1-Layered LSTM Model........................................................ 18

3.2 Univariate 2-Layered Bidirectional LSTM Model.................................... 19

3.3 Multivariate 2-Layered Bidirectional LSTM Model ................................. 21


CHAPTER 4. RESULTS AND EVALUATION ........................................ 24

4.1 Baseline Models ................................................................................... 24

4.1.1 Mean Baseline ........................................................................... 24

4.1.2 Random Walk Naive Forecasting................................................. 24

4.2 Comparision......................................................................................... 25

4.3 Evaluation on test data .......................................................................... 26

CHAPTER 5. CONCLUSIONS ............................................................... 27

5.1 Summary ............................................................................................. 27

5.2 Suggestion for Future Works ................................................................. 27

Bibliography ............................................................................................ 28
CHAPTER 1. OVERVIEW

1.1 Motivation
The financial market, with assets exhibiting high volatility such as oil, gold, and
emerging decentralized financial instruments like Bitcoin, presents both challenges
and opportunities. The volatility of these assets plays a crucial role, especially in
derivative markets, where it directly influences the pricing of options, futures, and
other contracts. Therefore, accurate volatility forecasting is essential for informed
decision-making.

Volatility forecasting is integral to the derivatives market, aiding in risk management,


optimizing portfolios, and refining trading strategies. Traditional methods, such
as GARCH models or implied volatility, often fail to capture the complex and
non-linear relationships present in financial data. This gap in predictive capability
highlights the need for more advanced approaches.

With the rise of deep learning, Long Short-Term Memory (LSTM) networks
have proven effective in time series analysis. By leveraging LSTM models, we
can explore the interactions among various market factors, aiming to create more
precise and reliable volatility forecasts. The goal of this project is to use LSTM to
predict the next seven days’ average realized volatility of BTC-USD. By analyzing
market behavior patterns, we intend to provide insights that can enhance decision-
making in the derivatives market. Through this effort, we seek not only to improve
financial forecasting but also to showcase the practical impact of machine learning
in quantitative finance.
1.2 Derivatives and Volatility Forecasting
Derivatives are advanced financial instruments whose value is based on an underlying
asset, such as stocks, commodities, currencies, or indices. These instruments—options,
futures, and swaps—are vital tools in modern financial markets, serving multiple
purposes: hedging against potential losses, leveraging speculative opportunities,
and managing complex portfolios. However, the profitability of trading derivatives
is influenced not only by market trends but also by the accurate assessment and
forecasting of market volatility, which is crucial for pricing and strategy development.

In this project, we aim to forecast the volatility of BTC-USD and AAPL using

1
CHAPTER 1. OVERVIEW

LSTM. To do so, it is essential to understand the foundational concepts that help


in making accurate volatility projections for outperforming the market:
1.2.1 Introduction to Long and Short Positions
Long Position: We buy an asset with the expectation that its price will increase
in the future.
• Definition: Buying an asset to profit from a price increase.
• Example: Buy at $50, sell at $70 in the future; profit = $20.
• Risk: Limited to the amount invested (if the asset’s price drops to $0).
Short Position: We sell an asset with the expectation that its price will decrease
in the future.
• Definition: Selling a borrowed asset to profit from a price decrease.
• Example: Sell at $100, buy back at $70 in the future; profit = $30.
• Risk: Potential losses are unlimited (if the asset’s price rises).
1.2.2 What are Realized Volatility (RV) and Implied Volatility (IV)?
Volatility measures the magnitude of price movements that a financial instrument
experiences over a given period. More dramatic price swings indicate higher
volatility, and smaller fluctuations suggest lower volatility.
Realized Volatility (RV): The actual historical volatility of an asset over a
specific period based on observed price movements.
• Definition: The observed price changes that have already occurred.
Implied Volatility (IV): The market’s expectation of future volatility, often
derived from options pricing models like Black-Scholes, though it may not always
be accurate.
• Definition: The projected future volatility, inferred from market data such as
option prices.
• Impact: Implied volatility heavily influences option premiums.
1.2.3 How Short/Long Positions Are Affected by Volatility in the Options
and Futures Markets
As discussed, IV reflects the market’s view of the future and determines the
option premium, significantly influencing option prices. On the other hand, RV
reflects what will actually happen, and we aim to predict it accurately.
• High IV: Options prices are higher for derivative contracts, reflecting higher

2
CHAPTER 1. OVERVIEW

anticipated volatility.
• Low IV: Options prices are lower, as lower volatility suggests less risk.
Our goal is to predict volatility more accurately than the market’s IV, thereby
outperforming the market. There are several potential scenarios where this approach
could be profitable:
Case 1: RV ¿ IV (Market Underpricing Volatility)
When the market underestimates future volatility, options are priced too cheaply
relative to actual price movements.
• Scenario: Our model forecasts RV at 25%, while IV is only 15%.
• Strategy: Enter a Long position, expecting the asset’s price to rise, and exit
when maximum profit is achieved. Avoid Short positions, as the option price
will increase in the future, leading to losses.
Case 2: RV ¡ IV (Market Overpricing Volatility)
When the market overestimates future volatility, options are overpriced and will
decrease in value over time.
• Scenario: IV is 25%, but our model predicts RV at only 15%.
• Strategy: Enter a Short position to capitalize on the overpricing of options.
Time the buyback carefully when IV falls closer to RV.
According to Sinclair (2020), several trading strategies revolve around identifying
volatility mismatches:

P/L = Vega × |σimplied − σrealized |

Where:
• Vega: The sensitivity of an option’s price to changes in the volatility of the
underlying asset.
• Sigma(implied) and Sigma(realized): Implied and realized volatilities (IV
and RV), respectively.
If traders can predict future volatility (RV) more accurately than the market (IV),
they can develop a strategy that yields significant profits.

3
CHAPTER 1. OVERVIEW

1.3 Time Series Analysis


1.3.1 Definition

Time series analysis refers to the method of analyzing a sequence of data points
collected over a period of time. The data points are recorded at consistent intervals,
ensuring that the data collection is systematic rather than random. However, time
series analysis is more than just collecting data; it focuses on understanding how
variables evolve over time.
What distinguishes time series data from other types of data is the critical role of
time. Time serves as a key variable that reflects how data changes over time and
reveals underlying trends. This sequence of data points provides valuable insights,
showing dependencies and patterns that emerge in relation to time.
Typically, time series analysis requires a large set of data points to guarantee
reliability and minimize the effects of noise. A robust data set helps ensure that
trends or patterns identified are not merely outliers, and it accounts for seasonal
fluctuations. Additionally, time series analysis is often employed in forecasting,
where historical data is used to predict future outcomes.

1.3.2 Examples
Time series analysis is most useful for non-stationary data—data that exhibits
fluctuations or is influenced by time. Various industries, such as finance, retail, and

4
CHAPTER 1. OVERVIEW

economics, rely heavily on time series analysis due to the dynamic nature of data
such as currency fluctuations and sales data. Below are some key examples of time
series analysis in practice:
• Weather data (e.g., rainfall, temperature)
• Health monitoring (e.g., heart rate monitoring via EKG, brain activity via
EEG)
• Economic indicators (e.g., quarterly sales, stock prices, interest rates)
• Automated stock trading algorithms
• Industry forecasts and trend analysis
• Predicting long-term climate change
Each of these examples involves a sequence of data points collected over time,
allowing for detailed analysis and accurate predictions. Time series analysis aids
in understanding past behaviors and forecasting future events, making it essential
in many modern fields.
1.4 Background theory (RNN, LSTM, bi-LSTM)
1.4.1 Recurrent Neural Network

Recurrent Neural Networks (RNNs) are a class of neural networks designed to


model sequential data, where the current output depends not only on the current
input but also on previous inputs. This is achieved by introducing a hidden state
that is updated at each time step based on the previous hidden state and the current
input.
An RNN processes sequential data in a recursive manner. At each time step t,
the hidden state ht is computed as:

5
CHAPTER 1. OVERVIEW

ht = f (W xt + U ht−1 + b)

Where:
• ht ∈ Rn is the hidden state at time t,
• xt ∈ Rm is the input at time t,
• W ∈ Rn×m is the weight matrix for the input to hidden state connection,
• U ∈ Rn×n is the weight matrix for the hidden state to hidden state connection,
• ht−1 ∈ Rn is the previous hidden state,
• b ∈ Rn is the bias vector, and
• f is a non-linear activation function (typically tanh or ReLU).
The output yt at time step t is given by:

yt = g(V ht + c)

Where:
• yt ∈ Rp is the output at time t,
• V ∈ Rp×n is the weight matrix for the hidden state to output connection,
• c ∈ Rp is the bias vector for the output layer,
• g is the activation function applied to the output (often softmax for classification
tasks).
The network updates its weights based on the error calculated at each output
using backpropagation through time (BPTT), but RNNs are prone to the vanishing
gradient problem, where gradients become very small as they are propagated backward
through time, limiting the ability to learn long-term dependencies.

6
CHAPTER 1. OVERVIEW

1.4.2 Long short-term Memory

Long Short-Term Memory (LSTM) networks are a type of RNN designed to overcome
the vanishing gradient problem. LSTMs are specifically effective in learning long-
range dependencies in sequential data. The key innovation in LSTMs is the use
of a cell state, along with multiple gates (forget, input, and output gates), which
control the flow of information into, out of, and within the cell state.
An LSTM unit consists of the following components:
• Cell State (Ct ): The cell state carries information throughout the sequence.
It is updated at each time step using the forget and input gates, allowing the
network to retain long-term memory.
• Forget Gate (ft ): The forget gate determines which portion of the previous
cell state should be carried forward to the next time step. It is computed as:

ft = σ(Wf xt + Uf ht−1 + bf )

where σ is the sigmoid activation function that outputs values between 0 and
1.

7
CHAPTER 1. OVERVIEW

• Input Gate (it ): The input gate controls how much new information should
be added to the cell state. It is computed as:

it = σ(Wi xt + Ui ht−1 + bi )

• Candidate Cell State (C̃t ): The candidate cell state is a new proposed value
that could be added to the cell state. It is computed as:

C̃t = tanh(WC xt + UC ht−1 + bC )

• Output Gate (ot ): The output gate determines what the next hidden state will

8
CHAPTER 1. OVERVIEW

be based on the current cell state. It is computed as:

ot = σ(Wo xt + Uo ht−1 + bo )

The cell state Ct is updated using the following formula:

Ct = ft · Ct−1 + it · C̃t

Where:
• ft is the forget gate,
• Ct−1 is the previous cell state,
• it is the input gate,
• C̃t is the candidate cell state.
The hidden state ht is updated as:

ht = ot · tanh(Ct )

The use of these gates allows LSTMs to selectively retain or discard information,
overcoming the vanishing gradient problem and enabling the network to learn
long-term dependencies in sequential data. LSTMs are widely used in tasks such
as language modeling, machine translation, speech recognition, and time series
forecasting.

9
CHAPTER 1. OVERVIEW

1.4.3 Bidirectional Long short-term Memory

Bidirectional Long Short-Term Memory (BiLSTM) networks extend LSTMs by


processing the input sequence in both forward and backward directions. This
bidirectional processing allows the model to capture contextual information from
both the past and the future, which is useful in tasks like sequence labeling, sentiment
analysis, and machine translation, where context from both directions is critical.
A BiLSTM consists of two LSTM layers:
• Forward LSTM: Processes the input sequence from left to right (i.e., from
the first to the last element).
• Backward LSTM: Processes the input sequence from right to left (i.e., from
the last to the first element).
Let the forward hidden state at time step t be hforward
t , and the backward hidden
backward
state at time step t be ht . The final output of the BiLSTM at each time step
is the concatenation of these two hidden states:

ht = [hforward
t , hbackward
t ]

Formally, the forward LSTM at time step t computes:

hforward
t = LSTMforward (xt , hforward
t−1 )

The backward LSTM at time step t computes:

hbackward
t = LSTMbackward (xt , hbackward
t+1 )

10
CHAPTER 1. OVERVIEW

The combined BiLSTM output is then:

ht = [hforward
t , hbackward
t ]

This bidirectional approach allows the BiLSTM to take into account both past
and future context when making predictions, which is particularly useful in tasks
such as machine translation, where both the preceding and following words are
crucial for understanding the meaning of the current word.

Advantages of BiLSTM:
• It captures contextual dependencies from both the past and the future, making
it more effective for tasks where full sequence context is important.
• BiLSTM networks have been shown to outperform unidirectional LSTM networks
in sequence labeling tasks (e.g., named entity recognition, part-of-speech tagging).

11
CHAPTER 2. DATA COLLECTION AND PREPROCESSING

2.1 Data Collection


In our project, we will use BTC-USD prices for predicting Realized Volatility.
The reason of choosing Bitcoin is due to the decentralized nature and lack of strict
regulation like its conventional counterparts, Bitcoin has been known to be more
volatile than regulated stocks and commodities. This makes an open door for
potential derivative contracts such as options and futures, allowing investors to
explore new areas to capitalize on.
The historical dataset of Bitcoin Open/Close/High/Low prices were obtained
using the Yahoo Finance API (yfinance)

1 import yfinance as yf
2

3 tckr = ’BTC-USD’
4 btc = yf.Ticker(tckr)
5 df = btc.history(period=’max’)

12
CHAPTER 2. DATA COLLECTION AND PREPROCESSING

2.2 Data preprocessing


2.2.1 Feature engineering
• Remove unused features: Dividens, Stock Split. These two columns do not
store any information, all rows are 0.0

1 df = df.drop([’Dividends’, ’Stock Splits’], axis=1)

• Create new features: Current Volatility, Future Volatility


– Calculate log return using formula:

Pt+i
rt,t+i = log( )
t

1 df[’log_returns’] = np.log(df.Close / df.Close.shift(1))

– Calculate daily volatility using formula


sX r
2 1
σdaily = rt−1,t ∗
interval − 1
t

where: rt−1,t is the log return at time t

1 def realized_volatility_daily(series_log_return):
2 n = len(series_log_return)
3 return np.sqrt(np.sum(series_log_return ** 2) / (n-1))

– In our project, we will use an interval window of 30 days (equivalent to


roughly 1 month of trading for cryptocurrencies). The goal here is to
forecast the average realized volatility of the next 7 days

1 INTERVAL_WINDOW = 30
2 n_future = 7
3

4 df[’vol_current’] = df.log_returns.rolling(window=
INTERVAL_WINDOW).apply(realized_volatility_daily)
5

6 df[’vol_future’] = df.log_returns.shift(-n_future).rolling
(window=INTERVAL_WINDOW).apply(
realized_volatility_daily)

13
CHAPTER 2. DATA COLLECTION AND PREPROCESSING

• For multivariate model, we need some new features: The Open/High/Low/Close


prices are usually very similar and highly correlated to each other. Therefore,
instead of keeping all of them in the dataset, we would add 2 more features:
– High-Low Spread - which is the difference between the Highest and
Lowest prices intraday as a percentage of the Closing price
– Open-Close Spread - which is the difference between the Close and
Open prices intraday as a percentage of the Closing price

1 df[’HL_sprd’] = np.log((df.High - df.Low) / df.Close)


2 df[’CO_sprd’] = (df.Close - df.Open) / df.Open

2.2.2 Create sequential data for LSTM model


LSTM is a sequential model, so we cannot take advance of this model if we just
use the single current volatility to forecast future volatility. We will use a sliding
lookback window to extract uniform input arrays and target outputs. This means
that we use (lookback window - 1) data points from the past and the current data
point to predict the future volatility.

14
CHAPTER 2. DATA COLLECTION AND PREPROCESSING

1 def windowed_dataset(x_series, y_series, lookback_window):


2 dataX, dataY = [], []
3 for i in range((lookback_window-1), len(x_series)):
4 start_idx = x_series.index[i-lookback_window+1]
5 end_idx = x_series.index[i]
6 x = x_series[start_idx:end_idx].values
7 y = y_series[end_idx]
8 dataX.append(x)
9 dataY.append(y)
10

11 return np.array(dataX), np.array(dataY)

2.2.3 Train - Validation - Test Splits


There are a total of 3692 usable data points in this dataset which covers a period
of almost 10 years from October 2014 until today (start of December 2024). Since
cryptocurrencies are not traded on a regulated exchange, the Bitcoin market is open
24/7, 1 year covers a whole 365 trading days instead of 252 days a year like with
other stocks and commodities. We would split the dataset into 3 parts as follows:
• The most recent 90 usable data points would be used for Final Model Testing
- approx. 2.4%
• 2 full year (365 days) for Validation and Model Tuning during training -
approx. 19.8%
• and the remaining for Training - approx. 77.8%
1 test_size = 30 * 3
2 val_size = 365 * 2
3

4 split_point_1 = len(df) - (val_size + test_size)


5 split_point_2 = len(df) - test_size
6

15
CHAPTER 2. DATA COLLECTION AND PREPROCESSING

7 train_idx = df.index[:split_point_1]
8 val_idx = df.index[split_point_1:split_point_2]
9 test_idx = df.index[split_point_2:]

16
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

For construct and training our Deep Learning models, we will use Tensorflow
and Keras libraries
We will use 2 main metrics for evalutating performance of model: RMSPE
(Root Mean Squared Percentage Error) and RMSE (Root Mean Square Errors)
with RMSPE prioritized.

v
u n
u 1 X ytrue − ypred
RM SP E = t ( )2
n ytrue
i=1

1 def RMSPE(y_true, y_pred):


2 output = np.sqrt(np.mean(np.square((y_true - y_pred) / y_true
)))
3 return output
v
u n
u1 X
RM SE = t (ytrue − ypred )2
n
i=1

1 def RMSE(y_true, y_pred):


2 output = np.sqrt(mse(y_true, y_pred))
3 return output

For univariate models, we get the input from current volatility and target from
future volatility
1 #Input
2 x_train = df.vol_current[train_idx]
3 x_val = df.vol_current[val_idx]
4 x_test = df.vol_current[test_idx]
5 #Target
6 y_train = df.vol_future[train_idx]
7 y_val = df.vol_future[val_idx]
8 y_test = df.vol_future[test_idx]

For multivariate models, we select 4 features: HL Spread, CO Spread,


Volume, vol current to create input matrix, we still get target from futrue
volatility
1 features = [’HL_spread’, ’CO_spread’, ’Volume’, ’vol_current’]
2 input_df = df[features]
3 X_train = input_df.loc[train_idx]

17
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

4 X_val = input_df.loc[val_idx]
5 X_test = input_df.loc[test_idx]

3.1 Univariate 1-Layered LSTM Model


The model architecture consists of a single LSTM layer with 20 units to capture
temporal dependencies in the sequential data, followed by a Dense layer with 1
unit to output the predicted value. The model is compiled using Mean Squared
Error (MSE) as the loss function and RMSPE (Root Mean Squared Percentage
Error) as the evaluation metric. ModelCheckpoint and EarlyStopping callbacks
are used to save the best model and prevent overfitting. The ModelCheckpoint
callback ensures that the model with the best validation performance is saved,
while EarlyStopping halts training if validation performance does not improve
after 30 epochs, restoring the best weights. The model is trained with a batch size
of 64 and a maximum of 200 epochs, with data shuffling enabled.
1 n_past = 30
2 batch_size = 64
3

4 mat_X_train, mat_y_train = windowed_dataset(x_train, y_train,


n_past)
5 mat_X_val, mat_y_val = windowed_dataset(x_val, y_val, n_past)
6

7 lstm_1 = tf.keras.models.Sequential([
8 InputLayer(input_shape=(n_past, 1)),
9 LSTM(20),
10 Dense(1)
11 ])
12

13 lstm_1.compile(loss=’mse’,
14 optimizer="adam",
15 metrics=[rmspe])
16

17 checkpoint_cb = ModelCheckpoint(’lstm_1.keras’,
18 save_best_only=True,
19 monitor=’val_rmspe’)
20

21 early_stopping_cb = EarlyStopping(patience=30,
22 restore_best_weights=True,
23 monitor=’val_rmspe’,
24 mode=’min’)
25

26 lstm_1_res = lstm_1.fit(mat_X_train, mat_y_train,


27 callbacks=[checkpoint_cb, early_stopping_cb],
28 validation_data=[mat_X_val, mat_y_val], shuffle=True,

18
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

29 verbose=0, batch_size=batch_size, epochs=200)

3.2 Univariate 2-Layered Bidirectional LSTM Model


The model architecture consists of two Bidirectional LSTM layers. The first
Bidirectional LSTM layer has 32 units and is set to return sequences, allowing it to
capture temporal dependencies across multiple time steps. The second Bidirectional
LSTM layer has 16 units and processes the output from the first layer. A Dense
layer with 1 unit is added at the end to output the predicted value. The model
is compiled using Mean Squared Error (MSE) as the loss function and RMSPE
(Root Mean Squared Percentage Error) as the evaluation metric. ModelCheckpoint
and EarlyStopping callbacks are used to save the best model and prevent overfitting.
The ModelCheckpoint callback ensures that only the model with the best validation
performance is saved, while EarlyStopping halts training if validation performance
does not improve after 30 epochs, restoring the best model weights. The training
is performed with a batch size of 64 and a maximum of 200 epochs, with data
shuffling enabled.

19
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

1 n_past = 30
2 batch_size = 64
3

4 mat_X_train, mat_y_train = windowed_dataset(x_train, y_train,


n_past)
5 mat_X_val, mat_y_val = windowed_dataset(x_val, y_val, n_past)
6

7 lstm_2 = tf.keras.models.Sequential([
8 InputLayer(input_shape=(n_past, 1)),
9

10 Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)
),
11

12 Bidirectional(tf.keras.layers.LSTM(16)),
13

14 Dense(1)
15 ])
16

17 lstm_2.compile(loss=’mse’,
18 optimizer="adam",
19 metrics=[rmspe])
20

21 checkpoint_cb = ModelCheckpoint(’lstm_2.keras’,
22 save_best_only=True,
23 monitor=’val_rmspe’)
24

25 early_stopping_cb = EarlyStopping(patience=30,
26 restore_best_weights=True,
27 monitor=’val_rmspe’,
28 mode=’min’)
29 lstm_2_res = lstm_2.fit(mat_X_train, mat_y_train,
30 callbacks=[checkpoint_cb, early_stopping_cb],
31 validation_data=[mat_X_val, mat_y_val], shuffle=True,
32 verbose=0, batch_size=batch_size, epochs=200)

20
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

3.3 Multivariate 2-Layered Bidirectional LSTM Model


This model architecture is more complex and consists of two Bidirectional
LSTM layers: the first with 32 units and return sequences=True, followed
by the second with 16 units. The return sequences=True in the first LSTM
layer allows the model to pass a sequence of outputs to the second LSTM layer,
capturing deeper temporal relationships. Dropout layers with a rate of 0.1 are
applied after each LSTM layer and the Dense output layer to regularize the model
and avoid overfitting. Additionally, BatchNormalization is applied after the input
layer to normalize the features, helping to speed up training and improve model
stability. The model is compiled with the Adam optimizer and MSE loss, and
RMSPE is used as the evaluation metric. Callbacks such as ModelCheckpoint
and EarlyStopping monitor the validation performance and stop training early if
there is no improvement for 30 epochs, ensuring efficient training and preventing
overfitting.
1 n_past = 30

21
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

2 batch_size = 64
3 n_dims = input_df.shape[1]
4

5 mat_X_train, mat_y_train = windowed_dataset(X_train, y_train,


n_past)
6 mat_X_val, mat_y_val = windowed_dataset(X_val, y_val, n_past)
7

8 lstm_3 = Sequential([
9 InputLayer(input_shape=(n_past, n_dims)),
10 BatchNormalization(),
11

12 Bidirectional(LSTM(32, return_sequences=True)),
13 Dropout(0.1),
14

15 Bidirectional(LSTM(16)),
16 Dropout(0.1),
17

18 Dense(1),
19 Dropout(0.1),
20 ])
21

22 lstm_3.compile(
23 loss=’mse’,
24 optimizer=’adam’,
25 metrics=[rmspe]
26 )
27

28 checkpoint = ModelCheckpoint(’lstm.h5.keras’, save_best_only=True


, monitor=’val_rmspe’)
29 early_stop = EarlyStopping(patience=30, monitor=’val_rmspe’, mode
=’min’)
30 lstm_3_res = lstm_3.fit(mat_X_train, mat_y_train,
31 callbacks=[checkpoint_cb, early_stopping_cb],
32 validation_data=[mat_X_val, mat_y_val], shuffle=True,
33 verbose=0, batch_size=batch_size, epochs=200)

22
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS

23
CHAPTER 4. RESULTS AND EVALUATION

4.1 Baseline Models


4.1.1 Mean Baseline
One of the essential characteristics of Volatility is it’s mean-revert over the
long term. Therfore my first baseline model his would be a very simple one that
only outputs the average current realized volatility of the whole training set as
predictions everything.

1 mean_train_vol = x_train.mean()
2 baseline_preds = np.ones(len(val_idx)) * mean_train_vol
3 baseline_preds = pd.Series(baseline_preds, index=x_val.index)

4.1.2 Random Walk Naive Forecasting


A commonly known fact about volatility is that it tends to be autocorrelated,
and clusters in the short-term. This property can be used to implement a naive
model that just ”predicts” future volatility by using whatever the daily volatility
was at the immediate previous time step.
In this case, we will use the average daily volatility of the most recent INTERVAL WINDOW
as predictions for the next 7 days, which is essentially using vol current at time
step t and prediction for vol future at time step t.

24
CHAPTER 4. RESULTS AND EVALUATION

4.2 Comparision

We use two metrics for comparing models: RMSPE, RMSE with RMSPE prioritized.
Multivariate Bidirect LSTM 2 layers (32/16 units) is the model having the lowest
root mean square percentage error (RMSPE) so we will use this model for testing
on test data.

25
CHAPTER 4. RESULTS AND EVALUATION

4.3 Evaluation on test data

RMSPE on Test Set: 0.19246198929377595

26
CHAPTER 5. CONCLUSIONS

5.1 Summary
In terms of performance on the test set, our final LSTM model has an RMSPE
of 0.19246, which is quite good for forecasting realized volatility. Traders does
not need to make perfectly accurate forecast to have a positive expectation when
participating in the markets, they just need to make a forecast that is both correct
and more correct than the general consensus. Multivariate LSTM could potentially
give investors an advantage in terms of highly forecasting accuracy.
However, since financial time series data are constantly evolving, no model
would be able to consistently forecast with high accuracy level forever. The average
lifetime of a model is between 6 months to 5 years, and there’s a phenomenon in
quant trading that is called alpha decay, which is the loss in predictive power of an
alpha model over time. In addition, according to Sinclair (2020), researchers have
found that the publication of a new ”edge” or anomaly in the markets lessens its
returns by up to 58%.
These models therefore require constant tweaking and tuning based on the most
recent information available to make sure they stay up-to-date and learn to evolve
with the markets.
5.2 Suggestion for Future Works
For current model, we could fine tune hyperparameters, using other LSTM
architectures or different kinds of RNN (e.g. GRU) to get better result.
We could implement a hybrid approach of Deep Learning (LSTM, GRU, CNN,...)
with GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) - the
most popular volatility forecasting model to utilize the power of both traditional
statistical method and state-of-the-art models.
The investigation of Realized Volatility for other assets can also be given consideration
using this model with further fine-tuned hyper-parameters.

27
Bibliography

[1] J. D. Hochreiter, Understanding LSTMs, Colah’s Blog, August 2015. [Online].


Available: https://colah.github.io/posts/2015-08-Under
standing-LSTMs/
[2] GeeksforGeeks, Understanding of LSTM Networks, [Online]. Available: ht
tps://www.geeksforgeeks.org/understanding-of-lstm-n
etworks/
[3] Tableau, What is Time Series Analysis?, [Online]. Available: https://www.
tableau.com/analytics/what-is-time-series-analysis
[4] GeeksforGeeks, Introduction to Recurrent Neural Networks (RNN), [Online].
Available: https://www.geeksforgeeks.org/introduction-t
o-recurrent-neural-network/
[5] GeeksforGeeks, Bidirectional Recurrent Neural Network (BiRNN), [Online].
Available: https://www.geeksforgeeks.org/bidirectiona
l-recurrent-neural-network/

28

You might also like