Volatility Forecasting Report
Volatility Forecasting Report
INTRODUCTION TO AI
GROUP 7
Instructor: Assoc. Prof. Le Thanh Huong
Do Dang Vu 20235578
1.2.2 What are Realized Volatility (RV) and Implied Volatility (IV)? ...... 2
4.2 Comparision......................................................................................... 25
Bibliography ............................................................................................ 28
CHAPTER 1. OVERVIEW
1.1 Motivation
The financial market, with assets exhibiting high volatility such as oil, gold, and
emerging decentralized financial instruments like Bitcoin, presents both challenges
and opportunities. The volatility of these assets plays a crucial role, especially in
derivative markets, where it directly influences the pricing of options, futures, and
other contracts. Therefore, accurate volatility forecasting is essential for informed
decision-making.
With the rise of deep learning, Long Short-Term Memory (LSTM) networks
have proven effective in time series analysis. By leveraging LSTM models, we
can explore the interactions among various market factors, aiming to create more
precise and reliable volatility forecasts. The goal of this project is to use LSTM to
predict the next seven days’ average realized volatility of BTC-USD. By analyzing
market behavior patterns, we intend to provide insights that can enhance decision-
making in the derivatives market. Through this effort, we seek not only to improve
financial forecasting but also to showcase the practical impact of machine learning
in quantitative finance.
1.2 Derivatives and Volatility Forecasting
Derivatives are advanced financial instruments whose value is based on an underlying
asset, such as stocks, commodities, currencies, or indices. These instruments—options,
futures, and swaps—are vital tools in modern financial markets, serving multiple
purposes: hedging against potential losses, leveraging speculative opportunities,
and managing complex portfolios. However, the profitability of trading derivatives
is influenced not only by market trends but also by the accurate assessment and
forecasting of market volatility, which is crucial for pricing and strategy development.
In this project, we aim to forecast the volatility of BTC-USD and AAPL using
1
CHAPTER 1. OVERVIEW
2
CHAPTER 1. OVERVIEW
anticipated volatility.
• Low IV: Options prices are lower, as lower volatility suggests less risk.
Our goal is to predict volatility more accurately than the market’s IV, thereby
outperforming the market. There are several potential scenarios where this approach
could be profitable:
Case 1: RV ¿ IV (Market Underpricing Volatility)
When the market underestimates future volatility, options are priced too cheaply
relative to actual price movements.
• Scenario: Our model forecasts RV at 25%, while IV is only 15%.
• Strategy: Enter a Long position, expecting the asset’s price to rise, and exit
when maximum profit is achieved. Avoid Short positions, as the option price
will increase in the future, leading to losses.
Case 2: RV ¡ IV (Market Overpricing Volatility)
When the market overestimates future volatility, options are overpriced and will
decrease in value over time.
• Scenario: IV is 25%, but our model predicts RV at only 15%.
• Strategy: Enter a Short position to capitalize on the overpricing of options.
Time the buyback carefully when IV falls closer to RV.
According to Sinclair (2020), several trading strategies revolve around identifying
volatility mismatches:
Where:
• Vega: The sensitivity of an option’s price to changes in the volatility of the
underlying asset.
• Sigma(implied) and Sigma(realized): Implied and realized volatilities (IV
and RV), respectively.
If traders can predict future volatility (RV) more accurately than the market (IV),
they can develop a strategy that yields significant profits.
3
CHAPTER 1. OVERVIEW
Time series analysis refers to the method of analyzing a sequence of data points
collected over a period of time. The data points are recorded at consistent intervals,
ensuring that the data collection is systematic rather than random. However, time
series analysis is more than just collecting data; it focuses on understanding how
variables evolve over time.
What distinguishes time series data from other types of data is the critical role of
time. Time serves as a key variable that reflects how data changes over time and
reveals underlying trends. This sequence of data points provides valuable insights,
showing dependencies and patterns that emerge in relation to time.
Typically, time series analysis requires a large set of data points to guarantee
reliability and minimize the effects of noise. A robust data set helps ensure that
trends or patterns identified are not merely outliers, and it accounts for seasonal
fluctuations. Additionally, time series analysis is often employed in forecasting,
where historical data is used to predict future outcomes.
1.3.2 Examples
Time series analysis is most useful for non-stationary data—data that exhibits
fluctuations or is influenced by time. Various industries, such as finance, retail, and
4
CHAPTER 1. OVERVIEW
economics, rely heavily on time series analysis due to the dynamic nature of data
such as currency fluctuations and sales data. Below are some key examples of time
series analysis in practice:
• Weather data (e.g., rainfall, temperature)
• Health monitoring (e.g., heart rate monitoring via EKG, brain activity via
EEG)
• Economic indicators (e.g., quarterly sales, stock prices, interest rates)
• Automated stock trading algorithms
• Industry forecasts and trend analysis
• Predicting long-term climate change
Each of these examples involves a sequence of data points collected over time,
allowing for detailed analysis and accurate predictions. Time series analysis aids
in understanding past behaviors and forecasting future events, making it essential
in many modern fields.
1.4 Background theory (RNN, LSTM, bi-LSTM)
1.4.1 Recurrent Neural Network
5
CHAPTER 1. OVERVIEW
ht = f (W xt + U ht−1 + b)
Where:
• ht ∈ Rn is the hidden state at time t,
• xt ∈ Rm is the input at time t,
• W ∈ Rn×m is the weight matrix for the input to hidden state connection,
• U ∈ Rn×n is the weight matrix for the hidden state to hidden state connection,
• ht−1 ∈ Rn is the previous hidden state,
• b ∈ Rn is the bias vector, and
• f is a non-linear activation function (typically tanh or ReLU).
The output yt at time step t is given by:
yt = g(V ht + c)
Where:
• yt ∈ Rp is the output at time t,
• V ∈ Rp×n is the weight matrix for the hidden state to output connection,
• c ∈ Rp is the bias vector for the output layer,
• g is the activation function applied to the output (often softmax for classification
tasks).
The network updates its weights based on the error calculated at each output
using backpropagation through time (BPTT), but RNNs are prone to the vanishing
gradient problem, where gradients become very small as they are propagated backward
through time, limiting the ability to learn long-term dependencies.
6
CHAPTER 1. OVERVIEW
Long Short-Term Memory (LSTM) networks are a type of RNN designed to overcome
the vanishing gradient problem. LSTMs are specifically effective in learning long-
range dependencies in sequential data. The key innovation in LSTMs is the use
of a cell state, along with multiple gates (forget, input, and output gates), which
control the flow of information into, out of, and within the cell state.
An LSTM unit consists of the following components:
• Cell State (Ct ): The cell state carries information throughout the sequence.
It is updated at each time step using the forget and input gates, allowing the
network to retain long-term memory.
• Forget Gate (ft ): The forget gate determines which portion of the previous
cell state should be carried forward to the next time step. It is computed as:
ft = σ(Wf xt + Uf ht−1 + bf )
where σ is the sigmoid activation function that outputs values between 0 and
1.
7
CHAPTER 1. OVERVIEW
• Input Gate (it ): The input gate controls how much new information should
be added to the cell state. It is computed as:
it = σ(Wi xt + Ui ht−1 + bi )
• Candidate Cell State (C̃t ): The candidate cell state is a new proposed value
that could be added to the cell state. It is computed as:
• Output Gate (ot ): The output gate determines what the next hidden state will
8
CHAPTER 1. OVERVIEW
ot = σ(Wo xt + Uo ht−1 + bo )
Ct = ft · Ct−1 + it · C̃t
Where:
• ft is the forget gate,
• Ct−1 is the previous cell state,
• it is the input gate,
• C̃t is the candidate cell state.
The hidden state ht is updated as:
ht = ot · tanh(Ct )
The use of these gates allows LSTMs to selectively retain or discard information,
overcoming the vanishing gradient problem and enabling the network to learn
long-term dependencies in sequential data. LSTMs are widely used in tasks such
as language modeling, machine translation, speech recognition, and time series
forecasting.
9
CHAPTER 1. OVERVIEW
ht = [hforward
t , hbackward
t ]
hforward
t = LSTMforward (xt , hforward
t−1 )
hbackward
t = LSTMbackward (xt , hbackward
t+1 )
10
CHAPTER 1. OVERVIEW
ht = [hforward
t , hbackward
t ]
This bidirectional approach allows the BiLSTM to take into account both past
and future context when making predictions, which is particularly useful in tasks
such as machine translation, where both the preceding and following words are
crucial for understanding the meaning of the current word.
Advantages of BiLSTM:
• It captures contextual dependencies from both the past and the future, making
it more effective for tasks where full sequence context is important.
• BiLSTM networks have been shown to outperform unidirectional LSTM networks
in sequence labeling tasks (e.g., named entity recognition, part-of-speech tagging).
11
CHAPTER 2. DATA COLLECTION AND PREPROCESSING
1 import yfinance as yf
2
3 tckr = ’BTC-USD’
4 btc = yf.Ticker(tckr)
5 df = btc.history(period=’max’)
12
CHAPTER 2. DATA COLLECTION AND PREPROCESSING
Pt+i
rt,t+i = log( )
t
1 def realized_volatility_daily(series_log_return):
2 n = len(series_log_return)
3 return np.sqrt(np.sum(series_log_return ** 2) / (n-1))
1 INTERVAL_WINDOW = 30
2 n_future = 7
3
4 df[’vol_current’] = df.log_returns.rolling(window=
INTERVAL_WINDOW).apply(realized_volatility_daily)
5
6 df[’vol_future’] = df.log_returns.shift(-n_future).rolling
(window=INTERVAL_WINDOW).apply(
realized_volatility_daily)
13
CHAPTER 2. DATA COLLECTION AND PREPROCESSING
14
CHAPTER 2. DATA COLLECTION AND PREPROCESSING
15
CHAPTER 2. DATA COLLECTION AND PREPROCESSING
7 train_idx = df.index[:split_point_1]
8 val_idx = df.index[split_point_1:split_point_2]
9 test_idx = df.index[split_point_2:]
16
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
For construct and training our Deep Learning models, we will use Tensorflow
and Keras libraries
We will use 2 main metrics for evalutating performance of model: RMSPE
(Root Mean Squared Percentage Error) and RMSE (Root Mean Square Errors)
with RMSPE prioritized.
v
u n
u 1 X ytrue − ypred
RM SP E = t ( )2
n ytrue
i=1
For univariate models, we get the input from current volatility and target from
future volatility
1 #Input
2 x_train = df.vol_current[train_idx]
3 x_val = df.vol_current[val_idx]
4 x_test = df.vol_current[test_idx]
5 #Target
6 y_train = df.vol_future[train_idx]
7 y_val = df.vol_future[val_idx]
8 y_test = df.vol_future[test_idx]
17
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
4 X_val = input_df.loc[val_idx]
5 X_test = input_df.loc[test_idx]
7 lstm_1 = tf.keras.models.Sequential([
8 InputLayer(input_shape=(n_past, 1)),
9 LSTM(20),
10 Dense(1)
11 ])
12
13 lstm_1.compile(loss=’mse’,
14 optimizer="adam",
15 metrics=[rmspe])
16
17 checkpoint_cb = ModelCheckpoint(’lstm_1.keras’,
18 save_best_only=True,
19 monitor=’val_rmspe’)
20
21 early_stopping_cb = EarlyStopping(patience=30,
22 restore_best_weights=True,
23 monitor=’val_rmspe’,
24 mode=’min’)
25
18
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
19
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
1 n_past = 30
2 batch_size = 64
3
7 lstm_2 = tf.keras.models.Sequential([
8 InputLayer(input_shape=(n_past, 1)),
9
10 Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)
),
11
12 Bidirectional(tf.keras.layers.LSTM(16)),
13
14 Dense(1)
15 ])
16
17 lstm_2.compile(loss=’mse’,
18 optimizer="adam",
19 metrics=[rmspe])
20
21 checkpoint_cb = ModelCheckpoint(’lstm_2.keras’,
22 save_best_only=True,
23 monitor=’val_rmspe’)
24
25 early_stopping_cb = EarlyStopping(patience=30,
26 restore_best_weights=True,
27 monitor=’val_rmspe’,
28 mode=’min’)
29 lstm_2_res = lstm_2.fit(mat_X_train, mat_y_train,
30 callbacks=[checkpoint_cb, early_stopping_cb],
31 validation_data=[mat_X_val, mat_y_val], shuffle=True,
32 verbose=0, batch_size=batch_size, epochs=200)
20
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
21
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
2 batch_size = 64
3 n_dims = input_df.shape[1]
4
8 lstm_3 = Sequential([
9 InputLayer(input_shape=(n_past, n_dims)),
10 BatchNormalization(),
11
12 Bidirectional(LSTM(32, return_sequences=True)),
13 Dropout(0.1),
14
15 Bidirectional(LSTM(16)),
16 Dropout(0.1),
17
18 Dense(1),
19 Dropout(0.1),
20 ])
21
22 lstm_3.compile(
23 loss=’mse’,
24 optimizer=’adam’,
25 metrics=[rmspe]
26 )
27
22
CHAPTER 3. VOLATILITY FORECASTING USING LSTM MODELS
23
CHAPTER 4. RESULTS AND EVALUATION
1 mean_train_vol = x_train.mean()
2 baseline_preds = np.ones(len(val_idx)) * mean_train_vol
3 baseline_preds = pd.Series(baseline_preds, index=x_val.index)
24
CHAPTER 4. RESULTS AND EVALUATION
4.2 Comparision
We use two metrics for comparing models: RMSPE, RMSE with RMSPE prioritized.
Multivariate Bidirect LSTM 2 layers (32/16 units) is the model having the lowest
root mean square percentage error (RMSPE) so we will use this model for testing
on test data.
25
CHAPTER 4. RESULTS AND EVALUATION
26
CHAPTER 5. CONCLUSIONS
5.1 Summary
In terms of performance on the test set, our final LSTM model has an RMSPE
of 0.19246, which is quite good for forecasting realized volatility. Traders does
not need to make perfectly accurate forecast to have a positive expectation when
participating in the markets, they just need to make a forecast that is both correct
and more correct than the general consensus. Multivariate LSTM could potentially
give investors an advantage in terms of highly forecasting accuracy.
However, since financial time series data are constantly evolving, no model
would be able to consistently forecast with high accuracy level forever. The average
lifetime of a model is between 6 months to 5 years, and there’s a phenomenon in
quant trading that is called alpha decay, which is the loss in predictive power of an
alpha model over time. In addition, according to Sinclair (2020), researchers have
found that the publication of a new ”edge” or anomaly in the markets lessens its
returns by up to 58%.
These models therefore require constant tweaking and tuning based on the most
recent information available to make sure they stay up-to-date and learn to evolve
with the markets.
5.2 Suggestion for Future Works
For current model, we could fine tune hyperparameters, using other LSTM
architectures or different kinds of RNN (e.g. GRU) to get better result.
We could implement a hybrid approach of Deep Learning (LSTM, GRU, CNN,...)
with GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) - the
most popular volatility forecasting model to utilize the power of both traditional
statistical method and state-of-the-art models.
The investigation of Realized Volatility for other assets can also be given consideration
using this model with further fine-tuned hyper-parameters.
27
Bibliography
28