Computations in Option Pricing Engines 2020
Computations in Option Pricing Engines 2020
Digital WPI
2020-05-01
Pavee Phongsopa
Worcester Polytechnic Institute
Repository Citation
Wotton, N. B., Phongsopa, P., & Mendonca Filho, V. T. (2020). Computations in Option Pricing Engines.
Retrieved from https://digitalcommons.wpi.edu/mqp-all/7359
This Unrestricted is brought to you for free and open access by the Major Qualifying Projects at Digital WPI. It has
been accepted for inclusion in Major Qualifying Projects (All Years) by an authorized administrator of Digital WPI.
For more information, please contact [email protected].
Computations in Option Pricing
Engines
Vital Mendonca Filho, Pavee Phongsopa, Nicholas Wotton
Advisors: Yanhua Li, Qinshuo Song, Gu Wang
Submited to
Worcester Polytechnic Institte
in fulfillment of the requirements for the
Degree of Bachelor of Science in Mathematical Sciences
Disclaimer
This report represents work of WPI undergraduate students submitted to the faculty as
part of a degree requirement. WPI routinely publishes these reports on its web site without
editorial or peer review. For more information about the projects program at WPI, see
http://www.wpi.edu/Academics/Projects.
Abstract
As computers increase their power, machine learning gains an important role in
various industries. We consider how to apply this method of analysis and pattern iden-
tification to complement extant financial models, specifically option pricing methods.
We first prove the discussed model is arbitrage-free to confirm it will yield appropriate
results. Next, we apply a neural network algorithm and study its ability to approxi-
mate option prices from existing models. The results show great potential for applying
machine learning where traditional methods fail. As an example, we study the im-
plied volatility surface of highly liquid stocks using real data, which is computationally
intensive, to justify the practical impact of the methods proposed.
Contents
1 Introduction 1
6 Conclusion 22
7 Appendix 23
7.1 Derivation of Vega . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.2 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.2.1 Approximating a Linear AND Non-Linear Function Using a Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.2.2 Approximating the CRR Model Using a Neural Network . . . . . . . 27
7.2.3 Layers vs Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.2.4 Approximating the BSM using a Neural Network . . . . . . . . . . . 31
7.2.5 Setup-Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 True vs Prediction Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.5 Implied Volatility Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 42
List of Figures
1 Payoffs of European Call and Put Options . . . . . . . . . . . . . . . . . . . 3
2 Binomial Price Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 A Basic Neural Network Diagram . . . . . . . . . . . . . . . . . . . . . . . . 8
4 The Neural Network in Python for Linear Functions . . . . . . . . . . . . . 9
5 Actual and Predicted Values of f (x) = x + 2 . . . . . . . . . . . . . . . . . 10
6 Approximation with insufficient neurons . . . . . . . . . . . . . . . . . . . . 10
7 Approximation with 100 and 1000 neurons . . . . . . . . . . . . . . . . . . . 11
8 Loss Functions of Approximating 4th-Degree Polynomial with Different Neu-
ral Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9 Approximated Graph vs True Graph with Different Activation Functions . 13
10 Illustration of the BSM Neural Network . . . . . . . . . . . . . . . . . . . . 14
11 Initial Result for the Black-Scholes Neural Network . . . . . . . . . . . . . . 15
12 Results with Normalized Data . . . . . . . . . . . . . . . . . . . . . . . . . . 15
13 Time for Model Trainings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
14 Loss Function of American Put Option per 500 Epoch . . . . . . . . . . . . 17
15 American Call and Put Approximation with 5000 Epoch . . . . . . . . . . . 18
16 Predicted Graph with 1 and 3 layers of neurons . . . . . . . . . . . . . . . . 19
17 Predicted graph of 1 layer and 14 neurons with ReLU instead of Sigmoid . 19
18 Illustration of Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . 21
19 olatility Surface of BAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
20 Volatility Surface MSFT Jan 2020 . . . . . . . . . . . . . . . . . . . . . . . 22
21 Python Imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
22 Target function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
23 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
24 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
25 Learning Rate and Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . 24
26 Randomized Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
27 Training Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
28 Example of Loss Printed per 10 epoch . . . . . . . . . . . . . . . . . . . . . 26
29 Code for Graphing True vs Predicted . . . . . . . . . . . . . . . . . . . . . . 26
30 Number of Neurons and Layers . . . . . . . . . . . . . . . . . . . . . . . . . 27
31 Python Code for Option Model . . . . . . . . . . . . . . . . . . . . . . . . . 27
32 Python Code for Option Model . . . . . . . . . . . . . . . . . . . . . . . . . 28
33 Python Code for Training Model . . . . . . . . . . . . . . . . . . . . . . . . 28
34 Python Code for Training Data . . . . . . . . . . . . . . . . . . . . . . . . . 29
35 2 layers and 3 neurons in each layer . . . . . . . . . . . . . . . . . . . . . . 29
36 2 layers and 9 neurons in each layer . . . . . . . . . . . . . . . . . . . . . . 30
37 2 layers and 15 neurons in each layer . . . . . . . . . . . . . . . . . . . . . . 30
38 4 layers with 3 neurons in each layer . . . . . . . . . . . . . . . . . . . . . . 30
39 5 layers and 3 neurons in each layer . . . . . . . . . . . . . . . . . . . . . . 31
40 Import Statements for Code in Python . . . . . . . . . . . . . . . . . . . . . 31
41 Definition of the Vanilla Option Class . . . . . . . . . . . . . . . . . . . . . 32
42 Definition of the Geometric Brownian Motion Class . . . . . . . . . . . . . . 32
43 Definition of the Black-Scholes-Merton Formula . . . . . . . . . . . . . . . . 33
44 Definition of a Function to Calculate a Group of BSM Prices Given a Tensor 33
45 Definition of a function to Calculate BSM Price given a Single Underlying
and Strike Price And Creation of a Random List of Strike Prices For Model
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
46 Definition of a Neural Network Model . . . . . . . . . . . . . . . . . . . . . 34
47 Definition of the Loss Function as Mean Squared Error . . . . . . . . . . . . 34
48 Definition of the Method of Optimization for the Network . . . . . . . . . . 34
49 Creation of Tensors of Training Data . . . . . . . . . . . . . . . . . . . . . . 34
50 Definition of the Linear Transform Function . . . . . . . . . . . . . . . . . . 35
51 Loop for Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . 35
52 Loss Per Epoch while Training the Model . . . . . . . . . . . . . . . . . . . 36
53 Function for Linearly Transforming Model Output . . . . . . . . . . . . . . 36
54 Testing of the Model Versus the Training Data. See Figure 12b for an
Example of Trained Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
55 Testing of the Model Using Randomized Data. See Appendix 7.3 for Exam-
ple Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
56 Setup-Optimizer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
57 Setup-Optimizer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
58 Black-Scholes Approximation Output Graphs With Two Layers . . . . . . . 39
59 Black-Scholes Approximation Output Graphs With Three Layers . . . . . . 40
60 Loss Function of European Put Option per 50 Epoch . . . . . . . . . . . . . 41
61 Loss Function of European Call Option per 50 Epoch . . . . . . . . . . . . 41
62 Loss Function of American Call Option per 50 Epoch . . . . . . . . . . . . 42
63 Implied Volatility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
64 Implied Volatility Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
65 Implied Volatility Market Price . . . . . . . . . . . . . . . . . . . . . . . . . 43
66 Implied Volatility Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
67 Implied Volatility Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 43
68 Implied Volatility Surface Plot . . . . . . . . . . . . . . . . . . . . . . . . . 43
1 Introduction
In 1973, Black, Scholes and Merton derived a formula to calculate the theoretical value of
a European option contract [BS73]. Since then, the underlying Black-Scholes model has
been adapted to accommodate early exercises (American options), default risk, and some
other exotic option forms seen in both exchanges and over the counter (OTC) contracts
which usually have no closed form solutions and are computationally intensive [FPS03].
Nowadays, advancements in stochastic numerical methods further drive research on new
and more efficient ways of performing these calculations. In this paper, we explore the
possibility of applying machine learning to computations of option prices. We explore
the computational efficiency and accuracy of deep and reinforcement learning algorithms
comparing to traditional methods and analyzing its applicability and limitations.
First, we revisit the classical method of computation to set a benchmark for our compar-
ison with machine learning. After proving the no-arbitrage condition in the discrete-time
Binomial Tree (CRR) Model, we present the computation of European and American op-
tions in CRR model, and also the closed form formulas for them in the continuous-time
Black-Scholes Model.
We demonstrate the machine learning methods, by approximating simple functions
using neural networks. By comparing the difference between the value of the approximation
and target function at the data points, we can calculate the loss function and determine
how well our machine is “learning” the target function. Since neural networks come in
varying sizes, we must also determine the optimal complexity of our system, for each
approximation task, so it finds a balance between the accuracy of the approximation and
the computational cost. We then apply this machine learning technique to approximate
option prices derived in Section 2, for both CRR and Black-Scholes models, which shows
promising results.
Finally, we examine the classical numerical methods of implied volatility as an example
in option pricing, where no closed-form solutions are available, show how the implied
volatility surface of one underlying stock can change due to different market conditions.
Based on calculation with real market data, we show that calculation of these time varying
quantities using traditional methods is very demanding in comutational resources, which
demonstrates an area of promising application of machine learning techniques.
1
2.1 A Brief Overview of Options
In finance, a derivative is a contract whose value is reliant upon an underlying asset.
Options are one type of derivative involving a pre-agreed upon price, known as the strike
price, and a specific expiry date beyond which the option has no value and can no longer
be exercised. While there are many types of options, the two most basic types are Calls
and Puts. A Call option gives the owner the right, but not the obligation, to buy the
underlying asset at the strike price K, while a Put option provides the right, but not the
obligation to sell the underlying at the strike price. In terms of time to exercise, these
two options can be further divided into two subgroups: American and European types.
European options can only be exercised at their expiry T , while American options can be
exercised at any time before, or on, the date of expiry. [Shr05]
Mathemtacially, we have the Payoff out of a Call option at time t is:
because the option is in-the-money only if the underlying price (ST ) is greater than the
previous stipulated strike. If the stock price(ST ) is less than the strike it makes no sense
to pay the strike value for the option as it can be bought cheaper on the market, and the
payoff is zero. Similarly, the payoff for a Put option is
PP ut (T ) = max{0, K − ST },
because the option is in-the-money only if the underlying price (ST ) is less than the previous
stipulated strike (K). If the stock price is greater than the strike it makes no sense to sell
the stock for the lower strike price as there is a better deal on the market and thus the
payoff is zero. Graphically, Figure 1 shows the payoffs for Call and Put at expiry time T
respectively.
Note that it is possible to sell, or short, an option. In that case, the payoff diagram is
just the same as for holding the option, but reflected on the St axis. Mathematically, the
payoffs become min{0, K − ST }, for the Call Option, and min{0, ST − K}, for the Put
Option.
Definition 2.1. An n-step binomial tree model B(n, p, S, u, d) describes a financial market
which includes two assets; one is risk-free and earns a constant rate of return equal to r.
The price of the risky asset is given by (as demonstrated in1 Figure 2):
1
Retrieved from Wikipedia https://upload.wikimedia.org/wikipedia/commons/2/2e/Arbre_
Binomial_Options_Reelles.png.
2
(a) European Call (b) European Put
where u < d, and 1{Xt =i} is the indicator function taking value 1 when the condition
specified is met and value 0 otherwise.
Definition 2.3. An asset pricing model is called arbitrage free if there exists no arbitrage
opportunities within the model.
In the next theorem we establish the necessary and sufficient conditions for the nonex-
istence of arbitrage in B(n, p, S, u, d).
Theorem 2.1. The model B(n, p, S, u, d) is arbitrage free if and only if d < (1 + r) < u.
3
Figure 2: Binomial Price Tree
Suppose the given asset model is arbitrage free. This means a risk-neutral probability
measure exists which assigns to the event {Xi = 1} the probability [Rup06]
(1 + r) − d
q=
u−d
.
Since this is a probability we know 0 ≤ q ≤ 1 which implies
(1 + r) − d
0≤ ≤1
u−d
However, note that if q = 0, d = 1 + r. In this case, we can construct an arbitrage
opportunity as follows: borrow S from the Money Market Account and invest it in the
risky asset. At expiry t = 1, we have either dS or uS in the risky asset. So after paying
off the debt from the money market account, either we have a profit of (u − d)S or zero
dollars. Thus we have a positive probability of making money and zero probability of
losing, starting from zero initial capital, which is an arbitrage opportunity. This violates
our assumption that the asset model is arbitrage free.
Note also that if q = 1, u = 1 + r. In this case, we can again construct an arbitrage
opportunity: short the risky asset and put the S dollars into the Money Market Account.
Then at t = 1, we have uS dollars in the Money Market Account which we can use to
close our short position in the risky asset leaving us with a profit of either (u − d)S or zero
dollars. Again, this is an arbitrage opportunity. Thus, we know 0 < q < 1, and therefore
d < (1 + r) < u.
(⇐) If d < (1 + r) < u then the model is arbitrage free:
4
If d < (1 + r) < u, then
(1 + r) − d
q= ,
u−d
defines the risk neutral measure P̃ for the upward movement of S and 0 < q < 1. Under this
measure, every discounted portfolio process Dt Xt is a martingale [Shr05], where Dt = e−rt
e −rt Xt ] = X0 , where E
is the discounting factor. Thus E[e e is the expectation under the risk
neutral measure [Shr10].
Suppose there exists an arbitrage strategy, so that the corresponding portfolio X satis-
e n Xn ] = 0. On the other hand, there is zero probability of losing money,
fies X0 = 0, so E[D
so
Pe{Xn < 0} = 0.
Together they imply that
Pe{Xn > 0} = 0.
Since P is equivalent to P̃ , this must also hold for P , which contradicts that X is an
arbitrage. Therefore, the model must be arbitrage free.
In the rest of the discussion we focus on the CRR(n; S, r, σ, T ) model which is a special
√
case of the binomial tree model we defined above with interest rate e −r∆t − 1, u = eσ ∆t ,
√
−σ ∆t
d = u1 , and the risk neutral measure q = eσ√∆t−e −σ√∆t , where ∆t = T /n. This is the
r∆t
e −e
counterpart of the Black Scholes model in discrete time setting.
where q is the risk-neutral probablity, and the terminal value (the starting point of the
calculation) CT,i = (ST,i − K)+ for the call option and CT,i = (K − ST,i )+ for the put
option, is the payoff of the options at the expiration date.
Unlike European options, American options allow holders to exercise at any time up
to and including the expiration date. To properly calculate the output tree, we follow the
dynamic programming principle: first, we calculate the option value at each node if it is
not exercised at this point, using the expected discounted payoff under the risk neutral
measure, the same as for European options above. Then we update this value by allowing
early exercising, i.e. finding the maximum between the option value if the holder wait to
2
Wikipedia, https://en.wikipedia.org/wiki/Binomial_options_pricing_model
5
exercise later and the value if exercised now. Thus the value of American call option At,i
can be caluclated using the following recursive formula (for put options, we only need to
change the value of immediate exercising accordingly):
( )
At−∆t,i = max St−∆t − K, e−r∆t (qAt,i + (1 − q)At,i+1 ) . (2)
The terminal value is the same as the European options, because at the expiration date
T , the holder of the option has to decide whether to exercise or not, and waiting is not an
option anymore.
return µ, the interest rate r, and the volatility σ are positive constants and Wt is a brownian
motion, and a risk-free asset, Bt which follows dB Bt = rdt. The stock price follows lognormal
t
distribution:
ST 1
ln ∼ N ((r − σ 2 )T, σ 2 T )
S0 2
under the risk neutral measure.
The Call and Put price (C0 and P0 respetively) with maturity T and strike price K
have closed form solution in this model [Shr10]:
and
P0 = Ke−rT Φ(−d2 ) − S0 Φ(−d1 ),
where [ ( ) ]
1 S0 σ2 √
d1 = √ ln + r+ (T − t) , d2 = d1 − σ T − t,
σ (T − t) K 2
and Φ is the cumulative distribution function of the Standard Normal Distribution:
∫ x
1 t2
Φ(x) = √ e− 2 dt.
2π −∞
6
network iterates through compositions of simple functions, and steps towards the optimal
composition, which minimizes the aggregate error of the approximiation from the bacch -
the so called loss function.
A neural network is typically made up of layers of neurons. These ”neurons” are simple
functions with closed form expressions. A layer can have any number of neurons and a
network can contain any number of internal layers. By nature, a network will always
include an input and output layer. The input layer takes in the input and passes it into
each neuron in the following layer. The output layer takes all values produced by the last
layer and condenses them into the specified output size [PCa19]. For example, consider a
neural network with 1 layer containing only 1 neuron - a linear function. The network tries
to find the best linear function to approximate the relationship between the given batch of
(x, f (x)) pairs, which is actually a linear regression problem
Note that in this case, a unique solution is guarenteed if there are only two data points.
The relationship between any number of points greater than that can be uniquely solved
if the functuon f itself is linearas explored in Section 3.1. If that is not the case, the
relationship can be approximated.
For more complex problems, we can construct neural networks with more layers and
neurons for better approximations. Consider another layer of one linear function added to
the above moddel, so that the result from equation (3) is passed as the input to another
linear function. Then the probelm expands to approximate f with the composition of two
linear functions:
min (f (x) − (a1 x + b1 )a2 + b2 ), (4)
a1 ,a2 ,b1 ,b2
where the ai ’s and bi ’s are constants to be chosen for the ith layer.
A single layer can also be made up of multiple neurons. In this case, each neuron is
a different simple function, and rather than a composition of functions like Equation (4),
this network processes multiple functions simultaneously and selects the best performing
for the next iteration using an activation function, discussed below. It is worth noting
that the number of neurons and layers directly impact the computational mass and time
required for the network to execute the approximation.
Figure 3 shows a basic neural network with 2 internal layers composed of 3 neurons
each. The network begins with the input-output pairs, (x, f (x)). The input layer passes x
to each neuron in the first internal layer. Each ai in this first layer is multiplied by x, which
represents a linear function in x. Then, ai x passes through an activation function. This
activation function chooses whether to turn this neuron’s result ”on” or ”off”, i.e. whether
to feed into the next layer of nuerons, based on if the approximation is good enough. The
two most common types of activation functions are ReLU and Sigmoid,
7
Figure 3: A Basic Neural Network Diagram
1
Sigmoid : ϕ(x) = .
1 + e−x
After ai x passes through the activation function, the constant bi is added and the result is
composed with each simple function of the next layer. Finally, each product of the neurons
of the second layer passes through another activation function. The results of the activation
functions are then passed into the output layer, where a single value is determined. During
the training of the model, this value, N (x) is then compared to the true given value, f (x).
The loss is assessed and adjustments are made by adjusting ai ’s and bi ’s to minimize the
loss. Mathematically, the above procedure can be represented as:
( ([ ] [ ]) [ ]) a21
N (x) = ϕ2 ϕ1 x a11 a12 a13 + b1 b1 b1 a22 + b2 , (5)
a23
where ϕi is an activation function, aij is the coefficient associated with the j th neuron on
the ith layer, and bi is the constant associated with the ith layer.
The above procedure is repeated for a large number of times. Each of such iteration is
call an “epoch”, which uses a newly generated data set, and further reduces the estimation
error, until it is less than a pre-detemrined confidence level. The measure of accuracy for
the model training is defined as the Mean Squared Error (MSE)
∑
n
M SE = (f (x) − N (x))2 ,
i=1
where N (x) is the estimated value for each x by the neural network. The MSE and learning
rate which determine how fast our neural network changes itself are then used to optimize
8
Figure 4: The Neural Network in Python for Linear Functions
the neural network. In each epoch, the MSE is multiplied by the learning rate and the
value is used to modify the multiplier of each neuron. The complete code for this example
can be found in Appendix 7.2.1.
f (x) = x + 2. (6)
The neural network is defined using the Pytorch Python package, as shown in Figure 4.
There are 2 layers of 30 neurons each. Note that the step invoking the activation
function ReLU is removed from the network because the target function itself is linear,
and having ReLU in the network negatively impacts the accuracy and efficiency of the
model. After initializing the model, the training data, n = 1000 pairs of (x, f (x) = x + 2),
are randomly generated. These newly generated pair are fed into the neural network for a
total of 10000 epochs.
Once the model is trained, it is tested using another randomly generated batch of
input-output pairs (x, f (x)). The closer the predicted value N (x) from the neural network
is to the actual value f (x), the more accurate the network is. The results of the optimized
model are illustrated in Figure 5. The graph shows that the model is accurate since the
predicted results are linear and completely overlap the actual points.
In order to approximate non-linear functuons, like option prices, we need to add ac-
tivation functions back to the neural network constructed above for linear functions. We
test the accuracy of this neural network for non-linear polynomials of different degrees,
and the result was poor (as shown in Figure 6) due to the insufficient number of neurons
9
Figure 5: Actual and Predicted Values of f (x) = x + 2
and layers. We carry out a sequence of experiments by increasing the number of neurons
and layers. The loss function decreases with the complexity of the neural network and
the improvement of the fitting was visibly noticable in Figure 7, with results of Sigmoid
activation function on the left and ReLU on the right. From our experiments, shown in
Appendix 7.2.3 and Figure 7, the number of neurons plays a larger role than the number
of layers in determining the accuracy of the predicted graph.
10
Figure 7: Approximation with 100 and 1000 neurons
11
(a) 4th-Degree Polynomial with 3 layers (b) 4th-Degree Polynomial with 4 layers
time efficiency. Further experiments with Sigmoid and ReLU as activation functions, also
show that polynomials of different degrees work better with different setups. With Sigmoid
function, the algorithm runs slightly faster compared to ReLU but left with larger loss with
the same number of neurons and layers, mainly due to the fact that the approximation
tends to diverge from the true value for x’s with large absolute values. Thus it would
be a good idea to include a model optimizer, which customize the structure of the neural
network, which we could use at the later stage of our project.
The optimizer that we wrote requires 4 inputs. The maximum number of neurons,
layers to test, acceptable loss and the actual function that we want to test neural network
against. The optimizer then will run every combination starting from 2 neurons and 2
layers up to the input amount. Two setup statistic will be returned. One is the setup
with the lowest loss and the other is the setup with the shortest execution time and loss
less than the acceptable level 0.001. For the later stage of our project, we mostly use the
second setup due to its practicality. The code for the setup-optimizer can be found in the
Appendix 7.2.5.
Lastly, just to see how close we can get to 0 loss, we leave our machine running for a
few hours with over 10000 neurons in each layer. We are able to get the loss function to 0
up to more than 5 decimal places but the process is impractical due to the amount of time
12
(b) Approximated Graph vs True
(a) Approximated Graph vs True Graph
Graph with Sigmoi]d
with ReLU
needed. Nevertheless, we may see different result given a more powerful machine such as
super computer.
13
Figure 10: Illustration of the BSM Neural Network
Once the model is trained, a second set of simulated stock prices is generated, which
are used both to calculate the true option price using the Black-Scholes formula, and the
approximation using the trained neural network. The true and predicted prices are plotted
against the underlying stock price in Figure 11. As illustraed in the graph, this first version
of the program performs poorly. Not only is the error large, the predictions are completely
wrong - it is a linear function of the stock price, with a slope of nearly zero.
Naturally, this leads to an investigation on why the model was behaving this way. There
are possibly many approaches to improving the approxmiation (see e.g. [Lu+19]). After
some experimentations, we discover that uniformly transforming the data leads to better
results. Regarding all data points as a vector x, the transformed data vector, xN is defined
as
u−l
xN = (x − min x) + l,
max(x) − min(x)
where u and l are the upper and lower scaling parameters for the transform respectively.
Difference choices of l and u produce difference accuracy in the approximation using neural
networks. After some experimentation, it was deduced that a normalization of the f (x)
vector (l = 0, u = 1) and a scaling of the x vector (l = −1, u = 1) corresponds to the best
accuracy. The exact reason for this is unclear. However, it is possible that the network is
more able to detect the pattern f (x) with a less noisy data set. As seen in the left panel of
Figure 12, after the above linear transformation, the model is able to accurately calculate
14
Figure 11: Initial Result for the Black-Scholes Neural Network
(a) Normalized Training Data Results (b) Normalized Random Data Results
the values of the Call options for the training data. Additionally, as seen in the right panel
of Figure 12, the model is able to much more accurately predict the prices of the options
for random, normalized data.
While the result after the data transformation is vastly superior to the original one,
further experiments are performed by varying the number of neurons and layers. From
the data summarized in Table 1 and the figures in Appendix 7.3, it is clear that, in most
cases, increasing the number of neurons increases accuracy, but increasing the number of
layers increases computation time and yields a higher loss. In the table, the Model Column
denotes the number of neurons in the first, second, and third layer respectively. The Loss
column contains the remaining loss after 5000 iterations of training, and the Time Column
indicates the time for the training of each network in seconds. The total elapsed time
for each model is also plotted in Figure 13a. The two peaks in the graph corresponds to
15
Model Loss Time
1 [50, 12] 0.00057448 4.441021919
2 [50, 12, 10] 0.00055705 5.121997118
3 [50, 30] 0.00072565 4.549135208
4 [50, 50] 0.00055633 4.46467185
5 [50, 50, 100] 0.00111636 5.457895279
6 [50, 100] 0.00067071 4.867865562
7 [80, 20] 0.00055201 4.295755148
8 [100, 80] 0.00044758 4.736923695
9 [100, 100] 0.00052513 4.812922478
10 [100, 120] 0.00042859 5.106760263
11 [100, 150] 0.00049361 5.278343201
12 [500, 100] 0.00030057 7.834435225
Networks 2 and 5, which have three layers, and Figure 13b shows the times with these two
models removed. Typically, increasing neurons and adding another layer to an otherwise
identical model increases the computation time. The code of the Black-Scholes Neural
Network can be found in Appendix 7.2.4.
16
Figure 14: Loss Function of American Put Option per 500 Epoch
We run more tests on the neural network because even though our input data was
randomly generated, they do not have the same volatility nor the magnitude of the real
data. As such, the loss function may have been scaled down and we end up with a misin-
terpretation of the results. It turns out that the machine is in factworking better than we
anticipated. The reason for this unusually low loss function is probably due to the large
number of epochs. When we scaled out the total number and print out 10 times as often,
we can clearly see the initial inaccuracy of the neural network and how fast it is correcting
itself. Only 5000 epoch is needed to get an acceptable result from the predicted graph as
shown in Figure 15 for American Call and Put options. Aside from the slight difference
for intermediate intial stoch price, the predicted closely resembles the true graph. In most
cases, the neural network was able to quickly converge to a loss less than 0.001 and in rare
occasion down to 0.0001. Full Code and output is provided in the Appendix 7.2.2 and
Appendix 7.4.
17
(a) American Call True vs Predicted (b) American Put True vs Predicted
Graph Graph
Figure 15: American Call and Put Approximation with 5000 Epoch
works if and only if the activation function is not polynomial. In this section, we test this
theory with some examples.
Knowing that given enough time, neurons, and layers, any graph can be approximated,
we want to test the efficiency of neural networks of differnt structures given limited re-
sources. To do so, we choose a fixed number of neurons that we further divide among 3
layers in one case, while keeping all of them in the same layer in the other.
From the approximation, both have roughly the same loss of 0.0003, as shown in Figure
16. However, the speed of the calculation for the single layer is much faster at 3.02s while
the 3 layers network takes 4.41s. In the case that we switch Sigmoid to ReLU in a single
layer, the loss drops to 0, giving us a perfect prediction in Figure 17. With the above
demonstration of the Universal Approximation Theorem, for any financial quantities that
we want to approximate, given that they closely resemble polynomial functions, we can be
confident in accurately generating a predicted graphs using neural networks with only one
layer of sufficiently large number of nuerons.
18
(a) Predicted Graph of 3 Layers with 6, (b) Predicted Graph of 1 layer with 14
5, and 3 neurons in each layer neurons
Figure 17: Predicted graph of 1 layer and 14 neurons with ReLU instead of Sigmoid
19
are easy to retrieve from the market. Volatility describes the variability of the stock prices,
and the past practices have used the value estimated from historical data as a proxy. This
backward looking method attracts criticism in that what happens in the past does not
necessarily indictes the future.
Implied volatility is a foward looking way, in that we can use the market price of an
option contract and backtrack the Black-Scholes formula to obtain the market’s expected
value of the underlying’s volatility for the duration of the contract. To do so, we have to
calculate the Vega of the option.
Vega measures the sensitivity of the option’s premium to volatility. It is important
to notice that the higher the volatility the better for both Calls and Puts as it increases
the probability of having the option end up in the money, which agree with the following
closed-form calculation from the Black-Scholes formula that it is always strictly positive,
and the same for both call and put options:
√
SN ′ (d1 ) T . (7)
In the following, we will demonstratet the derivation of the implied volatility using call
option prices, and the calculations for the put option follows the same procedure. With
Vega being strictly positive, the option price as a function the volatility is invertible, and we
can calculate the implied volatitlity by matching the market price Cmkt and the theoretical
value CBSM (σ).
The equation Cmkt − CBSM (σ) = 0 does not have a closed-form solution, and we
need help from numerical methods. In Numerical Analysis, Newton’s method is used for
approximating the roots of differentiable functions. It is an iterative method, and in each
step uses the intersection of the tangent line of a given point xn to determine the next
value xn+1 .
f (xn )
xn+1 = xn − ′ (8)
f (xn )
The procedure continues until the absolute value of xn+1 − xn is smaller than a pre-set
threshold. Figure 18 illustrates this process, where xn ’s are the sequence of approximations
for the implied volatility and f (x) = Cmkt − CBSM (x). The Python code of this calcaultion
is in Section 7.5.
20
Figure 18: Illustration of Newton’s Method
(a) Volatility Surface of BAC Dec 2019 (b) Volatility Surface of BAC Jan 2020
The following are some examples of how the implied volatility surface, as a function
of underlying asset price and time to maturity. Figure 19 compares the surface for call
options on stocks of Bank of America (NYSE: BAC), in December 2019 and January 2020.
The surface’s shape for the two contracts are very different. This change happen due to
different market expectations of the performance of the BAC stock. An implied volatility
surface is not stable as news, changes in economic policy and balance sheet, which affect
both liquidity and pricing of option contracts.
The next example in Figure 20 shows the volatility surface for call options on stocks of
Microsoft (NYSE: MSFT) in January 2020. It highlights that despite the majority of deep
out-of-money and deep in-the-money contract have relatively higher prices, or equivalently,
higher implied volatilities, the surface shape does not form the expected ”smile”. As
mentioned above, cases like this are not uncommon due to changes in the option prince in
accordance to market’s beliefs.
Analysis of the implied volaitlity surface assist traders, risk managers and mathemati-
cians on a daily basis. Approximating implied volatility surfaces could auxiliate traders,
specially high frenquency traders to obtain accurate estimations, allowing finance pro-
21
Figure 20: Volatility Surface MSFT Jan 2020
fessionals to leverage new mathematical models when devising and executing investment
strategies. However, the need for faster computations highlight the limitations brought
by traditional numerical methods. The use of machine learning may help estimating the
volatilty surface from fewer data points (all of which needs numerical calculations) than
what is needed for the traditionally used interpolation method, and ease the computation-
ally heavy burden brought by large number of strikes and maturities, and the ever changing
option prices.
6 Conclusion
Our result shows that neural network is an effective tools in predicting the trend of many
data sets. By training it against CRR and BSM models, it was able to reach desirable
result relatively quickly within 5 to 10 seconds, on estimating the option prices. This
solution can help approximate functions or models that can not be solved using traditional
method due to their complexity or the limited resources available for the computation.
The machine learning techniques can act as a maleable model that helps us get a ”good
enough” approximation of a function in a reasonable timeframe. Even though there are
still some limitation due to it being a model-based machine learning, we believe that neural
network can lay the foundation for future development in computational finance.
22
7 Appendix
7.1 Derivation of Vega
∂C ∂N (d1 ) ∂N (d2 )
=S − Ke−rT (9)
∂σ ∂σ ∂σ
∂N (d1 ) ∂d1 ∂N (d2 ) ∂d2
=N (d1 ) + S − Ke−rT (10)
∂d1 ∂σ ∂d2 ∂σ
[ ( ) ]
2 σ 2 T 2 − ln S + r + σ 2 T T 12
3
1 −d K 2
=S √ e 2
1
σ 2T
2π
[ ( ) ] 1
( 2
) − ln S + r + σ2
T T2
1 −d1 S rT K 2
− Ke−rT S √ e 2 e (11)
2π K σ2T
[ ( ) ]
2 σ 2 T 32 − ln S + r + σ 2 T T 12
1 −d K 2
=S √ e 2
1
2
σ T
2π
[ ( ) ] 1
σ2
1 −d1 2 − ln S
K + r + 2 T T2
− S√ e 2 (12)
2π σ2 T
√
=SN ′ (d1 ) T (13)
23
Figure 22: Target function
24
Figure 26: Randomized Training Data
25
Figure 28: Example of Loss Printed per 10 epoch
26
7.2.2 Approximating the CRR Model Using a Neural Network
27
Figure 32: Python Code for Option Model
28
Figure 34: Python Code for Training Data
29
Figure 36: 2 layers and 9 neurons in each layer
30
Figure 39: 5 layers and 3 neurons in each layer
31
Figure 41: Definition of the Vanilla Option Class
32
Figure 43: Definition of the Black-Scholes-Merton Formula
Figure 44: Definition of a Function to Calculate a Group of BSM Prices Given a Tensor
33
Figure 45: Definition of a function to Calculate BSM Price given a Single Underlying and
Strike Price And Creation of a Random List of Strike Prices For Model Training
34
Figure 50: Definition of the Linear Transform Function
35
Figure 52: Loss Per Epoch while Training the Model
Figure 54: Testing of the Model Versus the Training Data. See Figure 12b for an Example
of Trained Data
36
Figure 55: Testing of the Model Using Randomized Data. See Appendix 7.3 for Example
Output
37
7.2.5 Setup-Optimizer
38
7.3 True vs Prediction Graphs
(a) Results with 2 Internal Layers, H1 (b) Results with 2 Internal Layers, H1
= 50 and H2 = 30 = 50 and H2 = 50
(c) Results with 2 Internal Layers, H1 (d) Results with 2 Internal Layers, H1
= 50 and H2 = 100 = 80 and H2 = 20
(e) Results with 2 Internal Layers, H1 (f) Results with 2 Internal Layers, H1
= 100 and H2 = 80 = 100 and H2 = 100
39
(a) Results with 2 Internal Layers, H1 (b) Results with 2 Internal Layers, H1
= 100 and H2 = 150 = 500 and H2 = 100
(c) Results with 3 Internal Layers, H1 (d) Results with 3 Internal Layers, H1
= 50, H2 = 12, and H3 = 100 = 50, H2 = 50, and H3 = 21
40
7.4 Loss Function
41
Figure 62: Loss Function of American Call Option per 50 Epoch
42
Figure 65: Implied Volatility Market Price
43
References
[BS73] Fischer Black and Myron Scholes. “The Pricing of Options and Corporate Lia-
bilities”. In: The Journal of Political Economy 81.3 (1973), pp. 637–654.
[FPS03] Jean-Pierre Fouque, George Papanicolaou, and K. Ronnie Sircar. Derivatives in
Financial Markets with Stochastic Volatility. Cambridge University Press, 2003,
p. 201.
[Shr05] Steven Shreve. Stochastic Calculus for Finance I: The Binomial Asset Pricing
Model. Springer Finance, 2005.
[Rup06] David Ruppert. Statistics and Finance: An Introduction. Springer, 2006.
[Shr10] Steven Shreve. Stochastic Calculus for Finance II: Continuous Time Models.
Springer Finance, 2010.
[Lu+19] Lu Lu et al. “Dying relu and initialization: Theory and numerical examples”. In:
arXiv preprint arXiv:1903.06733 (2019).
[PCa19] Adam Paszke, Soumith Chintala, and Edward Yang et al. Pytorch Documenta-
tion. 2019.
44