Prediction of Stock Returns using Machine Learning
A project report submitted
to
MANIPAL ACADEMY OF HIGHER EDUCATION
For Partial Fulfillment of the Requirement for the
Award of the Degree
of
Bachelor of Technology
in
Information Technology
by
Sahil Singh
Reg. No. 160911051
Under the guidance of
Dr.Sanjay Singh
Professor
Department of I & CT
Manipal Institute of Technology
AUGUST 2020
I dedicate my thesis to my friends and family.
i
DECLARATION
I hereby declare that this project work entitled Prediction of Stock Re-
turns Using Machine Learning is original and has been carried out by me
in the Department of Information and Communication Technology of Manipal
Institute of Technology, Manipal, under the guidance of Dr. Sanjay Singh,
Professor, Department of Information and Communication Technology, M.
I. T., Manipal. No part of this work has been submitted for the award of a
degree or diploma either to this University or to any other Universities.
Place: Manipal
Date :12-08-20
Sahil Singh
ii
CERTIFICATE
This is to certify that this project entitled Prediction of Stock Returns
using Machine Learning is a bonafide project work done by Mr. Sahil
Singh (Reg.No.:160911051) at Manipal Institute of Technology, Manipal,
independently under my guidance and supervision for the award of the Degree
of Bachelor of Technology in Information Technology.
Dr.Sanjay Singh Dr.Balachandra
Professor Professor & Head
Department of I & CT Department of I & CT
Manipal Institute of Technology Manipal Institute of Technology
Manipal, India Manipal, India
iii
ACKNOWLEDGEMENTS
I would llike to thank my internal guide for this project, Dr Sanjay Singh,
who guided me in the right direction to conduct my research on the subject.I
would also like to thank my college,Manipal Institute of Technology for having
provided their lab for doing my project work.
iv
ABSTRACT
Finacial market forecasting has been a very challenging problem for both
researchers and industrialists as the markets generally have a very low signal to
noise ratio. Therefore it can be considered as one of the toughest problems in
Machine Learning domain.There have been attempts to solve this problem us-
ing modified ML techniques.For instance in McNally, Roche, and Caton [8] the
author has tried to predict the directional movement of price of Bitcoin using
LSTM networks.He shows how the LSTM networks outperforms standard time
series forecasting techniques like ARIMA although the best accuracy achieved
is only 52 percent which further demonstrates the difficulty of financial market
predictions.
It was clear that some data transformation techniques would be needed
to modify the noisy price data before using it as input into the model.Two
appoaches have been taken to solve the prediction problem, namely funamental
analysis, in which certain key financial ratios which demonstrate the health
of the company has been used as an input, and Technical Analysis, wherein
certain transformations on the historical price data is done prior to giving
input to the model in order smooth out the inherent noise, similar to time
series analysis.
The yearly predictions using fundamental analysis inputs yielded a MCC of
0.11 indicating correlation between predictions and actual result.The Technical
analysis process yielded an accuracy of 58 percent in directional predictions.
CCS CONCEPTS
• Applied Computing → Forecasting:Decision analysis;
• Computing methodologies → Neural networks;Deep belief networks;
v
Contents
Acknowledgements iv
Abstract v
List of Tables ix
List of Figures x
Abbreviations x
1 Introduction 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . 3
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . 4
2 Methodology 5
2.1 Fractal Structure of the Markets . . . . . . . . . . 5
2.1.1 Estimation of Hurst’s Exponent . . . . . . 6
2.1.2 Use of Hurst’s Exponent . . . . . . . . . . 7
2.1.3 Hurst Exponent as a Feature . . . . . . . . 8
vi
2.2 Using Control Systems theory to Filter Time Series
Data . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Butterworth Filter . . . . . . . . . . . . . 10
2.2.2 High Pass Digital Filters . . . . . . . . . . 13
2.2.3 The Problem of Spectral Dilation . . . . . 14
2.2.4 Automatic Gain Control . . . . . . . . . . 15
2.2.4.1 Calculation of K . . . . . . . . . 16
2.2.5 Roofing Filter . . . . . . . . . . . . . . . . 16
2.3 RSI . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Modified RSI . . . . . . . . . . . . . . . . 17
2.4 MLP-LSTM . . . . . . . . . . . . . . . . . . . . . 18
2.5 Prediction using Fundamental Analysis . . . . . . 19
2.5.1 General Approach . . . . . . . . . . . . . . 19
2.5.2 Normalization Details . . . . . . . . . . . 19
2.5.3 Features of the Data . . . . . . . . . . . . 21
2.5.4 AdaBoost Classifier . . . . . . . . . . . . . 21
3 Results 23
3.1 Results on Fundamental Analysis . . . . . . . . . 24
3.2 Results on Technical Analysis . . . . . . . . . . . 28
4 Conclusion 29
Appendices 31
vii
A 32
A.1 Butterworth Filter . . . . . . . . . . . . . . . . . 32
A.2 High pass filter . . . . . . . . . . . . . . . . . . . 32
References 34
ProjectDetail 35
viii
List of Tables
3.1 Perfomance of Neural Net and AdaBoost Classifier
in Fundamental Analysis classification task . . . . 27
A.1 Project Detail . . . . . . . . . . . . . . . . . . . . 36
ix
List of Figures
2.1 A visual representation of behaviour of time series
based on the Hurst’s Exponent value . . . . . . . 8
2.2 Power,Frequency and phase,frequency plot for But-
terworth filter . . . . . . . . . . . . . . . . . . . 12
2.3 Power,Frequency and phase,frequency plot for Mov-
ing Average filter . . . . . . . . . . . . . . . . . . 12
2.4 High Pass Filter Gain-Frequency plot . . . . . . . 14
2.5 Details of the MLP-LSTM neural network used . 20
2.6 MLP-LSTM . . . . . . . . . . . . . . . . . . . . . 21
x
ABBREVIATIONS
LDA : Latent Drichlet Allocation
API : Application Programming Interface
ML : Machine Learning
LSTM : Long-Short Term Memory networks
ARIMA : Auto Regressive Integrated Moving Averages
MCC : Mathew’s Correlation Coefficient
CNN : Convolutional Neural Network
tp : True Positives
fp : False Positives
fn : False Negatives
tn : True Negatives
CAPM : Capital Asset Pricing Model
xi
Chapter 1
Introduction
The problem of predicting future stock returns would qualify as
the most challenging machine learning problem, for in financial
markets there is no guarantee for a pattern to exist and the signal
to noise ratio of the data is very low.The risk of overfitting in-
creases as a result and hence novel techniques have to be devised
to take care of such problems.There are two common approaches
when it comes to predicting returns namely, fundamental analy-
sis(one which relies on using financial statements for prediction)
and technical analysis(time series prediction). In this project ef-
fort has been made to devise an algorithm that combines both
the approaches.
Effort has been made by researchers in this domain, for exam-
ple in McNally, Roche, and Caton [8], the researchers used LSTM
to forecast the directional price of bitcoin.While they found out
that LSTM performs better than certain benchmark techniques
1
like ARIMA, it had only 52 percent accuracy which limits the
practical application of the model to real world trading.In this
project greater accuracy has been tried to achieve.
Algorithmic trading is on the rise everywhere due to availabil-
ity of large amounts of data.It would be important for an indi-
vidual investor or an investment firm to take advantage of these
developments to remain competitive.
The systems used have been able to attain an accuracy of
58 percent in predicting the directional returns 10 days ahead,
while the fundamental analysis system attains an MCC of 0.11.
It is important to note that the result of these models could be
further improved by using more relevant data.Hedge funds and
other investment firms generally have access to alternate datasets
not freely available and therefore can further improve the model
using their own data.
Recently, a new technique called manifold mixup(Verma et al.
[10]) has been devised to resolve the problem of overfittingin neu-
ral networks in classification problems.The basic idea is to see the
output as a weighted combination of all the classes so that the
continues values produced as a result would produce a smooth de-
cision function, less likely to overfit.It is useful for reducing over-
fitting as well as data augmentation in cases where sufficient data
is not available.In this project modifications have been made to
2
use it to solve the problem of classification in imbalanced classes.
Modelling of spatio-temporal time series requires modelling
both the temporal aspect and the spatial aspect of the data.One
particular approach taken to such a problem was in Wang et
al. [11], where they used a CNN-LSTM neural network for sen-
timent analysis of textual data.The dimensional aspect of each
text was encoded by the CNN as a separate region within a sen-
tence.Therefore each consecutive sentence had its own CNN and
the outpu of these CNNs was fed as input to the LSTM blocks.In
this project a similar strategy has been devised wherein MLP-
LSTM neural network has been used to model the multivariate
OHLC(Ope,High,Low and Close of everyday of a given stock)
time series data.
1.1 Problem Definition
• The fractal nature of the markets, which plays an important
role in understanding the state of the markets and in effect
any time series, has largely been ignored by practitioners.
• Traditional time series analysis uses arbitrary methods like
Moving Averages for smoothing and data transformations
which have a lot of flaws.
• Lack of proper metrics being used by practitioners for judg-
3
ing the performance of a classifier.
1.2 Objectives
• To use well studied techniques for the pre-processing of fi-
nancial time series data.
• Develop an approach specifically tailored for a multivariate
time series.
• Use Machine Learning Classification techniques on financial
statements to predict their annual stock returns.
4
Chapter 2
Methodology
2.1 Fractal Structure of the Markets
It has long been suspected that markets have a fractal structure
i.e. the time series looks the same whether one samples the data
hourly,daily,weekly,etc.This makes intuitive sense as the shape of
the time series curve is related to the fluctuation in prices which
is a reflection of the risks.Therefore for a market to be stable the
inherent risk should scale up according to the fractal law relative
to the time horizon of investment.The problem of determining the
Hurst’s exponent of a time series was first studied in the field of
hydrology.The problem of predicting rivers and lake levels for the
design of reservoirs was studied in Hurst [5],wherein they used
past data to predict river levels for the forthcoming year.As a
result of this work, something known as Hurst’s exponent was
created which shows the nature of the time series i.e whether the
time series is mean reverting,persistent or following a random
5
Brownian motion.The equation for the Hurst’s exponent is as
follows.
R(n)
E = CnH as n → ∞ (2.1)
S(n)
where, R(n) is the range of sum of deviation from the mean.
S(n) is the standard deviation.
E(x) is the expected value.
n is the time period of observation relative to the above mea-
surements.
C is a constant.
In a time series that is self-similar, H is related to the frctal
dimension D as D = 2 − H such that 1 < D < 2.For more details
on the fractal dimension check out Mandelbrot [7].
2.1.1 Estimation of Hurst’s Exponent
There are a lot of techniques to estimate the Hurst’s exponent,
however the one which has been used in this project is called the
Rescaled Range analysis as given in Gilmore et al. [4].
A time series is divided into a shorter time series of its factors
i.e a time series of length N is divided into n=N,N/2,N/4 the
rescaled range is then calculated for each series.
Consider a time series of length n,X = X1 , X2 , .....Xn
1
Pn
1. The mean, m is given by n i=1 Xi
6
2. The next step is to create a mean adjusted series: Yt =
Xt − m for t = 1, 2, 3 · · · n
Pt
3. We calculate the sum of the deviated series Zt = i=1 Yi for
t = 1, 2, 3 · · · n
4. Then we compute the range R: R(n) = max(Z1 , Z2 · · · Zn ) −
min(Z1 , Z2 · · · Zn )
5. The standard deviation S of the considered interval is cal-
culated:
qP
n
S(n) = 1
n i=1 (Xi − m)2
6. The Hurst exponent is then calculated by performing linear
h i
R(n)
regression of log S(n) on log n, where the slope of the line
gives the value of H.
2.1.2 Use of Hurst’s Exponent
The value of Hurst’s exponent tells us about the nature of the
time series,which in turn gives us information about the certain-
ity of the prediction.The H value of 0.5 tells us that the time
series follows a geometric Brownian motion and hence is random,
indicating that it is not possible to make accurate predictions.For
H > 0.5 we say that the time series is persistent which means that
it is bound to keep moving in one direction whereas for H < 0.5
it is mean reverting, which means it is likely to be range bound.
7
Figure 2.1: A visual representation of behaviour of time series based on the Hurst’s
Exponent value
2.1.3 Hurst Exponent as a Feature
In this project, the Hurst’s Exponent has been used as a feature
of the multivariate time series.Since our model gives the prob-
ability of an up or down move for a given period,the value of
the exponent is an important indicator of the nature of the most
recent time series, as described in the previous section, which
in turns tells the model whether its possible in the first place
to make an accurate prediction.The value of Hurst’s Exponent
has been calculated using the Closing prices of the previous 150
days.Therefore for every day, the corresponding value is calcu-
lated by using the look back period of 150 days.
8
2.2 Using Control Systems theory to Filter
Time Series Data
The principles of Control Systems theory have been widely stud-
ied and applied in the filed of electronics for the purpose of mainly
smoothing the incoming signal or getting rid of unwanted fre-
quencies.Since any time series can be identified as basically a
signal(analog or digital), these same principles can be applied for
modifying or cleaning the time series data before entering it into
our prediction model.
Definition 2.1 (Laplace Transform) Laplace transform is the
mapping of the function f (t) where t is time into the s-plane such
that
Z ∞
F (s) = e−st f (t) (2.2)
0
where s is a complex number s = a + ib
Definition 2.2 (Transfer Function) A transfer function is a
function that denotes the mapping of input to the output generally
denoted as
O(s)
= f (s) (2.3)
I(s)
where O(s) is the output and I(s) is the input.
Definition 2.3 (Z-Transform) The Z-transform is defined by
the transformation of Laplace transform such that Z = e−st .The
9
following relation holds while using it to a function f (t)
f (t − k) = Z −k f (t) (2.4)
Definition 2.4 (Moving Average) A Moving Average, F (t),
of a time series f(t) at a point t is defined as
F (t) Z −1 + Z −2 + · · · Z −N
= (2.5)
f (t) N
where N is the period of moving average.
Definition 2.5 (Gain(dB)) Gain in decibels for an angular fre-
quency ω is given by
G = 20 log10 (H(ω)) (2.6)
Where H(ω) is the transfer function.
For our purpose,we will be using digital filters since the finan-
cial market time series data is not continuous.
2.2.1 Butterworth Filter
To smoothen the data, the simplest and the most widely used
technique is moving averages as portrayed in Definition 2.4.How-
ever to obtain more smoothing more no of data points will be
needed as inputs to the Moving Average which creates lag i.e. the
filter takes time to react to the changes.Now traders want entry
signal as fast as possible, but for our model to interpret it prop-
erly we also want sufficient smoothing.The problem is solved by
10
the digital version of Butterworth filter as given in Ehlers [1].The
butterworth filter basically removes the lower frequencies in the
signal or a time series so that the signal is less noisy.This removal
of high frequency signals is not performed by Moving Averages.
The equation of a two-pole butterworth filter is as follows:
√
2π∗ T1
a=e
1
b = 2 ∗ a ∗ cos 1.414 ∗ 1.25 ∗ π ∗
T
c2 = b
(2.7)
c3 = −a ∗ a
c1 = 1 − c2 − c3
O(t) = c1 ∗ I(t) − c2 ∗ O(t − 1) − c3 ∗ O(t − 2)
Where O(t) is the filtered time series while I(t) is the orignal
time series. T is the time period corresponding to the desired
1
cuttoff frequency, F = T, such that only frequencies below this
will be retained. As can be seen clearly in Figure 2.2, the gain
response is same for freuencies less than the cutoff frequency, af-
ter which it drops sharply eliminating high frequency signals. In
comparison ,the higher frequency components are not removed
and the frequency response is also not smooth in a Moving Av-
erage filter as shown in Figure 2.3
The code is given in Appendix A.1.
11
Figure 2.2: Power,Frequency and phase,frequency plot for Butterworth filter
Figure 2.3: Power,Frequency and phase,frequency plot for Moving Average filter
12
2.2.2 High Pass Digital Filters
A fundamental requirement of time series analysis is making the
time series stationary, that is constant mean and variance through-
out the series.This property is tried to be achieved by taking the
difference of succesive terms. However this renders the result-
ing output quite jittery,not to mention the high frequency noise
preset in such data.Therefore a balance is required between high
frequency and low frequency components.This is where High pass
filters come in.They attenuate te frequency components greater
than the cutoff frequency and let the lower ones pass.
The following is the transfer function of a single pole high pass
filter
cos .707 ∗ 2 ∗ π ∗ T −1 + sin .707 ∗ 2 ∗ π ∗ T −1 − 1
α=
cos (.707 ∗ 2 ∗ π ∗ T −1 )
(2.8)
O(t) (1 − α/2) 1 − Z −1
=
I(t) 1 − (1 − α)Z −1
Where I(t) is the input time series and O(t) is the output time
series.
13
Figure 2.4: High Pass Filter Gain-Frequency plot
2.2.3 The Problem of Spectral Dilation
Spectral Dilation as mentioned in Ehlers [2] is basically the in-
crese in the amplitude of signals as their frequency decreases.This
results in the output signal having more of these lower frequen-
cies.This effect is agin due to t fractal geometry, or fractal nature
of time series,because when the time interval in consideration in-
creases, the range of price swings also increases,thus increasing
the amplitude. The power or the gain increases in proportion
1
to Fα where F is the frequency and α = 2H with H being the
Hurst’s Exponent of the time series.The amplitude increases at
6db per octave for α = 1 or a time series that is a Brownian
motion.For persistent time series the increase is even more.A sin-
14
gle pole High pass filter only attenuuates at the rate of -6db per
octave and therefore is not enough for persistent time series.It
is for this reason we use a two-pole high pass filter so that the
attenuation is more than the spectral dilation gain.The code for
a two pole high pass filter is provided in Appendix A.2.
2.2.4 Automatic Gain Control
It is well known that for an ML algorithm to perform optimally,
the input has to be normalized within the range of -1 to 1 or 0
to 1.In this project, the technique of Automatic Gain Control as
mentioned in Ehlers [3] is applied to the filter output to maintain
a steady gain ratio.The steps taken are:
1. Peak value is initially set to 0.The first value of the series is
then set as peak.
2. Continue forward and check if any value is greater than the
value of peak.If its not then New peak value is calculated as
Peak(t)=Peak(t-1)*K where K is decided beforehand.If the
value at the current time step is greater than peak,then peak
takes this new value.
3. Check if Peak value is greater than 0 and divide the value at
current time step by the Peak value.
15
2.2.4.1 Calculation of K
The gain decay factor for a theoretical sine wave would be
Gain = K Period/2 (2.9)
Since we will be considering only the period between 10 and 48
days in our project the effective gain becomes K 24−5 = K 19 . A
reasonable value of attenuation would be -1.5db therefore we get
−1.5 = 20 log10 (K 19 )
solving which gets us a K value of 0.991.
2.2.5 Roofing Filter
To obtain the advantages of both stationarity and smoothing, a
combination of high pass and low pass filter will have to be used.In
this project, the data of daily closing prices is first passed through
a high pass filter of a period of 48 days removing all frequencies
below that, and the resulting output is smoothed by passing it
through a two-pole Butterworth filter of 10 days removing all
frequencies above that.Hence we get the final series containing
frequencies between 10 days and 48 days.
2.3 RSI
The RSI value is an indicator of the strength of the trend its
calculated as the following
16
SM M A(U, n)
RS =
SM M A(D, n)
where SMMA(U,n) is the average of positive returns over the
last n time periods and SMMA(D,n) does the same for negative
returns.
Using the relative strength factor RSI is calculted by the fol-
lowing formula;
100
RSI = 100 −
1 + RS
2.3.1 Modified RSI
The RSI equation can be rearranged to be written as
100 ∗ SM M A(U, n)
RSI = (2.10)
SM M A(U, n) + SM M A(D, n)
where 100 is just a scaling term and can be ignored. Let D(t) =
SM M A(U (t), n) + SM M A(D(t), n), where SM M A(U (t), n) is
the average of positive returns over the previous n time steps
corresponding to time t and likewise for SM M A(D(t), n) where
D(t) is the series of absolute magnitude of negative returns. We
apply Butterworth filter in the calculation of RSI,where c1,c2 and
17
c3 are given in equation (2.7).
RSI(t) = c1 ∗ (SM M A(U (t), n)/D(t) + SM M A(U (t − 1), n)/D(t − 1))/2
+c2 ∗ RSI(t − 1) + c3 ∗ RSI(t − 2)
(2.11)
Before feeding input into the RSI, the original time series is
passed through a roofing filter as described in the previous section
with a two-pole high pass filter having a critical period of 48 and
the Butterworth filter having a critical period of 10.The above
mentioned modified RSI values with periods 5 and 14 are used as
features into our neural net along with Hurst’s Exponent.
2.4 MLP-LSTM
The time series we had was a multivariate time series,therefore it
was clear that a simple LSTM network would not suffice.Hence
a modified version of the LSTM was created wherein there was a
Multi-Layered Perceptron for each block of LSTM and the output
of the MLP was fed into the input of the corresponding LSTM
block.
Three inputs were taken for each timestep i.e. RSI value of
5,RSI value of 14 and Hurst’s exponent calculated over the period
of previous 150 days.A look-back period of 20 days was used,
therefore our training tensor was basically a 20 × 3 matrix. The
output target was a binary number indicating whether the return
18
has been positive or negative in the next 10 days.
In figure 2.6, we show the structure of MLP-LSTN, where Xit
is the ith feature of the timestep t.The output from the MLP will
be taken as the input for the LSTM block.
2.5 Prediction using Fundamental Analysis
2.5.1 General Approach
The financial statement of the companies for the last 20 years
was downloaded and the Key Financial Ratios were taken as in-
put which were fed into a prediction model after being normal-
ized.The output to be predicted was the directional return for the
next year.
2.5.2 Normalization Details
MinMax normalization was used to scale the ratios in the range
of 0 to 1.However this normalization was performed sector wise
i.e. the maximum and minimum value to be used for the stocks in
auto sector are different from those of the IT sector.The rationale
was that when an investor is deciding on buying a stock, he or she
compares it with similar stocks, and that’s why financial ratios
should be scaled relative to other stocks in the same sector.
19
Figure 2.5: Details of the MLP-LSTM neural network used
20
Label3 hhti
Cell Label1
cht−1i × + chti
Tanh
× ×
Hidden σ σ Tanh σ Label2
X1t hht−1i hhti
X2t
X3t .. O1
.. . xhti Input
Xnt .
Figure 2.6: MLP-LSTM
2.5.3 Features of the Data
1. The data had 64 features and 11,200 data points.
2. There was a class imbalance as there were more stocks with
negative returns than positive ones.
2.5.4 AdaBoost Classifier
The perfect algorithm for classifying data with class imbalance is
the AdaBoost Classifier.The classifier works in the following way:
1. Classification is performed using a weak learner and results
are recorded.
2. In the next stage the data vectors which were assigned the
incorrect label have greater probability of being chosen for
21
the next round of classification.After choosing the data vec-
tors the classification process is repeated.
3. This process goes on for n stages.The final classifier is the
sum of classifiers in all the stages weighted inversely to the
error of their classifications.
22
Chapter 3
Results
Definition 3.1 (True Positives(TP)) The number of labels that
have been predicted as positive correctly.
Definition 3.2 (False Positives(FP)) The number of labels that
have been predicted as positive incorrectly.
Definition 3.3 (False Negatives(FN)) The number of labels
that have been predicted as negative incorrectly.
Definition 3.4 (True Negatives(TN)) The number of labels
that have been predicted as negative incorrectly.
Definition 3.5 (Recall) Recall is the proportion of true posi-
tives cases among all positive labels.
TP
Recall = TP+FN
Definition 3.6 (Precision) Precision is the proportion of true
positives to the total number of labels that had been identified as
positive by the classifier.
23
TP
Precision = TP+FP
Definition 3.7 (F1-Score) F1-Score is the harmonic mean of
Precision and Recall.
2(Precision×Recall)
F1-Score = Precision+Recall
Definition 3.8 (Mathew’s Correlation Coefficient) Mathews
Correlation coefficient is defined according to the following equa-
tion
TP × TN − FP × FN
MCC = p (3.1)
(TP + FP)(TP + FN)(TN + FP)(TN + FN)
3.1 Results on Fundamental Analysis
Accuracy is not a good enough measure to judge the performance
of a classifier,therefore we use additional metrics such as MCC
and F1-Score which are defined above.F1-score basically mea-
sures the balance between precision and recall and MCC mea-
sures the correlation between the model’s predictions and the
observed data. We get an MCC of 0.11 for our fundamental anal-
ysis classifier using Ada Boost indicating a correlation between
our predictions and observations which means successful classifi-
cations. A comparitive study has also been done amongst a MLP
classifier and Ada Boost Classifier, the results are given below
in the table below. In table 3.1, accuracy and F1-score for var-
ious pre-processing types have been shown.In training the data
24
with neural nets, the method of manifold mixup has been used
to generate new training vectors such that the effect of class im-
balances can be mitigated.From a general point of view higher
weights were assigned to those vectors which belonged to the less
frequently occuring class label and lower weights to the more fre-
quently occuring.Assignments of these weights have been done
using two types of distributions, namely beta and normal.The
term balanced beta implies the use of Beta(0.5,0.5) as the dis-
tribution for assigning the weights.Otherwise the weights were
assigned in the following manner:
For Beta:
1. If the frequency of occurence of Class 0 is f1 in terms of the
total no of training vectors i.e. 0 < f 1 < 1,then our weight
assginment distribution will be α ∼ Beta(1 − f 1, f 1)
2. The training vectors are created such that the new training
input vector Xnew and the corresponding training output
vector, Ynew will be given by the following relations:
Xnew = αX0 + (1 − α)X1
(3.2)
Ynew = αY0 + (1 − α)Y1
Where X0 ,X1 are the input training vector belonging to
Class 0 and 1 respectively and likewise for Y0 ,Y1 .
25
For Normal:
1. The weight assignment procedure remains the same as above
except that in this case α ∼ Normal(1 − f 1, 0.1k), where k
can take the value of any integer assigned by the user, accord-
ing to how much variation the user wants in the weights.Normal2
indecates k value of 1,while norm4 indicates a k value of 4.In
this case the weights will be more centered around the mean
value than in the case of Beta Distribution.
2. normal feautures9,normal features30 and so on indicate that
feature pruning has been used for those models to see if it
performes better and the number indicates the no of features
included, the selection of which is done on the basis of these
features having higher correlation with annual returns than
others.
For AdaBoost Classifier:
Ada Boost has been used along with Decision tree classifier and
depth indicates a maximum depth level of 1 for the tree for a
single iteration while depth6 would indicate a maximum depth of
6.
Basically a comparitive analysis has been done between differ-
ent ways of using manifold mixup to correct class imbalances to
the classic way of using AdaBoost Classifier.
26
Table 3.1: Perfomance of Neural Net and AdaBoost Classifier in Fundamental
Analysis classification task
model type preprocessing type accuracy F1 Score
0 neural net balanced beta 0.470859 0.485842
1 neural net beta 0.679448 0.205323
2 neural net normal 0.671779 0.286667
3 neural net normal2 0.564417 0.466165
4 neural net beta2 0.630368 0.377261
5 neural net beta3 0.664110 0.220641
6 ada depth 0.662577 0.402174
7 ada depth6 0.682515 0.378378
8 neuralnet norm4 0.673313 0.116183
9 neuralnet normal features9 0.579755 0.424370
10 neuralnet normal features30 0.613497 0.388350
11 neuralnet normal features5 0.595092 0.394495
27
3.2 Results on Technical Analysis
The best accuracy obtained from the technical analysis time series
classification was 58.5 percent and a final cross-entropy loss of
0.512 which is good enough from a financial market point of view.
28
Chapter 4
Conclusion
From the above mentioned results it is reasonable to conclude
that our models predict the direction of the returns better than
random.Investors and traders who don’t have time to research the
markets to select stocks can rely on these models as reccomender
systems.
In this project, the time series data used was sampled at daily
interval,however in the future similar methods can be applied to
data sampled at shorter time intervals.The advantage of more
data being available as a result of higher sampling frequency
would make the model more robust and adaptive to all kinds
of condition.
Only one technical indicator has been studied in this project as
a feature, but more such technical indicators,which come under
the category of oscillators such as stochastic oscillator can be
studied for their potential use as a feature.
29
It is also observed that the scope of this project was limited
by the availability of only the data that is freely available .Since
financial markets are very competitive, there’s not much capital-
izable information present in freely available data, therefore al-
ternate datasets such as consumer surveys and other proprietary
datasets can be used to make more accurate predictions.
Another thing not explored in this project which is gaining
much popularity these days is sentiment analysis.It is basically us-
ing NLP(Natural Language Processing) techniques to guage the
public sentiment or opinion on a particular financial product.For
example, the researchers had found that media sentiments signif-
icantly effects Bitcoin’s price and that investors tend to overreact
on news in a shorter time frame.Therefore in the future modifi-
cations of the system presented in this project,sentiment analysis
could also be added along with other methods.
30
Appendices
31
Appendix A
A.1 Butterworth Filter
Listing A.1: Two pole butterworth filter Python code
def b u t t e r w o r t h 2 ( s e l f , c l , p e r i o d ) :
c l 1=c l
c l = np . z e r o s ( len ( c l ) )
a = np . exp ( − 1 .4 1 4 ∗ 3 .1 4 15 9 / p e r i o d )
b = 2∗ a∗np . c o s ( 1 . 4 1 4 ∗ ( 3 . 1 4 1 5 9 / 2 ) / p e r i o d )
c2 =b
c3=−a∗a
c1=1−c2−c3
for i in range ( 3 , len ( c l ) ) :
c l [ i ] = c1 ∗( c l 1 [ i ]+ c l 1 [ i −1])/2+ c2 ∗ c l [ i −1]+c
return c l
A.2 High pass filter
Listing A.2: Two pole High-pass filter Python code
32
def h i g h p a s s 2 ( s e l f , c l , p e r i o d ) :
hp = np . z e r o s ( len ( c l ) )
c o s e l e m e n t=np . c o s ( 0 . 7 0 7 ∗ 2 ∗ np . p i / p e r i o d )
s i n e l e m e n t=np . s i n ( 0 . 7 0 7 ∗ 2 ∗ np . p i / p e r i o d )
alpha = ( c o s e l e m e n t+s i n e l e m e n t −1)/ c o s e l e m e n t
print ( alpha )
peak = np . z e r o s ( len ( c l ) )
for i in range ( 3 , len ( c l ) ) :
hp [ i ] = (1− alpha / 2 ) ∗ ∗ 2 ∗ ( c l [ i ] −2∗ c l [ i −1]+ c l [ i
return hp
33
References
[1] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:
Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,
pp. 31–33.
[2] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:
Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,
pp. 77–79.
[3] John F Ehlers. “Cycle Analytics for Traders,+ Downloadable Software:
Advanced Technical Trading Concepts”. In: John Wiley & Sons, 2013,
pp. 54–55.
[4] M Gilmore et al. “Investigation of rescaled range analysis, the Hurst
exponent, and long-time correlations in plasma turbulence”. In: Physics
of Plasmas 9.4 (2002), pp. 1312–1317.
[5] Harold E Hurst. “The problem of long-term storage in reservoirs”. In:
Hydrological Sciences Journal 1.3 (1956), pp. 13–27.
[6] Vytautas Karalevicius, Niels Degrande, and Jochen De Weerdt. “Using
sentiment analysis to predict interday Bitcoin price movements”. In: The
Journal of Risk Finance (2018).
[7] Benoit B Mandelbrot. “Self-affine fractals and fractal dimension”. In:
Physica scripta 32.4 (1985), p. 257.
34
[8] S. McNally, J. Roche, and S. Caton. “Predicting the Price of Bitcoin
Using Machine Learning”. In: 2018 26th Euromicro International Con-
ference on Parallel, Distributed and Network-based Processing (PDP).
2018, pp. 339–343.
[9] Yensen Ni, Yi-Ching Liao, and Paoyu Huang. “Momentum in the Chi-
nese stock market: Evidence from stochastic oscillator indicators”. In:
Emerging Markets Finance and Trade 51.sup1 (2015), S99–S110.
[10] Vikas Verma et al. “Manifold mixup: Encouraging meaningful on-manifold
interpolation as a regularizer”. In: stat 1050 (2018), p. 13.
[11] Jin Wang et al. “Dimensional sentiment analysis using a regional CNN-
LSTM model”. In: Proceedings of the 54th Annual Meeting of the Asso-
ciation for Computational Linguistics (Volume 2: Short Papers). 2016,
pp. 225–230.
35
Table A.1: Project Detail
Student Details
Student Name Your Name
Registration 160911051 Section/Roll No. A/35
Number
Email Address Phone No.(M)
[email protected] 7004164643
Project Details
Project Title Prediction of StockReturns using Machine Learning
Project Duration 4-6 Months Date of Reporting 03-01-2020
Faculty Name Dr. Sanjay Singh
Full Contact Ad- Department of Information and Communication Technology,
dress with PIN Manipal Institute of Technology, Manipal-576104
Code
36