DeepLearningandTimeSeries To ImageEncodingforFinancialForecasting
DeepLearningandTimeSeries To ImageEncodingforFinancialForecasting
net/publication/341250503
CITATIONS READS
232 3,416
5 authors, including:
All content following this page was uploaded by Silvio Barra on 13 April 2021.
Abstract — In the last decade, market financial forecasting has vestors [2]; these aspects tend to model the market as an en-
attracted high interests amongst the researchers in pattern recog- tity which is dynamic, non-linear, non-parametric, and chaot-
nition. Usually, the data used for analysing the market, and then
gamble on its future trend, are provided as time series; this as- ic [3].
pect, along with the high fluctuation of this kind of data, cuts out Notwithstanding the above components, the research
the use of very efficient classification tools, very popular in the performed in recent years in this field is growing both in terms
state of the art, like the well known convolutional neural net- of literature production and in tools generation [4], [5]; also,
works (CNNs) models such as Inception, ResNet, AlexNet, and so an increasing number of studies involves the use of learning
on. This forces the researchers to train new tools from scratch.
Such operations could be very time consuming. This paper ex-
approaches like machine learning classifiers and neural
ploits an ensemble of CNNs, trained over Gramian angular fields network models. Essentially, these techniques cover most of
(GAF) images, generated from time series related to the Stand- the research achieved in the field. The neural networks
ard & Poor’s 500 index future; the aim is the prediction of the fu- approaches applied to financial forecasting started
ture trend of the U.S. market. A multi-resolution imaging ap- approximately across the end of 80’s and the beginning of 90’s.
proach is used to feed each CNN, enabling the analysis of differ- In those years, one of the first artificial neural network has
ent time intervals for a single observation. A simple trading sys-
tem based on the ensemble forecaster is used to evaluate the qual- been proposed [1], with the aim of predicting the Tokyo stock
ity of the proposed approach. Our method outperforms the buy- exchange price indexes (TOPIX Index), by taking six metric
and-hold (B&H) strategy in a time frame where the latter vectors as input. In the same year, the authors in [6] built a
provides excellent returns. Both quantitative and qualitative res- recurrent neural network for stock price patterns’ recognition,
ults are provided. exploiting the triangle pattern as a clue to the trend of the
Index Terms—Convolutional neural networks (CNNs), ensemble of future stock prices. In [7], a similar approach has exploited
CNNs, financial forecasting, Gramian angular fields (GAF) imaging. eleven market indicators for building and training a recurrent
neural network for monthly transition of the stock price index
I. Introduction [8]. Several approaches have also been proposed in the field,
INCE the dawn of the financial market, people have been facing the financial series prediction topic by using the
S trying to build tools able to provide insights and informa-
tion about the stock price variations in the near future, so to
dendritic neuron model (DNM) technique [9], whose
performances have been shown both on Asiatic [10] and U.S.
increase the possibilities to invest on the right company [1], market [11]. These techniques have evolved a lot in the recent
future, etc. Since then, the market has become much bigger, 10 years, as well as the finance itself [12]. As a consequence,
and the available instruments for financial forecasting have dozens of scientific papers have been published, proposing
reached an unprecedented efficiency. Nowadays, the research approaches which aim at predicting the stock prices by
in this area is one of the most active amongst the pattern re- exploiting news data extracted from the most popular social
cognition related topics, and at the same time it is one of the networks for modelling the uncertainty which lies behind the
most challenging. This is mainly due to the fact that stock fluctuation of the market [13]. Financial forecasting is indeed
prices are often influenced by factors which are quite hard one of the research branches in which sentiment analysis
predictable like political events, the behaviour of the other found a quite breeding ground [14], [15].
stock markets and, last but not least, the psychology of the in- Alongside the artificial neural networks (ANN), also the
Manuscript received October 19, 2019; revised December 22, 2019, Febru- machine learning approaches had the possibility to show their
ary 6, 2020; accepted March 9, 2020. This work was supported by the “Bando efficiency through the years. In [16], the authors have
Aiuti per progetti di Ricerca e Sviluppo-POR FESR 2014-2020-Asse 1, compared the capabilities of the support vector machines [17]
Azione 1.1.3. Project AlmostAnOracle-AI and Big Data Algorithms for Fin-
ancial Time Series Forecasting”. Recommended by Associate Editor in market prediction related issues against those obtained by
MengChu Zhou. (Corresponding author: Silvio Barra.) using the radial basis function (RBF) networks and back
Citation: S. Barra, S. M. Carta, A. Corriga, A. S. Podda, and D. R. Recu- propagation (BP). In [18], a recent literature review is
pero, “Deep learning and time series-to-image encoding for financial forecast-
ing,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 683–692, May 2020.
performed, which compares the modern machine learning
The authors are with the Department of Mathematics and Computer Sci- approaches in financial forecasting field.
ence, University of Cagliari, Cagliari 09121, Italy (e-mail: silvio.barra@uni- Interesting results have also been obtained when fusing
ca.it; [email protected]; [email protected]; [email protected]; together the above described techniques: as an example, in
[email protected]).
Color versions of one or more of the figures in this paper are available on-
[19] the authors have fused ANN with decision trees (DT).
line at http://ieeexplore.ieee.org. The rationale behind this hybrid approach is that where ANNs
Digital Object Identifier 10.1109/JAS.2020.1003132 are able to provide quite good performances in forecasting the
684 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 3, MAY 2020
market trend, a DT model is stronger in generating potential The remainder of this paper is organised as follows: Section
rules which describe the forecasting decisions. Similarly, in II outlines the developed method, from the generation of the
[20], a two-stage fusion approach is proposed for predicting GAF images to the description of the ensemble of CNNs.
CNX Nifty and S&P Bombay Stock Exchange (BSE) Sensex Sections III and IV describe, respectively, the experiments’
from Indian stock markets. Specifically, the first stage uses a settings, along with the data preparation and the dataset
support vector regressor (SVR), whereas the second one description, and show and discuss the qualitative and
exploits, in turn, ANN, random forest, and SVR. Ten quantitative results. Section V ends the paper with conclusions
indicators have been selected as input to the prediction and future work where we are headed.
models. In general, however, predicting the daily direction
(positive or negative) of the market requires to solve a II. The Proposed Approach
classification problem, whereas directly predicting the profit As pointed in Section I, most of the research in market
needs to address a regression problem. A comparison of prediction and financial forecasting is based on ANN or
different market price prediction approaches is shown in [21], machine learning approaches. These models are commonly
where classification-based approaches usually provide better trained on time series data describing a market index in the
results, such as [22]–[24], rather than some regression-based past, with the goal of predicting its future trend. The aim of
competitors. As reported in [25], moreover, “there is no this work is to achieve market prediction over the S&P500
general consensus on best forecasting technique for price index, by using an ensemble of CNNs, with the training phase
prediction”, particularly since “price series is inherently a executed over GAF images (particularly, the GADF). This
non-stationary series having non-constant mean and particular kind of imaging technique is detailed in Section II-
variance”, making it not always advantageous to represent the B. The rightmost part of Fig. 1 shows how the proposed
problem through linear models such as regression. trading system works; firstly, the data of the original time
Furthermore, the actual trend of the research seems to be series are aggregated according to 4 intervals of time; then,
oriented towards the analysis of the data in their raw forms, consecutive time frames of 20 observations are extracted from
therefore as time series, without doing any dimensionality each time series, in order to generate the related set of GADF
changing. Inspired by recent successes of supervised and images. Twenty similar CNNs (whose architecture is shown in
unsupervised learning techniques in computer vision, and with the leftmost part of Fig. 1) are trained over these images, and
the aim to change this trend, in this paper we propose a the threshold-driven ensemble approach takes place for
trading system based on the forecast of the daily direction of a deciding which action to perform the day after the
market index by exploiting the discriminatory capabilities of observations, as described in what follows.
the convolutional neural networks (CNN) when dealing with
GAF images. Indeed, CNNs has shown very good accuracy A. Trading Strategy
results when applied to pattern recognition on image and We design our system to simulate a classic intraday trading
video data [26], [27]. We encoded time series as images to strategy. This strategy generally consists of buying or selling a
allow machines to visually recognise, classify and learn specific financial instrument (in our case, the S&P500 index
structures and patterns. Reformulating features of time series future), by making sure that any open position is closed before
as visual clues has raised much attention in computer science the market closes in the same trading day. Specifically, we
and physics [28]. model our strategy such that, for each single trading day, the
Model training, evaluation and testing are executed on final output of our system is one of the following actions:
Standard & Poor’s 500 (S&P500) index future. Time series of • A long action, which consists of buying the stock, and
the future prices are processed in a twofold way: firstly, then selling it before the market closes;
different intervals of time are considered, in order to analyse • A short action, which consists of selling the stock (using
the same trend under different points of view; then, GAF the mechanism of the uncovered sale), and then buying it
images are built for each of the defined time intervals. before the market closes;
Therefore, the main contributions of the proposed approach • A hold action, which consists of deciding not to invest in
are as follows. that day.
• The proposed system exploits the GAF imaging approach The ideal target of this strategy requires the system to
for encoding time series data as images; choose the action that maximizes the economic return (i.e., the
• The composition of GAF images in a multi-resolution profit) of the day, given a prediction about the stock price
structure helps improving the market prediction results; trend in that day (i.e., whether the price will rise or fall). Thus,
• The classification phase is carried out by organising in an a long action is performed whenever our system predicts that
ensemble a set of CNNs which have the same architecture, but the price will rise in that day; conversely, a short action is
each of them is initialized with a different kernel function for chosen whenever our system predicts that the price will fall in
initialization. A majority voting-based policy is adopted; that day; last case, a hold action is performed whenever the
• Comparisons both with state of the art baseline approaches system is not enough confident about the market behaviour.
(e.g., buy-and-hold (B&H) strategy) and with the results of an
existing competitor method (Calvi et al. [29]) have been B. Gramian Angular Fields Imaging
performed, showing that the proposed system is capable of The GAF imaging is an elegant way to encode time series as
obtaining a higher profit in the same investment period. images. This has been proposed by Wang and Oates in [30].
BARRA et al.: DEEP LEARNING AND TIME SERIES-TO-IMAGE ENCODING FOR FINANCIAL FORECASTING 685
D1 D2
D4
D3
INPUT_40×40×3
GADF GADF
encoding encoding
3×3_CONV_32
MAX_POOL 2×2 GADF GADF
encoding encoding
DROPOUT_0,25
3×3_CONV_64
3×3_CONV_64
CNN
MAX_POOL 2×2
DROPOUT_0,25
CNN1 CNN2 CNN3 CNNN
3×3_CONV_128
3×3_CONV_128 Threshold
MAX_POOL 2×2
Ensemble
FC_1024
Long / Short
0/1
Fig. 1. On the leftmost side of the figure the architecture of the convolutional neural network is shown. On the rightmost side, the overall process of the pro-
posed trading system is depicted.
The main reasons which led to the definition of this approach resulting in a three channel matrix (further details on this
regards the possibility to use existing pre-trained models, process can be found in [28] and [30]). Note that the
rather than training recurrent neural networks from scratch or application of the color map is not strictly required by our
using 1D-CNN models. The last two models may result approach; however, preliminary experiments showed us that
inconvenient. applying the color map to the images led us to obtain better
In order to build the GAF images, first a rescaling of the results and, additionally, to achieve a faster network
real observations of the time series is needed; therefore, let convergence and stability.
X = {x1 , x2 , . . . , xn } be the considered time series with n
C. Multi-Resolution Time Series Imaging
components, the rescaling to the interval [–1, 1] is achieved by
applying the mean normalization The time-series data show a quite important factor, that is the
variation of a feature across the time and, therefore, how
(xi − max(X)) + (xi − min(X))
x̃i = . (1) quickly data change. The speed which regulates the change of
max(X) − min(X) the features provides many insights about the evolution of the
Hence, the scaled series is represented by event: unfortunately, this peculiarity is hidden or even
X̃ = { x̃1 , x̃2 , . . . , x̃n }. This is transformed to a polar coordinates impossible to identify when data granularity is too coarse. As
system by computing the angular cosine of the single a trivial example, let us think to how important is the granularity
components of the scaled time series of information in weather forecasting: an actual changing from
sunny to rainy which occurs in few minutes gives different
θi = arccos( x̃i ), x̃i ∈ X̃
. (2)
information with respect to the same change in 24 hours period.
i Moreover, often the observations contained in a time series are
ri = , with ti ≤ N
N not done at the same interval of time, i.e., the distance between
Finally, Gramian summation angular field (GASF) and two consecutive observations is not always the same, thus
Gramian difference angular field can be easily obtained by forcing the researcher to aggregate the data for uniforming this
computing the sum/difference between the points of the time distance through the entire time series. Given the two factors
series described above, our approach proposes a multi-resolution
√ √ imaging, which aggregates the data under K different intervals
GASF = [cos(θi + θ j )] = X̃ · X̃ − I − X̃ 2 ′ · I − X̃ 2 of time, thus creating K different, but analogous, time series. Let
√ √ D = {T, F1 , F2 , . . . , F N } be the original time series in which T
GADF = [sin(θi + θ j )] = I − X̃ 2 ′ · X̃ − X̃ · I − X̃ 2 . (3) defines the moment in which the observations of a given event
Fig. 2 shows the process for transforming a time series to are done, and Fi with i = 1, 2, . . . , N are the features describing
the GADF and GASF images. It is worth to notice that the the event. Given two consecutive observations in D, let us say
equations in (3) produce a 1D matrix as an output of the D j and D j+1, let ID be the distance between them in terms of
encoding process. This matrix actually represents a heatmap, time. The new K aggregated time series are D1 , D2 , . . . , DK , and
whose values range from 0 (blue) to 1 (red). In a successive the interval between two consecutive observations is IDk > ID
step, we applied the RGB color map to the image, thus for each k = 1, . . . , K .
686 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 3, MAY 2020
Dataset plot
A line plot on a polar axis
GADF
90°
1.00
0.50
1.0
0.8
0.25 0.6
0.4
0.2
S&P500 value
0 180° 0°
−0.25
−0.50
−1.00
270°
2018-01-19 2018-01-26 2018-02-02 2018-02-09 2018-02-16 2018-02-23 2018-03-02
Date GASF
Fig. 2. The policy of composition of the multi-resolution GADF images; in particular, (a)–(d) refer to the same label, but the observations considered in each
are aggregated in four different ways.
Fig. 3 shows the composition approach in which (a)–(d) are convolutional layers produce an output of the same size of the
input. The activation function in each of them is a ReLU,
four GADF images built from four time-series which differ
while in the classification layer the softmax function is used.
for their aggregation intervals. The composition aims at The ensemble configuration involves the training of twenty
building a unique image, which considers the evolution of the convolutional neural networks, each with a different weight
time series in a fixed period of time. initialization method; the configuration parameters for each
network are detailed in Table I. The choice of defining each
(a) (b)
network with different weight initialization methods arises
from the need of excluding most of the randomness factors
which can affect the final prediction. This ensures that the
answer of the ensemble originates from a proper convergence
of the weights, no matters how they are initialized. Once the
TABLE I
For Each Trained CNN, a Different Weight Initialization
Method is Applied. In the Table, the Set of Configuration
Paramenters is Reported
Init. method Configuration parameters Seed
Orthogonal gain = 1.0 –
(c) (d)
Lecun_uniform – –
VarianceScaling scale = 1, mode = “fan_in,” distr. = “normal” –
Fig. 3. The process leading to the generation of the GAF images: from left
to right, the data are first plotted and then the coordinate system is trans- RandomNormal mean = 0.0, stddev = 0.05 –
formed to a polar plane; finally, the GADF and GASF images are generated RandomUniform minval = –0.05,maxval = 0.05 –
according to the function defined in (3). TruncatedNormal mean = 0.0, stddev = 0.05 –
Glorot_normal – –
D. Ensemble of CNNs
Glorot_uniform – –
Once the time series are converted to GADF images, the
He_normal – –
training phase can take place. Fig. 1 shows the architecture of
He_uniform – –
the convolutional neural network involved (on the left). It
consists of a simplified version of the VGG-16 network [31], Orthogonal gain = 1.0 42
nets are trained, the test samples are given as input to all the the mean fluctuation of the market over a certain time interval.
networks. The final decision of the ensemble is taken by The practical importance of this factor relies on the
applying a majority voting based approach, which returns a assumption that there is a tight correlation between the mean
specific output r ∈ {0, 1}, according to the percentage of fluctuation and the quantification of the risks related to the
networks which agree with the same classification. In the asset [33]. In the recent years, many studies have been
experimental results section, six different thresholds (related conducted in this area to the point that a specific branch of the
to the agreement percentage of the networks) are tested, from market-related research focuses on the prediction of this
0.5 to 1, according to the percentage of networks which agree fluctuation [34]. On top of this, we can use the performances
to the same classification. of the B&H applied on S&P500 as a direct benchmark. B&H
is both a strategy easy to replicate, and also an extremely
III. Experimental Settings significant competitor when considering a time frame where
This section describes the settings of the experimental the market performs a large and quite constant growth.
phase. In particular, first, in Section III-A the dataset S&P500 Indeed, B&H is a passive investment strategy where an
is presented and described along with the B&H investment investor buys stocks and holds them for a long time, with the
strategy applied on it. This represents the baseline comparison hope that stocks will gradually increase in value over a long
method for the majority of the trading systems. Then, in period of time. This strategy works as follows: given an
Section III-B, the validation approach is defined. Finally, in investment period, B&H “buys” a stock at the beginning,
Section III-C the ensemble policy is shown. “holds” it for the entire investment period, and sells it at the
end. The net profit is the difference between the price at the
A. Evaluation and S&P500
end period and the price at the beginning. In Fig. 4, the
The S&P500 is maybe the most important U.S. index. Born selected time frame for comparing our approach against the
in 1789, at the beginning it only consisted of 90 titles. Since B&H strategy is shown (from 2009-02-01 to 2014-07-31); it is
1957, the year corresponding to when computers started to be easy to notice that this investment period is extremely
actively applied to the financial market, the number of quoted favourable for B&H and extremely challenging for our
companies has grown up to 500. It followed that S&P500 approach.
became one of the most influencing market in the U.S., even
overcoming the Dow-Jones index. For the trading operations B. Walks’ Definition
the S&P500 futures have been used. They consist of futures With respect to the common cross-validation approaches,
contracts on a stock or a financial index. These derivative which are typically applied when dealing with image
securities are used by investors and portfolio managers to classification, like the leave-one-out cross validation
hedge their equity positions against a loss in stocks; in other (LOOCV) or the k-fold cross validation, time series data need
words, S&P500 index is used by those who want to hedge risk a more specific purpose approach, since they need to consider
over a certain period of time. the semantic linking between the observation at time t and the
Currently, S&P500 index is one of the most widely traded one at time t + 1. The walk-forward validation strategy
index future contracts in the U.S. and it is computed by properly fits in this scenario since the folds which are
multiplying the S&P500 value by $50. As an example, if the considered for the validation are temporally split, and
S&P500 is at a level of 2500, then the market value of a future internally processed as one sample. As a consequence, the use
contract is 2500 × $50 or $125 000. of GADF helps maintaining the correlation between two
The S&P500 data are publicly available on many platforms consecutive observations, thus keeping unaltered the semantic
over Internet; also, these can be downloaded at different levels of the succession.
of granularity, according to the scope of the research that is For our own research we considered an investment period
being carried out. Historically the data are tracked and which goes from 2000-02-01 to 2015-01-30, for a total of 16
gathered once every 15 s, but some of the platforms only years and 4569 observations which are related to the actual
provide data daily. More in detail, for each of the acquired days in which the financial market has been opened. Each
records, the following data are available. observation has been labelled according to the difference
• Date of the observation: mm/dd/yyyy; close-open of the day after which is
• Interval of the observation: hh:mm:ss; • 1, if the close-open value of the day after is positive;
• Open: the value the index has opened in the specified date; • 0, otherwise.
• Close: the value the index has closed in the specified date; The data are divided in training, validation and testing sets;
• High: the highest value reached by the index in the in Table II, the walks are shown: in particular, each model has
specified date; been trained over a period of ten years, validated over the
• Low: the lowest value reached by the index in the following six months and finally tested over the last six
specified date. months.
We decided to use S&P500 as a benchmark for several
reasons. First of all, because it is one of the most relevant C. CNN Training and Ensemble Policy
stock markets in the world. The second reason is that it is According to the considerations exposed in Section II-C, we
extremely challenging to forecast its behaviour. Indeed, it used K = 4 for the evaluation of our approach. Starting from
shows a high level of volatility, where this measure indicates the original time series of S&P500, in which the observations
688 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 3, MAY 2020
2500
2400
2300
2200
2100
2000
1900
1800
1700
1600
Buy-and-hold 1500
Net profit 1400
1300
1200
1100
1000
900
800
700
600
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
WALKS
Training
Validation
Testing
Fig. 4. The net profit obtained by the application of the buy-and-hold strategy is shown over the period which has been taken into account for the experi-
ments (https://www.investing.com/indices/us-spx-500-futures).
TABLE II
The 11 Walks Used for Training, Validation, and Testing the Models and how They are Composed
Training Validation Test
# since → to since → to since → to
1 2000-02-01 → 2009-01-30 2009-02-01 → 2009-07-31 2009-08-02 → 2010-01-31
2 2000-08-01 → 2009-07-31 2009-08-02 → 2010-01-31 2010-02-01 → 2010-07-30
3 2001-02-01 → 2010-01-31 2010-02-01 → 2010-07-30 2010-08-01 → 2011-01-31
4 2001-08-01 → 2010-07-30 2010-08-01 → 2011-01-31 2011-02-01 → 2011-07-31
5 2002-02-01 → 2011-01-31 2011-02-01 → 2011-07-31 2011-08-01 → 2012-01-31
6 2002-08-01 → 2011-07-31 2011-08-01 → 2012-01-31 2012-02-01 → 2012-07-31
7 2003-02-02 → 2012-01-31 2012-02-01 → 2012-07-31 2012-08-01 → 2013-01-31
8 2003-08-01 → 2012-07-31 2012-08-01 → 2013-01-31 2013-02-01 → 2013-07-31
9 2004-08-01 → 2013-07-31 2013-08-01 → 2014-01-31 2014-02-02 → 2014-07-31
10 2004-08-01 → 2013-07-31 2013-08-01 → 2014-01-31 2014-02-02 → 2014-07-31
11 2005-02-01 → 2014-01-31 2014-02-02 → 2014-07-31 2014-08-01 → 2015-01-30
are sampled at intervals of 5 minutes, the data have been CNN gives as output a single-value which is 0 or 1 whether the
aggregated according to 4 new intervals: CNN suggests to perform a short or a long action, respectively.
• 1 hour; When working in ensemble, the final prediction is taken
• 4 hours; according to the answer of all the CNNs, according to the
• 8 hours; trading strategy defined in Section II-A. Note that in cases
• 1 day. where not all the CNNs agree with the same result, a coverage-
This means that starting from the original time series, we base approach is used: let N be the number of networks involved
aggregated the data in four different ways and from each we in the ensemble, t ∈ [0.5, 1] be a threshold which indicates the
required percentage of agreement, and A be the most predicted
selected 20 samples for predicting the market-day after (20 1-
action (between long or short) by the nets for the considered day.
hour blocks make one GADF for the first time series, 20 4-
Then, the system performs A in that day if and only if at least
hours blocks make one more GADF image, etc.).
n > tN networks have voted A; conversely, if the agreement
The close-open value has been computed for each sample,
threshold is not reached, the system does not perform any
and a 20 × 20 × 3 GADF image has been built and composed operation and just holds.
according to the process shown in Fig. 3. By following this
procedure, the aggregated GADF image will have dimension IV. Results and Discussion
40 × 40 × 3. For each of the walks defined in the previous In this section, the quantitative and qualitative results are
section, the ensemble of CNNs is executed over the test samples; shown and discussed. In the following we report the metrics
the majority voting approach works as follows: each of the we have used.
BARRA et al.: DEEP LEARNING AND TIME SERIES-TO-IMAGE ENCODING FOR FINANCIAL FORECASTING 689
• The Accuracy is a well-known metric; it is defined as the proposed approach is the only one capable of sensitively
ratio between the number of correct predictions against the overcome the baseline performance of the B&H. However, a
total number of test samples; peak in the performance is appreciable with the threshold set to
• The Coverage is a metric (expressed as a percentage) 0.6 , in which the highest net profit is obtained. The quantitative
which indicates how many times the networks in the ensemble results are shown in Table III and further confirm the good
agree on an action, according to the threshold set; qualitative scores. Moreover, the table shows the big difference
• The annualized Sharpe Ratio is a financial risk index used between the net profit obtained by our approach and the B&H
to help investors to understand the return of an investment strategy. Using ensemble thresholds higher than 0.7, the results
compared to its risk; the greater the value of the Sharpe Ratio, tend to degrade, mainly due to the fact that it becomes harder to
the more attractive the risk-adjusted return; obtain a total agreement among the nets in the ensemble.
• As the term suggests, the Net Profit is the actual profit we Anyway, even with a threshold set to 1 (meaning that all the nets
earned in the testing period; must agree to trigger a long or short action), we managed to earn
• The Sortino Ratio is a variation of the Sharpe Ratio that $ 1.375; thus, we still do not lose money, even though the risk
differentiates harmful volatility from total overall volatility by is quite high, as shown in the Sharpe Ratio and Sortino Ratio
using the asset’s standard deviation of negative portfolio plots. In general, the good performances for all the threshold
returns; it is defined as follows: values are a clear indicator of the robustness of our proposed
approach.
Rp − r f
S ortino Ratio = (4) In addition to comparing our results to the baselines, we have
σd also considered the approach proposed by Calvi et al. [29],
where R p is the actual portfolio return, r f is the risk-free rate, which consists of the daily prediction of the S&P500 index, as
and σd is the standard deviation of the downside; in our case, but by using a support tensor machine (STM) as a
• The return over maximum drawdown (RoMaD) is the predictor (defined as a tensor extension of the better known
average return in a given period for a portfolio; it is computed support vector machine). Fig. 6 shows the gap between the net
as follows: profit given by our method, and those returned by the B&H and
the work in [29]. The comparison does not take into account the
NetPro f it
RoMaD = (5) market period related to the 2008’s Financial Crisis (whose
MDD
effects impacted the years between 2007 and early 2009), when
where the Portfolio Return is the effective gain or loss real- it results unfair to overcome the B&H strategy, due to the
ized by an investment, and the MDD indicates the maximum dramatic performances of the index in this period.
draw down (MDD), which is defined as the maximum loss Conversely, the considered period (from late 2009 until
from a peak to a trough of a portfolio, before a new peak is at- 2016) highlights a strong increase of the S&P500 index, thus
tained. Currently the MDD is the preferred way to evaluate the improving the B&H performances. Nevertheless, our method
riskiness of an investment. is able to outperform such performances, which is not the case
The results of the proposed approach have been compared of the approach in [29].
against the following baselines: Finally, note that, in the financial forecasting domain, the
• B&H strategy: as explained in Section III, this represents prediction accuracy is very close to 50%, mainly due to the
the baseline comparison method for the majority of the trading high variability of the indexes which complicates a lot the
system approaches; decision between long and short actions. In this scenario,
• Random guessing: this easy technique exploits 10 random using a single CNN (both in terms of prediction and, where
classifiers, included in an ensemble that using a majority appropriate, in terms of probability of the prediction) usually
voting approach tries to perform long or short operations. The introduces a major drawback: the initialization seed may
comparison against this random predictor serves as proof that significantly affect the final result. Moreover, focusing on
our approach does not act randomly, that the performed improving the accuracy does not usually imply the net profit
actions (longs or shorts) have a strong basis, and that the to grow accordingly. It follows that, by slightly changing the
criteria are correctly learnt from the past trend of the market; initialization, the same network could give worse results for
• 1D-CNN: this approach does not apply the GADF the same period, or in any case could behave too differently
transformation to the time series; therefore, the time series are when tested on different markets. This reveals the need of
directly applied and processed by 1D-CNN. This test aims at stabilizing the results; the ensemble architecture, together with
showing the benefits of the GADF transformations in the different weight initialization policies, has shown to have a
proposed approach. much more robust behaviour, allowing us to obtain more
Fig. 5 shows the performances of the simple trading system stable results and thus alleviate the randomness. As a result,
based on our forecasting approach (in blue) against those according to the set thresholds, we perform a long operation
described before: the B&H strategy (in red), the 1D-CNN (in only when the ensemble suggests to buy when a certain
green) and the random guessing (in orange). The performances ensemble voting policy is satisfied; otherwise we perform a
are computed over all the walks that have been taken into short operation. The mitigation of the randomness yields two
account; on the x-axis of each plot the considered threshold for simple but significant consequences.
the ensemble is indicated. Overall, the thresholds from 0.5 to • When we lose, we tend to lose very little;
0.7 let us obtain the best achievements. As shown in the plot, the • When we win, we tend to win considerably.
690 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 3, MAY 2020
80000 12
60000 10
40000 8
Net Profit 20000
RoMaD
6
0
4
−20000
2
−40000
−60000 0
0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
Threshold Threshold
2 1.5
1
1.0
Annualized Sharpe Ratio
Sortino Ratio
0.5
−1
0
−2
−0.5
−3
−1.0
−4
0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
Threshold Threshold
Fig. 5. The comparison among the tested approaches. In particular, the blue, orange and green lines represent, respectively, the results obtained by applying
our approach, a random guessing approach and a 1D-CNN. The red line evidences the results which are obtained by the buy-and-hold strategy. In each plot, the
x-axis indicates the ensemble threshold considered, while in the y -axis it is shown the obtained Net Profit, the RoMaD value, the annualized Sharpe Ratio and
the Sortino Ratio, respectively, from the upper-leftmost image to the bottom-rightmost one.
TABLE III
The Quantitative Results of the Proposed Approach. for Each Metric, the Best Result is Highlighted in Bold
Threshold Coverage (%) Accuracy (%) Net Profit ($) RoMaD Annualized Sharpe Ratio Sortino Ratio
0.500 95.433 52.638 66.625 8.289 1.196 0.452
0.600 72.423 54.810 82.312 11.95 1.808 0.828
0.700 47.716 56.564 62.600 8.517 1.518 1.009
Proposed approach
0.800 22.950 56.632 46.2125 9.105 1.596 1.504
0.900 7.494 53.125 17.4875 3.970 0.452 0.370
1.000 1.990 55.882 1.375 0.365 –1.445 –0.524
2000 Acknowledgments
We gratefully acknowledge the support of NVIDIA
1500 Corporation with the donation of the Titan X GPU used for
this research.
1000
References
500 [1] T. Kimoto, K. Asakawa, M. Yoda, and M. Takeoka, “Stock market
2010 2011 2012 2013 2014 prediction system with modular neural networks,” in Proc. Int. Joint
Dates Conf. Neural Networks, San Diego, CA, USA, 1990, pp. 1–6.
[2] Y. D. Zhang and L. E. Wu, “Stock market prediction of S&P500 via
Fig. 6. The comparison, in terms of cumulative profit, between the pro- combination of improved BCO approach and BP neural network,”
posed approach (in blue) and the approach in [29] (in black). Finally, in or- Expert Syst. Appl., vol. 36, no. 5, pp. 8849–8854, Jul. 2009.
ange, we show the B&H baseline as a benchmark. [3] T. Z. Tan, C. Quek, and G. S. Ng, “Brain-inspired genetic
complementary learning for stock market prediction,” in Proc. IEEE
Congr. Evolutionary Computation, Edinburgh, Scotland, UK, 2005, pp.
This result is to be considered particularly significant, 2653–2660.
thanks to the capability of our approach of beating the B&H [4] S. Soni, “Applications of ANNs in stock market prediction: a survey,”
strategy in the years in which the latter performs well. Int. J. Comput. Sci. Eng. Technol., vol. 2, no. 3, pp. 71–83, 2011.
[5] T. Lintonen and T. Raty, “Self-learning of multivariate time series using
V. Conclusions perceptually important points,” IEEE/CAA J. Autom. Sinica, vol. 6,
In this paper we have proposed an innovative approach for the no. 6, pp. 1318–1331, Nov. 2019.
forecasting of market behaviour by using deep learning [6] K. Kamijo and T. Tanigawa, “Stock price pattern recognition-a
recurrent neural network approach,” in Proc. Int. Joint Conf. Neural
technologies and by encoding time series to GAF images. The Networks, San Diego, CA, USA, 1990, pp. 215–221.
developed CNNs have been applied to the GAF images for a [7] C. H. Lee and K. C. Park, “Prediction of monthly transition of the
classification task. Moreover, an ensemble was fed with the composition stock price index using recurrent back-propagation,” in
Artificial Neural Networks, I. Aleksander and J. Taylor, Eds.
CNNs above ans a majority voting strategy has been used to Amsterdam, Netherlands: Elsevier, 1992, pp. 1629–1632.
select the final classification. High results have been obtained [8] E. Guresen, G. Kayakutlu, and T. U. Daim, “Using artificial neural
using the S&P500 future, the market where we have trained, network models in stock market index prediction,” Expert Syst. Appl.,
validated and tested our networks and the overall ensemble. The vol. 38, no. 8, pp. 10389–10397, Aug. 2011.
GAF imaging technique has thus been applied within the [9] S. C. Gao, M. C. Zhou, Y. R. Wang, J. J. Cheng, H. Yachi, and J. H.
Wang, “Dendritic neuron model with effective learning algorithms for
financial technology domain bringing the benefits of CNN. classification, approximation, and prediction,” IEEE Trans. Neural
Moreover, our approach, combination of deep learning and Netw. Learn. Syst., vol. 30, no. 2, pp. 601–614, Feb. 2019.
GAF images technologies, outperformed the baseline strategies [10] D. B. Jia, S. X. Zheng, L. Yang, Y. Todo, and S. C. Gao, “A dendritic
consisting of B&H operations for a classification task where neuron model with nonlinearity validation on Istanbul stock and Taiwan
futures exchange indexes prediction,” in Proc. 5th IEEE Int. Conf.
long and short actions can be performed. The analysis of the Cloud Computing and Intelligence Systems, Nanjing, China, 2018, pp.
tuning of several hyperparameters is being carried out by our 242–246.
team and is subject of future works. Currently we are studying [11] T. L. Zhou, S. C. Gao, J. H. Wang, C. Y. Chu, Y. Todo, and Z. Tang,
the stacking policy of the GAF images (as shown in Fig. 3), and “Financial time series prediction using a dendritic neuron model,”
Knowl-Based Syst., vol. 105, pp. 214–224, Aug. 2016.
how accuracy and net profit vary according to the value of K [12] J. D. Farmer and A. W. Lo, “Frontiers of finance: evolution and
(currently set to 4). Note that our approach outperforms the B& efficient markets,” Proc. Natl. Acad. Sci. USA, vol. 96, no. 18,
H strategy, although the latter was still very competitive and pp. 9991–9992, Aug. 1999.
profitable within the considered period. As our approach is [13] V. S. Pagolu, K. N. Reddy, G. Panda, and B. Majhi, “Sentiment analysis
of twitter data for predicting stock market movements,” in Proc. Int.
highly promising, there are several directions we would like to Conf. Signal Processing, Communication, Power and Embedded
explore. System, Paralakhemundi, India, 2016, pp. 1345–1350.
• First of all, we would like to apply our method to other [14] A. Mittal and A. Goel, “Stock prediction using twitter sentiment
markets and understand the benefits it brings with respect to analysis,” 2012. [Online]. Available:http://cs229.stanford.edu/proj2011/
GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf
the baselines;
[15] N. Oliveira, P. Cortez, and N. Areal, “The impact of microblogging data
• Then, something we are already exploring consists of for stock market prediction: using twitter to predict returns, volatility,
applying the results of our ensemble to real trading platforms. trading volume and survey sentiment indices,” Expert Syst. Appl.,
The goal is to simulate the real earnings we would obtain on vol. 73, pp. 125–144, May 2017.
the past data and on a certain market. The platform we are [16] T. B. Trafalis and H. Ince, “Support vector machine for regression and
applications to financial forecasting,” in Proc. IEEE-INNS-ENNS Int.
already playing with is MultiCharts1; Joint Conf. Neural Networks 2000. Neural Computing: New Challenges
• We would also like to test our approach on different and Perspectives for the New Millennium, Como, Italy, 2000, pp.
348–353.
classification tasks within several domains such as sentiment
[17] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
1https://www.multicharts.com/ vol. 20, no. 3, pp. 273–297, Sept. 1995.
692 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 3, MAY 2020
[18] B. M. Henrique, V. A. Sobreiro, and H. Kimura, “Literature review: Salvatore Mario Carta is an Associate Professor at
machine learning techniques applied to financial market prediction,” the Department of Mathematics and Computer Sci-
Expert Syst. Appl., vol. 124, pp. 226–251, Jun. 2019. ence of the University of Cagliari, Italy. He received
the Ph.D. degree in electronics and computer science
[19] C. F. Tsai and S. P. Wang, “Stock price forecasting by hybrid machine from the University of Cagliari in 2003 and in 2005
learning techniques,” in Proc. Int. MultiConf. Engineers and Computer he joined the Department of Mathematics and Com-
Scientists, Hong Kong, China, 2009, pp. 60. puter Science of the University of Cagliari as an As-
[20] J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock market sistant Professor. In 2006 and 2007 he was a Guest
index using fusion of machine learning techniques,” Expert Syst. Appl., Researcher at the Swiss Federal Institute of Techno-
vol. 42, no. 4, pp. 2162–2172, Mar. 2015. logy as invited Professor, hosted by Laboratoire des
[21] D. Shah, H. Isah, and F. Zulkernine, “Stock market analysis: a review Systmes Intgrs-LSI. His research interests include clustering algorithms, so-
and taxonomy of prediction techniques,” Int. J. Financ. Stud., vol. 7, cial media analysis, and text mining for behavioral pattern identification and
no. 2, pp. 26, Jun. 2019. recommendation in group of users and in single users; AI algorithms for cred-
it scoring, fraud detection and intrusion detection; AI algorithms for financial
[22] M. Ballings, D. van den Poel, N. Hespeels, and R. Gryp, “Evaluating
forecasting and robo-trading; E-coaching platforms and AI algorithms for
multiple classifiers for stock price direction prediction,” Expert Syst.
Appl., vol. 42, no. 20, pp. 7046–7056, Nov. 2015. healthy lifestyles. He is author of more than 100 conference and journal pa-
pers in these research fields, with more than 1200 citations and is Member of
[23] S. Basak, S. Kar, S. Saha, L. Khaidem, and S. R. Dey, “Predicting the the ACM and of the IEEE. He founded 3 hi-tech companies, spin off of the
direction of stock market prices using tree-based classifiers,” North Am. University of Cagliari, and is currently leading one of them.
J. Econ. Finance, vol. 47, pp. 552–567, Jan. 2019.
[24] S. Dey, Y. Kumar, S. Saha, and S. Basak, “Forecasting to classification:
predicting the direction of stock market price using Xtreme gradient
boosting,” 2016. [Online]. Available: https://doi.org/10.13140/RG.2.2. Andrea Corriga is a Ph.D. candidate at the Depart-
15294.48968 ment of Mathematics and Computer Science of the
[25] S. K. Aggarwal, L. M. Saini, and A. Kumar, “Price forecasting using University of Cagliari, Italy. He received the master
wavelet transform and LSE based mixed model in Australian electricity degree in computer science in 2018. His research in-
market,” Int. J. Energy Sector Manage., vol. 2, no. 4, pp. 521–546, Nov. terests include financial forecasting using deep learn-
2008. ing and machine learning approaches.
[26] P. M. Kebria, A. Khosravi, S. M. Salaken, and S. Nahavandi, “Deep
imitation learning for autonomous vehicles based on convolutional
neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 82–95,
Jan. 2020.
[27] D. Freire-Obregón, F. Narducci, S. Barra, and M. Castrillón-Santana,
“Deep learning for source camera identification on mobile devices,”
Pattern Recognit. Lett., vol. 126, pp. 86–91, Sept. 2019. Alessandro Sebastian Podda is a Postdoctoral Re-
searcher at the Department of Mathematics and Com-
[28] Z. G. Wang and T. Oates, “Imaging time-series to improve
puter Science of the University of Cagliari, Italy. In
classification and imputation,” in Proc. 24th Int. Joint Conf. Artificial
2018, he received the Ph.D. degree in mathematics
Intelligence, 2015.
and informatics, supported by a grant from RAS
[29] G. G. Calvi, V. Lucic, and D. P. Mandic, “Support tensor machine for (Autonomous Region of Sardinia). Previously, he re-
financial forecasting,” in Proc. IEEE Int. Conf. Acoustics, Speech and ceived the B.Sc. and M.Sc. degrees in informatics
Signal Processing, Brighton, United Kingdom, 2019, pp. 8152–8156. (both with honours) at the University of Cagliari.
[30] Z. G. Wang and T. Oates, “Encoding time series as images for visual Currently, he is a Technical Coordinator of the re-
inspection and classification using tiled convolutional neural networks,” search projects DoUtDes and SardCoin. His current
in Proc. Workshops at the 29th AAAI Conf. Artificial Intelligence, 2015. research interests mainly include deep learning, financial forecasting, inform-
[31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for ation security, blockchains, and smart contracts.
large-scale image recognition,” arXiv: 1409.1556, 2014.
[32] K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning
for image recognition,” in Proc. IEEE Conf. Computer Vision and Diego Reforgiato Recupero has been an Associate
Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778. Professor at the Department of Mathematics and
[33] P. Cizeau, Y. H. Liu, M. Meyer, C. K. Peng, and H. E. Stanley, Computer Science of the University of Cagliari,
“Volatility distribution in the S&P500 stock index,” Phys. A Stat. Mech. Italy, since December 2016. He received the Ph.D.
Appl., vol. 245, no. 3–4, pp. 441–445, Nov. 1997. degree in computer science from the University of
Naples Federico II, Italy, in 2004. From 2005 to
[34] M. Martens, “Measuring and forecasting S&P500 index-futures
2008 he has been a Postdoctoral Researcher at the
volatility using high-frequency data,” J. Futur. Mark., vol. 22, no. 6,
University of Maryland College Park, USA. He won
pp. 497–518, Jun. 2002.
different awards in his career (such as Marie Curie
International Reintegration Grant, Marie Curie In-
Silvio Barra received the B.Sc. (cum laude) and the novative Training Network, Best Research Award from the University of
M.Sc. degrees (cum laude) in computer science from Catania, Computer World Horizon Award, Telecom Working Capital, Star-
University of Salerno, Italy, in 2009 and 2012, re- tup Weekend). He co-founded 6 companies within the ICT sector and is act-
spectively. In 2017, he received the Ph.D. degree at ively involved in European projects and research (with one of his companies
the University of Cagliari, Italy. Currently he is a re- he won more than 30 FP7 and H2020 projects). His current research interests
search Fellow at the University of Cagliari. He is a include sentiment analysis, semantic web, natural language processing, hu-
Member of the GIRPR. His main research interests man robot interaction, financial technology, and smart grid. In all of them,
include pattern recognition, biometrics, and video machine learning, deep learning, big data are key technologies employed to
analysis and analytics. effectively solve several tasks. He is author of more than 100 conference and
journal papers in these research fields, with more than 1000 citations.