0% found this document useful (0 votes)
14 views38 pages

Revised Manuscript

Uploaded by

desilvanithika9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views38 pages

Revised Manuscript

Uploaded by

desilvanithika9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This is a repository copy of Explainable artificial intelligence modeling to forecast Bitcoin

prices.

White Rose Research Online URL for this paper:


https://eprints.whiterose.ac.uk/199663/

Version: Accepted Version

Article:
Goodell, JW, Jabeur, SB, Saadaoui, F et al. (1 more author) (2023) Explainable artificial
intelligence modeling to forecast Bitcoin prices. International Review of Financial Analysis,
88. 102702. ISSN 1057-5219

https://doi.org/10.1016/j.irfa.2023.102702

Reuse
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs
(CC BY-NC-ND) licence. This licence only allows you to download this work and share it with others as long
as you credit the authors, but you can’t change the article in any way or use it commercially. More
information and the full terms of the licence here: https://creativecommons.org/licenses/

Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing [email protected] including the URL of the record and the reason for the withdrawal request.

[email protected]
https://eprints.whiterose.ac.uk/
Explainable artificial intelligence modeling to forecast Bitcoin prices

John W. GOODELL1, Sami BEN JABEUR2, Foued SAÂDAOUI3 , Muhammad Ali Nasir4

Abstract

Forecasting cryptocurrency behaviour is an increasingly important issue for investors. However, proposed
analytical approaches typically suffer from a lack of explanatory power. In response, we propose for
cryptocurrency pricing an explainable artificial intelligence (XAI) framework, including a new feature
selection method integrated with a game-theory-based SHapley Additive exPlanations approach and an
explainable forecasting framework. This new approach, extendable to other uses, improves both forecasting
and model generalizability and interpretability. We demonstrate that XAI modeling is capable of predicting
cryptocurrency prices during the recent cryptocurrency downturn identified as associated in part with the
Russian-Ukraine war. Modeling reveals the critical inflection points of the daily financial and
macroeconomic determinants of the transitions between low and high daily prices. We contribute to financial
operating systems research and practice by introducing XAI techniques to enhance the transparency and
interpretability of machine learning applications and to support various decision-making processes.

Keywords: Decision support systems; Explainable artificial intelligence; SHAP value; Feature selection;
Cryptocurrency prices

1. Introduction

Since the inception of bitcoin and concomitant underlying blockchain technology, digital currencies
and assets have expanded to thousands of assets, several blockchains, and hosts of solutions for a variety of
financial and business uses. Over time, competitors have attempted to develop new digital assets that improve
on Bitcoin’s paradigm as a store of value and a transactional asset. However, bitcoin remains the most popular
crypto asset in terms of market capitalization and so, arguably, representative of the cryptocurrency market.
Fluctuations in the prices of digital currencies naturally lead investors, scholars, and policymakers
to have concerns. Anonymous, decentralized, and unregulated crypto markets may manifest bubbles that
threaten financial stability (Atsalakis et al., 2019 ). However, the behaviour of crypto markets has been
difficult to predict. Consequently, an ability to accurately forecast bitcoin prices may not only help investors

1
College of Business, the University of Akron, 259 S Broadway St, Akron, OH 44325, UNITED STATES,
E-mail: [email protected], Corresponding author.
2
Institute of Sustainable Business and Organizations, Confluence: Sciences et Humanités, UCLY, ESDES, FRANCE,
E-mail: [email protected]
3
Department of Statistics, Faculty of Sciences, King Abdulaziz University, P.O. BOX 80203, Jeddah 21589, SAUDI
ARABIA, E-mail: [email protected]
4
Department of Economics, University of Leeds, Department of Land Economy, University of Cambridge, UNITED
KINGDOM, Email: [email protected]

1
make decisions but will also help governments design regulatory policies (Liu et al., 2021). Forecasting
bitcoin prices is a serious issue when it comes to risk management, and it merits careful consideration by
investors and financial institutions (Nasir et al., 2019).
Our model testing period focuses on the recent decline in the value of cryptocurrencies from February
to June of 2022. This has been attributed in part to the response of stock markets to the conflict between
Russia and Ukraine (Bissoondoyal-Bheenick et al., 2022; Shih et al., 2019; Boungou & Yatié, 2022; Saâdaoui
et al., 2022 ), as well as suggest possible liquidations by prominent cryptocurrency investors Khalfaoui et al.
(2022a). Khalfaoui et al. (2022a) note that the co-movements of war attention and cryptocurrency prices rely
on the investment horizon and the current status of the market, documenting that war attention has a short-
term negative (positive) impact on the value of cryptocurrencies. However, factors conditioning the influence
of the Russian-Ukraine war on cryptocurrencies is still largely unexplored. In this regard, we investigate the
predictive power of several potential conditioning factors in forecasting BTC prices related to the Russia–
Ukraine war and their economic consequences. We hope to assist cryptocurrency investors in broadening
their analysis of the causes behind the recent sharp drops in the values.
As traders and policymakers must develop effective warning systems to forecast asset prices
including digital currencies. Large volumes of information are captured and stored on many big-data
platforms for analytical purposes. However, in many cases, this large quantity of information, rather than
providing advantages for optimum decision-making, instead complicates decisions. Further, gathering,
storing, and processing this information is prohibitively expensive (Ghaddar & Naoum-Sawaya, 2018).
Decision-makers must recognize the essential facts from this wealth of data to construct an efficient and
practical prediction model without compromising the accuracy of the anticipated output. The achievement of
such goals is possible through feature selection, which is an essential element in the process of data
preparation in machine learning (ML) models (Simumba et al., 2022; Ben Jabeur et al., 2021).
This study contributes to the literature on forecasting bitcoin prices in several ways: First, we develop
a new approach, where using an improved Shapley Additive exPlanations (SHAP) algorithm, based on
feature importance selection (FS-SHAP) is proposed to forecast the prices of the financial assets including
bitcoin. Our algorithm suggests that FS-SHAP can enhance the accuracy and interpretability of artificial
intelligence (AI) in the prediction of BTC prices. In fact, feature selection is a critical step in many machine
learning applications (Chandrashekar and Sahin, 2014). According to Labbé et al. (2022) feature selection is
a necessary procedure for avoiding overfitting and reducing database size without major information loss.
We carry out a number of numerical experiments using FS-SHAP, and we compare the results of these
experiments to other state-of-the-art approaches that are outlined in the relevant technical literature. These
experiments indicate that our methodology achieves comparable or better outcomes than the alternative
methods, with the additional benefit of being simpler and easily interpretable.
Further, we build explainable artificial intelligence (XAI) models to develop and investigate the data-
driven nexus between financial and macroeconomic determinants and bitcoin prices. A framework that is
explainable has been presented in order to meet the interpretative requirements imposed by external

2
stakeholders for machine models (Zhang et al., 2022). The whole process ensemble method establishes an
FS-SHAP model with excellent accuracy and stability from feature selection through predictor creation. This
model is supposed to search for a feature subset swiftly and efficiently with good accuracy. Through FS-
SHAP, we demonstrate the utility of this modeling. We do this for both to improve forecasting of bitcoin
prices and to identify factors that impact these prices, and to motivate the use of our methodology for future
research in a variety of forecasting contexts.
Demonstrating the utility of XAI modeling, we highlight the inflection points of the critical
macroeconomic factors above or below which the bitcoin prices will respond. Based on this utility, we
demonstrate that long-term forecasting of ongoing daily bitcoin prices benefits from identifying these factors.
We infer from this that other forecasting contexts will be enhanced by this modeling. We add to existing
studies by providing additional determinants for predicting bitcoin prices. In this regard, key macroeconomic
determinants that influence bitcoin prices, such as climate policy uncertainty, public attention to inflation and
recession, and uncertainty factors (Twitter uncertainty) and new sentiments are investigated. Our study
contributes to the behavioural economics theory by combining insights from psychology and economics to
explain how people make decisions (Tversky and Kahneman, 1992; Kahneman and Tversky, 2013).
According to this argument, individuals care more about the bad repercussions of inflation and recession than
the positive ones. This article provides fresh insights into how media framing and public dialogue influence
perceptions of economic news, particularly inflation and recession concerns during the Russia–Ukraine war.
Our study also provides new evidence on the role of geopolitical risks in predicting cryptocurrency
prices, particularly during the Russia–Ukraine war and contributes to a growing body of literature on the
financial repercussions of wars, terrorist acts, and other types of collective violence (Boubaker Glick &
Taylor, 2010; Moretti et al., 2014; Pástor & Veronesi, 2013; Caldara & Iacoviello, 2022; Saâdaoui et al.,
2022). Moreover, climate policy uncertainty political economy theory ((Stigler, 2021). Uncertainty around
climate policy may result in uncertainty regarding regulations for a number of businesses, including the
cryptocurrency industry. If investors become more risk-averse as a result of worries about the effect of
climate change or the possibility of legislative changes connected to climate change, this might lead to a
reduction in demand for Bitcoin, which would ultimately result in prices falling. Our research offers a
thorough comprehension of the effects that global warming bitcoin market may have on the Bitcoin
market.
The remainder of the paper is organized as follows: Section 2 provides background and critically
discusses the literature on forecasting crypto prices; Section 3 describes the data sources across the data
processing steps; Section 4 outlines the variational mode decomposition (VMD) and machine learning
models adopted for this research; Section 5 reports on the causality settings, the predictive performance of
ML methods, and provides discussion. Section 6 concludes.

3
2. Literature review

Initially, research on forecasting cryptocurrency dynamics was primarily on the potential drivers of
the bitcoin prices. Early studies evidence gold, oil, natural gas, and bitcoin prices as correlated, with these
assets together being portfolio diversifiers (Symitsi & Chalvatzis, 2019; Maghyereh & Abdoh, 2020;
Goodell and Goutte, 2021; Zhang et al., 2022; Khalfaoui et al., 2022 ). Parvini et al. (2022) report that gold
rates have a strong power for predicting bitcoin prices in both training and validation periods. Basher and
Sadorsky (2022) evidence of the relationship between oil and the bitcoin price, finding that the oil price
volatility index is a significant predictor of both the bitcoin and gold price direction, consistent with bitcoin
serving as a replacement for gold in terms of diversifying this type of volatility. However, some studies
dispute whether cryptocurrencies are diversifiers for equity (e.g., Goodell and Goutte, 2021).
Rehman and Kang (2021) examine the time-frequency nexus between bitcoin prices and oil. Their
results revealed a significant relationship with oil over a period of 128 to 256 days. More recently, based on
Granger causality testing, Li et al. (2022) investigate extreme risk transmission between bitcoin and crude
oil assets. They report the existence of extreme asymmetry in the oil-bitcoin linkage, as well as time-varying
causality at different timescales.
The most recent work of Basher and Sadorsky (2022) includes a set of macroeconomic factors as
features in forecasting the bitcoin price, consisting of economic policy uncertainty (EPU) and inflation.
Moreover, the prices of the S&P500, the volatility index, the U.S. dollar index, Ethereum, Litecoin, and
Ripple were included among the predicting factors as a number of previous studies had shown a connection
between bitcoin and the aforementioned commodities markets (Celeste et al., 2020; Jiang et al., 2022; Wu et
al., 2022; Nguyen, 2022; Al-Shboul et al., 2022). For example, Fang et al. (2019) have stated that global EPU
improves the prediction of bitcoin volatility. Demir et al. (2018) investigate the prediction power of EPU on
bitcoin returns, evidencing that EPU has positive and significant impacts in both the lower and higher
quantiles. In another study that employed the vector autoregressive approach, Blau et al. (2021) examine the
relationship between bitcoin and inflation expectation rates, finding that changes in bitcoin cause changes in
the forward inflation rate. During the recent COVID-19 pandemic, Choi and Shin (2022) examine the
relationships among inflation, uncertainty, and bitcoin. Their results suggest bitcoin appreciation in with
inflation shocks so that bitcoin can be a hedge against future price increases. However, considering the recent
2022 fall in cryptocurrency values during a time of global resurgence of inflation, clearly forecasting bitcoin
prices requires considering a number of factors.
In terms of methodological choice, there are different approaches explored for predicting the price
of Bitcoin and other cryptocurrencies. For instance, in a study based on the generalized autoregressive
conditional heteroskedasticity (GARCH) model, Katsiampa (2017) report that the autoregressive component
GARCH has the best goodness-of-fit to the data of any model. In another study, Sun et al. (2020) use the
light gradient-boosting machine (LightGBM) and 42 features to forecast the price of cryptocurrencies. Based
on their findings, LightGBM modeling is more accurate and reliable for making predictions. Han et al. (2020)

4
combine genetic algorithms and the NARX neural network to predict Bitcoin returns. The authors compare
the hybrid model to the feed-forward model, finding that the latter is superior for predicting Bitcoin geometric
returns. Mallqui and Fernandes (2019) propose recurrent neural networks and a tree classifier technique to
predict bitcoin price direction. According to them, this proposed methodology leads to the best performance.
There is also a suggestion that compared to GARCH models, hybrid artificial neural network modeling
performs better and improves forecasting (see Kristjanpoller & Minutolo, 2018).
Focusing on intraday technical trading and using artificial neural networks for bitcoin return
prediction, Nakano et al. (2018) use a deep-learning approach. They conclude that their method significantly
enhances the efficiency of a buy-and-hold investment approach. Using technical analysis at high frequencies,
Alonso-Monsalve et al. (2020) investigate the feasibility of neural networks with a convolutional component
as an alternative to conventional multilayer perceptions in the area of trend classification of cryptocurrency
exchange rates. Based on 18 technical indicators, their results indicate that the convolutional neural network
LSTM performs the best, and indicated good results, especially with the bitcoin, Ether and Litecoin
cryptocurrencies. Table 1 presents a summary of recent typical literature reviews on forecasting
cryptocurrency prices.

(Insert Table 1)

An overview of previous research on Bitcoin forecasting suggests various contrasting approaches


that suffer from a lack of explanatory power. Therefore, we are motivated to propose and demonstrate an
explainable artificial intelligence (XAI) framework, including a new feature selection method integrated with
the game-theory-based SHapley Additive exPlanations approach and an explainable frame for forecasting
cryptocurrency prices. Further, this new approach can be extended to other uses as it comes with improved
forecasting performance as well as improved model generalizability and interpretability. Therefore, we feel
that demonstrating this novel modeling will have applications in future research beyond just being applied to
cryptocurrency price forecasting.

3. Data and variables

We use daily time series covering the period from August 2016 to the end of June 2022. Our main
variable of interest is the Bitcoin price series. We use the BTC price index (BTC). from
www.investing.com. Based on previous literature, 15 predictor variables are identified for inclusion
in this investigation of the effectiveness of the forecasting models, namely Litecoin (LTC),
Ethereum (ETH), the S&P500 (SP500), Gold, Brent crude oil future (Oil), Volatility (VIX), the U.S.
dollar index (USDI), the 5-year forward inflation expectation rate (YIFR), inflation (INF), recession
(REC), the geopolitical risk index (GPR), Twitter-based economic uncertainty (TEU), the news-
based sentiment index (NSI), and the infectious disease equity market volatility index
(INFECTION). Table 2 provides definitions and sources of variables included in our data set, while

5
Fig. 1 illustrates the correlation matrix and Table 3 depicts the descriptive statistics. Moreover, the
BDS of Broock et al. (1996) test is used to analyze three different types of the BTC price: level data, returns,
and log-returns (test statistics: 109.65, 12.89 and 5.38. respectively, with all p-values under 10-5). This implies
that a more sophisticated nonlinear model may be better suited to model the data. In other words, a linear
model may not accurately capture the underlying relationships between the variables in the data, and a
nonlinear model may provide a better fit.

(Insert Table 2)

(Insert Table 3)

(Insert Figure 1)

Data are divided into training and out-of-sample subsets. There are several methods for splitting the
sample. About 80% of the dataset is used to train the model, with the remaining 20% used for validation.
This approach is similar to previous research (Parvini et al., 2022). To evaluate performance, we use out-of-
sample R2 (R2OOS) This indicator is defined as follows:
2 ∑𝑛 (𝑧 −𝑧̂ )
𝑅𝑂𝑂𝑆 = 1 − [∑𝑛𝑡=1(𝑧 𝑡−𝑧̅ 𝑡)2 ] (1)
𝑘=1 𝑡 𝑡

where zt is the true observation, 𝑧̂𝑡 the predicted value of time t, 𝑧̅𝑡 the historical mean of the bitcoin price and
n the total number of observations in the out-of-sample dataset.

4. Methodology

4.1 Feature selection methodology

Recently, feature selection has emerged as a difficult challenge in a variety of machine-learning


areas, including regression problems (Jiménez-Cordero et al., 2021). The proposed framework involves using
the FS-SHAP algorithm to select the most important variables for a given model, thereby improving
prediction performance. The methodology can be summarized by seven steps, including training the Extra
trees model, computing the Shapley value for all features, and selecting the k highest-ranking features. Figure
2 provides a visual summary of the methodology and its key components. The following section outlines the
proposed approach for selecting the most relevant features during the construction of machine learning
models.

4.1.1. Shapley values for feature selection

Initial characteristics may be noisy and redundant in certain cases, negatively impacting the model
training stage. As a result, efficient classification work requires the use of a strong feature extraction
approach. This paper uses SHapley Additive exPlanations (SHAP), proposed by Lundberg and Lee (2017).
The SHAP values are useful for illustrating how each attribute contributes to the model’s final prediction

6
(i.e., prediction). SHAP values are a relatively new metric used in machine learning to evaluate the efficacy
of any decision-tree-based model. The SHAP values are determined by defining the output of a tree based on
a subset of functions S, defined as ℎ𝑥 (𝑆) = [𝐸(ℎ(𝑥)], and the SHAP values are calculated as follows:
|𝑆|!(𝐾−|𝑆|−1)!
𝜙𝑖 ,𝑗 = ∑𝑆⊆𝑁{𝑖,𝑗} 𝐾!
[ℎ𝑥 (𝑆 ∪ {𝑖} − ℎ𝑥 (𝑆)], (5)

where K is the number of input features.


In this paper, a SHAP value technique based on feature importance selection (FS-SHAP) is proposed
to predict bitcoin prices. It is necessary to remove redundant and unneeded variables while maintaining the
accuracy of different machine-learning models (Ben Jabeur et al., 2022). According to García et al. (2016),
feature selection results in increased interpretability, simpler modeling, shortened learning time, and
improved generalizations. To accomplish this goal, the FS-SHAP method was utilized to minimize
dimensionality. This approach will not only improve forecasting performance but will also improve model
generalization and interpretability. When the SHAP value is larger, the corresponding variable vector will be
more important. In other words, the order of the scores should go from highest to lowest for each of the input
features. We selected the top-ranked factor whose score was higher than k for the feature sets, as illustrated
in Table 4.

(Insert Table 4)

(Insert Figure 2)

4.1.2. Granger causality

Granger causality, based on stochastic linear regression modeling, is often used to determine whether
one economic variable may aid in the forecasting of another economic variable (Granger, 1969). It uses the
Fisher test to see whether lagged information on a variable Y tells us anything important about a variable X
when the lagged X is also present. If Xt, Yt are two stationary time series, the simple causal model with
autoregressive lag length p, ordinary least squares, may be used to estimate the causal relationship between
X and Y as follows:
𝑝
𝑋𝑡 = C𝑡 + ∑𝑖=1 β𝑖 𝑋𝑡−𝑖 + ∑𝑃𝑖=1 𝛼𝑖 𝑌𝑡−𝑖 + 𝜇𝑡 (2)

Based on the null hypothesis H0, that Y does not cause X (i.e., β1, β2, …, β = 0 ), Granger causality
is a popular method for analyzing time series data in many fields, including economics and finance. Even
though this framework is very popular, it has been the subject of ongoing debate about whether or not it can
be used to find causal relationships between time series (Shojaie & Fox, 2022).
4.1.3. Variational mode decomposition causality

Variational mode decomposition (VMD) is considered to build non-linear and non-stationary signals
into orthogonal sub-signals, known as intrinsic mode functions (IMFs) and trends, which reflect different

7
time scales. Variational mode decomposition is a new adaptive technique for time series multi-scaled
decomposition (Saâdaoui et al., 2022b); it works on the premise of adaptively breaking down an input time
(𝑚)
series Xt into a number of modes 𝑢𝑡 . According to Wang et al. (2015), VMD is expressed as a constrained
variational problem as follows:
(𝑚)
𝑋𝑡 = ∑𝑀
𝑚=1 𝑢𝑡 (3)

(ℎ)
where M is the number of modes and 𝑢𝑡 is the mth intrinsic mode function (IMF).
Thus, VMD-based cross-correlation provides a multi-scaled approach for delving extensively into
the scale-by-scale lead-lag connection between two signals. It is feasible to spread the VMD principle to the
study of scale-by-scale causality. At each VMD scale m (for each IMF), the test has two possible alternative
hypotheses:
(𝑚) (𝑚)
𝐻0 : 𝑢𝑡,𝑥 ⇏ 𝑢𝑡,𝑦
{ (𝑚) (𝑚)
(4)
𝐻𝑎 : 𝑢𝑡,𝑥 ⇒ 𝑢𝑡,𝑦

(𝑚)
The null hypothesis in H0 shown by the crossed-out sign suggests that component 𝑢𝑡,𝑥 does not cause
(𝑚)
component 𝑢𝑡,𝑦 .

4.2. Machine learning models

4.2.1. Linear regression

The field of machine learning considers linear regression (LR) as a standard method. Ordinary least
squares (OLS) is often used to estimate the intercept and slope regression parameters. The model may be
summarized as follows:
𝑝
𝑌̂ = β0 + ∑𝑖=1 β𝑖 𝑋𝑖 + ε𝑖 , (6)

where Y represents the response variable, Xi represents the predictor, and β𝑖 represents the parameter
determined through OLS regression. Then, by assessing the multicollinearity, significant independent
variables are chosen (Sarstedt & Mooi, 2014). According to Nolan and Ojeda-Revah (2013), OLS provides
a poor fit and leads to faulty predictions in the absence of normality of the error terms. Alternatively, machine
learning models may be used when errors are not normally distributed.
4.2.2. Support vector machine

Support vector regression (SVR) is a regression technique introduced in 1998 (Smola and Schölkopf,
2004). It has a powerful generalization capacity and can tackle actual issues, such as a small sample size,
high dimensionality, strong nonlinearity, and local extremum. In addition, it has a high dimensionality
(Huang et al., 2022). The nonlinear support vector regression (Vapnik et al., 1996) algorithm attempts to
handle the following nonlinear regression problems:
𝑌̂ = 𝑤∅(𝑥)𝐻 + 𝑐 (7)

8
where x are the input variables. φ(x) represents a mathematical function that maps the input vector x into a
higher dimensional feature space, ω are the weight vector, c and H are the intercept and transpose operator.
The fundamental benefit of the SVR is that it uses the structural risk minimization concept to reduce an upper
limit on the generalization error, rather than the empirical risk minimization principle to reduce the training
error. As a result, it should always succeed in achieving the global optimum. In addition, even with a small
data sample, the SVR may provide high generalization results.
4.2.3. Random forest

Random forest (RF) is a set of tree predictors that generate each tree by sampling an independent
random vector, incorporating a regression and a classification approach. Random forest is a regression
algorithm. According to Breiman (2001), in regression, numerical values are obtained from the tree predictor,
in contrast to the labels obtained from the random forest classifier. According to Jabeur et al. (2021), the
function can be estimated as follows:
1
𝑌̂ = 𝑇 ∑𝑇𝑖=1 𝑔𝑘 (𝑋) (8)

where g(x) is a set of the kth learner random tree, and x is the vector of the input variables. The dataset is
divided into homogenous subsets at random using the bootstrap sample algorithm. Each tree is grown and
trained using a random subset of the data, and its validity and accuracy are estimated using the remaining
samples.
4.2.4. XGBoost regression algorithm

XGBoost, developed by Chen and Guestrin (2016) as an ensemble machine learning technique, has
been used in various research fields. In XGBoost, a sequential ensemble method, also known as sequential
decision tree building, is used to build a sequential decision tree. Every sample in the dataset is assigned a
weight, which determines how a decision tree will choose weight for further analysis. The following equation
is used to combine the data from each tree for the first productivity prediction:
𝑌̂ = ∑𝑇𝑡=1 𝑓𝑡 (𝑥𝑖 ) (9)

where T denotes the number of trees, and 𝑓𝑡 (𝑥𝑖 ) is the regression tree’s output of the input xi.
Appropriate hyperparameters must be determined while developing the XGBoost with improved
prediction performance. In particular, the hyperparameters crucial to the initialization of the model should
be validated against a range of values (Jabeur et al., 2021). Furthermore, this approach offers efficient and
valuable answers to previously unresolved optimization problems.
4.2.5. Extra trees regression

As an extension of the RF method, the extra trees (ET, or extremely randomized trees) algorithm
proposed by Geurts et al. (2006) is a machine learning approach that is less prone to overfit a dataset. This
approach is extremely similar to the RF algorithm used to build several decision tree models. There are two
significant distinctions. The primary distinction is in the full splitting of the descriptors at random at the node.

9
The second difference is that each tree is constructed using the whole dataset as a distinction. In the regression
case, the function is computed as follows:
𝑌̂ = ∑𝑇𝑗=1(𝑦𝑗 − 𝑦̅(𝑡))⏉ (𝑦𝑗 − 𝑦̅(𝑡)) (10)

where 𝑦̅(𝑡) is the sample mean of the output vector at node t, and yj is the median of the output vector at node
t. In contrast to RF, a tree is built using the whole training data set (Schmid et al., 2022). This additional
degree of randomization is intended to enhance the decorrelation process, resulting in less variance and
maybe even greater predicted accuracy in certain circumstances.
4.2.6. Deep neural networks

Recently there has been increasing scholarly attention to the effective implementation of deep neural
networks (DNN) in a number of different contexts (Krauss et al., 2017; Jabeur et al., 2022). Deep neural
networks can readily handle models with very complicated and nonlinear predictor-outcome interactions.
Deep neural networks beat superior models from classical machine learning in a variety of circumstances.
The model iw trained using a stochastic gradient descent training approach and a feedforward neural network
architecture. Deep neural networks may be formulated by combining numerous single-layer networks to
construct a DNN with k layers:
𝑔1𝑁𝑁 (𝑔1𝑁𝑁 (… 𝑔1𝑁𝑁 (𝑋)))
g(x)𝐷𝑁𝑁 = ⏟ (11)
𝑘

where g1NN is a one-layer perceptron trained using an activation function such as a sigmoid, a hyperbolic
tangent, or a rectified linear unit. It is important to note that the dimensions of different layers are not always
the same. Deep neural networks may include anything from two hidden layers to possibly hundreds of layers,
depending on the application. This may quickly lead to networks with tens of millions of degrees of freedom
when combined with the equivalent dimension of each layer.
4.2.7. Long–short term memory

An enhanced deep learning technique built on a recurrent neural network is called Long–Short Term
Memory (LSTM). Recurrent neural networks are excellent at mining time series and semantic information
from input that has sequential properties. The input layer at each instant also influences the hidden layer at
that instant, as does the hidden layer at the instant before. The output values at time t are calculated as follows:
𝑌̂ = 𝛿(𝑊𝑓 . [ℎ𝑡−1 , 𝑥𝑡 ) + 𝑐𝑓 ) (12)

where Wf is the weight matrix, cf is the bias, and the operator 𝛿() is defined as a sigmoid layer that pushes the
values between 0 and 1. The bottom layer is the hidden layer represented by ht and is the input of LSTM at
day t. The training dataset is utilized to continually update the parameters in the network throughout the
training process, with the model being saved after training. The rectified linear unit function performs
calculations more quickly than the sigmoid and tanh functions while simultaneously solving the gradient
vanishing issue in the positive interval. This function serves as the activation function for all layers.
4.3. Explainable artificial intelligence framework

10
In the past, researchers have used machine learning and ensemble strategies to improve the predictive
performance of forecasting bitcoin prices. However, investors are still cautious about fully adopting these
approaches (Zhang et al., 2022). Because of the complexity of the models, machine learning techniques do
not expose the fundamental processes that they use, and it may be challenging to explain and confirm the
models’ predictions. In complicated models, this issue is sometimes referred to as the ‘black box’ problem
(Adadi & Berrada, 2018). To fill this gap, researchers have developed explainable artificial intelligence or
XAI. As the goal of XAI is to provide a better perception of opaque AI systems so that humans may better
employ such tools to support their job, it is congruent with the notion of ‘human-in-the-loop’ (Zhang et al.,
2022).
The novel approach of XAI consists of a complete process ensemble technique as well as an
explainable framework. It was designed to satisfy the interpretative needs of traders and investors while
maintaining a high level of prediction accuracy. Under the banner of XAI, there are many views and
reasonings in the literature (Adadi & Berrada, 2018): interpretations to justify, interpretations to control,
interpretations to discover, and explanations for better classification or regression tasks. Our research intends
to provide a feature selection methodology based on the TreeExplainer for financial forecasting that improves
the predictive performance of ML modeling.
TreeExplainer is a novel local feature attribution approach for trees, developed by Lundberg et al.
(2020), that precisely calculates the traditional Shapley values from game theory. The Shapley values are
then calculated according to Equation 1 and employed as feature attributes.
To deepen our analysis, we use explanation Local Interpretable Model-Agnostic Explanations
(LIME) modeling. LIME uses a local surrogate linear regression model, which is more intuitive and
understandable than an artificial intelligence method (Ribeiro et al., 2016). After the model has been trained,
the prediction for an unknown sample may be explained using LIME. In essence, LIME believes that,
although the overall correlations between inputs and outputs may be non-linear, linear surrogate models may
be built to monitor how an individual prediction changes as the data changes are disturbed (Stevenson et al.,
2021). Equation 5 was used to calculate the local linear explanation values as follows:
ℎ(𝑧 ′ ) = ∅0 + ∑𝐾 ′
𝑖=1 ∅𝑖 𝑧𝑖 (13)

where 𝑧 ′ ∈ [0,1]𝐾 , K is the number of simplified input variables. ∅𝑖 ∈ ℝ. LIME techniques are useful
because they can be integrated into many different types of machine learning. The XAI framework used to
predict the BTC price is presented in Fig. 2.

5. Results and discussion

5.1. Causality test results

Cross-correlation statistics between bitcoin prices and each of the exogenous variables, all calculated
in differences, are plotted in Figure 3. This preliminary visual analysis is important because it helps to
determine the most important variables that can be added to a forecasting model of the bitcoin price. In

11
addition to the cross-correlation functions, we also report the autocorrelation function of Bitcoin returns. This
is also important for the preliminary identification of the autoregression order for the bitcoin variable. Most
notable in these results is that two variables appear to be able to contribute to the explanation of Bitcoin
returns. These are essentially LTC and ETH variables, whose cross-correlation functions show a lagging
relationship with a certain regularity up to a lag of 40 days. For the rest of the variables, the cross-correlations
are clearly irregular, which may be a sign that the rest of the variables are not significant for the Bitcoin
forecast. However, this preliminary analysis remains insufficient to know the explanatory power that each of
the variables has. It is therefore meaningful to move on to a more advanced methodology to better understand
the most relevant explanatory variables.

(Insert Figure 2)

The second step of our preliminary analysis is inferential, with multi-scale causality tests being
applied to the same time series. In other words, we perform a causality test between bitcoin returns and each
of the 15 different variables. The choice of multi-scale causality is due to the data showing some nonlinearity
with local stationarity. Their distributions are also nonstandard. The multi-resolution analysis provides a
powerful tool for decomposing the series into sub-signals with more regular properties than their primitives.

(Insert Table 5)

In these tests, we employ the third-order VMD approach for the decay of the time series into intrinsic
mode functions (IMFs). This technique is preferred to other methodologies such as wavelets, mainly because
of its efficiency in dealing with nonlinear time series. This characteristic has been emphasized in several
previous works, such as Wardana (2016) and Krishnan and Soman (2022). The results reported in Table 5
indicate that only two variables already identified in the cross-correlations, that is, LTC and ETH, are
considered to cause bitcoin returns on all three scales simultaneously.
However, another finding that should be highlighted here is that three other variables are evidenced
to have partial explanatory power. These variables are INF, on Scale 1 (IMF1), as well as the GPR and TEU
variables on Scale 3 (IMF3). This suggests that the multi-scale VMD-based causality test makes it possible
to detect what could not be observed by the cross-correlations, or by the classic single-scale Granger
approach.

(Insert Figure 3 here)

(Insert Figure 4 here)

Recently, studies confirm the strong feature selection method using the SHAP value (Sigrist &
Leuenberger, 2022; Jabeur et al., 2021b). SHAP employs the concept of the game theory proposed by Shapley
(1953) to calculate the importance of individual independent variables. Figure 4 depicts the SHAP summary
plot for all variables, which ranks factors according to their relevance in influencing the BTC price. Figure 4
is a SHAP summary graphic, with each dot representing a single data point in the dataset. A higher SHAP

12
value indicates that the model predicts higher BTC price values and vice versa. We can observe that the top
four features for XGBoost are REC, GPR, YIFR and SP 500. There are several important gaps in the literature
which lead to these findings. Variables such as inflation, geopolitical risk, and recession are elements that
define the economic cycle and have significant influences on the pricing of assets (Thorbecke, 1997).
According to Basher et al. (2022), business cycle factors should be significant forecasters if bitcoin prices
are influenced by economic circumstances.

5.2. Performance comparison

In this section, we compare machine and deep learning techniques such as linear regression (LR),
random forest (RF), and eXtreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM),
Support Vector Machine (SVM), and Deep Learning Neural Networks (DL). Appendix 1 presents LSTM
model details. Based on the testing dataset, the performance of the machine learning models is presented and
compared in terms of the R2OOS. In this study, about 80% of the dataset was utilized to train the model, with
the remaining 20% for the test dataset. The predictive accuracy of all modes was compared to the predictive
accuracy of the LSTM model, with 14-day lags.

(Insert Table 6 here)

Table 6 presents the results of univariate regression for all predictors. Among the proposed
predictors, LTC yields the most accurate outcomes with the highest value of R2OOS (0.896) based on the
LSTM model. This finding is consistent with Elsayed et al. (2022), who posit a significant causal relationship
among cryptocurrencies. In addition, REC appears to produce the second most accurate outcomes, with the
highest value of R2OOS (0.865). This result is in line with García (2013) and Dias et al. (2022), who document
that the capacity to forecast stock returns using the substance of the news is most prominent during recession
periods. Moreover, the VIX and GPR predictors provide good performance compared to the other predictors,
with R2OOS ranging from 0.525 for VIX to 0.431 for GPR, as shown in Table 6. This outcome is in line with
Basher and Sadorsky (2022) which highlights that the direction of bitcoin prices may be predicted using
macroeconomic factors.
According to Parvini et al. (2022), the disadvantage of univariate regression is that it does not take
advantage of the synergistic and cooperative effects of numerous inputs, which are vital when utilizing
machine learning algorithms. In this regard, the results of multivariate regression are presented in Table 7.
Overall, results indicate that LSTM based on the SHAP feature importance selection method (ROOS
= 0.689) performs better than VMD-based causality and Granger causality. As can be seen in Table 7, LSTM
with FS-SHAP pulls and enhances positive territory to ROOS significantly with a figure of 68.9%, averaged
over all forecasting algorithms. We observe an increase in the ROOS when applying FS-SHAP to all models.
Meanwhile, ET based on the FS-SHAP method generates the second-best value of ROOS compared to DL,

13
SVM, LR, RF and XGBoost. LSTM and ET expand the model to accommodate nonlinear predictive
relationships.
Gu et al. (2020) find that allowing for nonlinearities significantly improves predictions. Nonetheless,
the forecasting performance of OLS regression is insufficiently low overall forecasting algorithms under the
FS-SHAP method. When we incorporate our set of ten predictors in the OLS panel model, predictability
rapidly diminishes, as indicated by the R2OOS falling deep into the negative territory of -73.7% These findings
are not surprising, and are in line with the previous studies of Gu et al. (2020) and Leippold et al. (2022),
who document that when there is a large number of variables, OLS regression’s efficiency drops
dramatically, leading to very volatile out-of-sample predictions.

(Insert Table 7 here)

The benefits of forecasting using machine learning projections are substantial. In our testing, the
OLS model is statistically clearly rejected in favour of nonlinear machine learning techniques. More
sophisticated statistical techniques in machine learning may be able to overcome the potentially serious
limitations of these traditional approaches.
To deepen our analysis, We compared the performance of different models in terms of RMSE for
each feature selection technique. The LSTM model based on FS-SHAP method provided the best RMSE of
2.031. However, for the VMD, the ET model based on IMF3 provided the best RMSE of 72.111. Overall,
the findings show that the FS-SHAP feature selection strategy enhances forecasting accuracy and
model generalization. In terms of RMSE, the ET model based on IMF3 beats other models,
indicating that it is a promising model for predicting bitcoin values.
5.3. Model explainability

‘Explainability” refers to consistency and interpretability, with explanations simple enough to


understand (Chakraborty et al., 2021b; Chakraborty et al., 2021a). In this study, we utilize the SHAP
and LIME based on Extra trees regression (ET) on frameworks to give straightforward, human-
readable explanations, while correctly representing the real-world forecasting bitcoin price
processes. In fact, Chakraborty et al (2021b) pointed out that deep learning neural networks
automatically learn the input features that subsequently undergo several layers of nonlinear
transformations, making them noninterpretable to the end-users. From a theoretical point of view,
interpretable models such as extreme gradient boosting and extra trees regression provide certain
advantages over the “black-box” type deep learning, Support Vector Machine and long short-term-
memories. Moreover, based on the performance comparison, it can be concluded that the Extra
Trees model provides the lowest RMSE among all the models tested, indicating that it is a promising
model for predicting bitcoin values.

14
SHAP ‘local interpretability’ analysis based on FS-SHAP values is shown in Figure 5. This describes
in detail how the feature values and interactions affect the model predictions. Each dot represents a data
point, and the variation in height across features at any given feature value is a result of their interaction and
dependence in the model. In Figure 5, we illustrate the effect of the variations in REC, GPR, YIFR, SP500,
PACC and INF estimates on the models’ predictions to identify the critical inflection points, above or below
which bitcoin price improves.
Figure 5 highlights that REC interacts with GPR, and high REC and high GPR values drive up the
SHAP values, corresponding to higher bitcoin values. Our findings are in line with Caldara and Iacoviello
(2022), who highlight that the impacts of risk and uncertainty indicators on bitcoin returns are negative ,
while, nevertheless, bitcoin may be used as a hedge against the effects of international crises.
To further examine the critical determinants and inflection points of the important factors that drive
bitcoin prices, we conduct a detailed ‘local interpretability’ analysis with LIME, depicted in Figure 6. These
graphs provide the predictions of 12 instances from the Extra trees model to enhance their clarity. These
reveal that the pertinent daily REC, GPR, YIFR, SP500, PACC, INF, TEU, ETH, Gold and INFECTION are
19.10, 62.49, 1.82, 2617.61, 32.64, 35.06, 93.06, 164.67, 1279.13, and 0.00, respectively. The VIX, USDI,
NSI, LTC and Oil can be ignored, since they have a very small effect on bitcoin prices, according to the
global interpretability analysis shown in Figure 9. Based on these interpretations, we can design ten different
features or conditions based on these inflection points:
If REC ≤ 19.10, GPR ≤ 62.49, YIFR ≤ 1.82, SP 500 ≤ 2617.61, PACC > 32.64, INF ≤ 35.06, TEU
≤ 93.06, ETH ≤ 164.67, Gold ≤ 1279.13, and INFECTION ≤ 0.000, the BTC price will be low.

Regarding the effect of public attention to climate change, our result is in line with Fang and Peress,
(2009) and Ouadghiri et al. (2021), who demonstrate that stocks with extensive media awareness have much
lower returns. This finding could be explained by the fact that investors with different motivations, including
traditional sustainable investors, neo-sustainable investors, and opportunistic self-interested investors, tend
to favor the stocks of sustainable firms when there is a great deal of public attention on environmental issues.

(Insert Figure 5)

(Insert Figure 6)

5.4. Further analysis

To gain a deeper understanding of the feature importance and contribution to the model's predictions,
we created SHAP force plots using the SHAP values generated by our SHAP-based ET model. Force plots
are a powerful tool for visualizing the contribution of individual features to the final prediction for a single
instance. To explore the relative importance of different features for predicting Bitcoin prices, we generated
force plots for select observations during the Russia-Ukraine war from our dataset. Figure 7 depicts SHAP
force plots for two randomly selected samples during the Russia-Ukraine war: (a) sample 2070, and (b)
sample 2161, from April and July 2022 respectively, based on the ET model. The red colour highlights the

15
features that increase the prediction, while the blue colour highlights the features that decrease the prediction.
For sample number 2070, REC and infection generated a negative SHAP value, which negatively affected
the BTC price. However, in the other sample (number: 2161 in Fig. 7b), the recession created a SHAP value
that positively affected the BTC price value. Moreover, in both samples, ETH, INF, and GPR created SHAP
values that contributed positively to the BTC value. The resulting plots helped us to identify the key features
driving the model's predictions for each sample, providing a more comprehensive understanding of the
underlying patterns and relationships in the data. Overall, the use of force plots enabled us to conduct a more
detailed investigation and arrive at more robust conclusions regarding the relative significance of various
characteristics when it comes to estimating future BTC value.

(Insert Figure 7)

6. Conclusions

Predictions of financial series are inherently difficult. However, recent developments in analytics
and explainable artificial intelligence modeling suggest opportunities to overcome the difficulties associated
with forecasting. Considering this context, we propose and develop an explainable artificial intelligence
(XAI) framework, including a new feature selection method integrated with the game-theory-based SHapley
Additive exPlanations approach and an explainable frame for forecasting. We use the cryptocurrency market
to test this experimental methodology, and, in so doing, employ an improved feature selection method based
on SHAP value to predict bitcoin prices.
FS-SHAP integrates the advantages of Extra trees and features important for enhancing prediction.
Results demonstrate that FS-SHAP not only improves forecasting but also improves model generalization
and explainability. Considering our findings, we draw several conclusions. First, based on the FS-SHAP
value from the multivariate regression, we conclude that the highest Roos is ensured by LSTM followed by
Extra trees among OLS regression and complex models.
The global explainability component of the SHAP framework reveals the relative importance among
a set of factors impacting BTC price across the sample period. We find these factors have an order of
importance of recession > geopolitical risk > 5-year forward inflation > S&P 500 > climate change > inflation
> Twitter economic uncertainty > Ethereum > Gold > Infectious disease equity market volatility index.
The local interpretability of the LIME (XAI) framework allows quantification of the critical
inflection points for each predictor that leads to transitions from low to high BTC prices.
From a practical standpoint, our research offers some insights for investors and policymakers. First,
our work demonstrates that the combined SHAP and LIME global and local explanations accurately describe
the real-world bitcoin price and explain the nonlinear ML models by offering rule-based explanations that
are humanly comprehensible (Chakraborty et al., 2023). It is important to use AI models that are naturally
interpretable along with XAI methods to make accurate predictions. This will help us understand how the AI
approach works provide new interpretable knowledge in large datasets that would be impractical to find using

16
traditional statistical methods. To address these issues which result from models’ lack of transparency, we
employ XAI models with tree-based ensembles, which are more interpretable than deep learning models
(e.g., LSTM). Further, by shedding light on how economic downturn, geopolitical risk, inflation and climate
change affect bitcoin prices, our findings can help investors and traders to better identify these risks and
incorporate them into their investment decisions.
The FS-SHAP framework has shown promise in both theoretical and practical applications, but more
work is required to ensure that it can provide accurate and reliable data for a wider range of themes and
underlying factors. In the future, XAI methods might be used for other types of data (e.g., images and texts)
and other ML tasks, broadening the scope of the field (e.g., unsupervised learning and semi-supervised
learning). However, this work presents XAI to academics and practitioners to address the difficulty of
interpreting the results of AI applications in predicting financial time series. This paper equips academics
and industry experts with the information they need to develop more transparent AI applications in the
financial sector.
Based on our findings, future research could investigate the impact of additional variables on BTC
prices, such as social media sentiment, regulatory changes, and global economic conditions. Moreover,
Future research could explore ways to enhance the interpretability and explainability of deep learning models
to provide a more comprehensive understanding of their behaviour.

17
Appendix A. Selection of hyperparameters and training range for LSTM model.
Layer Type Output Shape Number of Parameters Activation
LSTM (None, 50) 12,400 Tanh, Sigmoid
Dense (None, 1) 51 Sigmoid
Training Hyperparameters:
- Loss function: Mean Squared Error (MSE)
- Optimizer: Adam
- Number of training epochs: 100 (early stopping with a patience of 5)
- Batch size: 14
- Input sequence length (lags): 14
- Number of input features: 11
- Number of repeats: 10

18
References

Adadi, A., Berrada, M., 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence
(XAI). IEEE Access 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Akyildirim, E., Cepni, O., Corbet, S., Uddin, G.S., 2021a. Forecasting mid-price movement of Bitcoin futures
using machine learning. Annals of Operations Research 1–32.
Akyildirim, E., Goncu, A., Sensoy, A., 2021b. Prediction of cryptocurrency returns using machine learning.
Annals of Operations Research 297, 3–36. https://doi.org/10.1007/s10479-020-03575-y
Alonso-Monsalve, S., Suárez-Cetrulo, A.L., Cervantes, A., Quintana, D., 2020. Convolution on neural
networks for high-frequency trend prediction of cryptocurrency exchange rates using technical
indicators. Expert Systems with Applications 149, 113250.
https://doi.org/10.1016/j.eswa.2020.113250
Al-Shboul, M., Assaf, A., Mokni, K., 2022. When bitcoin lost its position: Cryptocurrency uncertainty and
the dynamic spillover among cryptocurrencies before and during the COVID-19 pandemic.
International Review of Financial Analysis 83, 102309. https://doi.org/10.1016/j.irfa.2022.102309
Atsalakis, G.S., Atsalaki, I.G., Pasiouras, F., Zopounidis, C., 2019. Bitcoin price forecasting with neuro-
fuzzy techniques. European Journal of Operational Research 276, 770–780.
https://doi.org/10.1016/j.ejor.2019.01.040
Baker, S.R., Bloom, N., Davis, J., Renault, T., 2021. Twitter-Derived Measures of Economic Uncertainty 1–
14.
Baker, S.R., Bloom, N., Davis, S.J., Kost, K., 2019. Daily infectious disease equity market volatility tracker.
National Bureau of Economic Research.
Basher, S.A., Sadorsky, P., 2022. Forecasting Bitcoin price direction with random forests: How important
are interest rates, inflation, and market volatility? Machine Learning with Applications 9, 100355.
https://doi.org/10.1016/j.mlwa.2022.100355
Ben Jabeur, S., Stef, N., Carmona, P., 2022. Bankruptcy Prediction using the XGBoost Algorithm and
Variable Importance Feature Engineering. Computational Economics.
https://doi.org/10.1007/s10614-021-10227-1
Bissoondoyal-Bheenick, E., Do, H., Hu, X., Zhong, A., 2022. Sentiment and stock market connectedness:
Evidence from the U.S. – China trade war. International Review of Financial Analysis 80, 102031.
https://doi.org/10.1016/j.irfa.2022.102031
Blau, B.M., Griffith, T.G., Whitby, R.J., 2021. Inflation and Bitcoin: A descriptive time-series analysis.
Economics Letters 203, 109848. https://doi.org/10.1016/j.econlet.2021.109848
Boungou, W., Yatié, A., 2022. The impact of the Ukraine–Russia war on world stock market returns.
Economics Letters 215, 110516. https://doi.org/10.1016/j.econlet.2022.110516
Breiman, L., 2001. Random forests. Machine learning 45, 5–32.
Broock, W.A., Scheinkman, J.A., Dechert, W.D., LeBaron, B., 1996. A test for independence based on the
correlation dimension. Econometric reviews 15, 197–235.

19
Caldara, D., Iacoviello, M., 2022. Measuring geopolitical risk. American Economic Review 112, 1194–1225.
Celeste, V., Corbet, S., Gurdgiev, C., 2020. Fractal dynamics and wavelet analysis: Deep volatility and return
properties of Bitcoin, Ethereum and Ripple. The Quarterly Review of Economics and Finance 76,
310–324. https://doi.org/10.1016/j.qref.2019.09.011
Chakraborty, D., Alam, A., Chaudhuri, S., Başağaoğlu, H., Sulbaran, T., Langar, S., 2021a. Scenario-based
prediction of climate change impacts on building cooling energy consumption with explainable
artificial intelligence. Applied Energy 291, 116807. https://doi.org/10.1016/j.apenergy.2021.116807
Chakraborty, D., Başağaoğlu, H., Alian, S., Mirchi, A., Moriasi, D.N., Starks, P.J., Verser, J.A., 2023.
Multiscale extrapolative learning algorithm for predictive soil moisture modeling & applications.
Expert Systems with Applications 213, 119056. https://doi.org/10.1016/j.eswa.2022.119056
Chakraborty, D., Başağaoğlu, H., Winterle, J., 2021b. Interpretable vs. noninterpretable machine learning
models for data-driven hydro-climatological process modeling. Expert Systems with Applications
170, 114498. https://doi.org/10.1016/j.eswa.2020.114498
Chandrashekar, G., Sahin, F., 2014. A survey on feature selection methods. Computers & Electrical
Engineering 40, 16–28.
Chen, T., Guestrin, C., 2016. Xgboost: A scalable tree boosting system. Presented at the Proceedings of the
22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794.
Chen, W., Xu, H., Jia, L., Gao, Y., 2021. Machine learning model for Bitcoin exchange rate prediction using
economic and technology determinants. International Journal of Forecasting 37, 28–43.
https://doi.org/10.1016/j.ijforecast.2020.02.008
Choi, S., Shin, J., 2022. Bitcoin: An inflation hedge but not a safe haven. Finance Research Letters 46,
102379. https://doi.org/10.1016/j.frl.2021.102379
Demir, E., Gozgor, G., Lau, C.K.M., Vigne, S.A., 2018. Does economic policy uncertainty predict the Bitcoin
returns? An empirical investigation. Finance Research Letters 26, 145–149.
https://doi.org/10.1016/j.frl.2018.01.005
Derbentsev, V., Matviychuk, A., Soloviev, V.N., 2020. Forecasting of Cryptocurrency Prices Using Machine
Learning, in: Pichl, L., Eom, C., Scalas, E., Kaizoji, T. (Eds.), Advanced Studies of Financial
Technologies and Cryptocurrency Markets. Springer Singapore, Singapore, pp. 211–231.
https://doi.org/10.1007/978-981-15-4498-9_12
Dias, I.K., Fernando, J.M.R., Fernando, P.N.D., 2022. Does investor sentiment predict bitcoin return and
volatility? A quantile regression approach. International Review of Financial Analysis 84, 102383.
https://doi.org/10.1016/j.irfa.2022.102383
Elsayed, A.H., Gozgor, G., Lau, C.K.M., 2022. Risk transmissions between bitcoin and traditional financial
assets during the COVID-19 era: The role of global uncertainties. International Review of Financial
Analysis 81, 102069. https://doi.org/10.1016/j.irfa.2022.102069
Fang, L., Bouri, E., Gupta, R., Roubaud, D., 2019. Does global economic uncertainty matter for the volatility
and hedging effectiveness of Bitcoin? International Review of Financial Analysis 61, 29–36.
https://doi.org/10.1016/j.irfa.2018.12.010
Fang, L., Peress, J., 2009. Media coverage and the cross‐section of stock returns. The Journal of Finance 64,
2023–2052.
Fehr, E., Gächter, S., 2000. Fairness and retaliation: The economics of reciprocity. Journal of economic
perspectives 14, 159–182.
GARCÍA, D., 2013. Sentiment during Recessions. The Journal of Finance 68, 1267–1300.
https://doi.org/10.1111/jofi.12027
García, S., Luengo, J., Herrera, F., 2016. Tutorial on practical tips of the most influential data preprocessing
algorithms in data mining. Knowledge-Based Systems 98, 1–29.
https://doi.org/10.1016/j.knosys.2015.12.006
Geurts, P., Ernst, D., Wehenkel, L., 2006. Extremely randomized trees. Machine Learning 63, 3–42.
https://doi.org/10.1007/s10994-006-6226-1
Ghaddar, B., Naoum-Sawaya, J., 2018. High dimensional data classification and feature selection using
support vector machines. European Journal of Operational Research 265, 993–1004.
https://doi.org/10.1016/j.ejor.2017.08.040
Glick, R., Taylor, A.M., 2010. Collateral damage: Trade disruption and the economic impact of war. The
Review of Economics and Statistics 92, 102–127.

20
Goodell, J.W., Goutte, S., 2021. Diversifying equity with cryptocurrencies during COVID-19. International
Review of Financial Analysis 76, 101781.
Granger, C.W., 1969. Investigating causal relations by econometric models and cross-spectral methods.
Econometrica: journal of the Econometric Society 424–438.
Gu, S., Kelly, B., Xiu, D., 2020. Empirical asset pricing via machine learning. The Review of Financial
Studies 33, 2223–2273.
Guo, H., Zhang, D., Liu, S., Wang, L., Ding, Y., 2021. Bitcoin price forecasting: A perspective of underlying
blockchain transactions. Decision Support Systems 151, 113650.
https://doi.org/10.1016/j.dss.2021.113650
Han, J.-B., Kim, S.-H., Jang, M.-H., Ri, K.-S., 2020. Using Genetic Algorithm and NARX Neural Network
to Forecast Daily Bitcoin Price. Computational Economics 56, 337–353.
https://doi.org/10.1007/s10614-019-09928-5
Huang, H., Wei, X., Zhou, Y., 2022. An overview on twin support vector regression. Neurocomputing 490,
80–92. https://doi.org/10.1016/j.neucom.2021.10.125
Jabeur, S.B., Ballouk, H., Arfi, W.B., Khalfaoui, R., 2021a. Machine Learning-Based Modeling of the
Environmental Degradation, Institutional Quality, and Economic Growth. Environmental Modeling
& Assessment. https://doi.org/10.1007/s10666-021-09807-0
Jabeur, S.B., Ballouk, H., Mefteh-Wali, S., Omri, A., 2022. Forecasting the macrolevel determinants of
entrepreneurial opportunities using artificial intelligence models. Technological Forecasting and
Social Change 175, 121353. https://doi.org/10.1016/j.techfore.2021.121353
Jabeur, S.B., Mefteh-Wali, S., Viviani, J.-L., 2021b. Forecasting gold price with the XGBoost algorithm and
SHAP interaction values. Annals of Operations Research. https://doi.org/10.1007/s10479-021-
04187-w
Jiang, S., Li, Y., Lu, Q., Wang, S., Wei, Y., 2022. Volatility communicator or receiver? Investigating
volatility spillover mechanisms among Bitcoin and other financial markets. Research in International
Business and Finance 59, 101543. https://doi.org/10.1016/j.ribaf.2021.101543
Jiménez-Cordero, A., Morales, J.M., Pineda, S., 2021. A novel embedded min-max approach for feature
selection in nonlinear Support Vector Machine classification. European Journal of Operational
Research 293, 24–35. https://doi.org/10.1016/j.ejor.2020.12.009
Kahneman, D., Tversky, A., 2013. Prospect theory: An analysis of decision under risk, in: Handbook of the
Fundamentals of Financial Decision Making: Part I. World Scientific, pp. 99–127.
Katsiampa, P., 2017. Volatility estimation for Bitcoin: A comparison of GARCH models. Economics Letters
158, 3–6. https://doi.org/10.1016/j.econlet.2017.06.023
Khalfaoui, R., Gozgor, G., Goodell, J.W., 2022a. Impact of Russia-Ukraine war attention on cryptocurrency:
Evidence from quantile dependence analysis. Finance Research Letters 103365.
https://doi.org/10.1016/j.frl.2022.103365
Khalfaoui, R., Jabeur, S.B., Dogan, B., 2022b. The spillover effects and connectedness among green
commodities, Bitcoins, and US stock markets: Evidence from the quantile VAR network. Journal of
Environmental Management 306, 114493. https://doi.org/10.1016/j.jenvman.2022.114493
Krauss, C., Do, X.A., Huck, N., 2017. Deep neural networks, gradient-boosted trees, random forests:
Statistical arbitrage on the S&P 500. European Journal of Operational Research 259, 689–702.
Kristjanpoller, W., Minutolo, M.C., 2018. A hybrid volatility forecasting framework integrating GARCH,
artificial neural network, technical analysis and principal components analysis. Expert Systems with
Applications 109, 1–11. https://doi.org/10.1016/j.eswa.2018.05.011
Labbé, M., Landete, M., Leal, M., 2022. Dendrograms, minimum spanning trees and feature selection.
European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2022.11.031
Leippold, M., Wang, Q., Zhou, W., 2022. Machine learning in the Chinese stock market. Journal of Financial
Economics 145, 64–82. https://doi.org/10.1016/j.jfineco.2021.08.017
Li, D., Hong, Y., Wang, L., Xu, P., Pan, Z., 2022. Extreme risk transmission among bitcoin and crude oil
markets. Resources Policy 77, 102761. https://doi.org/10.1016/j.resourpol.2022.102761
Liu, M., Li, G., Li, J., Zhu, X., Yao, Y., 2021. Forecasting the price of Bitcoin using deep learning. Finance
Research Letters 40, 101755. https://doi.org/10.1016/j.frl.2020.101755
Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal,
N., Lee, S.-I., 2020. From local explanations to global understanding with explainable AI for trees.
Nature Machine Intelligence 2, 56–67. https://doi.org/10.1038/s42256-019-0138-9

21
Lundberg, S.M., Lee, S.-I., 2017. A Unified Approach to Interpreting Model Predictions, in: Guyon, I.,
Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.), Advances
in Neural Information Processing Systems. Curran Associates, Inc.
Maghyereh, A., Abdoh, H., 2020. Tail dependence between Bitcoin and financial assets: Evidence from a
quantile cross-spectral approach. International Review of Financial Analysis 71, 101545.
https://doi.org/10.1016/j.irfa.2020.101545
Mallqui, D.C.A., Fernandes, R.A.S., 2019. Predicting the direction, maximum, minimum and closing prices
of daily Bitcoin exchange rate using machine learning techniques. Applied Soft Computing 75, 596–
606. https://doi.org/10.1016/j.asoc.2018.11.038
Moretti, E., Steinwender, C., Van Reenen, J., 2014. The intellectual spoils of war? Defense R&D,
productivity and spillovers, in: American Economic Association Annual Meeting.
Nakano, M., Takahashi, A., Takahashi, S., 2018. Bitcoin technical trading with artificial neural network.
Physica A: Statistical Mechanics and its Applications 510, 587–609.
https://doi.org/10.1016/j.physa.2018.07.017
Nguyen, K.Q., 2022. The correlation between the stock market and Bitcoin during COVID-19 and other
uncertainty periods. Finance Research Letters 46, 102284. https://doi.org/10.1016/j.frl.2021.102284
Nolan, J.P., Ojeda-Revah, D., 2013. Linear and nonlinear regression with stable errors. Journal of
Econometrics 172, 186–194. https://doi.org/10.1016/j.jeconom.2012.08.008
Ouadghiri, I.E., Guesmi, K., Peillex, J., Ziegler, A., 2021. Public Attention to Environmental Issues and
Stock Market Returns. Ecological Economics 180, 106836.
https://doi.org/10.1016/j.ecolecon.2020.106836
Oyedele, A.A., Ajayi, A.O., Oyedele, L.O., Bello, S.A., Jimoh, K.O., 2023. Performance evaluation of deep
learning and boosted trees for cryptocurrency closing price prediction. Expert Systems with
Applications 213, 119233. https://doi.org/10.1016/j.eswa.2022.119233
Parvini, N., Abdollahi, M., Seifollahi, S., Ahmadian, D., 2022. Forecasting Bitcoin returns with long short-
term memory networks and wavelet decomposition: A comparison of several market determinants.
Applied Soft Computing 121, 108707. https://doi.org/10.1016/j.asoc.2022.108707
Pástor, Ľ., Veronesi, P., 2013. Political uncertainty and risk premia. Journal of financial Economics 110,
520–545.
Rehman, M.U., Kang, S.H., 2021. A time–frequency comovement and causality relationship between Bitcoin
hashrate and energy commodity markets. Global Finance Journal 49, 100576.
https://doi.org/10.1016/j.gfj.2020.100576
Ribeiro, M.T., Singh, S., Guestrin, C., 2016. “ Why should i trust you?” Explaining the predictions of any
classifier. Presented at the Proceedings of the 22nd ACM SIGKDD international conference on
knowledge discovery and data mining, pp. 1135–1144.
Saâdaoui, F., Ben Jabeur, S., Goodell, J.W., 2022a. Causality of geopolitical risk on food prices: Considering
the Russo–Ukrainian conflict. Finance Research Letters 49, 103103.
https://doi.org/10.1016/j.frl.2022.103103
Saâdaoui, F., Mefteh-Wali, S., Jabeur, S.B., 2022b. Multiresolutional statistical machine learning for testing
interdependence of power markets: A Variational Mode Decomposition-based approach. Expert
Systems with Applications 208, 118161. https://doi.org/10.1016/j.eswa.2022.118161
Sarstedt, M., Mooi, E., 2014. A concise guide to market research. The Process, Data, and 12.
Schmid, L., Gerharz, A., Groll, A., Pauly, M., 2022. Tree-based ensembles for multi-output regression:
Comparing multivariate approaches with separate univariate ones. Computational Statistics & Data
Analysis 107628. https://doi.org/10.1016/j.csda.2022.107628
Shapiro, A.H., Sudhof, M., Wilson, D.J., 2020. Measuring news sentiment. Journal of Econometrics.
https://doi.org/10.1016/j.jeconom.2020.07.053
Shih, I.-F., Haan, M.N., Paul, K.C., Yu, Y., Sinsheimer, J.S., Ritz, B., 2019. The Roles of Physical Activity
and Inflammation in Mortality, Cognition, and Depressive Symptoms Among Older Mexican
Americans. AMERICAN JOURNAL OF EPIDEMIOLOGY 188, 1944–1952.
https://doi.org/10.1093/aje/kwz180
Shojaie, A., Fox, E.B., 2022. Granger Causality: A Review and Recent Advances. Annual Review of
Statistics and Its Application 9, 289–319. https://doi.org/10.1146/annurev-statistics-040120-010930

22
Sigrist, F., Leuenberger, N., 2022. Machine learning for corporate default risk: Multi-period prediction,
frailty correlation, loan portfolios, and tail probabilities. European Journal of Operational Research.
https://doi.org/10.1016/j.ejor.2022.06.035
Simumba, N., Okami, S., Kodaka, A., Kohtake, N., 2022. Multiple objective metaheuristics for feature
selection based on stakeholder requirements in credit scoring. Decision Support Systems 155,
113714. https://doi.org/10.1016/j.dss.2021.113714
Smola, A.J., Schölkopf, B., 2004. A tutorial on support vector regression. Statistics and computing 14, 199–
222.
Stevenson, M., Mues, C., Bravo, C., 2021. The value of text for small business default prediction: A Deep
Learning approach. European Journal of Operational Research 295, 758–771.
https://doi.org/10.1016/j.ejor.2021.03.008
Stigler, G.J., 2021. The theory of economic regulation, in: The Political Economy. Routledge, pp. 67–81.
Sun, X., Liu, M., Sima, Z., 2020. A novel cryptocurrency price trend forecasting model based on LightGBM.
Finance Research Letters 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032
Symitsi, E., Chalvatzis, K.J., 2019. The economic value of Bitcoin: A portfolio analysis of currencies, gold,
oil and stocks. Research in International Business and Finance 48, 97–110.
https://doi.org/10.1016/j.ribaf.2018.12.001
THORBECKE, W., 1997. On Stock Market Returns and Monetary Policy. The Journal of Finance 52, 635–
654. https://doi.org/10.1111/j.1540-6261.1997.tb04816.x
Tversky, A., Kahneman, D., 1992. Advances in prospect theory: Cumulative representation of uncertainty.
Journal of Risk and uncertainty 5, 297–323.
Vapnik, V., Golowich, S., Smola, A., 1996. Support vector method for function approximation, regression
estimation and signal processing. Advances in neural information processing systems 9.
Wang, Y., Markert, R., Xiang, J., Zheng, W., 2015. Research on variational mode decomposition and its
application in detecting rub-impact fault of the rotor system. Mechanical Systems and Signal
Processing 60, 243–251.
Wu, Chang-Che, Ho, S.-L., Wu, Chih-Chiang, 2022. The determinants of Bitcoin returns and volatility:
Perspectives on global and national economic policy uncertainty. Finance Research Letters 45,
102175. https://doi.org/10.1016/j.frl.2021.102175
Zhang, C. (Abigail), Cho, S., Vasarhelyi, M., 2022. Explainable Artificial Intelligence (XAI) in auditing.
International Journal of Accounting Information Systems 46, 100572.
https://doi.org/10.1016/j.accinf.2022.100572
Zhang, H., Hong, H., Guo, Y., Yang, C., 2022. Information spillover effects from media coverage to the
crude oil, gold, and Bitcoin markets during the COVID-19 pandemic: Evidence from the time and
frequency domains. International Review of Economics & Finance 78, 267–285.
https://doi.org/10.1016/j.iref.2021.12.005
Zhang, Z., Wu, C., Qu, S., Chen, X., 2022. An explainable artificial intelligence approach for financial
distress prediction. Information Processing & Management 59, 102988.
https://doi.org/10.1016/j.ipm.2022.102988

23
Table 1. The main recent studies in forecasting the bitcoin price
Studies Time span Data source Methods Variables type Best model

Liu et al. (2021) July 2013 to Wind, Choice Financial BPNN, PCA-SVR, Cryptocurrency SDAE
December Thomson Datastream SVR, SDAE market, Public
2019 attention, and,
macroeconomi
c environment
Atsalakis et al. September bitcoincharts.com ANN, ANFIS, Bitcoin prices PATSOS
(2019) 13, 2011, to PATSOS
October 12,
2017
Derbentsev et al. August 1, Yahoo Finance ANN, RF, BART Cryptocurrenc BART
(2020) 2015 to y market
December
1, 2019

Guo et al. (2021) 2016-01-01 WalletExplorer, ARIMA, ARIMAX, Market prices WT-
to 2018-12- CoinMarketCap, CNN, MLP, LSTM, data, Social CATCN
31. Google Trends Data Seq2Seq, BNN, interest data,
SFM, WT- Inter-exchange
CATCN transaction data
Basher & September Yahoo Finance, St. Boost Logit, Logit, Market prices RF and tree
Sadorsky (2022) 17, 2014, to Louis Federal Reserve RF, Tree bag, Tune data, bagging
December RF macroeconomi
29, 2021 c variables
Akyildirim et al. 2 January Chicago Mercantile KNN, NB, Logit, Bitcoin futures KNN and
(2021a) 2020 and an Exchange RF, SVM, data RF
end date of XGBoost, ARIMA
10
September
2020
Akyildirim et al. 1 April Kaiko digital asset store ANN, Logit, RF, Cryptocurrenci SVM
(2021b) 2013 to 23 SVM, ARIMA es traded in the
June 2018 global markets
Parvini et al. August 8, “Investing’’ website LSTM, OLS, Market prices DWT-
(2022) 2015, to LASSO, AdaBoost, data LSTM
April 4, RF, SVR, DWT-
2020 LSTM
Chen et al. (2021) https://bitcoincharts.co
m/markets/
Oyedele et al. January 1, Yahoo Finance CNN, DFNN, Cryptocurrenc CNN
(2023) 2018, to y market
December
31, 2021
Notes: BPNN: back propagation neural network; PCA-SVR: principal component analysis-based support vector regression;
SVR: support vector regression; SDAE: stacked denoising autoencoders; ANN: neural networks; ANFIS: adaptive neuro-fuzzy
inference system; PATSOS: neuro-fuzzy controller forecasting system; WT-CATCN: wavelet transform-based casual multi-
head attention temporal convolutional network; ARIMA: autoregressive integrated moving average; CNN: convolutional neural
networks; MLP: multilayer perceptron; LSTM: Long Short-Term Memory; Seq2Seq: Sequence to Sequence; BNN: Bayesian
neural networks; SFM: state-frequency memory recurrent neural networks. KNN: k-nearest neighbours, Logit: logistic
regression, NB: naive Bayes, RF: random forest, SVM: support vector machine, XGBoost: extreme gradient boosting; ANN:
artificial neural networks; OLS: ordinary least squares; Ridge: ridge regression; LASSO: least absolute shrinkage and selection
operator; AdaBoost: adaptive boosting; SVR: support vector regression; DWT: wavelet transform. BART: Binary
Autoregressive Tree model; CNN: Convolutional Neural Networks; DFNN: Deep Forward Neural Networks; GRU:
Gated Recurrent Units

24
Table 2. Definitions of the variables.

Variable Acronym Source

Bitcoin BTC BTC is a cryptocurrency. Source: investing.com.

Standard & Poor 500 index. A market-capitalization-weighted


S&P500 SP500 index of 500 leading publicly traded companies in the U.S. Source:
Bloomberg.

Gold Gold Gold commodity. Source: Bloomberg.

Brent crude oil


Oil Brent crude oil. Source: Bloomberg.
future

Volatility VIX VIX index. Source: Federal Reserve Bank of St. Louis

U.S. dollar index USDI Source: Federal Reserve Bank of St. Louis

5-Year Forward Projected inflation over a five-year period beginning five years
YIFR
Inflation Expect from now. Source: Federal Reserve Bank of St. Louis.

Ethereum ETH ETH cryptocurrency. Source: investing.com.

Litecoin LTC LTC cryptocurrency. Source: investing.com.

Ripple XRP XRP cryptocurrency. Source: investing.com.

Daily normalized search volume for the term ‘inflation’ on Google


Inflation INF
trends in the USA. Source: Google trends.

Daily normalized search volume for the term ‘recession’ on Google


Recession REC
in the USA. Source: Google trends.

The GPR index reflects automated text-search results of the


Geopolitical risk GPR electronic archives of 10 newspapers. Source: Caldara and
Iacoviello (2022)

Twitter-based The TEU-USA is the total number of daily English-language tweets


economic TEU in the United States that include both uncertainty and economy
uncertainty (USA) phrases. Source: Baker et al. (2021)

Score on a sentiment index based on newspapers dealing with


New-based
NSI topics pertaining to the economic issues. Source:(Shapiro et al.
sentiment index
(2020).

Infectious disease
Index corresponds to newspaper-based Infectious Disease Equity
equity market INFECT
Market Volatility Tracker. Source Baker et al. (2019)
volatility index

Climate change PACC The daily normalized search volume for the term ‘climate change’
on Google trends in the USA. Source: Google trends.

25
Table 3. Descriptive statistics
Standard
Mean Median Deviation Kurtosis Skewness Minimum Maximum
BTC 16240.16 8737.85 17338.0647 0.2438 1.2669 513.4 67527.9
LTC 87.9542 62.545 67.7193 1.1989 1.1425 -21.5801 386.82
SP 500 3148.612 2896.245 742.3328 -0.5327 0.7681 1775.474 5585.957
Gold 1516.442 1437.177 270.4096 -1.2032 0.3980 948.7482 2369.536
Oil 60.8540 60.215 18.0246 1.4916 0.6131 -14.56 134.7699
VIX 18.5049 16.305 9.1102 9.3730 2.2638 0.450585 89.8002
USDI 114.2861 114.3591 3.3550 0.7670 0.2527 103.4899 127.5749
YIFR 1.9764 2 0.2441 0.59400 -0.6134 0.86 2.67
ETH 888.0576 295.625 1173.9198 1.1893 1.5560 6.7 4808.38
INF 46.9106 45.30922 16.6495 5.9768 1.5795 11.2862 177.8136
REC 15.8089 12.4896 20.3228 20.9517 3.692 -20.1914 194.4609
GPR 99.4759 90.01543 57.2009 9.9916 2.2813 3.5695 539.5826
TEU 129.4994 93.05706 119.9770 12.8128 3.0591 6.3049 1134.894
INFECTION 7.6142 0.905 12.0921 9.4655 2.5237 8.88E-16 112.93
NSI -0.0205 0.014567 0.20789 1.2870 -1.0292 -0.67612 0.333454
PACC 24.7938 21.45019 19.0926 23.2655 3.2280 -17.0035 251.6004

26
Figure 1. Correlation matrix

27
Table 4. FS-SHAP values algorithm based on feature importance selection
Algorithm 1: FS-SHAP value

1. Train Extra trees based on (X, Y)


2. Tune model (Extra trees) using all features
3. Compute the Shapley value for all features
4. Select the k highest-ranking features
5. Data set with key features (X, Y)
6. Model training and hyper-parameters tuning based on key features (X, Y)
7. Prediction performance evaluation

28
Figure 2. Conceptual representation of the research methodology based on the explainable machine
learning framework

29
Table 5. Monoscale and multiscale (VMD-based) causality of set of exogenous variables on the bitcoin’s price.
IMF1 IMF2 IMF3 Granger IMF1 IMF2 IMF3
causality
F-stat c-value F-stat c-value F-stat c-value F-stat
LTC → BTC 10.884 2.6090 59.338 2.3761 62.6861 2.9999 11.711 3.8457
S&P500 → BTC 0.1774 3.8458 1.4317 3.8458 0.9098 3.8458 1.1275 3.8457
Gold → BTC 0.0068 3.8458 0.2423 3.8458 0.3882 3.8458 0.0021 3.8457
Oïl → BTC 0.5248 3.8458 2.2635 3.8458 0.0615 3.8458 0.1888 3.8457
VIX → BTC 0.0065 3.8458 0.6779 3.8458 0.6415 3.8458 0.8784 3.8457
USDI → BTC 1.1399 3.8458 0.3227 3.8458 0.1533 3.8458 0.9124 3.8457
YIFR → BTC 2.7714 3.8458 0.0092 3.8458 0.0245 3.8458 1.3134 3.8457
ETH → BTC 26.851 2.2182 75.169 2.3761 209.96 2.3761 27.302 3.8457
INF → BTC 5.9020 3.8458 1.9152 3.8458 0.9289 3.8458 0.0052 3.8457
REC → BTC 0.8637 3.8458 3.6704 3.8458 0.1346 3.8458 2.0575 3.8457
GPR → BTC 0.7294 3.8458 0.1144 3.8458 6.8142 3.8458 0.0275 3.8457
TEU → BTC 0.1458 3.8458 0.8722 3.8458 4.2877 2.9999 0.9615 3.8457
INFECT → BTC 0.4189 3.8458 2.6627 3.8458 1.4720 3.8458 0.9301 3.8457
NSI → BTC 1.3053 3.8458 0.1576 3.8458 0.0583 3.8458 0.0363 3.8457
PACC → BTC 0.0005 3.8458 0.2102 3.8458 0.0053 3.8458 0.8670 3.8457
LTC → BTC 10.884 2.6090 59.338 2.3761 62.6861 2.9999 11.711 3.8457
S&P500 → BTC 0.1774 3.8458 1.4317 3.8458 0.9098 3.8458 1.1275 3.8457
Gold → BTC 0.0068 3.8458 0.2423 3.8458 0.3882 3.8458 0.0021 3.8457
Oïl → BTC 0.5248 3.8458 2.2635 3.8458 0.0615 3.8458 0.1888 3.8457
VIX → BTC 0.0065 3.8458 0.6779 3.8458 0.6415 3.8458 0.8784 3.8457
USDI → BTC 1.1399 3.8458 0.3227 3.8458 0.1533 3.8458 0.9124 3.8457
YIFR → BTC 2.7714 3.8458 0.0092 3.8458 0.0245 3.8458 1.3134 3.8457
ETH → BTC 26.851 2.2182 75.169 2.3761 209.96 2.3761 27.302 3.8457
INF → BTC 5.9020 3.8458 1.9152 3.8458 0.9289 3.8458 0.0052 3.8457
REC → BTC 0.8637 3.8458 3.6704 3.8458 0.1346 3.8458 2.0575 3.8457
GPR → BTC 0.7294 3.8458 0.1144 3.8458 6.8142 3.8458 0.0275 3.8457
TEU BTC 0.1458 3.8458 0.8722 3.8458 4.2877 2.9999 0.9615 3.8457
INFECT →BTC 0.4189 3.8458 2.6627 3.8458 1.4720 3.8458 0.9301 3.8457
NSI → BTC 1.3053 3.8458 0.1576 3.8458 0.0583 3.8458 0.0363 3.8457
PACC → BTC 0.0005 3.8458 0.2102 3.8458 0.0053 3.8458 0.8670 3.8457
LTC → BTC 10.884 2.6090 59.338 2.3761 62.6861 2.9999 11.711 3.8457
S&P500 → BTC 0.1774 3.8458 1.4317 3.8458 0.9098 3.8458 1.1275 3.8457
Gold → BTC 0.0068 3.8458 0.2423 3.8458 0.3882 3.8458 0.0021 3.8457
Oïl → BTC 0.5248 3.8458 2.2635 3.8458 0.0615 3.8458 0.1888 3.8457
VIX → BTC 0.0065 3.8458 0.6779 3.8458 0.6415 3.8458 0.8784 3.8457
USDI → BTC 1.1399 3.8458 0.3227 3.8458 0.1533 3.8458 0.9124 3.8457
YIFR → BTC 2.7714 3.8458 0.0092 3.8458 0.0245 3.8458 1.3134 3.8457
ETH → BTC 26.851 2.2182 75.169 2.3761 209.96 2.3761 27.302 3.8457
INF → BTC 5.9020 3.8458 1.9152 3.8458 0.9289 3.8458 0.0052 3.8457
REC → BTC 0.8637 3.8458 3.6704 3.8458 0.1346 3.8458 2.0575 3.8457
GPR → BTC 0.7294 3.8458 0.1144 3.8458 6.8142 3.8458 0.0275 3.8457
TEU → BTC 0.1458 3.8458 0.8722 3.8458 4.2877 2.9999 0.9615 3.8457
INFECT → BTC 0.4189 3.8458 2.6627 3.8458 1.4720 3.8458 0.9301 3.8457
NSI → BTC 1.3053 3.8458 0.1576 3.8458 0.0583 3.8458 0.0363 3.8457
PACC → BTC 0.0005 3.8458 0.2102 3.8458 0.0053 3.8458 0.8670 3.8457
NSI → BTC 1.3053 3.8458 0.1576 3.8458 0.0583 3.8458 0.0363 3.8457
PACC → BTC 0.0005 3.8458 0.2102 3.8458 0.0053 3.8458 0.8670 3.8457
LTC → BTC 10.884 2.6090 59.338 2.3761 62.6861 2.9999 11.711 3.8457
S&P500 → BTC 0.1774 3.8458 1.4317 3.8458 0.9098 3.8458 1.1275 3.8457
Gold → BTC 0.0068 3.8458 0.2423 3.8458 0.3882 3.8458 0.0021 3.8457
Oïl → BTC 0.5248 3.8458 2.2635 3.8458 0.0615 3.8458 0.1888 3.8457

30
(a) (b)

(c) (d)

Figure 3. Cross-correlation between bitcoin and fifteen exogenous variables.

The last subfigure on the right-bottom represents the autocorrelation function of bitcoin.

31
Figure 4. SHAP feature importance based on Extra Tree Regression (ET) model.

Red and blue dots indicate the positive and negative impact of the features on the outcome. The y-axis in
represents the feature importance, which shows the impact of each feature on the bitcoin price of the model.
The higher the value on the y-axis, the more important the feature is for predicting the outcome.

32
Table. 6. The performance of the univariate forecasting models

Variables
LSTM DL SVM LR RF XGBoost ET
LTC R2OOS 0.896 0.285 0.292 -0.091 0.201 0.253 0.163
18.819 47.312 62.004 52.988 51.315 54.582
RMSE 0.508
SP 500 R2OOS 0.115 -0.591 -0.588 -0.346 -0.667 -0.606 -0.733
0.168
RMSE 365.322 493.181 492.921 453.774 505.033 495.671
Gold R2OOS 0.208 0.001 -0.082 -0.227 -0.349 -0.072 -0.423
0.168
RMSE 116.488 130.811 136.319 145.169 152.166 135.647
Oil R2OOS -0.199 -2.321 -2.412 -2.027 -2.665 -2.677 -2.540
0.316
RMSE 18.952 31.610 32.053 30.188 33.219 33.273
VIX R2OOS 0.525 0.023 0.074 -0.276 -0.279 0.044 -0.435
0.081
RMSE 4.045 5.799 5.651 6.634 6.643 5.743
USDI R2OOS 0.059 -0.781 -1.380 -0.412 -1.109 -0.854 -1.331
RMSE 1.6 0.025 4.001 3.448 1.454 3.753 4.187
YIFR R2
OOS 0.069 -1.813 -1.407 -2.788 -2.231 -2.182 -2.225
RMSE 0.131 0.025 4.001 3.260 3.984 3.735 4.187
ETH R2OOS 0.053 -2.109 -2.039 -2.119 -2.197 -2.185 -2.194
0.559
RMSE 836.962 1536.039 1519.117 1539.038 1558.166 1555.133
INF R2OOS 0.351 -0.585 -0.537 -0.580 -0.690 -0.664 -0.665
0.311
RMSE 16.061 25.273 24.893 25.242 26.103 25.900
REC R2OOS 0.865 -0.184 -0.230 -0.067 -0.210 -0.168 -0.168
0.006
RMSE 10.220 31.119 31.729 29.548 31.459 30.914
GPR R2OOS 0.431 -0.373 -0.311 -0.332 -0.442 -0.334 -0.444
0.256
RMSE 66.110 102.811 100.506 101.289 105.392 101.396
TEU R2OOS 0.051 -0.537 -0.426 -1.187 -1.502 -0.728 -1.707
0.072
RMSE 50.024 64.102 61.904 76.665 82.004 68.149
INFECTI -1.329
R2OOS 0.545 -0.846 -0.588 -1.008 -1.201 -0.561
ON
RMSE 8.358 0.048 10.138 10.776 11.281 9.50 11.590
NSI R2OOS 0.019 -0.190 -0.237 -0.546 -0.312 -0.199 -0.339
RMSE 0.040 0.003 0.114 0.125 0.115 0.114 0.116
PACC R2OOS 0.814 -0.06 0.000 -0.018 -0.057 0.002 -0.082
10.223 0.040 27.832 28.139 28.716 27.864 28.989
RMSE
Notes: Daily out-of-sample predictive R2OOS and RMSE of forecast models for different ML models

33
Table 7. The performance of the multivariate forecasting
Feature Granger causality
selection FS-SHAP VMD causality
technique
FS-SHAP IMF1 IMF2 IMF3
Number of 2
10 3 2 4
features
Models R2OOS RMSE R2OOS RMSE R2OOS RMSE R2OOS RMSE R2OOS RMSE
LSTM 0.689 2.031 0.419 8.000 -0.110 876.139 -0.065 836.962 -0.110 876.139
DL -0.274 3.811 0.350 15.496 -3.032 1572.61 -3.065 1536.039 -3.032 1572.61
SVM -0.156 43.257 0.403 20.770 -3.251 1570.081 -0.860 1519.117 -3.251 1570.081
LR -0.737 36.755 0.096 11.588 -2.621 1674.312 -37.503 1539.038 -2.621 1674.312
RF -0.013 28.068 -0.363 12.258 -2.432 1581.789 -1.248 1558.166 -2.432 1581.789
XGBoost -0.011 28.041 -0.390 12.138 -2.503 1607.513 -1.111 1555.133 -2.503 1607.513
ET 0.010 27.789 -0.331 8.924 -2.593 1637.223 -0.714 72.111 -2.593 1637.223

34
Low BTC price when
GPR < ~ 19 Low BTC price when
GPR < ~ 62

Low BTC price when


YIFR < ~ 1.80

Low BTC price when


SP 500 < ~ 2617

Low BTC price when


PACC > ~ 32

Low BTC price when


INF < ~ 35

Figure 5. SHAP local dependence interpretability plots for the six important features based on Extra
Trees model.

35
Figure 6. Local interpretation plots of the ten most important features based on FS-TreeSHAP value (ET model).

36
(a) sample number: 2070

(b) sample number: 2161


Figure 7. depicts SHAP force plots for two randomly selected samples during the Russia-Ukraine war: (a) sample 2070,
and (b) sample 2161, from April and July 2022 respectively, based on the ET model.

37

You might also like