Financial Sentiment Analysis Survey
Financial Sentiment Analysis Survey
2
3 KELVIN DU, School of Computer Science and Engineering, Nanyang Technological University, Singapore
4
5
FRANK XING, Department of Information Systems and Analytics, National University of Singapore, Singapore
6 RUI MAO∗ , School of Computer Science and Engineering, Nanyang Technological University, Singapore
7
8
ERIK CAMBRIA, School of Computer Science and Engineering, Nanyang Technological University, Singapore
9
Financial Sentiment Analysis (FSA) is an important domain application of sentiment analysis that has gained increasing attention in
10
the past decade. FSA research falls into two main streams. The first stream focuses on defining tasks and developing techniques for FSA,
11
12 and its main objective is to improve the performances of various FSA tasks by advancing methods and using/curating human-annotated
13 datasets. The second stream of research focuses on using financial sentiment, implicitly or explicitly, for downstream applications on
14 financial markets, which has received more research efforts. The main objective is to discover appropriate market applications for
15 existing techniques. More specifically, the application of FSA mainly includes hypothesis testing and predictive modeling in financial
16
markets. This survey conducts a comprehensive review of FSA research in both the technique and application areas and proposes
17
several frameworks to help understand the two areas’ interactive relationship. This article defines a clearer scope for FSA studies and
18
conceptualizes the FSA-investor sentiment-market sentiment relationship. Major findings, challenges, and future research directions
19
20
for both FSA techniques and applications have also been summarized and discussed.
21
CCS Concepts: • Computing methodologies → Natural language processing; Neural networks; Machine learning algorithms;
22
• Information systems → Information retrieval; Sentiment analysis; • Applied computing → Law, social and behavioral
23
24
sciences.
25
Additional Key Words and Phrases: Financial Sentiment Analysis, Financial Forecasting, Natural Language Processing, Information
26
System, Machine Learning, Deep Learning
27
28 ACM Reference Format:
29 Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. 2024. Financial Sentiment Analysis: Techniques and Applications. 1, 1 (Febru-
30
ary 2024), 40 pages. https://doi.org/XXXXXXX.XXXXXXX
31
32
33
1 INTRODUCTION
34 Sentiment analysis is a field of study that analyzes people’s sentiments, attitudes, opinions, emotions, evaluations, and
35
appraisals towards various entities such as events, topics, services, products, individuals, organizations, issues, and
36
37 their attributes [92]. Financial Sentiment Analysis (FSA), which in broad terms studies investor sentiment and financial
38 textual sentiment [78], is an important domain application for sentiment analysis. Given the intricate nature of the
39
∗ Corresponding author.
40
41 Authors’ addresses: Kelvin Du, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore, [email protected].
42 edu.sg; Frank Xing, Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore, [email protected]; Rui
43 Mao, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore, [email protected]; Erik Cambria, School
44 of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore, [email protected].
45
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
46
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
47
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
48
redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
49
© 2024 Association for Computing Machinery.
50 Manuscript submitted to ACM
51
52 Manuscript submitted to ACM 1
2 Du, Xing, Mao, and Cambria.
53 financial market, individuals involved in varying market conditions exhibit diverse cognitive patterns [111], rendering
54
it challenging to dynamically comprehend and analyze the market for robust financial decision-making. To address the
55
56
challenge posed by the market’s active shifts, automated FSA has gained increasing attention in the past decade [170].
57 It is proven to be a powerful tool to support business decision-making and perform financial forecasting [101, 102].
58 The application scenarios include corporate disclosures, annual reports, earning calls, financial news, social media
59
interactions, and more [170, 183]. Sentiment analysis is a suitcase problem and domain-dependent. The phenomenon
60
61
of domain-dependence is more pronounced in the finance domain [106] because of both topic concentration and
62 the use of highly professional language. For example, a word such as “liability” and “debt” is considered negative in
63 general-purpose sentiment analysis, whereas it is frequent and has a neutral meaning in the financial context.
64
In terms of where the ground truth comes from, financial sentiment indicators are categorized into market-derived [84]
65
66
and human-annotated sentiments [99]. The market-derived sentiments are computed proxies from market dynamics,
67 such as price movement and trading volume, thus, may include noise from other sources. For example, generally
68 positive news is related to large changes in price for a short time, while the effect of negative news lasts longer [35].
69
The subjective human-annotated sentiments, however, are specifically labeled by professionals [106] or investors
70
71
themselves [181]. FSA has received great attention from researchers and investors and has become a prominent and
72 interesting research topic in recent years [5, 158] mainly due to the increase in online materials such as digital news,
73 World Wide Web, and social media. FSA research is shifting from human-annotated to market-derived sentiment. More
74
specifically, the application of FSA in financial forecasting has become more popular in recent years.
75
76
In the realm of FSA research, Kearney and Liu [78] conducted a comprehensive survey in 2014, focusing primarily on
77 FSA techniques rooted in dictionaries and machine learning. [107] presented a brief review of various FSA methodologies
78 in 2019, encompassing data sources, lexicon-based approaches, traditional machine learning, and deep learning tech-
79
niques. While prior reviews have tended to be skewed towards either FSA techniques or applications [78, 107, 127, 183],
80
81
this survey aims to provide a comprehensive review of the most recent FSA research bridging two aspects of this
82 spectrum. We believe that linking both techniques and applications can enable researchers to have an overarching
83 understanding of FSA studies and more importantly, facilitate better adoption of FSA in downstream applications to
84
generate more promising results. Our work entails an extensive examination of the most recent FSA studies, offering
85
86
a dual perspective from both technical and applied standpoints. Notably, our investigation extends beyond the con-
87 fines of computer science literature, establishing connections with other disciplines such as information systems and
88 finance. In particular, we delve into the foundational principles of financial forecasting, lending support to the market
89
predictability of financial sentiment from a financial theory standpoint. We have meticulously defined the scope of FSA
90
91
research, reaffirming the intricate relationship between FSA, investor sentiment, and market sentiment. Furthermore,
92 we have scrutinized the genesis of financial sentiment, whether implicit or explicit, in its applications within financial
93 markets. This analysis sheds light on the dynamic interplay between FSA techniques and their practical applications,
94
ultimately facilitating a more seamless integration of financial sentiment in downstream tasks. Besides, we deliver
95
96
structured summaries for different technical trends, tasks, features, and applications. Finally, building upon the most
97 recent brief survey on FSA [107], we additionally review FSA tasks with more recent benchmark datasets, learning
98 approaches, pre-trained language models, word representation techniques, and evaluation methods. We also highlight
99
FSA applications, including data sources, hypothesis testing, and predictive modeling.
100
101
This study aims to answer the following four groups of research questions. For easy navigation, our findings are
102 elaborated in Section 6.
103
104 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 3
105 (1) FSA studies have evolved over the years with more data available. What is the scope of FSA in today’s context,
106
and what is the relationship among FSA, investor sentiment, and market sentiment?
107
108
(2) What trends are emerging from the latest tasks, benchmark datasets, and methods in the FSA technique studies?
109 (3) FSA has been widely used in financial applications since Engle and Ng suggested the asymmetric and affective
110 impact of news on market volatility in 1993. How many data sources, tasks, methods, and financial markets
111
have been researched in FSA applications?
112
113
(4) How financial sentiment is involved in financial forecasting and the focus is on FSA techniques or applications?
114 Our contributions can be summarized from the following four aspects:
115
116 (1) We have conducted a comprehensive review of the latest FSA studies from both the technique and application
117 perspective. This effort fills the gap in the literature by having a detailed and referential anchoring point for
118
FSA research.
119
120
(2) Our field of investigation goes beyond the computer science literature and links to other disciplines, such as
121 information systems and finance. Specifically, we review the underlying principles of financial forecasting and
122 provide support to the market predictability of financial sentiment from a financial theory perspective.
123
(3) We have defined the scope of FSA research and re-confirmed the relationship among FSA, investor sentiment,
124
125
and market sentiment.
126 (4) We have reviewed how financial sentiment is generated, implicitly or explicitly, during its applications in financial
127 markets, and the interactive relationship between FSA techniques and applications, which will facilitate a better
128
adoption of financial sentiment in downstream application tasks.
129
130 The remainder of this article is organized as follows: Section 2 provides the background of FSA, including its definition,
131 motivation and importance; Section 3 provides the literature review framework; Section 4 and Section 5 review existing
132
studies on FSA techniques and applications, respectively; Section 6 demonstrates the research findings of this survey;
133
134
Section 7 lists challenges and future directions; finally, Section 8 offers concluding remarks.
135
136 2 BACKGROUND
137
The term “sentiment” is used in the context of analyzing evaluative texts automatically and detecting predictive
138
139 judgments from negatively and positively opinionated texts [15]. This term first appeared in the studies by [28]
140 and [165], where researchers were interested in market sentiment analysis [158]. Traditionally, investor sentiment is
141
collected via surveys which ask for opinions on the markets regularly. With the advancement in textual data such as
142
news texts, social media collections, and automatic processing technologies, the media has become an important source
143
144 of investor sentiments. The task of FSA is to perform sentiment analysis from financial texts.
145 In computational finance, the adoption of robo-readers to process and analyze texts are emerging technology
146
trends [106]. It is an area of knowledge that emerged in the 1980s that uses computational methods to solve problems in
147
finance. From this perspective, FSA is also a research area under computational finance. Generally, FSA techniques refer
148
149 to the methods to perform sentiment analysis (e.g., extraction of sentiment polarities or intensities) from financial texts,
150 which could be categorized into the lexicon, machine learning, deep learning, hybrid, and pre-trained language model
151
approaches. In terms of the applications of FSA, which refers to the adoption of financial sentiment in downstream
152
tasks such as hypothesis testing and predictive modeling, the most important application of FSA is the forecasting of
153
154 financial markets. Efficient Market Hypothesis (EMH), proposed by Fama in the 1970s [47], is a critical foundation of
155 modern financial market analysis. It hypothesizes that financial markets are efficient and the price has incorporated all
156 Manuscript submitted to ACM
4 Du, Xing, Mao, and Cambria.
157 available market information. To tie the concept to reality, Fama classified the efficient market into three forms, namely
158
strong, semi-strong, and weak form. Under weak form, it is assumed that the information set is merely historical prices,
159
160
and that any current or private information will not influence the market. The weak form test has been renamed to tests
161 for return predictability in [48] by Fama. The semi-strong form, which is changed to event studies in [48], states that
162 stock prices reflect all public and historical information, while private information fails to influence market movements.
163
The strong form describes a market condition, in which the price of securities reflects all information including public,
164
165
private, and historical price information with a presumption that it is free to trade and access information, which is
166 rarely the case. It is worth noting that Fama acknowledged that such a market is rarely seen in reality, but is useful for
167 theoretical purposes only. However, the intellectual dominance of EMH had become less universal by the start of 21𝑠𝑡
168
century. A large number of statisticians and financial economists began to believe that stock prices could be predicted
169
170
at least partially [105]. Under behavioral finance, investors make decisions from a psychological perspective [3], and
171 their state of mind, or sentiment, influences them when making that decision [11, 98]. Today, the debate on market
172 predictability is not about whether or not investor sentiment affects markets anymore but how to measure the sentiment
173
and quantify its effect [7].
174
175
176 Express in Approximate or
177 Media Aggregate to
178
179
180 Investor Sentiment - General outlook of investors
Economics Politics
181 toward a particular security or
182 General financial market
Company Finance - Influenced by, or equivalent to
183
-Specific …… • Subjective Judgement for aggregated investor sentiment
184 Current and Future - Measured by calculated
Subjective Judgement • Behavioral Characteristics indicators such as CBOE
185
Objective Reflection • Form and Impact Markets Volatility Index (VIX)
186
187
Asymmetrically
188 Financial Sentiment Market Sentiment
189
190 1 2 3
191 Data Source Measured by Financial Express in and Measured by Market
192 Sentiment as Proxy Measured by Surveys Sentiment Indicators
193
• Financial reports, filings and earnings • Baker and Wurgler Index
• AAII Sentiment Survey
194 calls etc. • CBOE Volatility Index (VIX)
• Sentix
• Public news e.g., financial news, • Bullish Percent Index (BPI)
195 • Michigan Consumer Sentiment Index (MCSI)
macroeconomic news etc. • Margin Debt
• Investors Intelligence Advisors Sentiment
196 • Social media e.g., Twitter, StockTwits etc. • Put-Call Ratio / High-Low Index
197
198
Fig. 1. Financial Sentiment, Investor Sentiment, and Market Sentiment.
199
200
201
202 2.1 Relationship among Different Sentiment Agents
203
The relationship among market sentiment, investor sentiment, and financial textual sentiment is illustrated in Figure 1.
204
205
Firstly, investor sentiment, which indicates the degree of deviation of an asset value from its economic fundamentals,
206 can be defined as investors’ optimism or pessimism about future market activity [6] or as the way investors form
207 beliefs [10]. Investor sentiment can be expressed and measured in two main forms including survey and financial
208 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 5
209 texts [199]. The popular surveys include American Association of Individual Investors (AAII) Investor Sentiment
210
Survey 1 , Sentix Investor Confidence 2 , or Investors Intelligence Sentiment Index 3 . The AAII Investor Sentiment Survey
211
212
provides valuable insights into the perspectives of individual investors regarding the future direction of the market
213 over a six-month period through a weekly survey in which investors can vote Bullish, Neutral, or Bearish. The Sentix
214 Investor Confidence Index assesses the prospective economic outlook for the eurozone over a six-month period, and is
215
derived from a comprehensive survey involving investors and analysts. A reading surpassing zero signifies a positive or
216
217
optimistic outlook, while a reading below indicates a negative or pessimistic perspective. The Investors Intelligence
218 Sentiment Index operates on contrarian principles and conducts surveys of more than one hundred independent market
219 newsletters, evaluating the current stance of each author regarding the market, whether it be bullish, bearish, or
220
indicating a correction. Investor sentiment can also be measured through textual data such as microblogs and analyst
221
222
reports. This measure is derived from various communication platforms that exist on the internet and can be a proxy for
223 investor sentiment or non-informational trading. In this sense, [3] summarized that some studies have adopted investor
224 sentiment derived from social networks such as Twitter [196], StockTwits [140, 146] or Facebook [156]; message boards
225
such as RagingBull.com [168], Yahoo! Finance [81], or Google searches [27].
226
227
Secondly, financial (textual) sentiment is measured by the degree of positivity or negativity in financial texts. Investor
228 sentiment and financial textual sentiment are not independent but connect with each other. Investor sentiment could
229 be measured by financial textual sentiment, especially with the increase of financial textual data today. This is because
230
financial textual sentiment contains both subjective and objective information. The subjective information includes
231
232
subjective judgment and analysis from investors and analysts, which is normally published on social media and self-
233 media. The objective information, e.g., political and macroeconomics news, break news, and annual reports released by
234 companies, is the objective reflection of conditions within the general environment, industries, markets, and firms [78].
235
The objective information is often leading in the sense that it is influential to investors’ judgment, while the subjective
236
237
information is often lagging because it is driven by investor sentiment and reflects investors’ opinions. Thus, financial
238 sentiment interacts with investor sentiment. From this perspective, financial sentiment can serve as a measure or proxy
239 of investor sentiment, which subsequently influences trading strategies in the markets. Financial textual sentiment
240
analysis differs from classic sentiment analysis in several key aspects. Firstly, it involves the frequent use of metaphorical
241
242
expressions in financial communication, where figures of speech are employed to convey sentiments or describe market
243 conditions. For example, “The market is riding a bull” is a common metaphor signifying a robust, upward market
244 movement. Secondly, precision and brevity are of paramount importance in the financial world. Professionals use
245
concise language to efficiently convey complex information. For instance, instead of stating “The company experienced
246
247
a substantial increase in revenue and a corresponding improvement in profitability,” a financial analyst might highlight,
248 “The company posted robust revenue growth, driving higher profits.”, which requires FSA to decode sentiments from
249 concise sentence structures. Thirdly, the financial industry employs a unique set of terms and jargon with specific
250
meanings. A thorough understanding of these terms is crucial for accurate interpretation and analysis of financial texts
251
252
in FSA. For example, the “Price-to-Earnings (P/E) ratio” is a fundamental financial metric used to assess a company’s
253 valuation. A high P/E ratio may indicate that investors hold high expectations for future earnings. Furthermore, unlike
254 classic sentiment analysis, which typically focuses on text alone, financial texts often integrate qualitative text with
255
quantitative data, which requires FSA to not only understand the language used in financial texts but also to process
256
257
1 https://www.aaii.com/sentimentsurvey
258 2 https://www.sentix.de/
259 3 https://www.investorsintelligence.com/x/us_advisors_sentiment.html
261 and analyze numerical information in conjunction with the textual context, to gain a comprehensive understanding of
262
the sentiment. Lastly, FSA is often direction-dependent and the direction of events or changes holds critical importance
263
264
in FSA. For instance, the word “profit” may carry both positive and negative sentiment depending on the direction. An
265 increase in profit is generally regarded as positive, while a decrease is seen as negative. In practice, [181] concluded
266 that there are six areas that cause FSA fail, i.e., irrealis mood (conditional mood, subjunctive mood, imperative mood),
267
rhetoric (negative assertion, personification, sarcasm), dependent opinion, unspecified aspects, unrecognized words
268
269
(entity, microtext, jargons), and external reference. Understanding these specificities is crucial for accurately analyzing
270 financial sentiment, as they provide context and nuance to the language used in financial texts.
271 Further, market sentiment is the collective outlook of investors towards a specific financial market or security [160].
272
It is often used interchangeably with investor sentiment but fundamentally distinct as investors may hold varying
273
274
viewpoints at different periods and markets. Market sentiment reflects the trading behavior of investors in a specific
275 market, which is driven by investor sentiment or namely aggregated effect of investor sentiment [160]. It encapsulates
276 the prevailing atmosphere or mood within the market, representing the crowd’s psychological disposition, discernible
277
through the trading activity and price fluctuations of the securities being exchanged. Broadly speaking, ascending prices
278
279
signal an optimistic or bullish market sentiment, whereas descending prices signal a pessimistic or bearish market
280 sentiment. The influence of investor sentiment on the market is asymmetric, which means the impact of investor
281 sentiment on the market varies in different regimes of the market. Market sentiment can be measured by various proxy
282
financial metrics, such as the degree of price movements and volatility computed from historical market data, and thus
283
284
are backward-looking, lagging indicators. One of the most well-known market sentiment indicators is the Chicago
285 Board Options Exchange (CBOE) Volatility Index (VIX) 4 , which measures expected market volatility based on real-time
286 prices of the S&P 500 Index options over the next 30 days. The VIX tends to be higher when there is a greater level of
287
fear and uncertainty in the market and lower in bull markets. Besides, the Equity Market Sentiment Index (EMSI) [9],
288
289
High-Low Index, Bullish Percent Index (BPI) and the Baker and Wurgler Index [6, 7] are also popular indices for market
290 sentiment or prevailing investor sentiment. Particularly, the Baker-Wurgler Index is generated from the first principal
291 component of six proxies from market variables which are CEFD, dividend premium, equity issues, first-day return, IPO
292
activity, and trading volume. [101] argued that the market sentiment of financial events does not equal to the semantic
293
294
sentiment of financial news. The former represents the market reflections towards financial events in financial asset
295 prices, while the latter represents the semantic understanding of the sentiment of news. [101] experimentally proved
296 that using market sentiment representations can better predict stock price movements than using semantic sentiment
297
representations.
298
299
300 2.2 FSA Research Scope
301
The scope of FSA studies can be broadly categorized into technique-driven and application-driven studies. The technique-
302
driven FSA study, similar to other domain adoption of sentiment analysis, focuses on sentiment analysis of financial
303
304 texts. However, the application-driven FSA study is unique to the finance domain, as financial sentiment can be used
305 as a proxy of investor sentiment to make predictions in financial markets. Fundamentally, the objectives and scopes
306
of FSA techniques and applications differ significantly. Nevertheless, an interactive relationship between techniques
307
and applications also exists. For instance, the benchmark datasets employed for FSA technique studies usually require
308
309 human annotation with sentiment polarities or intensity scores, which need to be sufficient, representative, and precise
310
311 4 https://www.cboe.com/tradable_products/vix/
313 to train an unbiased model in different domains. On the other hand, the data sources adopted for FSA application
314
studies are normally annotated by financial metrics, computed from the market data. They also require the data to
315
316
be in a time series with a substantial amount of financial texts and representative periods to model the relationships
317 between investor sentiment and financial metrics. As for the tasks, FSA techniques focus on the granularity which
318 refers to the level of sentiment at which the sentiment is detected (e.g., targeted aspect-based sentiment analysis vs.
319
sentence-level sentiment analysis). However, FSA applications explore various financial application scenarios, such as
320
321
stock market movement prediction, financial risk prediction, portfolio management, FOREX market prediction, and
322 cryptocurrency market prediction. The interactive relationship between FSA techniques and applications is twofold.
323 First, the sentiment analysis lexicons and models developed through FSA techniques could be used for FSA applications
324
by deriving the sentiment representation explicitly. Second, techniques such as feature engineering, text representation
325
326
methods, algorithms, and evaluation metrics can also be considered for FSA applications. One point to highlight is that
327 certain FSA applications such as portfolio management require more complex methods (e.g., reinforcement learning)
328 and evaluations (e.g., trading simulation). Considering the difference between FSA techniques and applications in task
329
definitions, datasets, methods, and their connections in learning algorithms, lexicon resources, and feature engineering
330
331
methods, we believe an appropriate scope of an FSA survey should cover both domains.
332
333 2.3 Summary of Conceptualization
334
To summarize, it is a consensus that investor sentiment affects market dynamics and the measure of investor sentiment
335
336
and quantification of its effect are critical to market prediction. Generally, investor sentiment can be measured by
337 financial textual sentiment, sentiment surveys, and indices constructed from market data. Financial textual sentiment
338 captures subjective judgment expressed by investors and also contains objective reflection, which drives investor
339
sentiment. Market sentiment is the reflection of investor sentiment in investment behaviors. Investor sentiment is the
340
341
psychological state of investors and it can be partially measured by financial textual sentiment and market sentiment,
342 because of the interactions between investor sentiment and the other two aspects. Thus, FSA can be defined as a field of
343 study that analyzes people’s sentiment from financial texts, measures and quantifies investor sentiment from financial
344
textual sentiment, and is finally grounded in the applications of market prediction and financial decision-making. We
345
346
define the scope of this survey as covering both FSA techniques and FSA applications, motivated by the distinct yet
347 interconnected nature of their datasets, methodologies, and targets.
348
349 3 LITERATURE REVIEW FRAMEWORK
350
351
The research in FSA can be categorized into two main streams. The first type focuses on the tasks in FSA, and its main
352 objective is to study the techniques that are able to improve the performance of various FSA tasks such as paragraph and
353 sentence-level sentiment analysis, (targeted) aspect-based sentiment analysis and development of financial lexicons and
354
sentiment analysis models [1, 4, 5, 30, 40, 58, 66, 72, 73, 77, 88, 96, 97, 99, 109, 110, 113, 122, 133, 136, 139, 147, 152, 157, 160,
355
356
178]. The other group is application-driven or market-driven, which has received more attention in recent years where
357 financial sentiment is treated as an intermediate output. The main objective is to use it for downstream applications,
358 such as causality and correlation testing and financial forecasting [6, 13, 20, 31, 32, 34, 38, 43, 59, 63, 70, 75, 76, 79, 80,
359
85, 86, 95, 101, 104, 124, 128, 134, 145, 150, 151, 153, 154, 162, 167, 172–174, 177, 179, 180, 184, 185, 187, 190, 198, 200].
360
361
The sentiment can be represented in an explicit or implicit manner. The explicit representation refers to the generation
362 of sentiment words, polarity, or intensity score [167], while the implicit representation of textual sentiment refers to
363 the generation of feature embedding [95, 101]. Figure 2 illustrates our survey framework.
364 Manuscript submitted to ACM
8 Du, Xing, Mao, and Cambria.
365 Background
366
367
Explicit
368
FSA FSA
369 Techniques Applications
370 Implicit
371
372
• Main objective is to study the techniques that can improve • Main objective is to exploit sentiment for financial applications, such as causality and
373 performance of various FSA tasks correlation testing and predictive modelling
374 • Human-driven annotation • Market-driven annotation
375 Dataset FSA Tasks Methods Evaluation Data Sources Financial Applications Financial Metrics Methods Evaluation
376 •PhraseBank •Coarse-grained/Fine- •Lexicon (HFD, LM, •Accuracy •Financial reports, •Causality and •Market Index •Time Series Models •Accuracy
•SemEval 2017 Task 5 grained SMSL, SentiEcon, •MCC filings and earnings Correlation Testing •Price (e.g., ARIMA and •MSE
377 •FiQA Task 1 •Polarity/Intensity Senti-DD, •WCS calls etc. •Stock Market •Price Movement GARCH and their •F1-Score
•Document-level FinSenticNet) •News headline and Prediction variants) •Precision/Recall
•Topic-Specific •MSE •Price Change
378 Sentiment Analysis •Sentence-level •Machine Learning social media data •Financial Risk •Machine Learning
•R-Squared •Return Rate •Spearman's Rho
•StockSen •Aspect-level •Deep Learning •Others Prediction •Deep Learning •Kendall’s Tau
•F1-Score •Volatility
•SentiEcon GS-1000 •Targeted Aspect- •Hybrid •Portfolio Management •Reinforcement
379 •Precision/Recall •Market Crash Risk
Learning
•Return
•FinLin based •Pre-trained Language •FOREX Market •Sharpe Ratio
•Trading Volume
Models Prediction
380 •SEntFiN 1.0 •Earnings •Sortino Ratio
•Word Representation •Cryptocurrency Market
Techniques Prediction •Cash Flow
381
382
Main Findings and Future Directions
383
384
Fig. 2. FSA Review Framework.
385
386
387
388 4 FSA TECHNIQUES
389
4.1 Tasks
390
391 Sentiment analysis can be performed in a coarse-grained [170] or fine-grained manner. The fine-grained sentiment
392
analysis can be studied from two perspectives: granularity and expression. Granularity refers to the level of sentiment
393
at which the sentiment is detected and it includes document-level [121], paragraph-level [52], sentence-level [195]
394
395 and aspect-level [141]. In the financial domain, the aspect-level approaches are known as Aspect-based FSA. To be
396 more granular, the target element can be introduced where the sentiment detection is for that particular target. This
397
is also known as stance detection defined by [159], or Targeted FSA. The task is to detect the text that is favorable
398
or unfavorable to a specific given target. The most challenging but pragmatic task is called Targeted Aspect-based
399
400 FSA (TABFSA), which aims to extract entities and aspects and detect their corresponding sentiment in financial texts.
401 While most of the current FSA studies still adopt a sentiment polarity detection fashion (i.e., classification to positive or
402
negative), sentiment can be also expressed by intensity score that is more consequential and nuanced for FSA compared
403
to other sentiment analysis domains. Thus, intensity score-based FSA requires models in a regressive fashion.
404
405
406 4.2 Benchmark Datasets
407
The textual data used for FSA include email communications, social media posts (e.g., tweets), corporate reports, and
408
409 daily news [18]. Financial corpora are labeled through either manual annotation [21, 35, 52, 57, 123, 181] or based on
410 stock price [84]. Popular benchmark datasets have been summarized in Table 1. We summarize available FSA benchmark
411
datasets not only to understand trends and templates in FSA annotation but also to facilitate researchers to find various
412
public datasets that could be employed to evaluate their model performance and improve model generalization. Overall
413
414 there is one document-level, four sentence-level, two target-level, and one targeted aspect-level dataset. The annotation
415 is becoming more granular on target- and aspect-level. It is also observed that each public release of FSA benchmark
416 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 9
469 detect the targets, aspects, and their corresponding sentiment scores.
470
471
"sentence": "Royal Mail chairman Donald Brydon set to step down", "info": ["snippets": "[’set to step down’]", "target": "Royal Mail",
472
"sentiment_score": "-0.374", "aspects": "[’Corporate/Appointment’]"]
473
474
475
4.2.4 Topic-Specific Sentiment Analysis. [163] has created a benchmark dataset, which has 297 news documents
476 extracted from the Thomson Reuters Newswire, for topic-specific sentiment analysis of economic texts. It has covered
477 ten event topics that have significant financial impact such as Apple’s iPad, the EuroZone crisis, GM’s IPO, and the
478
United-Continental merger. The 297 selected documents are equally distributed across all topics. A team of three
479
480
experienced annotators is instructed to read and annotate the news documents as if they were investors in the company
481 that was described in the topic statement [163]. The annotation uses a 7-point scale from very negative, negative, slightly
482 negative, neutral, slightly positive, positive to very positive. The Kappa statistic, Intraclass Correlation, Robinson’s
483
A and Finn coefficient, and average percentage agreement are used to evaluate the degree of agreement between
484
485
annotators and measure how reliable the annotation scheme is. This dataset is not publicly released but is available on
486 a request basis.
487
488 4.2.5 StockSen [181]. The StockSen dataset consists of 55,171 financial tweets from StockTwits dated between 2019-
489 06-06 and 2019-08-26. This dataset uses user annotations to investigate the common mistakes made by the lexicon,
490
machine learning, and deep learning-based methods. It has shown that the same type of sentiment prediction models
491
tend to have similar error patterns and identified six main error types that cause FSA fail. However, this dataset is not
492
493 publicly released but is available on a request basis.
494
495 4.2.6 SentiEcon GS-1000 [122]. SentiEcon GS-1000 is a manually annotated gold standard dataset that contains 1,000
496 sentences extracted from the Esmeraldas Great Recession News Corpus. Two domain experts have classified each
497
sentence as positive, negative, or none. The annotators were instructed to consider the information available in the
498
sentences only for annotation. Annotation was carried out independently and a consensus was reached in differing
499
500 cases.
501
502 4.2.7 FinLin [29]. The FinLin5 corpus is released by Daudert in 2022 which aims to provide a novel and publicly
503 available dataset for FSA to complement the current knowledge and foster research on FSA [29]. It contains a total of
504 3,811 texts including 3,204 stocktwits, 394 news articles, 127 company reports, and 86 investor reports. The corpus is
505
annotated with a relevance score and a sentiment score in the range of [0.0, 1.0] and [−1.0, 1.0], respectively. Similarly,
506
507 this dataset is not publicly released but is available on a request basis.
508
509 "type": "SW", "text": "$GM hot diggity dog", "created_at": "2018-09-19T13:12:15z", "entity": "GM", "id": 137731223, "sentiment": 0.7444,
510 "relevance": 0.0989, "annotations": ["sentiment": -0.0121, "spans": ["text_span": "hot"], "relevance": 0.6963, "sentiment": 0.7733, "spans":
511
["text_span": "hot diggity dog"], "relevance": 0.1049, "sentiment": 0, "spans": ["text_span": " hot diggity dog"], "relevance": 0.1159]
512
513
4.2.8 SEntFiN 1.0 [157]. In an effort to address the problem of scant benchmark dataset for fine-grained FSA, a challeng-
514
515
ing task that requires extensive human efforts for annotation, [157] released SEntFiN 1.0 and made publicly available
516 to promote further research. SEntFiN is a human-annotated dataset that includes 10,753 news headlines with their
517 entity and corresponding sentiment. It is common that multiple entities are present in a news headline with different
518
519 5 https://github.com/TDaudert/FinLin
521 sentiment expressions and SEntFiN has 2,847 headlines that contain multiple entities, which may have conflicting
522
sentiment [157].
523
524
"S No.": 1, "Title": "SpiceJet to issue 6.4 crore warrants to promoters", "Decisions": [’SpiceJet’: ’neutral’], "Words": 8
525
526
527 4.2.9 Comparison of Benchmark Dataset. Table 1 shows that news data is the primary source for constructing benchmark
528
datasets, followed by microblogs. News data is widely used for constructing sentiment analysis data in various fields,
529
such as NewsMTSC for target-dependent sentiment classification on policy issues [64]. The quantity of entries varies
530
531 across datasets, with those annotated by polarity typically featuring a higher number of labeled entries. For instance,
532 in the case of the FiQA dataset, there are 498 entries derived from news and 675 from posts, presenting a potential
533
challenge for model training and generalization. It is also worth noting that datasets originating from news sources
534
adhere more to a formal English language structure. On the other hand, microblogs like tweets tend to feature more
535
536 informal expressions and greater potential for aspect ambiguity, such as ticker names. This introduces additional
537 complexities to FSA tasks. Furthermore, in terms of granularity, the fine-grained FSA dataset remains limited, with
538
FiQA being the current preference for aspect-based FSA. Additionally, it is important to highlight that the majority of
539
datasets annotate sentiment solely in terms of polarity, without capturing the intensity of sentiment.
540
541
542 4.3 Evaluation Metrics
543
544 4.3.1 Regression. The first group of metrics measures the closeness between the predicted value and ground truth in
545 the context of a regression task. The popular metrics include Weighted Cosine Similarity (WCS), Mean Squared Error
546
(MSE), and coefficient of determination or R-squared (𝑅 2 ).
547 Í𝑛
548 |𝑃 | 𝑖=1 (𝐺𝑖 × 𝑃𝑖 )
𝑊 𝐶𝑆 = × √︃ (1)
|𝐺 |
√︃
549 Í𝑛
(𝐺 2 ) × Í𝑛 (𝑃 2 )
𝑖=1 𝑖 𝑖=1 𝑖
550
551 where P is the vector of scores predicted by the model and G is the vector of ground truth scores.
552 𝑛
1 ∑︁
553 𝑀𝑆𝐸 = (𝑦𝑖 − 𝑦ˆ𝑖 ) 2 (2)
554 𝑛 𝑖=1
555
where 𝑦𝑖 is the gold standard score and 𝑦ˆ𝑖 is the score predicted by the model.
556
Í𝑛
557 2 (𝑦𝑖 − 𝑦ˆ𝑖 )
558
𝑅 = 1 − Í𝑖=1𝑛 (𝑦 − 𝑦¯ ) (3)
𝑖=1 𝑖 𝑖
559
560
where 𝑦𝑖 is the gold standard score and 𝑦ˆ𝑖 is the score predicted by the model.
561 𝑅 2 is a popular and intuitive measurement for how close the predictions fit the observations on a 0 to 1 scale. As a 𝑅 2
562 value closer to 1 signifies a good model, the model performance can be evaluated from 𝑅 2 even without comparing
563
with other models. However, 𝑅 2 fails to determine whether the predictions are biased. MSE is the most common metric,
564
565
which is often used in conjunction with 𝑅 2 , for regression tasks. It places more weight on large errors by squaring to
566 ensure that the trained model has no outlier predictions. The WCS metric is introduced by [25] to evaluate sentiment
567 scores on a continuous scale between -1 and 1. It enables the comparison of the proximity between the ground truth
568
vector and prediction vector, while not requiring exact correspondence between them for a given instance. The WCS
569
570
is derived by weighting the cosine similarity with the proportion of scored instances aiming to reward models that
571 attempt to predict all entries in the dataset.
572 Manuscript submitted to ACM
12 Du, Xing, Mao, and Cambria.
573 4.3.2 Classification. The second group of metrics measures the categorical accuracy between predicted value and
574
ground truth in the context of classification tasks. The popular metrics include Accuracy, Matthews Correlation
575
576
Coefficient (MCC), and F1-Score.
𝑇𝑃 +𝑇𝑁
577 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (4)
𝑇𝑃 + 𝐹𝑃 + 𝑇 𝑁 + 𝐹𝑁
578
𝑇𝑃 × 𝑇 𝑁 − 𝐹𝑃 × 𝐹𝑁
579 𝑀𝐶𝐶 = √︁ (5)
580 (𝑇 𝑃 + 𝐹 𝑃) × (𝑇 𝑃 + 𝐹 𝑁 ) × (𝑇 𝑁 + 𝐹 𝑃) × (𝑇 𝑁 + 𝐹 𝑁 )
581 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 1 − 𝑆𝑐𝑜𝑟𝑒 = 2 × (6)
582 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
583 Here, Precision = TP/(TP + FP) and Recall = TP/(TP + FN). True Positive (TP) is the count of positive classes that
584 are correctly predicted and True Negative (TN) is the count of negative classes that are correctly predicted. Similarly,
585
False Positive (FP) is the count of positive classes that are incorrectly predicted and False Negative (FN) is the count of
586
587 negative classes that are incorrectly predicted. Accuracy score can be calculated easily for both binary and multi-class
588 classification. However, it cannot be considered a reliable measure when the data is imbalanced as it makes the classifier
589 estimate over-optimistically on the majority class. F1-Score has addressed this issue and has been widely adopted in
590
most application areas of machine learning. However, [23] argues that the F1-Score is independent of true negative,
591
592 which is considered a conceptual flaw, and shows that MCC, which factors in TP, TN, FP, and FN together, can produce
593 a score for evaluation of binary classification that is more informative and truthful than accuracy and F1-score.
594
595
4.4 Methods
596
597 4.4.1 Lexicon Approaches. The lexicon-based method is to detect the semantic orientation of the text based on the
598 semantic orientation of the words in the text. Lexicon construction is a key element for sentiment analysis, which
599
could be accomplished in manual, semi-automatic, or automatic manner [133]. The manual approach requires intensive
600
601
efforts from creators with expert knowledge, which is slow but generally the accuracy is higher. On the other hand, the
602 automatic approach is fast and scalable but often results in sacrifice in accuracy to some extent. The biggest advantage
603 of the lexicon-based method is that no annotated dataset is required to perform FSA as it is unsupervised, which reduces
604
the need for arduous manual annotation of the texts. Meanwhile, lexicons are useful to create features for supervised
605
606
learning tasks. The challenge with lexicon-based approach is that it is time-consuming to build lexicons and also hard
607 to generalize [160]. Also, it only can detect explicit sentiment and usually is less accurate than the learning-based
608 method due to the constraint in coverage and quantification of sentiment intensity. More importantly, sentiment
609
analysis is sensitive to domain [169] and generic domain-independent lexicons are often ineffective in FSA [133]. A
610
611
general-purpose sentiment analysis lexicon may misclassify common words in financial texts [97]. For example, words
612 like “liability” and “debt” are considered negative in general-purpose sentiment analysis, but are frequent and often
613 neutral in the financial context. This makes it difficult to generalize the sentiment classifiers and underlines the need
614
for finance domain-specific sentiment analysis [97].
615
616
The construction of lexicons in the financial domain is scant [133] as compared to general-purpose lexicons. In the
617 context of FSA, there are six popular finance domain-specific lexicons as shown in Table 2, namely Henry’s Financial
618 Dictionary (HFD), Loughran and McDonald (LM) Word List, Stock Market Sentiment Lexicon (SMSL), SentiEcon,
619
Senti-DD, and FinSenticNet. HFD, which includes 104 positive words and 85 negative words, is the first dictionary that
620
621
was created specifically for the financial domain from earning press releases. It is used to measure the tone of earnings
622 press releases which is an important element of the firm-investor communication process [66]. The weakness of HFD is
623 its limited number of words which can result in low coverage. One prominent effort that advances the development of
624 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 13
677 types (1) linguistic features (e.g., n-grams, RF n-gram, verb, NER, and word cluster) (2) sentiment lexicon features (e.g.,
678
the proportion of positive and negative words, maximum, minimum and sum of sentiment score) (3) domain-specific
679
680
features such as number (e.g., + number, - number, + number %, - number %), keyword + number (e.g., call + number %,
681 put + number %), metadata (e.g., binary features such as source, user/official and entities/sentiment, value features and
682 other features) and punctuation, and (4) word embeddings. In machine learning approaches, feature selection is equally
683
important as noise features need to be identified and excluded. The popular feature selectors include Chi-squared,
684
685
ANOVA, and mutual information [160]. In terms of algorithms, Support Vector Machines (SVM) is one of the most
686 adopted algorithms, and other algorithms such as Bagging, Random Forest (RF), AdaBoost, Gradient Boosting (GB) and
687 XGBoost (XGB) are also among the popular algorithms to be selected. For example, [73] has elaborately designed all
688
above four types of features on SemEval 2017 Task 5 dataset, adopted a hill climbing algorithm to select the best features,
689
690
and explored seven algorithms as follows: Bagging, RF, AdaBoost, GB, LASSO, Support Vector Regression (SVR) and
691 XGB. An ensemble learning has been applied to different algorithms such as SVR + GB and SVR + XGB + AdaBoost +
692 Bagging and achieved promising results. [30] established a strong baseline with a traditional feature engineering-based
693
machine learning approach (MSE=0.0958) by treating aspect extraction as a classification task and sentiment detection
694
695
as a regression task using Support Vector Classifier (SVC) and SVR respectively. The features generated include n-gram,
696 tokenization, word replacement, and word embeddings using Term Frequency-Inverse Document Frequency (TF-IDF)
697 and Word2Vec. [5] has adopted linguistic features (uni-gram, bi-grams, and tri-grams) and semantic features (BabelNet
698
synsets and semantic frames). The features are selected by the word-score correlation metric proposed by [39] and
699
700
the sentiment regressor is trained using an SVR, which achieved an accuracy of 0.726 for microblogs and 0.655 for
701 news headlines. [88] proposed an association rule mining-based Hierarchical Sentiment Classifier (HSC) which adopts
702 the concept of financial and non-financial performance indicators, to classify financial texts into positive, neutral, or
703
negative polarity on the PhraseBank dataset. [152] has used ontology information as a source of features in the SVM
704
705
model on SemEval 2017 Task 5 dataset. The newly created domain ontology models various concepts from the financial
706 domain and has four classes, which are sentiment, entity, property, and action. [147] has adopted a feature-light method
707 that consists of an SVR with various kernels and word embedding vectors as features.
708
709
4.4.3 Deep Learning Approaches. Deep Learning models have achieved remarkable performance in FSA. It is able to
710
711
construct complicated representations from textual data with a high level of abstraction [160]. The most popular deep
712 learning algorithms used in FSA include Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
713 and their variants. For example, When target-aspect identification is jointly considered, [72] treated aspect extraction as a
714
multi-class classification problem, as this task does not involve multiple aspects for one target, and adopted bidirectional
715
716
Long Short-Term Memory (LSTM) to extract aspects using word embeddings such as GloVe, Google-News-Word2Vec,
717 Godin, FastText, and Keras in-built embedding layer, while a multi-channel CNN is used for sentiment analysis task with
718 enhanced vector combined from dependency tree, sentence word vector and snippet and target vector. The Bayesian
719
optimization is used for hyper-parameters tuning to find out the most optimal parameters which achieves an F1-Score
720
721
of 0.69 for aspect extraction and MSE of 0.112 for sentiment analysis. [139] has ensemble CNNs and RNNs with a
722 voting strategy and a ridge regression for aspect and sentiment prediction. [99] has proposed FSA with Hierarchical
723 Query-driven Attention (FISHQA) for financial polarity detection on the document level which outperforms benchmark
724
models including SVM + Bag-of-Words (BoW), SVM + BoW TF-IDF, CNN-word [82], Bi-LSTM [61], LSTM-GRNN [164]
725
726
and HAN [189] on a dataset which includes 7,648 documents annotated by three domain experts in the perspective
727 that whether the corresponding bonds of the companies mentioned in the document will encounter the risk of default
728 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 15
729 in the future. FISHQA has achieved an accuracy of 0.9446 and an F1-Score of 0.9449, followed by HAN, a model that
730
adopts hierarchical networks with an attention mechanism of random initialization, having an accuracy of 0.9177 and
731
732
an F1-Score of 0.9166.
733
4.4.4 Hybrid Approaches. Hybrid approaches refer to the ensemble of lexicon methods, machine learning models,
734
735 and deep learning models and often yield superior performance. [1] has proposed a method by ensembling LSTM,
736 CNN, Gated Recurrent Unit (GRU), and SVM. In terms of word embeddings for LSTM [67], CNN, and GRU, the word
737 representation is learned through the stacked denoising auto-encoder network [171] by using Word2Vec [119] and
738
GloVe [137]. In the SVR model, word TF-IDF, lexicon features and Vader sentiment [71] have been chosen as features. It
739
740 has achieved a cosine similarity of 0.797 for microblogs data and 0.786 for news headlines on the SemEval 2017 Task 5.
741 A hybrid of deep learning and lexicon-based technique that combines LSTM, CNN, Vector Averaging MLP and Feature
742 Driven MLP (e.g., character n-grams (TF-IDF weighted counts of a continuous sequence of N characters), word n-grams
743
(TF-IDF weighted counts of continuous sequence of N words), POS-tag, lexicons (e.g., positive count, negative count,
744
745 net count, sum of positive score, sum of negative score, maximum of positive and negative scores)), which is proposed
746 by [58] has achieved promising result (microblogs: Cosine=0.751, news headlines: Cosine=0.697) on the same dataset.
747 The highest score (Cosine=0.745) for SemEval-2017 Task 5 Track 2 news headlines is reported by [109], which combines
748
GloVe and DepecheMood to represent words and feed into CNN followed by global max-pooling, and the output is
749
750 then concatenated with Vader sentiment score for two levels of drop and fully-connected layers. [77] combined the
751 representation learned from CNN and Bidirectional GRU (Bi-GRU) with attention mechanism with manually engineered
752 lexical, sentiment, and metadata features and obtained WCS scores of 0.723 and 0.744 for the Microblogs and the News
753
Headline tracks in SemEval-2017 Task 5, respectively. Recently, MetaPro was proposed to improve FSA by understanding
754
755 metaphors in financial text [113]. The linguistic intuition is that metaphors frequently appear in financial news, causing
756 errors in sentiment analysis. MetaPro can paraphrase metaphors into their literal counterparts via data pre-processing,
757 so that a sentiment classifier can achieve better performance in downstream applications. MetaPro consists of a multitask
758
learning-based module for metaphor identification [112] and a WordNet[51]-based metaphor interpretation module
759
760 [114]. A novel soft-parameter sharing method, termed Gated Bridging Mechanism (GBM), and a knowledge-enhanced
761 masked word prediction technique are proposed. The average accuracy gain of three state-of-the-art sentiment analysis
762 classifiers is 4.7% on the SemEval 2017 Task 5 news headline dataset.
763
764 4.4.5 Pre-Trained Language Models. The emergence of pre-trained language models and transfer learning has brought
765
Natural Language Processing (NLP) research to a new era. This involves pre-training a neural network model on a
766
767
large corpus of text, and the pre-trained models, such as Bidirectional Encoder Representations from Transformers
768 (BERT) [36], are capable of capturing rich contextual information, enabling them to be adapted to various downstream
769 tasks. Fine-tuning is the subsequent step in transfer learning which involves taking the pre-trained model and further
770
training it on a task-specific dataset. This process fine-tunes the model’s parameters to suit the specific requirements
771
772
of the target application, thus enhancing its performance on the given task. Domain adaptation is a related concept
773 that addresses the challenge of applying a pre-trained model to a specific domain or application for which it was not
774 originally trained. This process involves adapting the model to perform effectively in the target domain, even if it differs
775
significantly from the domain on which the model was pre-trained. In the finance domain, domain-specific transformer-
776
777
based models have significantly enhanced the performance of various financial NLP tasks such as FinBERT [4, 96, 197]
778 for FSA and FinBERT-MRC [197] for financial named entity recognition. In terms of the adoption of pre-trained language
779 models in FSA, [186] reported a superior MSE of 0.08 using ULMFiT [68] on the FiQA Task 1. A more recent fine-tuned
780 Manuscript submitted to ACM
16 Du, Xing, Mao, and Cambria.
781 language model FinBERT [4], which is further pre-trained by TRC2-financial corpus6 , reported the best performance
782
(MSE=0.07, R2 =0.55) on the FiQA Task 1. Similarly, [96] trained FinBERT using two general domain corpus including
783
784
English Wikipedia and BooksCorpus, and three financial domain corpus including FinancialWeb, YahooFinance, and
785 RedditFinanceQA. It has achieved the state-of-the-art performance on financial sentence boundary detection on the
786 FinSBD English dataset with a mean score of 0.97 and FSA on the PhraseBank dataset with an accuracy of 0.94 and an
787
F1-Score of 0.93. More recently, [178] proposed a Semantic and Syntactic Enhanced Neural Model (SSENM), which
788
789
obtains input representation using BERT model and incorporates dependency graph and context words to supervise a
790 target representation. This novel model captures semantic contextual information through a self-attentive mechanism.
791 An edge-enhanced Graph Convolutional Network (E-GCN) is included to aggregate node-to-node features and a
792
Manifold Mixup strategy is also developed to generate pseudo data in training to address the over-fitting problem
793
794
potentially caused by limited data size. SSENM has significantly improved its performance with a WCS of 0.8441 for
795 news headline and 0.8333 for microblogs in SemEval-2017 Task 5 and an MSE of 0.0717 in FiQA Task 1. Finally, it is
796 important to highlight that while models primarily based on transformer encoder architectures, such as BERT, have
797
significantly enhanced FSA, autoregressive decoder architectures like GPT (Generative Pre-trained Transformer) [142],
798
799
have also demonstrated promise in FSA [50, 69, 194]. Bloomberg has introduced BloombergGPT [176], a Large Language
800 Model in finance that outperforms similarly sized open models on financial NLP tasks including such as sentiment
801 analysis, named entity recognition, news classification, question answering, etc.
802
803 4.4.6 Word Representation Techniques. Word representation is always an important component of sentiment analysis.
804
This section aims to introduce both classic and state-of-the-art word representation techniques which can be used for
805
806
FSA. In general, the word representation techniques could be grouped as classical models and representation learning
807 models [126]. Classical models include categorical word representation (e.g., BoW and one hot encoding). This is the
808 most straightforward method for text representation but they cannot capture positional and structural information as
809
well as semantic relationships between words [72, 95]. Another classic model is weighted word representation which
810
811
includes TF-IDF. Techniques such as TF-IDF can reduce the impact of common words but it is still built on the concept
812 of the BOW model, which fails to capture the sequence of words in a document, semantics, and syntactical information
813 of words [126]. To address the shortcomings of classic models, researchers study the methods to learn the distributed
814
word representation in low-dimensional space. Overall representation learning can be categorized into contextual and
815
816
non-contextual word representations. For non-contextual embeddings, the most popular model is Word2Vec, which
817 is developed by [119] and can capture the semantics of words and manipulate the connectivity of words including
818 sentimental similarity among words [160]. The problem with Word2vec is it only focuses on local context window
819
knowledge but ignores global statistical information. Global Vectors (GloVe) [108] is thus presented. Meanwhile, FastText,
820
821
which has decreased the training time but maintained the performance, is introduced by [12]. While the semantic
822 and syntactic information of the text can be retained in non-contextual word representation models, there remains
823 the problem of how the full context-specific representation can be kept. Therefore, contextual word representation
824
techniques, such as generic context word representation (Context2Vec) [117], contextualized word vectors (CoVe) [116],
825
826
embedding from Language Models (ELMo) [138], universal language model fine-tuning (ULMFiT) [68] and transformer-
827 based pre-trained language models, are proposed to resolve this problem. It has been proven that transformer-based
828 pre-trained language models are more efficient than LSTM or CNN models for language representation. Models such as
829
GPT [143], XLNet [188], BERT [36] and Robustly optimized BERT approach (RoBERTa) [94] are from the transformer
830
831 6 https://trec.nist.gov/data/reuters/reuters.html
833 family. GPT and XLNet are decoder-only architecture while BERT and RoBERTa are encoder-only architecture model.
834
One of the most popular models for sentence embeddings is Sentence-BERT [144], which is a modification of the
835
836
pre-trained BERT network which use siamese and triplet network structures, enabling it to effectively capture the
837 semantic significance of sentence embeddings.
838
839
840
4.4.7 Summary of FSA Techniques. We summarize the characteristics of different technical trends in Tables 3. The
841
842 evolution of FSA techniques has been marked by a progression from lexicon-based methods, conventional machine
843 learning, and hybrid approaches to more advanced techniques involving deep learning and language models. It is
844 observed that most of the research in FSA techniques has adopted the long-established PhraseBank, SemEval 2017 Task
845
5, and FiQA Task 1 as the benchmark datasets. First, the creation of a financial lexicon is still attracting researcher’s
846
847 attention although the lexicon approach is more often used in combination with learning-based methods in FSA. One
848 trend in the development of financial lexicons is that it is shifting from single-word to multiple-word and direction-
849 dependent expressions. This is particularly important in the finance domain as the sentiment of a financial term can be
850
opposite for different directional words. Meanwhile, manual creation is still the main approach to building financial
851
852 lexicons which requires intensive efforts from creators with expert knowledge but generally has higher accuracy.
853 However, there is research [42, 133] which is pushing lexicon construction from manual to automatic approaches that
854 enable us to address the slow process and low coverage issues in manual approach and build lexicons with increased
855
speed and coverage. Second, in traditional machine learning methods, feature engineering is an important step and
856
857 there are generally four types of features namely lexicon features, linguistic features, domain-specific features, and
858 word embeddings. One type of feature that is less frequently investigated but is demonstrated effective in FSA is the
859 domain-specific features such as numbers and keywords + numbers, especially when plenty of numbers are mentioned
860
in the texts. In the financial context, for instance, a keyword of “revenue” followed by a positive symbol “+” and
861
862 percentage can be a positive sign. Lastly, deep learning methods, represented by CNN and LSTM, and pre-trained
863 language models are the mainstream techniques in FSA that have improved the performance significantly. Notably, the
864 finance domain-specific BERT, namely FinBERT, is trained by using various data sources such as Reuters Corpora, Yahoo
865
Finance, Raddit Finance, corporate reports, earnings call transcripts and analyst reports in different studies and pushed
866
867 the boundary of research in FSA techniques. The most recent study by [41] has achieved state-of-the-art performance
868 on SemEval 2017 Task 5 and FiQA Task 1 by incorporating multiple knowledge sources into the fine-tuning process of
869 language models including RoBERTa.
870
871
872
873 5 FSA APPLICATIONS
874
875 FSA has been widely used in financial applications since the asymmetric and affective impact of news on market
876 volatility has been discovered [45]. The application of FSA is mainly contextualized for two broader analytical purposes,
877 i.e., hypothesis testing and predictive modeling in financial markets. Unlike the annotated benchmark datasets for
878
FSA techniques, which aim to develop an accurate sentiment analysis model from a FSA technique standpoint, the
879
880 data sources for FSA applications hold significant importance which offer supplementary information crucial for
881 calibrating financial sentiment to their specific application scenario. We first review data sources for FSA applications
882 and investigate how the techniques are adjusted for various application types.
883
884 Manuscript submitted to ACM
18 Du, Xing, Mao, and Cambria.
Ref. Dataset Task Method Feature/Lexicon Algorithm Evaluation Metrics and Performance
885
[136] PhraseBank Sentence-level Lexicon LM and Senti-DD Lexicon Precision: 0.8238, Recall: 0.8128, F1-Score: 0.8105
886 [42] PhraseBank Sentence-level Lexicon FinSenticNet Lexicon Accuracy: 0.7619, F1-Score: 0.7216
[88] PhraseBank Sentence-level Machine Learning Performance indicator Classification based on Precision: Pos: 0.83, Neg: 0.93, Neut: 0.86, Recall:
887 tags Multiple Association Rules Pos: 0.82, Neg: 0.93, Neut: 0.81, F1-Score: Pos: 0.83,
(CMAR) Neg: 0.93, Neut: 0.83
888 [96] PhraseBank Sentence-level Language Model FinBERT FinBERT Accuracy: 0.94, F1-Score: 0.93
[73] SemEval 2017 Task 5 Targeted, Sentence-level Machine Learning Linguistic, sentiment lex- AdaBoost, Bagging, Ran- WCS (News: 0.7779, Microblogs: 0.7107)
889 icon, domain-specific fea- dom Forest, Gradient
tures and word embed- Boosting, LASSO, SVM
890 dings and XGBoost
891 [40] SemEval 2017 Task 5 Targeted, Sentence-level Machine Learning Lexical features, semantic Linear regression with WCS (News: 0.655, Microblogs: 0.726)
features SGD, Lass with SGD,
892 Ridge regression with
SGD, SVR and RF
893 [152] SemEval 2017 Task 5 Targeted, Sentence-level Machine Learning Ontology based features SVM WCS (News and Microblogs: 0.7050)
[147] SemEval 2017 Task 5 Targeted, Sentence-level Machine Learning Word embeddings SVM WCS (News: 0.733)
894 [1] SemEval 2017 Task 5 Targeted, Sentence-level Hybrid Approach Lexicon features and word SVM, GRU, LSTM, CNN WCS (News: 0.786, Microblogs: 0.797)
embeddings
895 [58] SemEval 2017 Task 5 Targeted, Sentence-level Hybrid Approach Word embeddings using LSTM, CNN, Vector Av- WCS (News: 0.697, Microblogs: 0.751)
Word2Vec and GloVe, eraging MLP and feature
896 word n-grams, character driven MLP
n-grams, POS-tag, lexi-
897
cons, pointwise mutual
898 information
[77] SemEval 2017 Task 5 Targeted, Sentence-level Hybrid Approach Word embeddings, lexical, CNN, Bidirectional GRU WCS (News: 0.744)
899 sentiment and metadata (Bi-GRU)
features
900 [109] SemEval 2017 Task 5 Targeted, Sentence-level Hybrid Approach GloVe and DepecheMood CNN, MLP WCS (News: 0.745)
word embeddings, VADER
901 lexicon
[113] SemEval 2017 Task 5 Sentence-level Hybrid Approach RoBERTa WordNet, RoBERTa, Trans- Avg. F1-Score: +0.040, Avg. accuracy: +0.047
902 former, masked word pre-
diction, GBM
903
[178] SemEval 2017 Task 5 Targeted, Sentence-level Language Model BERT representation Semantic and Syntactic WCS (News: 0.8441, Microblogs: 0.8333)
904 Enhanced Neural Model
(SSENM)
905 [41] SemEval 2017 Task 5 Targeted, Sentence-level Language Model RoBERTa RoBERTa WCS (News: 0.8483, Microblogs: 0.8122)
[30] FiQA Task 1 Targeted, Aspect-level Machine Learning Linguistic and word em- SVM Aspect Extraction: F1-Score (News: 0.4240, Post:
906 beddings 0.5775), Sentiment Analysis: MSE (News: 0.1052,
Microblogs: 0.1281)
907 [72] FiQA Task 1 Targeted, Aspect-level Deep Learning GloVe, Google-News- CNN, LSTM Aspect Extraction: F1-Score (0.69), Sentiment
Word2Vec, Godin, Fast- Analysis: MSE (0.112)
908 Text, and Keras in-built
embedding layer
909
[139] FiQA Task 1 Targeted, Aspect-level Deep Learning Pre-trained word embed- Ridge, Random Forest, Aspect Extraction: F1-Score (0.6530), Sentiment
910 dings using CNN CNN, GRU/Bi-GRU and Analysis: MSE (0.0926)
Bi-LSTM
911 [178] FiQA Task 1 Targeted, Aspect-level Language Model BERT representation Semantic and Syntactic Sentiment Analysis: MSE (0.0717), R2 (0.4878)
Enhanced Neural Model
912 (SSENM)
[4] FiQA Task 1 Targeted, Aspect-level Language Model FinBERT FinBERT Sentiment Analysis: MSE (0.07)
913 [41] FiQA Task 1 Targeted, Aspect-level Language Model RoBERTa RoBERTa MSE: 0.0490
[157] SEntFiN Targeted, Sentence-level Language Model Pre-trained sentence repre- RoBERTa, FinBERT Accuracy: 0.9429, F1-Score: 0.9327
914 sentation
[42] SEntFiN Targeted, Sentence-level Lexicon FinSenticNet Lexicon Accuracy: 0.5920, F1-Score: 0.5939
915
[99] 7648 documents (non-neg: Document-level Deep Learning Word embeddings Hierarchical Query-driven Accuracy: 0.9446, F1-Score: 0.9449
916 3681, neg: 3957) Attention (FISHQA) using
GRU
917 [198] CCF BDCI, CCKS Sentence-level Language Model RoBERTa RoBERTa Accuracy: 96.774%
989 market price movement and concluded that sentiment attitude does not seem to Granger-cause stock price changes
990
while emotion does Granger-cause stock price changes on specific occasions. The addition of sentiment emotions has
991
992
increased the model accuracy for certain stock price prediction. The sentiment attitude and stock price time series are
993 verified to be stationary by analyzing the autocorrelation and partial autocorrelation and performing the augmented
994 Dickey-Fuller or the Ljung-Box 𝑡-statistic tests. [158] developed and applied an active learning approach to perform
995
sentiment analysis of tweet streams in the equity market. The Granger causality test demonstrates that sentiment
996
997
derived from stock-related tweets can serve as indicators of stock price movements a few days in advance. The results
998 are improved further by adopting the SVM classifier to categorize tweets into three sentiment polarities (i.e., positive,
999 negative, and neutral) instead of two polarities only (i.e., positive and negative) [158]. [31] find that Twitter activity
1000
is related to market participants’ interest in and attention to 8-K filings, which in turn affect stock price and volume
1001
1002
reactions to 8-K filings. The results also show that positive abnormal sentiment is not significantly associated with stock
1003 price reactions but is significantly negatively associated with stock volume reactions. [62] examined the correlation
1004 between four distinct investor emotions (fear, gloom, joy, stress) and S&P 500 index returns using the Threshold
1005
Generalized Auto-regressive Conditional Heteroskedasticity (TGARCH) model and discovered that fear emotion has a
1006
1007
significant and lasting impact on conditional volatility and market returns. It is also found that the abnormal returns
1008 associated to emotion experience rapid reversal within 5 days. [26] investigates the effect of investor sentiment on stock
1009 price crash risk, which is measured by the Negative Coefficient of Skewness (NCSKEW) and the Down-to-Up Volatility
1010
(DUVOL) of the weekly stock return, by testing their main hypothesis by performing OLS analysis, and concludes that
1011
1012
the risk of collapse is impacted by investor sentiment.
1013
1014 5.2.2 Stock Market Prediction. The stock market prediction is a challenging task due to its inherently noisy and volatile
1015 characteristics. Most earlier research in market prediction uses historical stock trading data, the technical indicators of
1016
stock trading, and macroeconomic variables as input data. The inclusion of financial textual data and application of NLP
1017
1018
techniques in financial forecasting is an emerging research field [101, 124, 182, 183]. Traditional financial news providers
1019 such as Bloomberg and Thomson Reuters, have pioneered to provide commercial FSA service [124, 134]. Today, many
1020 investment banks and fund managers are exploiting financial sentiment to make better predictions of the financial
1021
market. Financial institutions such as Two Sigma and D. E. Shaw have included financial sentiment signals, in addition
1022
1023
to traditional structured transaction data, to improve their machine learning model for algorithmic trading [124].
1024 Practical traders agree that any results above 50% are value-added to their day-to-day trading [128]. Conventionally
1025 there are two major schools of thought in stock market analysis: fundamental analysis and technical analysis [128].
1026
Fundamental analysis is to evaluate stocks from their intrinsic value perspective from economy, and industry conditions
1027
1028
to the financial strength of individual companies. Financial indicators such as earnings, expenses, assets, and liabilities
1029 are part of fundamental analysis. Technical analysis attempts to identify opportunities from statistical trends in the
1030 movements of stock’s price volume. Popular technical indicators include Simple Moving Average (SMA), Exponential
1031
Moving Average (EMA), and Moving Average Convergence/Divergence(MACD). Natural language-based financial
1032
1033
forecasting techniques, as suggested by [19], could be classified as technical analysis. Essentially, the intrinsic value
1034 does not change with the sentiment and indicators that measure market sentiment, such as the High-Low Index and
1035 Bullish Percent Index which are important indicators in technical analysis.
1036
Investor sentiment refers to the degree to which investors’ beliefs about future firm valuation deviates from fun-
1037
1038
damental information and existing studies show that investor sentiment has a significant impact on stock prices and
1039 market participants’ activities [118]. The studies in stock market prediction include stock index, stock price, stock
1040 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 21
Ref. Data Source Period Text Representation Markets Method Task Performance
1041
[38] Thomson Reuters and Oct-2006 to Nov-2013 Event Embedding US CNN Stock price prediction, Accuracy: 64.21%, MCC: 0.40 for index and
1042 Bloomberg S&P500 index prediction Accuracy: 65.48%, MCC: 0.41 for individual
stock
1043 [95] Thomson Reuters Oct-2011 to Jul-2017 Feature extraction from news US SVM, LSTM Stock price movement Accuracy: 55.44%, F1-Score: 0.7133
title using CNN and from event prediction
1044 tuple using TransE model
[32] News and comments 1-Jan-2006 to 15-Aug- Sentiment (objective, subjec- US SVM with Multiple Ker- Stock price change rate Sharp (MAE: 0.2368, MAPE: 1.3501, RMSE:
1045 from Engadget 2008 tive, negative, positive) using nel Learning prediction 0.3025), Panasonic (MAE: 0.2673, MAPE:
SentiWordNet. 1.3178, RMSE: 0.3435), Sony (MAE: 0.7001,
1046 MAPE: 1.4630, RMSE: 0.8865)
[179] Thomson Reuters Jan-2007 to Aug-2012 Frames, BOW, and part-of- US SVM Stock price movement MCC: ConsumerStaples (0.1550 for change
1047
speech specific DAL (FWD) prediction, Stock price and 0.1147 for movement), Information Tech-
1048 features and SemTree data rep- change prediction nology (0.1017 for change and 0.0801 for move-
resentations ment), Telecommunication Services (0.3049
1049 for change and 0.0770 for movement)
[185] Twitter 1-Jan-2014 to 1-Jan-2016 Learnt Embeddings US GRU, VAEs Stock price movement Accuracy: 0.5823, MCC: 0.080796
1050 prediction
[79] Earning calls 1-Jan-2010 to 31-Dec- Market, semantic (Doc2Vec, US Ridge Regression, Logis- Change in target price Regression (MSE: 0.00137, 𝑅 2 : 0.1718), Classi-
1051 2017 BoW) and pragmatic features tic Regression, LSTM, fication (Accuracy: 0.482, F1-Score: 0.475)
Ensemble
1052 [134] Twitter 22-Dec-2012 to 29-Oct- Sentiment indicators based on US MR, NN, SVM, RF, En- Return, trading volume Daily return (Lowest NMAE: NDQ, 7.58), Daily
2015 SMSL lexicon which combines semble and volatility trading volume (Lowest NMAE: DJIA, 5.84),
1053 with AAII, II, UMSC, Sentix Daily volatility (Lowest NMAE: DJIA, 2.79)
1054 [200] EDT Dataset PRNewswire: 1-Mar- High-level event detector US Multi-class Classifica- Stock price movement Trade at End Strategy:- Average Return: 1.74%,
2020 to 30-Apr-2021, incorporates entire article’s tion with MLM Loss prediction Exceed return of $84443 (844%) in 1-day trad-
1055 Businesswire: 16-Aug- representation and Low-level ing and Trade at Best Strategy:- Average Re-
2020 to 6-May-2021 model detects events, which turn: 9.11%
1056 results to discover events at
article-level
1057 [76] Stocktwits 4-Mar-2013 to 28-Feb- CNN as base model of senti- US Empirical Mode Decom- Stock price prediction Accuracy: 70.56%, MAPE: 1.65%, MAE: 2.39
2018 ment index position (EMD) with
1058 LSTM based model
[177] EastMoney.com 1-Jul-2017 to 30-Apr- Skip-gram China CNN and LSTM Stock price prediction MAE: 2.38, MSE: 7.27, RMSE: 2.69
1059 2020
[101] Hundsun Electronics 2-January-2018 to 18- Market-driven sentiment rep- China Pre-training, BiLSTM Stock price movement Accuracy: 0.6726, MCC: 0.3452
1060
June-2021 resentations and GCN prediction
1061 [63] Twitter 14-June-2017 to 30-Aug- Twitter Sentiment Score (TSS) UK Linear Regression Stock price prediction Accuracy: 67.22%
2017
1062 [175] Twitter Jan-2017 to Nov-2017 Daily tweet-level embeddings US Cross-modal attention Stock price movement Accuracy: 59.15%
based Hybrid Recurrent prediction
1063 Neural Network(CH-
RNN)
1064 [161] Twitter 14-June-2017 to 30-Aug- Tweet embeddings by self- US Attentive LSTM Stock price movement Accuracy (BIGDATA22: 54.81%, ACL18:
2017 supervised learning prediction 58.72%, CIKM18: 55.86%), MCC (BIGDATA22:
1065 0.0952, ACL18: 0.2065, CIKM18: 0.0899)
[149] Twitter 1-Jan-2014 to 1-Jan-2016 Universal Sentence Encoders US Graph Attention Net- Stock price movement F1-Score: 0.605, Accuracy: 0.608, MCC: 0.195
1066
(USE) work prediction
1067
Table 5. Stock Market Prediction
1068
1069
1070
1071 price movement, return rate, and volatility using time series models (e.g., ARIMA and GARCH), machine learning,
1072
deep learning, and reinforcement learning approaches. [38] has proposed a novel neural tensor network to learn event
1073
1074
embeddings, and a deep CNN to model the combined effects of long-term and short-term events for event-driven stock
1075 price movement prediction on the S&P 500 index and its individual stocks. Accuracy and MCC are used to measure
1076 the model performance and a simulation is performed to evaluate the profitability of the proposed model, which has
1077
demonstrated that the deep learning method is effective in event-driven market prediction. [95] has extracted features
1078
1079
from news title via CNN and from event tuple (an event can be defined as a tuple (Agent, Predicate, Object) e.g., Apple
1080 sues Samsung where A is Apple, P is sue and O is Samsung.) using knowledge graph embedding (i.e., TransE model)
1081 and combined with daily trading and technical analysis data. This approach is evaluated using an SVM model as
1082
a machine learning method and an LSTM model as a deep learning method to predict stock price movement. The
1083
1084
best-performed model is achieved through joint learning of event tuples and text, which has solved the text sparsity
1085 problem in feature extraction. [32] has combined technical indicators with sentiment information to predict future
1086 prices using regression models and demonstrated that combining technical indicators and sentiment indicators has
1087
produced better prediction than using one of them only. The sentiment information is used explicitly where sentiment
1088
1089
(i.e., subjective, objective, negative, positive) for each news and comment is obtained using SentiWordNet, and the
1090 count of positive, negative, and objective texts for the target company is used as sentiment features. [179] proposed
1091 FWD features (Frame, bag-of-Words, part-of-speech DAL score) and SemTree data representations, and adopted SVM to
1092 Manuscript submitted to ACM
22 Du, Xing, Mao, and Cambria.
1093 predict stock price movement (polarity and change). [185] proposed StockNet, a deep generative model, which consists
1094
of Market Information Encoder, Variational Movement Decoder, and Attentive Temporal Auxiliary, for stock market
1095
1096
prediction based on binary movement where a rise in stock price is denoted by one and a fall is by zero. A two-year
1097 Twitter data is selected which targets 88 stocks (8 stocks from the Conglomerates industry and the top 10 stocks in terms
1098 of capital size in Basic Materials, Consumer Goods, Financial, Healthcare, Industrial Goods and Technology, Utilities, and
1099
Services). The proposed model is evaluated by Accuracy and MCC and achieved state-of-the-art performance on a new
1100
1101
stock movement prediction dataset, which is also made publicly available7 . Wu et al. [175] presented novel cross-modal
1102 attention based hybrid recurrent neural network (CH-RNN), which is inspired by the DA-RNN model and also released
1103 the social text-driven stock prediction dataset built by aggregating stock prices from Yahoo Finance alongside relevant
1104
social media discourse, primarily from Twitter. Earnings calls is another data source used for financial forecasting.
1105
1106
Earlier research has shown that more information is disclosed in earnings calls [54] than company filings alone and
1107 they influence investor sentiment in the short term [14]. The most recent studies using earnings calls include [79],
1108 which used 12,285 earnings calls for the S&P 500 companies. It studies the pragmatic correlation with analysts’ pre-call
1109
judgment and predicts the changes in analysts’ post-call forecast as a regression problem and a 3-class classification
1110
1111
task by using Ridge Regression and Logistic Regression respectively. The models are evaluated by MSE and 𝑅 2 for the
1112 regression model and F1-Score and accuracy for the classification model. The results demonstrate that earnings calls
1113 are moderately predictive of changes in analysts’ target prices. Earning calls has shown predictive power of investment
1114
sentiment in the short term, increasing absolute returns [24].
1115
1116
Multiple studies have investigated the integration of company relationships into the prediction of stock market
1117 movements. Notably, [22] incorporated company relationships using Graph Convolutional Neural Networks, while [149]
1118 proposed a deep attentive learning approach for predicting stock movements based on information from social media
1119
texts and company correlations. Recently, [192] introduced Data-axis Transformer with Multi-Level contexts (DTML)
1120
1121
for stock movement prediction. DTML leverages temporal and global market context to learn dynamic correlations
1122 and outperforms existing approaches, resulting in a substantial annualized return in investment simulations. An-
1123 other initiative to advance stock market movement prediction involves self-supervised learning from sparsely noisy
1124
tweets [161] and a newly created dataset for stock market forecasting is also made publicly available 8 . The proposed
1125
1126
SLOT method employs self-supervised learning to generate shared embeddings for stocks and tweets, enabling accurate
1127 predictions for less popular stocks. Moreover, it exploits multi-level relationships between stocks inferred from tweets,
1128 thereby bolstering its robustness. The work of [101] came up with two important hypotheses: (1) market sentiment
1129
does not equal semantic sentiment; (2) the stock price of a target company is also impacted by its related company.
1130
1131
Thus, they proposed a multi-source aggregated classifier for stock price movement prediction. They first pre-trained a
1132 market-driven sentiment classifier to generate sentiment representations for news. Then, they proposed a classifier
1133 to predict the stock price movement for a target company, which aggregates the quantitative indicators and news
1134
sentiment of the target company and the news sentiment of its related companies, respectively. This model achieved
1135
1136
67.26% accuracy and 34.52% MCC averaged over 6 blue chip stocks in the Chinese market. The backtesting is also
1137 conducted to show the improvements over strong baselines in the Sharpe ratio.
1138 While many studies have combined textual sentiment with fundamental and technical indicators, another type of
1139
research is to combine various sources of sentiment with different frequencies for stock market prediction. Principal
1140
1141
Component Analysis (PCA) and Kalman Filter (KF) are two popular methods to combine sentiment indicators with
1142 7 https://github.com/yumoxu/stocknet-dataset
1143 8 https://github.com/deeptrade-public/slot
1145 different frequencies. [134] applied the KF procedure which is able to aggregate various sources of sentiment with
1146
distinct frequencies (e.g., daily, weekly, monthly) and generate a more representative and less noisy latent variable
1147
1148
as a newly created daily sentiment indicator. Specifically, the daily sentiment indicator from Twitter and weekly and
1149 monthly sentiment indicators from surveys (e.g., AAII, II, UMSC, Sentix) are extracted using the KF procedure. Multiple
1150 Regression, Neural Networks, SVM, RF, and Ensemble Averaging are used to perform predictions. Also, [134] highlighted
1151
that the common issues with evaluation in stock market prediction, which include that either out-of-sample data is
1152
1153
not used to evaluate the model performance or the test data size is too small, are addressed in the study. Meanwhile,
1154 it is limited to utilize statistical tests to evaluate the accuracy of prediction and this is resolved in [134] by applying
1155 Diebold-Mariano test in addition to the evaluation criteria of Normalized Mean Absolute Error (NMAE). Unlike earlier
1156
studies which used textual features to predict market movements, [200] proposed event-driven trading strategies to
1157
1158
predict stock market movements by detecting corporate events, which is considered as the driving force of market
1159 movements, from news articles. The bi-level event detection model is trained with masked-language model (MLM) loss.
1160 The authors employed two trading strategies to perform experimentation on the EDT dataset9 . The first strategy, Trade
1161
at End Strategy keeps the transactions already started on hold for 𝑘 days and closes on the last day. This strategy gave
1162
1163
an estimated 1.74% average return and exceeded the return with 844% in 1-day trading. The second strategy, Trade at
1164 Best Strategy completes the transaction within 𝑘 trading days, at the best price. This method resulted in an estimate of
1165 9.11% average returns, that exceeded all the sentiment-based models.
1166
The expected return could be investigated from the time series or cross-section perspective. A time series perspective
1167
1168
is how average returns change over time while a cross-section perspective is how average returns change across
1169 different stocks or portfolios. While the majority of the studies investigate the effects of investor sentiment from a time
1170 series perspective, [6] pioneered and demonstrated investor sentiment, broadly defined, affects the cross-section of stock
1171
returns significantly via a set of empirical results. When estimation of sentiment is high, stocks that are unattractive
1172
1173
to arbitrageurs and meanwhile attractive to optimists and speculators, tend to generate relatively low subsequent
1174 returns [6]. This includes distressed stocks, extreme growth stocks, high volatility stocks, non-dividend-paying stocks,
1175 small stocks, unprofitable stocks, and younger stocks.
1176
1177 5.2.3 Financial Risk Prediction. One essential indicator of instability and risk is financial volatility, which is a popular
1178
metric used in financial forecasting. Volatility is commonly defined as the standard deviation of a stock’s returns over a
1179
1180
pre-defined period of time. The volatility of return is defined as follows [172]:
√︄
1181
2
Í𝑠+𝜏
𝑡 =𝑠 (𝑟 𝑡 − 𝑟 )
1182 𝑣 [𝑠,𝑠+𝜏 ] = (7)
1183
𝜏
1184 where 𝑟𝑡 is the return and 𝑟 is mean of return.
1185
1186
1187
𝑟𝑡 = 𝑃𝑡 /𝑃𝑡 −1 − 1 (8)
1188 where 𝑃𝑡 is the adjusted close price.
1189
There are studies in risk prediction using various data sources such as financial reports [83, 130, 166, 172], financial
1190
1191
news [130, 166], message boards [129] and earning calls [174]. [172] investigates the significance of sentiment words
1192 on financial risk, by formulating a regression task to predict future real-value risk given sentiment and a ranking task
1193 to rank the risk levels, using financial reports from 1996 to 2006 i.e., Section 7: management’s discussion and analysis of
1194
1195 9 https://github.com/Zhihan1996/TradeTheEvent/tree/main/data
Ref. Data Source Period Text Representation Markets Method Task Performance
1197
[134] Twitter 22-Dec-2012 to 29-Oct- Sentiment indicators based on US MR, NN, SVM, RF, En- Return, trading volume Daily return (Lowest NMAE: NDQ, 7.58), Daily
1198 2015 SMSL lexicon which combines semble and volatility trading volume (Lowest NMAE: DJIA, 5.84),
with AAII, II, UMSC, Sentix Daily volatility (Lowest NMAE: DJIA, 2.79)
1199 [145] 10-K reports of compa- 2006 to 2015 Feature vector generated by US GARCH, SVM Volatility MSE: 0.111, 𝑅 2 : 0.527
nies from the U.S. SEC the weights of lexicons from
1200 LM and the word weighting
scheme includes TC, TF, TF-
1201 IDF and BM25
[174] Quarterly earning calls 2006 to 2013 Uni-grams, bi-grams, named US Linear Regression, Lin- Volatility Spearman (Pre-2009: 0.425, 2009: 0.422, Post-
1202 from the US stock mar- entities, part-of-speech tags ear SVM, Gaussian SVM, 2009: 0.375), Kendall: (Pre-2009: 0.315, 2009:
ket and probabilistic frame- Gaussian Copula Models 0.310, Post-2009: 0.282)
1203
semantic features
1204 [184] StockTwits 14-Aug-2017 to 22-Aug- Sentiment polarity score com- US SAVING, GARCH, Volatility Negative Log-Likelihood (NLL): -3.0642
2018 puted by augmented sentic EGARCH, TARCH, GJR,
1205 computing GP-vol, VRNN, NSVM,
LSTM, s+LSTM
1206 [172] 10-K Form, an annual 1996 to 2006 LM lexicon is chosen with six US SVR Volatility Regression (MSE: 0.14894), Ranking (Kendall’s
report required by the types of sentiment words i.e., Tau: 0.60458, Spearman’s Rho: 0.63403)
1207 Securities and Exchange positive, negative, uncertainty,
Commission (SEC) legal context, strong and weak
1208 confidence. The BoW model
is adopted and TF-IDF and
1209 LOG1P are selected as word
features to represent the 10-K
1210
reports
1211 [34] RavenPack, Twitter, 2019 Latent Dirichlet Allocation UK Logistic Regression Volatility Accuracy (Headlines: 65%, Tweets: 65%, Sto-
Thomson Reuters (LDA) model to extract feature ries: 67%), F1-Score (Headlines: 64%, Tweets:
1212 vectors 64%, Stories: 63%)
1249 sentiment to predict stock return fluctuation. The proposed model outperforms not only pure statistical models such
1250
as GARCH and its variants, which are commonly used econometric time series models for volatility prediction, and
1251
1252
Gaussian-process volatility model, but also the latest autoregressive deep neural network architectures e.g., neural
1253 stochastic volatility model and variational recurrent neural network. [34] proposed a market volatility classifier based
1254 on Latent Dirichlet Allocation (LDA) topic modeling. The paper suggests a strong negative correlation between positive
1255
tweets and next-day volatility by observing the relationship between financial news, tweets, and FTSE100. The study
1256
1257
also indicated the dependence of the model’s accuracy on a number of topics chosen.
1258
1259 5.2.4 Portfolio Management. One type of research that was less focused earlier [104] but emerged recently is to exploit
1260 the opinions posted by investors to invest in stock markets properly by optimizing portfolios. Existing approaches
1261
largely treat market prediction as a classification, regression, or ranking task but are not optimized for making profitable
1262
1263
investment decisions. However, the decision-making and trading strategies should be incorporated and also improve
1264 practical applicability [104]. To address this challenge, researchers started to investigate the adoption of financial
1265 sentiment for portfolio management. [86] applied a semi-supervised learning method to stock microblogs and proposed
1266
Follow-the-Loser online portfolio selection strategy. The proposed approach includes a user model, an emotion classifier
1267
1268
using MLP, and a portfolio selection strategy. The concept of online portfolio selection is to apply online learning in the
1269 machine learning literature to the portfolio selection problem, which aims to maximize the cumulative returns over
1270 sequential multiple periods of time [86]. This is different from offline or batch portfolio selection (e.g., Markowtitz’s
1271
Mean-Variance Theory), which balances the expected return and risk focusing on a single period of time. [167] firstly
1272
1273
predicts the quality of opinions followed by investment recommendations. The quality estimation for investment
1274 opinions adopted features associated with the author, content, and stocks in discussion, and opinions with the highest
1275 predicted qualities are selected as high-quality opinions. When generating portfolios, a score is generated for each stock
1276
by aggregating the sentiment about stocks in the opinions weighted by the predicted qualities of opinions. Experiments
1277
1278
are conducted on a real-world dataset and demonstrate the effectiveness of the proposed strategy in recommending
1279 high-quality opinions and profitable portfolios. [187] applied the Gaussian inverse reinforcement learning method in
1280 which the market dynamics is modeled as a Markov decision process and investor sentiment is regarded as a series of
1281
actions taken at different market states. The S&P500 index return, which is measured at 15 minutes internal, is used as
1282
1283
the market return and the sentiment from Thomson Reuters news is used as the proxy of investor sentiment toward the
1284 general U.S. market. It is often that markets do not react to noisy signals, which largely exist in investor sentiment signals.
1285 The investor sentiment reward-based trading system is designed to filter out noisy signals and extract only effective
1286
signals that generate either negative or positive market responses. The annualized performance including the mean
1287
1288
return, volatility, Sharpe ratio, and Sterling ratio are adopted to measure the model performance. [150] has proposed
1289 Policy for Return Optimization using FInancial news and online Text (PROFIT), which formulates the stock prediction
1290 into a reinforcement learning problem and leverages financial news and tweets to model stock-affecting signals and
1291
optimize trading decisions to increase profitability. The trading performance is evaluated by Sharpe ratio, Sortino ratio,
1292
1293
cumulative return, and maximum drawdown and compared with baselines spanning different formulations including
1294 regression, classification, and ranking. The proposed PROFIT system has outperformed benchmark systems significantly.
1295 [190] has proposed a novel and generic state-augmented RL framework called State Augmented Reinforcement Learning
1296
(SARL), which can integrate heterogeneous data sources into standard RL training pipelines for learning portfolio
1297
1298
management strategies. The portfolio performance is measured by portfolio value and Sharpe ratio. The Bitcoin [74]
1299 and HighTech [37] datasets are used for the experiments which have achieved significantly better portfolio value
1300 Manuscript submitted to ACM
26 Du, Xing, Mao, and Cambria.
1301 and Sharpe ratio than baseline models including equal weight and Deep Portfolio Management [74]. [43] proposed
1302
stock embedding, a vector representation of stocks in a financial market, which uses a neural network framework
1303
1304
consisting of text feature distiller and price movement classifier to acquire such vectors from stock price history and
1305 news articles. The stock embedding could be applied to other financial tasks such as portfolio optimization other
1306 than price prediction. The proposed method has outperformed baseline models in both price movement classification
1307
and portfolio optimization tasks. [104] has included public mood from online news and social media and selected RF,
1308
1309
Multi-Layer Perceptron, and LSTM to take a historical series of lagged data and public mood and generate the optimal
1310 portfolio allocation automatically. The proposed methodology outperforms the equal-weighted portfolio allocation
1311 strategy consistently and shows that it is always beneficial to the model performance to include the financial sentiment.
1312
[85] proposed a sentiment-aware deep reinforcement model for portfolio allocation on a daily basis. The sentiment
1313
1314
polarity is added using Valence Aware Dictionary and sEntiment Reasoner (VADER) [71]. The trading performance
1315 is evaluated by Sharpe ratio and annualised return which shows it is more robust than benchmarks across Sharpe
1316 ratio and annualised returns. [20] presented a sentiment-based RF model to generate a portfolio of Chinese stocks
1317
that can achieve higher returns. The proposed method shows the importance of choosing suitable methods for stock
1318
1319
characteristics and stock selection methods in a highly volatile market. This method generated higher returns than the
1320 Shanghai Stock Index. [151] proposed a modified LSTM, namely time-aware LSTM (t-LSTM) that learns the time-aware
1321 representations to produce a ranked list of predicted stock return ratios based on expected profit. The model captures
1322
the relevant market trends by using hierarchy-based temporal attention for ranking stocks. In intra-day situations, the
1323
1324
model outperformed the SOTA methods by over 8%, and by 10% in risk-adjusted returns.
1325
1326 5.2.5 FOREX Market Prediction. There has been a significant focus on forecasting stock market trends, while the
1327 foreign exchange market has received comparatively less attention in predictive efforts. Earlier studies had investigated
1328
the connection between macroeconomic fundamentals and exchange rates in the short run using Flexible Fourier Form
1329
1330
regression method using absolute returns as a measure of volatility [90]. The impact of macroeconomic announcements,
1331 which is collected from Bloomberg World Economic Calendar (e.g., GDP, interest rates, and consumer confidence
1332 indexes), on USD/EUR exchange rate volatility is estimated. The observations are divided into 5-minute intervals, totaling
1333
288 in 24 hours, from 28-Oct-2003 to 20-Jan-2004 and the results suggest that macroeconomic news significantly increases
1334
1335
the volatility of exchange rates immediately after the announcement. It also shows that the degree of significance
1336 varies by news category and country. Also, [46] concludes that macroeconomic news arrivals affect currency markets
1337 over time. The average news effects correspond to the direct channel for price impact which is absorbed immediately,
1338
but total news effects are not reflected quickly. [75] presented Forex-foreteller (FF), which utilizes news articles to
1339
1340
forecast the movement of foreign currency markets. FF combines language models, topic clustering, and sentiment
1341 analysis to identify relevant news articles which are used together with historical stock index and currency exchange
1342 values to build a linear regression model to perform forecasting and generate warning messages. [128] proposed a
1343
novel approach that adopted TF-IDF weighted features scaled by sentiment sum score using SentiWordNet to predict
1344
1345
intra-day directional-movements of currency-pair using SVM and demonstrated the existence of a promising predictive
1346 relationship between foreign exchange market and financial news. [153] proposed a FOREX market prediction system
1347 that performs sentiment analysis of news headlines by exploiting word sense disambiguation and predicts the directional
1348
movement of EUR/USD exchange rate and improved prediction accuracy. [154] presented a novel approach that includes
1349
1350
news story events in the economy calendar to predict intra-day directional movement of currency pairs using SVM,
1351 RF, and XGBoost algorithms and achieved promising results. [180] investigates the efficacy of high-frequency news
1352 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 27
Ref. Data Source Period Text Representation Markets Method Task Performance
1353
[86] Yahoo! JAPAN textream 1-Jan-2013 to 1-Jan-2016 Emotion polarity Japan MLP, reliability and Portfolio selection Cumulative return: 1.151
1354 cumulative reliability
score for each individual
1355 using sentiment polar-
ity, Follow-the-Winner
1356 and Follow-the-Loser
strategy
1357 [167] StockTwits 2014 Sentiment polarity provided in US Linear regression for Portfolio selection Cumulative return: 30%
tweets opinion quality score
1358 [187] Thomson Reuters 2-Jan-2008 to 31-Dec- News sentiment from Thom- US Gaussian inverse rein- Portfolio selection Return: 17.39%, Sharpe ratio: 0.85, Sterling Ra-
2015 son Reuters News forcement learning tio: 0.76 (about 3 times all the market bench-
1359 marks)
1360 [150] Twitter (English), fi- US: Jan-2014 to Dec- BERT Embeddings US, China, Deep Deterministic Portfolio selection Sharpe ratio (US: 1.03, China and Hong Kong:
nancial news headlines 2015, China and Hong Hong Kong Policy Gradient (DDPG) 1.29), Sortino Ratio (US: 1.87, China and Hong
1361 (Chinese) aggregated Kong: Jan-2015 to Dec- framework Kong: 1.99), Cumulative Return (US: 29.64,
by Wind.com.cn from 2015 China and Kong Kong: 40.88), Maximum
1362 major financial website Drawdown(US: 5.01, China and Hong Kong:
like Sina and Hexun for 6.78)
1363 China and Hong Kong
[190] Bitcoin of 10 different Bitcoin: 30-Jun-2015 to Auto Phrase, Glove, Word2Vec US State augmented rein- Portfolio selection Portfolio value is improved by 140.9% and
1364 cryptocurrencies, High- 30-Jun-2017, HighTech: and FastText Embeddings forcement learning 15.7% for Bitcoin and HighTech respectively as
Tech daily closing asset 20-Oct-2006 to 20-Nov- compared to the state-of-the-art RL algorithm
1365 price and financial news 2013 DPM, Sharpe ratio (Bitcoin: 14.78, HighTech:
7.73)
1366
[43] Wall Street Journal WSJ: Jan-2000 to Dec- Stock Embeddings US Deep learning Price movement predic- Price movement prediction (WSJ: Accuracy is
1367 (WSJ), Reuters & 2015, R&B: Oct-2006 to tion, portfolio selection 0.601 for 16 years, 0.550 for 3 years and 0.525
Bloomberg (R&B) Nov-2013 for 1 year, R&B: Accuracy is 0.688 for 7 years,
1368 0.675 for 3 years and 0.512 for 1 year), Portfolio
selection (An average of the realised annual
1369 gains increased to 17.2% and 35.5% for WSJ
and R&B respectively)
1370 [104] Financial and sentiment
24-Jan-2012 to 2-Jun- The number of positive, nega- US EW, LSTM, MLP, RF Portfolio selection Wealth of the portfolio (LSTM + Sentiment for
data for 15 different 2017 tive, and neutral comments, a five portfolios: 2.23, 2.72, 2.30, 2.81 and 1.65)
1371 stocks measure of change in positive
and negative comments com-
1372 pared with the previous days
(change) and a measure of pos-
1373
itive and neutral versus nega-
1374 tive reviews (sentiment score)
for each day and each stock
1375 [151] US: Twitter, China US: Jan-2014 to Dec- Time-aware representations of US, China, Time-aware LSTM (t- Portfolio selection US S&P 500: Return ratio: 1.34, Sharpe ratio:
and Hong Kong: 2015, China and Hong news and tweets Hong Kong LSTM) 0.96, China and Hong Kong: Return ratio: 1.44,
1376 Wind.com.cn Kong: Jan-2015 to Dec- Sharpe ratio:1.19
2015
1377 [85] Wharton Research Data 1-Jan-2001 to 2-Oct-2018 Sentiment polarity using US Deep reinforcement Portfolio selection Sharpe ratio: 2.07, Annualized return: 22%
Services VADER learning
1378 [20] RESSET database 1-Jan-2016 to 31-Jul- Lexical Model to give confi- China DT, LR, SVM, RF Portfolio selection Accuracy: 79.6%, Holding Period yield: 5.41
2018 dence index
1379
[59] Bloomberg Jan-1990 to Dec-2018 3D standardized features US RF, LSTM Portfolio selection Daily return (LSTM: 0.64%, RF: 0.54%)
1380 [162] Sina Guba and East- 2008 to 2018 Gubalex - stock sentiment lex- China Probabilistic Neural Net- Price movement predic- Accuracy: 86.3%
money Guba, RESSET icon work (PNN) tion, portfolio selection
1381
Table 7. Portfolio Management.
1382
1383
1384
1385 Ref. Data Source Period Text Representation Markets Method Task Performance
1386 [75] Bloomberg Apr-2010 to Mar-2013 LM lexicon to identify relevant FOREX Linear Regression Predict the change in Precision (Argentina: 0.18, Brazil: 0.28, Chile:
keywords. AFINN dictionary currency value and gen- 0.33, Colombia: 0.25), Recall (Argentina: 0.60,
1387 to measure general emotions erate warning messages Brazil: 0.63, Chile: 1, Colombia: 1)
[128] MarketWatch.com, 2008 to 2011 TF-IDF weighted features FOREX SVM FOREX price movement Accuracy: 0.8333, Precision (Pos: 0.6667, Neg:
1388 Google RSS reader API scaled by sentiment sum score prediction 0.8889), Recall (Pos: 0.6667, Neg: 0.8889)
using SentiWordNet
1389 [153] MarketWatch.com 2008 to 2012 WSD-Sentiment + TF-IDF, Sen- FOREX, SVM FOREX price movement Accuracy: 0.5926, Precision: 0.5710, Recall:
tiWordNet EUR/USD prediction 0.5735
1390 [154] Forexfactory.com, First 1-Oct-2015 to 31-Oct- TF, TF-IDF, LM, and FX dictio-FOREX, SVM, RF, XGB FOREX price movement Accuracy (EUR/USD: 0.638, GBP/USD: 0.663,
Word FX News 2017 naries EUR/USD, prediction USD/CHF: 0.631, and USD/JPY: 0.641)
1391 GBP/USD,
1392 USD/CHF,
USD/JPY
1393 [180] Dow Jones Newswire 1-Jan-2016 to 30-Jun- Sentiment generated from Fin- FOREX, SVM FOREX price movement Accuracy: 0.503, F1-Score: 0.538
2018 BERT EUR/USD prediction
1394
Table 8. FOREX Market Prediction.
1395
1396
1397
1398
1399 sentiment, which is represented by a 4-dimensional time series extracted by a FinBERT-based model, for FOREX
1400
market prediction without including other semantic features. Experimental results show that their model outperforms
1401
1402 benchmark approaches for sentiment analysis and conclude that news sentiment alone may have predictive power,
1403 though it is relatively weak, for FOREX price movements.
1404 Manuscript submitted to ACM
28 Du, Xing, Mao, and Cambria.
Ref. Data Source Period Text Representation Markets Method Task Performance
1405
[80] Google Trends, Twitter Dec-2018 to May-2019 Google trend rate, positive Cryptocurrency Hidden Markov Model Cryptocurrency market Accuracy: 54%, AUC: 52%
1406 sentiment rate, negative sen- movement prediction
timent rate
1407 [70] Sina-Weibo 2021 Cryptocurrency sentiment dic- Cryptocurrency Autoregression, LSTM Cryptocurrency market Precision: 87%, Recall: 92.5%
tionary movement prediction
1408 [148] Twitter 28-May-2021 to 25-Sep- Sentiment score from fine- Cryptocurrency LSTM Financial sentiment, Sentiment: Accuracy: 0.8352, F1-Score: 0.8515,
2021 tuned FinBERT BTC volume correlation BTC Volume: Pearson’s R: 0.1584
1409 [132] Twitter 01-Sep-2021 to 01-Nov- Sentiment score from VADER Cryptocurrency Granger causality test, Cryptocurrency price MAPE: 0.0038
2021 Vector Autoregression
1410 [87] Twitter 4-Jun-2018 to 4-Aug- VADER, LM lexicon and cryp- Cryptocurrency Granger causality test Cryptocurrency price re- Significant predictive power (p < 0.05) on Bit-
2018 tocurrency lexicon turn coin, Bitcoin Cash and Litecoin
1411
[155] Twitter 4-Sep-2014 to 31-Aug- Volume of tweets Cryptocurrency Granger causality test, Volume, return and real- Number of previous day tweets are significant
1412 2018 Vector Autoregression ized volatility drivers of Bitcoin volume and realized volatil-
ity, but not returns.
1413 [55] Twitter 24-Jun-2019 to 12-Aug- LM lexicon, stock market opin- Cryptocurrency Regression Return and volatility Bitcoin prices are partially predicted by mo-
2019 ion lexicon mentum on social media sentiment
1414
Table 9. Cryptocurrency Market Prediction
1415
1416
1417
1418 5.2.6 Cryptocurrency Market Prediction. Cryptocurrencies have experienced remarkable value growth, surpassing the
1419
most substantial historical bubbles of the past three centuries [2]. More studies have been conducted to understand
1420
1421 the dynamics of cryptocurrency market behavior over time since recently, with a primary focus on on causality and
1422 correlation tests [87, 132, 155]. The research demonstrated that investor sentiment holds significant nonlinear predictive
1423 power for the returns of major cryptocurrencies [125]. [80] proposed a hidden Markov Model to construct a transition
1424
matrix from Markov Chains on positive and negative sentiment, trading volume, and closing price to predict the upward
1425
1426 or downward market trend. This study also observed that the market tends to respond more to positive sentiments in a
1427 bearish market and responds more to negative sentiments in a bullish market. The study however only used Bitcoin
1428 for the study and other cryptocurrencies were not included in the dataset. [70] worked on analyzing the sentiment of
1429
Chinese social media and its effects on the cryptocurrency market. The study proposed LSTM-based RNN model to
1430
1431 predict the cryptocurrency price and it has achieved better precision and recall than baseline auto regressive models.
1432
1433
5.2.7 Explainable FSA Applications. The notion of explainability holds paramount importance in FSA applications where
1434 decisions can have significant consequences [16]. [191] has classified the explanation procedure into textual, visual, by
1435 example, simplification and feature relevance. The majority of studies in FSA applications emphasize explainability
1436
through visual, feature relevance, and simplification techniques. Specifically, in visual explanation, [33] employs
1437
1438
knowledge graphs to establish visual connections among event entities extracted from stock news articles. This method
1439 provides users with a graphical representation of the relationship between features and their corresponding predictions.
1440 Another approach, as presented in [89], involves conducting deconvolution on the penultimate layer preceding the
1441
output to generate a visual attentive map. This approach, named CLEAR (Class Enhanced Attentive Response), produces
1442
1443
a graphical representation indicating the timeframe during which the stock-picking agent exhibits the highest degree
1444 of attention, along with a separate plot corresponding to the sentiment class of the stock. Regarding feature relevance
1445 explainability, [17] adopts various configurations of a permutation importance technique to prune less significant
1446
technical indicators. Subsequently, decision tree techniques are implemented for stock market forecasting. The proposed
1447
1448
method was compared with LIME and exhibited greater reliability. In a similar vein, [135] employs aspect-based
1449 sentiment analysis to examine the correlation between stock price movement and the most pertinent aspects identified
1450 in tweets. The polarity of each aspect is derived using a SenticNet-based graph convolutional network (GCN) [91]. This
1451
approach mirrors the feature relevance technique, with its focus on discerning top-contributing aspects along with their
1452
1453
associated polarity values. Notably, this work emphasizes the interplay between financial variables rather than making
1454 direct financial predictions. This information serves as a foundation for further analysis, leveraging the relationship
1455 between the price movement of individual stocks and the sentiment associated with popular terms detected in tweets. In
1456 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 29
1457 the simplification procedure, [8] integrates sentiment analysis of text with technical analysis of historical stock prices to
1458
train a random forest stock forecasting model, which is further explained through LIME. Furthermore, [60] implements
1459
1460
LIME in conjunction with LSTM-CNN, accurately identifying pivotal words that align with the target sentiment. In
1461 reference to [193], text explanations are generated utilizing the advanced natural language generation transformer
1462 decoder, GPT-2 [143], with the added consideration of incorporating specific keywords within the generated text. The
1463
presented methodology, known as soft-constrained dynamic beam allocation (SC-DBA), involves the extraction of
1464
1465
keywords associated with different tiers of anticipated market volatility. This extraction process is facilitated by a
1466 distinct network designed for analyzing harvested news titles. The evaluation of the quantitative performance is based
1467 on assessing both the fluency and practical relevance of the generated explanation.
1468
1469 5.2.8 Summary of FSA Applications. We summarize different FSA applications in Tables 5, 6, 7, 8 and 9 by types of FSA
1470
applications. The most frequently used data sources in FSA applications are microblogs and news, followed by annual
1471
1472
reports filed by companies. The financial metrics predicted include stock index, stock price, stock price movement, stock
1473 price change, return rate, volatility, stock market crash risk, and FOREX rate change. As for methods for predictive
1474 modeling, similar to many other NLP tasks, deep learning has received more attention than traditional machine learning
1475
(e.g., SVM) and time series modeling (e.g., GARCH) in recent years with more promising performance contributing
1476
1477
to its capability to perform high-level abstraction from complex data, though is less explainable than feature-based
1478 machine learning. The variants of Recurrent Neural Networks (e.g., Attentive LSTM, DA-RNN), and advancements
1479 in Graph Convolution Networks (e.g., Graph Attention Networks) have demonstrated remarkable state-of-the-art
1480
performance. Meanwhile, Reinforcement Learning has started to apply to portfolio optimization tasks and we have
1481
1482
observed more work adopting it for intelligent asset allocation recently. In terms of markets, the US stock market is the
1483 most studied market and the tasks are moving beyond market movement prediction to risk prediction and portfolio
1484 management. Meanwhile, there has been a notable increase in research aimed at forecasting both the FOREX and
1485
cryptocurrency markets using textual data. Particularly, the exploration of the cryptocurrency market is still in its
1486
1487
infancy, primarily centered on causality and correlation tests. Early investigations reveal that the cryptocurrency market
1488 exhibits reduced predictability when compared to the stock market, a trait attributed to the heightened dynamism
1489 inherent in cryptocurrency markets. As for model performance, practical traders agree that any results above 50%
1490
accuracy are value-added to their day-to-day trading [128] and literature reviewed in our survey has demonstrated
1491
1492
that effectiveness. The evaluation metrics adopted in FSA techniques also apply to FSA applications depending on
1493 whether it is a regression (e.g., MSE) or classification task (e.g., F1-Score). Additionally, the trading simulation results
1494 are consistently adopted as a measurement of performance for portfolio management systems and popular metrics
1495
include accumulated return, average percentage gain per transaction, and Sharpe ratio.
1496
1497
1498
6 MAIN FINDINGS
1499 6.1 FSA Scopes and Sentiment Types
1500
1501 To address our first research question on what is the scope of FSA in today’s context, and what is the relationship among
1502 FSA, investor sentiment, and market sentiment, we surveyed the recent publications. FSA studies have evolved with the
1503
increase of financial textual data over the years particularly in public news and social media. In today’s context, the
1504
scope of FSA research has been extended to the field of study that not only analyzes people’s sentiments from financial
1505
1506 texts but also investigates the predictability of financial textual sentiment on the financial market. The objectives and
1507 scopes of FSA techniques are fundamentally different from FSA applications but there is also an interactive relationship
1508 Manuscript submitted to ACM
30 Du, Xing, Mao, and Cambria.
1509 between them. While FSA techniques aim to study the techniques that can improve the performance of various FSA
1510
tasks (e.g., targeted aspect-based FSA) with human-driven annotation, the main objective of FSA applications is to
1511
1512
exploit sentiment for financial applications, such as causality and correlation testing and financial forecasting with
1513 market-driven annotation computed from real-world market data. Here, financial sentiment serves as a proxy of investor
1514 sentiment, which affects the market dynamics. There is a complex connection exists in FSA, investor sentiment, and
1515
market sentiment. Market sentiment is the aggregated effect of investor sentiment and a reflection of investor sentiment
1516
1517
in their investment behaviors. Investor sentiment can be partially measured by financial textual sentiment, sentiment
1518 surveys, and market sentiment. This has established the theoretical foundation of using multiple data sources such as
1519 financial texts, sentiment survey, and market data to perform financial forecasting.
1520
1521
6.2 Trends in FSA Techniques
1522
1523 Our second research question is: What trends are emerging from the latest tasks, benchmark datasets, and methods
1524 in the newest FSA technique studies? We observed that the more benchmark datasets (e.g., SEntFiN in 2022) are
1525
annotated with substantial entries for fine-grained FSA tasks. Meanwhile, the creation of financial lexicons is extending
1526
1527
from conventional word-level to concept-level and direction-dependent expressions. As for methods, the feature
1528 engineering process has factored in domain-specific features such as numbers in finance. The deep learning and
1529 hybrid approach which ensembles lexicon, machine learning, and deep learning methods has shown promising model
1530
performance. Moreover, the pre-trained language presentation models such as BERT are able to capture general language
1531
1532
representation from large-scale corpora but lack domain-specific knowledge [93]. To improve the domain application of
1533 pre-trained language models, researchers have attempted to train domain-specific pre-trained language models such as
1534 FinBERT [4, 96] but it requires large domain-specific corpus (e.g., news, corporate reports, earnings call transcripts and
1535
analyst reports) and substantial computing resources. This has pushed the boundary of research in FSA techniques
1536
1537
and improved the model performance significantly. Finally, autoregressive decoder architectures such as GPT have
1538 shown potential in FSA. Bloomberg has unveiled BloombergGPT, a Large Language Model specialized in finance, which
1539 surpasses similar sized open models in financial NLP tasks, including FSA.
1540
1541
6.3 Trends in FSA Applications
1542
1543 Our third research question is: What data sources, tasks, methods, and financial markets can be used in FSA application-
1544 focused domains? FSA applications have gained increasing attention than FSA techniques in recent years largely
1545
attributed to the increase of various textual data sources and technologies. The main application of FSA is in causality
1546
1547
and correlation testing and predictive modeling in financial markets, or named natural language-based financial
1548 forecasting which is brought up by [183]. While causality and correlation testing were focused on by earlier studies, the
1549 adoption of financial textual data using NLP techniques to extract sentiment in financial forecasting is an emerging
1550
research field. We have identified six financial forecasting tasks including stock market movement prediction, stock
1551
1552
market risk prediction, portfolio management, FOREX market prediction, and cryptocurrency market prediction.
1553 Financial sentiment, which serves as a proxy for investor sentiment or non-informational trading, has demonstrated
1554 its effectiveness in financial forecasting, especially in the stock market. Sentiment from the three main sources (i.e.,
1555
corporation-released, media-expressed, and internet-posted texts) has been found to have important effects on market
1556
1557
movement. Particularly, negative sentiment has proved to be the strongest influence. Specifically, media-expressed
1558 texts (e.g., financial news) are the most commonly used data source followed by internet-posted texts (e.g., Twitter
1559 and StockTwits) and corporation-released texts (e.g., annual reports). The media-expressed texts have been applied to
1560 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 31
1561 all six financial forecasting tasks while corporation-released texts are more used for stock market risk prediction. A
1562
more recent trend is to combine different sources of sentiment for financial forecasting. Currently, the application of
1563
1564
FSA in financial markets focuses on the stock market, FOREX market, and cryptocurrency market, where the former
1565 market is the most commonly studied and the latter two markets are emerging. Many other financial markets such as
1566 commodities, money, derivatives, future, and insurance are not explored yet. It is also observed that FSA applications are
1567
shifting from market predictions to trading strategies such as portfolio management to improve practical applicability.
1568
1569
In terms of methods, deep learning and reinforcement learning with financial market prediction is regarded as one of
1570 the most charming topics. In every instance, the importance of explainability is paramount in FSA applications due to
1571 the profound consequences that decisions can entail.
1572
1573
1574
6.4 FSA and Financial Forecasting
1575 Our fourth research question is: How financial sentiment is involved in financial forecasting and the focus is on FSA
1576
techniques or applications? We found that financial sentiment can be represented in an explicit or implicit manner for
1577
1578 financial forecasting, with the latter (implicit) being more frequently adopted. The explicit use of financial sentiment
1579 is to derive sentiment polarity, intensity, or sentiment lexicons and use it for predictive modeling while the implicit
1580 use is to generate the features and text representation such as word or event embeddings. The approaches to generate
1581
features and word representations in FSA applications are similar to word representation techniques, summarized under
1582
1583 FSA techniques in Section 4.4.6. The implicit use of sentiment is more popular than explicit use and also can achieve
1584 better performance, largely attributed to the fact that this way carries more complete information than direct sentiment
1585 extraction. Notably, the FSA model demonstrates potential in the filtration of textual data, allowing for the retention of
1586
more pertinent information crucial for financial forecasting. Empirical studies also indicate that embeddings generated
1587
1588 from models trained on market sentiment exhibit superior performance in financial forecasting compared to those
1589 trained on semantic sentiment. A thorough investigation into the interplay between financial sentiment models and
1590 their influence on the performance of financial applications is a crucial area for further exploration. Meanwhile, the
1591
algorithms and evaluation metrics adopted in FSA techniques can also be applied to financial forecasting except for
1592
1593 complex application tasks such as portfolio management which requires reinforcement learning and trading simulations
1594 in many studies. Our view is that the application of financial sentiment in predictive modeling also can be regarded as
1595 an FSA task that uses market-driven annotation, as compared to the human-driven annotation in the research in FSA
1596
techniques.
1597
1598
1599 7 CHALLENGES AND FUTURE DIRECTIONS
1600
1601
7.1 FSA Techniques
1602 Firstly, as elaborated by [106], [4] and [96], there is a lack of high-quality and large-scale open-source finance domain-
1603
specific annotations for FSA. The main challenge is that the creation of FSA benchmark datasets is usually expensive
1604
1605
and requires expert knowledge [96, 106]. The research in fine-grained FSA has gained more attention after the release
1606 of the SemEval 2017 Task 5 and FiQA Task 1 datasets. In terms of data annotation, the nested target annotation schema,
1607 which is proposed by [100] and goes beyond the traditional target, aspect, and sentiment annotation, could open a
1608
new space for FSA. Secondly, lexical resources are limited and scattered. Since finance is a highly professional domain,
1609
1610
general-purpose sentiment lexicons usually fail to take into account the domain-specific connotations and the heavy
1611 reference to prior knowledge. For example, words like “liability” and “debt” are considered negative in general-purpose
1612 Manuscript submitted to ACM
32 Du, Xing, Mao, and Cambria.
1613 sentiment analysis, but are frequent and have neutral meanings in the financial context. This makes it difficult to
1614
generalize the sentiment classifiers and underlines the need for finance domain-specific sentiment analysis [97]. Further,
1615
1616
sentiment intensity scores are more consequential and nuanced for FSA compared to other domains. Whereas most of
1617 the current FSA studies still adopt a polarity detection fashion (i.e., classification to positive or negative). Further, it is
1618 important to improve the capability of generalization for BERT-based models. The successful application of FinBERT
1619
in FSA tasks is largely dependent on the corpora used to pre-train the language model. Presently, financial news,
1620
1621
annual reports, earning call transcripts, and analyst reports are adopted by variant FinBERT but microblog corpora
1622 have not been explored. Next, the incorporation of knowledge and adoption of GCN are demonstrated to be useful
1623 in sentiment analysis but few of the earlier research have attempted to incorporate knowledge in FSA, which could
1624
be a promising direction for research in FSA techniques. Finally, incorporating linguistic intuitions in deep learning
1625
1626
models [56, 65, 115] is another direction to improve the rationality of model design, because many deep learning-based
1627 methods focused on algorithm novelties and ignored linguistic intuitions. The future study could focus on improving
1628 the six areas that cause FSA fail, i.e., irrealis mood (conditional mood, subjunctive mood, imperative mood), rhetoric
1629
(negative assertion, personification, sarcasm), dependent opinion, unspecified aspects, unrecognized words (entity,
1630
1631
microtext, jargons), and external reference. Future research can explore the incorporation of multimodal data into FSA,
1632 such as text, images, audio, and video, to gain a more comprehensive understanding of financial sentiment. This could
1633 involve analyzing earnings call transcripts, financial news articles, social media posts, and even multimedia content.
1634
Also, the contextual FSA which is to develop methods that can understand and analyze the context in which financial
1635
1636
sentiments are expressed will be crucial. This could involve sentiment disambiguation, where sentiment is understood
1637 in relation to specific events, companies, or market conditions.
1638
1639 7.2 FSA Applications
1640
1641
One challenge with FSA applications is the lack of publicly released textual data sources which are in a time series
1642 with a substantial amount of financial texts and representative periods to model the relationships between investor
1643 sentiment and financial markets. Meanwhile, the attention of the FSA application is moving beyond the stock market to
1644
other financial markets and shifting from stock market prediction to trading strategies such as portfolio management
1645
1646
to improve practical applicability. The adoption of reinforcement learning could open a new avenue for portfolio
1647 management but it still remains relatively less explored. Further, earlier studies focus on news sentiment, the emotions
1648 derived from news could also influence investors’ behaviors in financial markets [44]. Lastly, the financial market is
1649
driven by different types of news which include macroeconomic factors, geopolitics, and company-specific factors [170],
1650
1651
which means the sentiment can be derived from different perspectives such as macroeconomics, microstructure factors,
1652 event-oriented, and company-specific [99]. From this perspective, the aspect-based FSA could be adopted to extract
1653 aspect-level features to forecast future market performance [186] which could also improve the model interpretability
1654
and explainability. The notion of explainability holds paramount importance in FSA applications where decisions can
1655
1656
entail significant consequences. Methods for generating human-understandable explanations for model predictions will
1657 become a focus. With the rise of cryptocurrencies and emerging markets, there will likely be a growing interest in
1658 sentiment analysis tailored to these specific assets and markets. FSA can continue to play a crucial role in assessing risk
1659
and managing portfolios. One potential area of future research is to develop models that can provide more accurate risk
1660
1661
assessments and aid in making more informed investment decisions. Lastly, future research could delve deeper into
1662 the intersection of behavioral economics and FSA, exploring the psychological factors that influence sentiment and
1663 decision-making in financial markets.
1664 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 33
1665 8 CONCLUSION
1666
1667
This survey conducted a comprehensive review of FSA research from both technique and application perspectives
1668 including their interactive relationship. The scope of FSA research has been redefined and the relationship among
1669 FSA, investor sentiment, and market sentiment. Specifically, through the review of FSA techniques, we have included
1670
the latest benchmark datasets, and the methods, which include lexicon, machine learning, deep learning and hybrid
1671
1672
approaches, pre-trained language models, and word representation techniques used for the FSA study. As for FSA
1673 applications, we summarized that the main FSA application in financial markets is hypothesis testing and predictive
1674 modeling. Predictive modeling has received more attention in recent years, particularly in the stock market and FOREX
1675
market. In terms of tasks in predictive modeling, the application in the stock market has moved beyond traditional
1676
1677
market movement prediction to market risk prediction and portfolio management. Market prediction typically is treated
1678 as a classification or regression problem, however, the decision-making and trading strategies are incorporated into
1679 portfolio management which has improved practical applicability. The study in the FOREX market is less than the
1680
stock market but has become an emerging field. In terms of methods for predictive modeling, machine learning, deep
1681
1682
learning, and reinforcement learning have become the three mainstream approaches.
1683
1684 ACKNOWLEDGMENTS
1685
1686
The authors would like to thank Sunisth Kumar, who has helped collecting and reviewing some papers about FSA
1687 applications under the NTU-India Connect Programme.
1688
1689
REFERENCES
1690
[1] Md Shad Akhtar, Abhishek Kumar, Deepanway Ghosal, Asif Ekbal, and Pushpak Bhattacharyya. 2017. A multilayer perceptron based ensemble
1691
technique for fine-grained financial sentiment analysis. In Proceedings of EMNLP. 540–546.
1692
[2] Khamis Hamed Al-Yahyaee, Mobeen Ur Rehman, Walid Mensi, and Idries Mohammad Wanas Al-Jarrah. 2019. Can uncertainty indices predict
1693 Bitcoin prices? A revisited analysis using partial and multivariate wavelet approaches. The North American Journal of Economics and Finance 49
1694 (2019), 47–56.
1695 [3] M Ángeles López-Cabarcos, Ada M Pérez-Pico, Maria Luisa López Perez, et al. 2020. Investor sentiment in the theoretical field of behavioural
1696 finance. Economic research-Ekonomska istraživanja 33, 1 (2020), 2101–2228.
1697 [4] Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063 (2019).
1698 [5] Mattia Atzeni, Amna Dridi, and Diego Reforgiato Recupero. 2017. Fine-grained sentiment analysis on financial microblogs and news headlines. In
1699
Semantic Web Evaluation Challenge. Springer, 124–128.
[6] Malcolm Baker and Jeffrey Wurgler. 2006. Investor sentiment and the cross-section of stock returns. The Journal of Finance 61, 4 (2006), 1645–1680.
1700
[7] Malcolm Baker and Jeffrey Wurgler. 2007. Investor sentiment in the stock market. Journal of economic perspectives 21, 2 (2007), 129–152.
1701
[8] Harit Bandi, Suyash Joshi, Siddhant Bhagat, and Dayanand Ambawade. 2021. Integrated technical and sentiment analysis tool for market index
1702
movement prediction, comprehensible using xai. In 2021 ICCICT. IEEE, 1–8.
1703 [9] Arindam Bandopadhyaya and Anne Leah Jones. 2006. Measuring investor sentiment in equity markets. Journal of Asset Management 7, 3 (2006),
1704 208–215.
1705 [10] Nicholas Barberis, Andrei Shleifer, and Robert Vishny. 1998. A model of investor sentiment. Journal of Financial Economics 49, 3 (1998), 307–343.
1706 [11] Anna Blajer-Golebiewska, Dagmara Wach, and Maciej Kos. 2018. Financial risk information avoidance. Economic Research-Ekonomska Istraživanja
1707 31, 1 (2018), 521–536.
1708 [12] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of
1709
the Association for Computational Linguistics 5 (2017), 135–146.
[13] Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market. Journal of Computational Science 2, 1 (2011), 1–8.
1710
[14] Robert M Bowen, Angela K Davis, and Dawn A Matsumoto. 2002. Do conference calls affect analysts’ forecasts? The Accounting Review 77, 2
1711
(2002), 285–316.
1712
[15] Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, , and Kenneth Kwok. 2022. SenticNet 7: A Commonsense-based Neurosymbolic AI
1713 Framework for Explainable Sentiment Analysis. In LREC. 3829–3839.
1714 [16] Erik Cambria, Rui Mao, Melvin Chen, Zhaoxia Wang, and Seng-Beng Ho. 2023. Seven Pillars for the Future of Artificial Intelligence. IEEE Intelligent
1715 Systems 38, 6 (2023), 62–69.
1716 Manuscript submitted to ACM
34 Du, Xing, Mao, and Cambria.
1717 [17] Salvatore Carta, Alessandro Sebastian Podda, Diego Reforgiato Recupero, and Maria Madalina Stanciu. 2021. Explainable AI for financial forecasting.
1718 In International Conference on Machine Learning, Optimization, and Data Science. Springer, 51–69.
1719 [18] Samuel WK Chan and Mickey WC Chong. 2017. Sentiment analysis in financial texts. Decision Support Systems 94 (2017), 53–64.
1720 [19] Jaebin Jay Chang. 2020. Natural Language Processing as a Predictive Feature in Financial Forecasting. (2020).
1721 [20] Mingqin Chen, Zhenhua Zhang, Jiawen Shen, Zhijian Deng, Jiaxing He, and Shiting Huang. 2020. A Quantitative Investment Model Based on
1722
Random Forest and Sentiment Analysis. Journal of Physics: Conference Series 1575, 1 (2020), 012083. https://doi.org/10.1088/1742-6596/1575/1/012083
[21] Siyi Chen and Frank Xing. 2023. Understanding Emojis for Financial Sentiment Analysis. In Proceedings of the 44th International Conference on
1723
Information Systems (ICIS). 1–16.
1724
[22] Yingmei Chen, Zhongyu Wei, and Xuanjing Huang. 2018. Incorporating corporation relationship via graph convolutional neural networks for
1725
stock price prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1655–1658.
1726 [23] Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary
1727 classification evaluation. BMC genomics 21, 1 (2020), 1–13.
1728 [24] Lauren Cohen, Dong Lou, and Christopher J Malloy. 2020. Casting conference calls. Management Science 66, 11 (2020), 5015–5039.
1729 [25] Keith Cortis, André Freitas, Tobias Daudert, Manuela Huerlimann, Manel Zarrouk, Siegfried Handschuh, and Brian Davis. 2017. Semeval-2017 task
1730 5: Fine-grained sentiment analysis on financial microblogs and news. Association for Computational Linguistics (ACL).
1731 [26] Huijie Cui and Yanan Zhang. 2020. Does investor sentiment affect stock price crash risk? Applied Economics Letters 27, 7 (2020), 564–568.
1732
[27] Zhi Da, Joseph Engelberg, and Pengjie Gao. 2011. In search of attention. The Journal of Finance 66, 5 (2011), 1461–1499.
[28] Sanjiv R. Das and Mike Y. Chen. 2007. Yahoo! For Amazon: Sentiment Extraction from Small Talk on the Web. Management Science 53 (2007),
1733
1375–1388. Issue 9.
1734
[29] Tobias Daudert. 2022. A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus. Language Resources and Evaluation
1735
56, 1 (2022), 333–356.
1736 [30] Dayan de França Costa and Nadia Felix Felipe da Silva. 2018. INF-UFG at FiQA 2018 Task 1: predicting sentiments and aspects on financial tweets
1737 and news headlines. In Companion Proceedings of the The Web Conference 2018. 1967–1971.
1738 [31] Roger S Debreceny, Asheq Rahman, and Tawei Wang. 2021. Is User-Generated Twittersphere Activity Associated with Stock Market Reactions to
1739 8-K Filings? Journal of Information Systems 35, 2 (2021), 195–217.
1740 [32] Shangkun Deng, Takashi Mitsubuchi, Kei Shioda, Tatsuro Shimada, and Akito Sakurai. 2011. Combining technical analysis with sentiment analysis
1741 for stock price prediction. In IEEE DASC. 800–807.
1742
[33] Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan Chen, Jeff Z Pan, and Huajun Chen. 2019. Knowledge-driven stock trend prediction and
explanation via temporal convolutional network. In Companion Proceedings of The 2019 World Wide Web Conference. 678–685.
1743
[34] Justina Deveikyte, Hélyette Geman, Carlo Piccari, and Alessandro Provetti. 2020. A Sentiment Analysis Approach to the Prediction of Market
1744
Volatility. ArXiv abs/2012.05906 (2020).
1745
[35] Ann Devitt and Khurshid Ahmad. 2007. Sentiment polarity identification in financial news: A cohesion-based approach. In Proceedings of ACL.
1746 984–991.
1747 [36] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language
1748 understanding. In Proceedings of NAACL-HLT. 4171–4186.
1749 [37] Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2014. Using structured events to predict stock price movement: An empirical investigation. In
1750 Proceedings of EMNLP. 1415–1425.
1751 [38] Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. In IJCNN.
1752
[39] Adam Drake, Eric Ringger, and Dan Ventura. 2008. Sentiment regression: Using real-valued scores to summarize overall document sentiment. In
IEEE ICSC. 152–157.
1753
[40] Amna Dridi, Mattia Atzeni, and Diego Reforgiato Recupero. 2019. FineNews: fine-grained semantic sentiment analysis on financial microblogs and
1754
news. International Journal of Machine Learning and Cybernetics (2019), 1–9.
1755
[41] Kelvin Du, Frank Xing, and Erik Cambria. 2023. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment
1756 Analysis. ACM Transactions on Management Information Systems 14, 3 (2023), 1–24. https://doi.org/10.1145/3580480
1757 [42] Kelvin Du, Frank Xing, Rui Mao, and Erik Cambria. 2023. FinSenticNet: A Concept-Level Lexicon for Financial Sentiment Analysis. In IEEE SSCI.
1758 109–114.
1759 [43] Xin Du and Kumiko Tanaka-Ishii. 2020. Stock embeddings acquired from news articles and price history, and an application to portfolio optimization.
1760 In Proceedings of ACL. 3353–3363.
1761 [44] Darren Duxbury, Tommy Gärling, Amelie Gamble, and Vian Klass. 2020. How emotions influence behavior in financial markets: a conceptual
1762
analysis and emotion-based account of buy-sell preferences. The European Journal of Finance 26, 14 (2020), 1417–1438.
[45] Robert F Engle and Victor K Ng. 1993. Measuring and testing the impact of news on volatility. The Journal of Finance 48, 5 (1993), 1749–1778.
1763
[46] Martin DD Evans and Richard K Lyons. 2005. Do currency markets absorb news quickly? Journal of International Money and Finance 24, 2 (2005),
1764
197–217.
1765
[47] Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The Journal of Finance 25, 2 (1970), 383–417.
1766 [48] Eugene F Fama. 1991. Efficient capital markets: II. The journal of finance 46, 5 (1991), 1575–1617.
1767 [49] Weiguo Fan and Michael D Gordon. 2014. The power of social media analytics. Commun. ACM 57, 6 (2014), 74–81.
1768 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 35
1769 [50] Georgios Fatouros, John Soldatos, Kalliopi Kouroumali, Georgios Makridis, and Dimosthenis Kyriazis. 2023. Transforming Sentiment Analysis in
1770 the Financial Domain with ChatGPT. Machine Learning with Applications 14 (2023), 100508.
1771 [51] Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
1772 [52] Paul Ferguson, Neil O’Hare, Michael Davy, Adam Bermingham, Paraic Sheridan, Cathal Gurrin, and Alan F Smeaton. 2009. Exploring the use of
1773 paragraph-level annotations for sentiment analysis of financial blogs. (2009).
1774
[53] Kenneth L Fisher and Meir Statman. 2000. Investor sentiment and stock returns. Financial Analysts Journal 56, 2 (2000), 16–23.
[54] Richard Frankel, Marilyn Johnson, and Douglas J Skinner. 1999. An empirical examination of conference calls as a voluntary disclosure medium.
1775
Journal of Accounting Research 37, 1 (1999), 133–150.
1776
[55] Xiang Gao, Weige Huang, and Hua Wang. 2021. Financial Twitter Sentiment on Bitcoin Return and High-Frequency Volatility. Virtual Economics 4,
1777
1 (2021), 7–18.
1778 [56] Mengshi Ge, Rui Mao, and Erik Cambria. 2022. Explainable Metaphor Identification Inspired by Conceptual Metaphor Theory. In Proceedings of
1779 AAAI. 10681–10689.
1780 [57] Manoochehr Ghiassi, James Skinner, and David Zimbra. 2013. Twitter brand sentiment analysis: A hybrid system using n-gram analysis and
1781 dynamic artificial neural network. Expert Systems with Applications 40, 16 (2013), 6266–6282.
1782 [58] Deepanway Ghosal, Shobhit Bhatnagar, Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2017. IITP at SemEval-2017 task 5: an ensemble
1783 of deep learning and feature based models for financial sentiment analysis. In Proceedings of SemEval-2017. 899–903.
1784
[59] Pushpendu Ghosh, Ariel Neufeld, and Jajati Keshari Sahoo. 2022. Forecasting directional movements of stock prices for intraday trading using
LSTM and random forests. Finance Research Letters 46 (2022), 102280.
1785
[60] Shilpa Gite, Hrituja Khatavkar, Ketan Kotecha, Shilpi Srivastava, Priyam Maheshwari, and Neerav Pandey. 2021. Explainable stock prices prediction
1786
from financial news articles using sentiment analysis. PeerJ Computer Science 7 (2021), e340.
1787
[61] Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures.
1788 Neural networks 18, 5-6 (2005), 602–610.
1789 [62] John Griffith, Mohammad Najand, and Jiancheng Shen. 2020. Emotions in the stock market. Journal of Behavioral Finance 21, 1 (2020), 42–56.
1790 [63] Xinyi Guo and Jinfeng Li. 2019. A Novel Twitter Sentiment Analysis Model with Baseline Correlation for Financial Market Prediction with
1791 Improved Efficiency. SNAMS, 472–477.
1792 [64] Felix Hamborg, Karsten Donnay, and Bela Gipp. 2021. Towards target-dependent sentiment classification in news articles. In Diversity, Divergence,
1793 Dialogue: 16th International Conference, iConference 2021, Proceedings, Part II 16. Springer, 156–166.
1794
[65] Sooji Han, Rui Mao, and Erik Cambria. 2022. Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor
Concept Mappings. In Proceedings of COLING. 94–104.
1795
[66] Elaine Henry. 2008. Are investors influenced by how earnings press releases are written? The Journal of Business Communication (1973) 45, 4
1796
(2008), 363–407.
1797
[67] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
1798 [68] Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL. 328–339.
1799 [69] Nan Hu, Peng Liang, and Xu Yang. 2023. Whetting All Your Appetites for Financial Tasks with One Meal from GPT? A Comparison of GPT,
1800 FinBERT, and Dictionaries in Evaluating Sentiment Analysis. SSRN: https://ssrn.com/abstract=4426455.
1801 [70] Xin Huang, Wenbin Zhang, Yiyi Huang, Xuejiao Tang, Mingli Zhang, Jayachander Surbiryala, Vasileios Iosifidis, Zhen Liu, and Ji Zhang. 2021.
1802 LSTM Based Sentiment Analysis for Cryptocurrency Prediction. In DASFAA.
1803 [71] Clayton Hutto and Eric Gilbert. 2014. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of
1804
AAAI, Vol. 8.
[72] Hitkul Jangid, Shivangi Singhal, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Aspect-based financial sentiment analysis using deep learning. In
1805
Companion Proceedings of the The Web Conference 2018. 1961–1966.
1806
[73] Mengxiao Jiang, Man Lan, and Yuanbin Wu. 2017. Ecnu at semeval-2017 task 5: An ensemble of regression algorithms with effective features for
1807
fine-grained sentiment analysis in financial domain. In Proceedings of SemEval-2017. 888–893.
1808 [74] Zhengyao Jiang, Dixing Xu, and Jinjun Liang. 2017. A deep reinforcement learning framework for the financial portfolio management problem.
1809 arXiv preprint arXiv:1706.10059 (2017).
1810 [75] Fang Jin, Nathan Self, Parang Saraf, Patrick Butler, Wei Wang, and Naren Ramakrishnan. 2013. Forex-foreteller: Currency trend modeling using
1811 news articles. In Proceedings of ACM SIGKDD. 1470–1473.
1812 [76] Zhigang Jin, Yang Yang, and Yuhong Liu. 2020. Stock closing price prediction based on sentiment analysis and LSTM. Neural Computing and
1813 Applications 32 (07 2020).
1814
[77] Sudipta Kar, Suraj Maharjan, and Thamar Solorio. 2017. RiTUAL-UH at SemEval-2017 Task 5: Sentiment Analysis on Financial Data Using Neural
Networks. In Proceedings of SemEval-2017. 877–882.
1815
[78] Colm Kearney and Sha Liu. 2014. Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis 33
1816
(2014), 171–185.
1817
[79] Katherine Keith and Amanda Stent. 2019. Modeling Financial Analysts’ Decision Making via the Pragmatics and Semantics of Earnings Calls. In
1818 Proceedings of ACL. 493–503.
1819
1820 Manuscript submitted to ACM
36 Du, Xing, Mao, and Cambria.
1821 [80] Kwansoo Kim, Sang-Yong Lee, and Saïd Assar. 2021. The dynamics of cryptocurrency market behavior: sentiment analysis using Markov chains.
1822 Industrial Management and Data Systems ahead-of-print (11 2021).
1823 [81] Soon-Ho Kim and Dongcheol Kim. 2014. Investor sentiment from internet message postings and the predictability of stock returns. Journal of
1824 Economic Behavior & Organization 107 (2014), 708–729.
1825 [82] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP. 1746–1751.
1826
[83] Shimon Kogan, Dimitry Levin, Bryan R Routledge, Jacob S Sagi, and Noah A Smith. 2009. Predicting risk from financial reports with regression. In
Proceedings of NAACL-HLT. 272–280.
1827
[84] Moshe Koppel and Itai Shtrimberg. 2006. Good news or bad news? Let the market decide. In Computing Attitude and Affect in Text: Theory and
1828
Applications. Springer, 297–301.
1829
[85] Prahlad Koratamaddi, Karan Wadhwani, Mridul Gupta, and Sriram G. Sanjeevi. 2021. Market sentiment-aware deep reinforcement learning
1830 approach for stock portfolio allocation. Engineering Science and Technology, an International Journal (2021).
1831 [86] Shinta Koyano and Kazushi Ikeda. 2017. Online portfolio selection based on the posts of winners and losers in stock microblogs. In IEEE SSCI.
1832 [87] Olivier Kraaijeveld and Johannes De Smedt. 2020. The predictive power of public Twitter sentiment for forecasting cryptocurrency prices. Journal
1833 of International Financial Markets, Institutions and Money 65 (2020), 101188.
1834 [88] Srikumar Krishnamoorthy. 2018. Sentiment analysis of financial news articles using performance indicators. Knowledge and Information Systems
1835 56, 2 (2018), 373–394.
1836
[89] Devinder Kumar, Graham W Taylor, and Alexander Wong. 2017. Opening the black box of financial ai with clear-trade: A class-enhanced attentive
response approach for explaining and visualizing deep learning-driven stock market prediction. arXiv preprint arXiv:1709.01574 (2017).
1837
[90] Helinä Laakkonen. 2004. The impact of macroeconomic news on exchange rate volatility. Bank of Finland discussion paper 24 (2004).
1838
[91] Bin Liang, Hang Su, Lin Gui, Erik Cambria, and Ruifeng Xu. 2022. Aspect-based sentiment analysis via affective knowledge enhanced graph
1839
convolutional networks. Knowledge-Based Systems 235 (2022), 107643.
1840 [92] Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies 5, 1 (2012), 1–167.
1841 [93] Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, and Ping Wang. 2020. K-BERT: Enabling language representation with
1842 knowledge graph. In Proceedings of AAAI, Vol. 34. 2901–2908.
1843 [94] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
1844 RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]
1845 [95] Yang Liu, Qingguo Zeng, Huanrui Yang, and Adrian Carrio. 2018. Stock price movement prediction from financial news with deep learning and
1846
knowledge graph embedding. In Pacific Rim Knowledge Acquisition Workshop. Springer, 102–113.
[96] Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. 2020. FinBERT: A Pre-trained Financial Language Representation Model for
1847
Financial Text Mining.. In IJCAI. 4513–4519.
1848
[97] Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance 66, 1
1849
(2011), 35–65.
1850 [98] Brian M Lucey and Michael Dowling. 2005. The role of feelings in investor decision-making. Journal of Economic Surveys 19, 2 (2005), 211–237.
1851 [99] Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial Sentiment Analysis
1852 with Hierarchical Query-driven Attention.. In IJCAI. 4244–4250.
1853 [100] Yun Luo, Hongjie Cai, Linyi Yang, Yanxia Qin, Rui Xia, and Yue Zhang. 2022. Challenges for Open-domain Targeted Sentiment Analysis. arXiv
1854 preprint arXiv:2204.06893 (2022).
1855 [101] Yu Ma, Rui Mao, Qika Lin, Peng Wu, and Erik Cambria. 2023. Multi-source Aggregated Classification for Stock Price Movement Prediction.
1856
Information Fusion 91 (2023), 515–528.
[102] Yu Ma, Rui Mao, Qika Lin, Peng Wu, and Erik Cambria. 2024. Quantitative Stock Portfolio Optimization by Multi-task Learning Risk and Return.
1857
Information Fusion 104 (2024), 102165.
1858
[103] Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. WWW’18 open
1859
challenge: financial opinion mining and question answering. In Companion Proceedings of the The Web Conference 2018. 1941–1942.
1860 [104] Lorenzo Malandri, Frank Z Xing, Carlotta Orsenigo, Carlo Vercellis, and Erik Cambria. 2018. Public mood–driven asset allocation: The importance
1861 of financial sentiment in portfolio management. Cognitive Computation 10, 6 (2018), 1167–1176.
1862 [105] Burton G Malkiel. 2003. The efficient market hypothesis and its critics. Journal of Economic Perspectives 17, 1 (2003), 59–82.
1863 [106] Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in
1864 economic texts. Journal of the Association for Information Science and Technology 65, 4 (2014), 782–796.
1865 [107] Xiliu Man, Tong Luo, and Jianwu Lin. 2019. Financial sentiment analysis (FSA): A survey. In IEEE ICPS. 617–622.
1866
[108] Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP
natural language processing toolkit. In Proceedings of ACL: system demonstrations. 55–60.
1867
[109] Youness Mansar, Lorenzo Gatti, Sira Ferradans, Marco Guerini, and Jacopo Staiano. 2017. Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish?
1868
Inferring Sentiment towards Brands from Financial News Headlines. In Proceedings of (SemEval-2017). 817–822.
1869
[110] Huina Mao, Pengjie Gao, Yongxiang Wang, and Johan Bollen. 2014. Automatic construction of financial semantic orientation lexicon from
1870 large-scale Chinese news corpus. Institut Louis Bachelier 20, 2 (2014), 1–18.
1871
1872 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 37
1873 [111] Rui Mao, Kelvin Du, Yu Ma, Luyao Zhu, and Erik Cambria. 2023. Discovering the Cognition behind Language: Financial Metaphor Analysis with
1874 MetaPro. In IEEE ICDM. IEEE, 1211–1216.
1875 [112] Rui Mao and Xiao Li. 2021. Bridging Towers of Multitask Learning with a Gating Mechanism for Aspect-based Sentiment Analysis and Sequential
1876 Metaphor Identification. In Proceedings of AAAI. 13534–13542.
1877 [113] Rui Mao, Xiao Li, Kai He, Mengshi Ge, and Erik Cambria. 2023. MetaPro Online: A Computational Metaphor Processing Online System. In
1878
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Vol. 3. 127–135.
[114] Rui Mao, Chenghua Lin, and Frank Guerin. 2018. Word Embedding and WordNet Based Metaphor Identification and Interpretation. In Proceedings
1879
of ACL, Vol. 1. 1222–1231.
1880
[115] Rui Mao, Chenghua Lin, and Frank Guerin. 2019. End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories. In Proceedings of
1881
ACL. 3888–3898.
1882 [116] Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. NeurIPS 30
1883 (2017).
1884 [117] Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Proceedings
1885 of SIGNLL. 51–61.
1886 [118] G Mujtaba Mian and Srinivasan Sankaraguruswamy. 2012. Investor sentiment and stock market response to earnings news. The Accounting Review
1887 87, 4 (2012), 1357–1384.
1888
[119] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their
compositionality. NeurIPS 26 (2013).
1889
[120] Karo Moilanen, Stephen Pulman, and Yue Zhang. 2010. Packed feelings and ordered sentiments: Sentiment parsing with quasi-compositional
1890
polarity sequencing and compression. In Proc. WASSA Workshop at ECAI. 36–43.
1891
[121] Rodrigo Moraes, João Francisco Valiati, and Wilson P GaviãO Neto. 2013. Document-level sentiment classification: An empirical comparison
1892 between SVM and ANN. Expert Systems with Applications 40, 2 (2013), 621–633.
1893 [122] Antonio Moreno-Ortiz, Javier Fernández-Cruz, and Chantal Pérez Chantal Hernández. 2020. Design and evaluation of SentiEcon: A fine-grained
1894 economic/financial sentiment lexicon from a corpus of business news. In Proceedings of LREC. 5065–5072.
1895 [123] Mohamed M Mostafa. 2013. More than words: Social networks’ text mining for consumer brand sentiments. Expert Systems with Applications 40,
1896 10 (2013), 4241–4251.
1897 [124] Andrius Mudinas, Dell Zhang, and Mark Levene. 2019. Market trend prediction using sentiment analysis: lessons learned and paths forward. arXiv
1898
preprint arXiv:1903.05440 (2019).
[125] Muhammad Abubakr Naeem, Imen Mbarki, and Syed Jawad Hussain Shahzad. 2021. Predictive role of online investor sentiment for cryptocurrency
1899
market: Evidence from happiness and fears. International Review of Economics & Finance 73 (2021), 496–514.
1900
[126] Usman Naseem, Imran Razzak, Shah Khalid Khan, and Mukesh Prasad. 2021. A comprehensive survey on word representation models: From
1901
classical to state-of-the-art word representation language models. Transactions on Asian and Low-Resource Language Information Processing 20, 5
1902 (2021), 1–35.
1903 [127] Arman Khadjeh Nassirtoussi, Saeed Aghabozorgi, Teh Ying Wah, and David Chek Ling Ngo. 2014. Text mining for market prediction: A systematic
1904 review. Expert Systems with Applications 41, 16 (2014), 7653–7670.
1905 [128] Arman Khadjeh Nassirtoussi, Saeed Aghabozorgi, Teh Ying Wah, and David Chek Ling Ngo. 2015. Text mining of news-headlines for FOREX
1906 market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment. Expert Systems with Applications 42, 1 (2015),
1907 306–324.
1908
[129] Thien Hai Nguyen and Kiyoaki Shirai. 2015. Topic modeling based sentiment analysis on social media for stock market prediction. In Proceedings
of ACL-IJCNLP. 1354–1364.
1909
[130] Clemens Nopp and Allan Hanbury. 2015. Detecting risks in the banking system by sentiment analysis. In Proceedings of EMNLP. 591–600.
1910
[131] Neil O’Hare, Michael Davy, Adam Bermingham, Paul Ferguson, Páraic Sheridan, Cathal Gurrin, and Alan F Smeaton. 2009. Topic-dependent
1911
sentiment analysis of financial blogs. In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. 9–16.
1912 [132] Sotirios Oikonomopoulos, Katerina Tzafilkou, Dimitrios Karapiperis, and Vassilios Verykios. 2022. Cryptocurrency Price Prediction using Social
1913 Media Sentiment Analysis. In 2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA). IEEE, 1–8.
1914 [133] Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2016. Stock market sentiment lexicon acquisition using microblogging data and statistical measures.
1915 Decision Support Systems 85 (2016), 62–73.
1916 [134] Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2017. The impact of microblogging data for stock market prediction: Using Twitter to predict
1917 returns, volatility, trading volume and survey sentiment indices. Expert Systems with Applications 73 (2017), 125–144.
1918
[135] Keane Ong, Wihan van der Heever, Ranjan Satapathy, Gianmarco Mengaldo, and Erik Cambria. 2023. FinXABSA: Explainable Finance through
Aspect-Based Sentiment Analysis. In 2023 IEEE International Conference on Data Mining Workshops (ICDMW). 773–782.
1919
[136] Jihye Park, Hye Jin Lee, and Sungzoon Cho. 2021. Automatic Construction of Context-Aware Sentiment Lexicon in the Financial Domain Using
1920
Direction-Dependent Words. arXiv preprint arXiv:2106.05723 (2021).
1921
[137] Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of EMNLP.
1922 1532–1543.
1923
1924 Manuscript submitted to ACM
38 Du, Xing, Mao, and Cambria.
1925 [138] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized
1926 Word Representations. In Proceedings of NAACL-HLT.
1927 [139] Guangyuan Piao and John G Breslin. 2018. Financial aspect and sentiment predictions with deep neural networks: an ensemble approach. In
1928 Companion Proceedings of the The Web Conference 2018. 1973–1977.
1929 [140] Juan Piñeiro-Chousa, Marcos Vizcaíno-González, and Ada María Pérez-Pico. 2017. Influence of social media over the stock market. Psychology &
1930
Marketing 34, 1 (2017), 101–108.
[141] Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, and al. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In International
1931
Workshop on Semantic Evaluation. 19–30.
1932
[142] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
1933
[143] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask
1934 learners. OpenAI blog 1, 8 (2019), 9.
1935 [144] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Conference on Empirical Methods
1936 in Natural Language Processing.
1937 [145] Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Dür, and Linda Anderson. 2017. Volatility prediction using financial
1938 disclosures sentiments with word embedding-based IR models. Proceedings of the 55th Annual Meeting of the Association for Computational
1939 Linguistics (Volume 1: Long Papers) (2017).
1940
[146] Thomas Renault. 2017. Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance 84 (2017),
25–40.
1941
[147] Leon Rotim, Martin Tutek, and Jan Šnajder. 2017. TakeLab at SemEval-2017 Task 5: Linear aggregation of word embeddings for fine-grained
1942
sentiment analysis of financial news. In Proceedings of SemEval-2017. 866–871.
1943
[148] Jayit Saha, Smit Patel, Frank Xing, and Erik Cambria. 2022. Does Social Media Sentiment Predict Bitcoin Trading Volume?. In Proceedings of the
1944 43rd International Conference on Information Systems (ICIS). 1–9.
1945 [149] Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, and Rajiv Shah. 2020. Deep attentive learning for stock movement prediction from social media
1946 text and company correlations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8415–8426.
1947 [150] Ramit Sawhney, Arnav Wadhwa, Shivam Agarwal, and Rajiv Shah. 2021. Quantitative Day Trading from Natural Language using Reinforcement
1948 Learning. In Proceedings of NAACL-HLT. 4018–4030.
1949 [151] Ramit Sawhney, Arnav Wadhwa, Shivam Agarwal, and Rajiv Ratn Shah. 2021. FAST: Financial News and Tweet Based Time Aware Network for
1950
Stock Trading. In Proceedings of EACL. 2164–2175.
[152] Kim Schouten, Flavius Frasincar, and Franciska de Jong. 2017. Commit at semeval-2017 task 5: Ontology-based method for sentiment analysis of
1951
financial headlines. In Proceedings of SemEval-2017. 883–887.
1952
[153] Saeed Seifollahi and Mehdi Shajari. 2019. Word sense disambiguation application in sentiment analysis of news headlines: an applied approach to
1953
FOREX market prediction. Journal of Intelligent Information Systems 52, 1 (2019), 57–83.
1954 [154] Hamed Naderi Semiromi, Stefan Lessmann, and Wiebke Peters. 2020. News will tell: Forecasting foreign exchange rates based on news story
1955 events in the economy calendar. The North American Journal of Economics and Finance 52 (2020), 101181.
1956 [155] Dehua Shen, Andrew Urquhart, and Pengfei Wang. 2019. Does twitter predict Bitcoin? Economics letters 174 (2019), 118–122.
1957 [156] Antonios Siganos, Evangelos Vagenas-Nanos, and Patrick Verwijmeren. 2017. Divergence of sentiment and stock market trading. Journal of
1958 Banking & Finance 78 (2017), 130–141.
1959 [157] Ankur Sinha, Satishwar Kedas, Rishu Kumar, and Pekka Malo. 2022. SEntFiN 1.0: Entity-aware sentiment analysis for financial news. Journal of
1960
the Association for Information Science and Technology (2022).
[158] Jasmina Smailović, Miha Grčar, Nada Lavrač, and Martin Žnidaršič. 2014. Stream-based active learning for sentiment analysis in the financial
1961
domain. Information Sciences 285 (2014), 181–203.
1962
[159] Parinaz Sobhani, Saif Mohammad, and Svetlana Kiritchenko. 2016. Detecting stance in tweets and analyzing its interaction with sentiment. In
1963
Proceedings of *SEM. 159–169.
1964 [160] Sahar Sohangir, Dingding Wang, Anna Pomeranets, and Taghi M Khoshgoftaar. 2018. Big Data: Deep Learning for financial sentiment analysis.
1965 Journal of Big Data 5, 1 (2018), 1–25.
1966 [161] Yejun Soun, Jaemin Yoo, Minyong Cho, Jihyeong Jeon, and U Kang. 2022. Accurate Stock Movement Prediction with Self-supervised Learning
1967 from Sparse Noisy Tweets. In IEEE International Conference on Big Data. 1691–1700.
1968 [162] Yunchuan Sun, Mengting Fang, and Xinyu Wang. 2018. A novel stock recommendation system using Guba sentiment analysis. Personal and
1969 Ubiquitous Computing 22 (2018), 575–587.
1970
[163] Pyry Takala, Pekka Malo, Ankur Sinha, and Oskar Ahlgren. 2014. Gold-standard for Topic-specific Sentiment Analysis of Economic Texts. In LREC,
Vol. 2014. Citeseer, 2152–2157.
1971
[164] Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of
1972
EMNLP. 1422–1432.
1973
[165] Richard M Tong. 2001. An operational system for detecting and tracking opinions in on-line discussion. In Working Notes of the ACM SIGIR 2001
1974 Workshop on Operational Text Classification, Vol. 1.
1975
1976 Manuscript submitted to ACM
Financial Sentiment Analysis: Techniques and Applications 39
1977 [166] Ming-Feng Tsai and Chuan-Ju Wang. 2014. Financial keyword expansion via continuous word vector representations. In Proceedings of EMNLP.
1978 1453–1458.
1979 [167] Wenting Tu, David W Cheung, Nikos Mamoulis, Min Yang, and Ziyu Lu. 2016. Investment recommendation using investor opinions in social
1980 media. In Proceedings of ACM SIGIR. 881–884.
1981 [168] Robert Tumarkin and Robert F Whitelaw. 2001. News or noise? Internet postings and stock prices. Financial Analysts Journal 57, 3 (2001), 41–51.
1982
[169] Peter D Turney and Michael L Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions
on Information Systems 21, 4 (2003), 315–346.
1983
[170] Marjan van de Kauter, Diane Breesch, and Véronique Hoste. 2015. Fine-grained analysis of explicit and implicit sentiment in financial news articles.
1984
Expert Systems with Applications 42, 11 (2015), 4999–5010.
1985
[171] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol, and Léon Bottou. 2010. Stacked denoising autoencoders:
1986 Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 12 (2010).
1987 [172] Chuan-Ju Wang, Ming-Feng Tsai, Tse Liu, and Chin-Ting Chang. 2013. Financial sentiment analysis for risk prediction. In Proceedings of IJCNLP.
1988 802–808.
1989 [173] Gang Wang, Tianyi Wang, Bolun Wang, Divya Sambasivan, Zengbin Zhang, Haitao Zheng, and Ben Y Zhao. 2015. Crowds on wall street: Extracting
1990 value from collaborative investing platforms. In Proceedings of ACM CSCW. 17–30.
1991 [174] William Yang Wang and Zhenhao Hua. 2014. A semiparametric Gaussian copula regression model for predicting financial risks from earnings
1992
calls. In Proceedings of ACL. 1155–1165.
[175] Huizhe Wu, Wei Zhang, Weiwei Shen, and Jun Wang. 2018. Hybrid deep sequential modeling for social text-driven stock prediction. In Proceedings
1993
of the 27th ACM International Conference on Information and Knowledge Management (CIKM). 1627–1630.
1994
[176] Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon
1995
Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
1996 [177] Shengting Wu, Yuling Liu, Ziran Zou, and Tien-Hsiung Weng. 2022. S_I_LSTM: stock price prediction based on multiple data sources and sentiment
1997 analysis. Connection Science 34, 1 (2022), 44–62.
1998 [178] Chunli Xiang, Junchi Zhang, Fei Li, Hao Fei, and Donghong Ji. 2022. A semantic and syntactic enhanced neural model for financial sentiment
1999 analysis. Information Processing & Management 59, 4 (2022), 102943.
2000 [179] Boyi Xie, Rebecca Passonneau, Leon Wu, and Germán G Creamer. 2013. Semantic frames to predict stock price movement. In Proceedings of ACL.
2001 873–883.
2002
[180] Frank Xing, Duc Hong Hoang, and Dinh-Vinh Vo. 2020. High-frequency news sentiment and its application to forex market prediction. In
Proceedings of HICSS.
2003
[181] Frank Xing, Lorenzo Malandri, Yue Zhang, and Erik Cambria. 2020. Financial sentiment analysis: an investigation into common mistakes and
2004
silver bullets. In Proceedings of COLING. 978–987.
2005
[182] Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Intelligent asset allocation via market sentiment views. IEEE Computational Intelligence
2006 Magazine 13, 4 (2018), 25–34.
2007 [183] Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Natural language based financial forecasting: a survey. Artificial Intelligence Review 50, 1
2008 (2018), 49–73.
2009 [184] Frank Z Xing, Erik Cambria, and Yue Zhang. 2019. Sentiment-aware volatility forecasting. Knowledge-Based Systems 176 (2019), 68–76.
2010 [185] Yumo Xu and Shay B Cohen. 2018. Stock movement prediction from tweets and historical prices. In Proceedings of ACL. 1970–1979.
2011 [186] Steve Yang, Jason Rosenfeld, and Jacques Makutonin. 2018. Financial aspect-based sentiment analysis using deep representations. arXiv preprint
2012
arXiv:1808.07931 (2018).
[187] Steve Y Yang, Yangyang Yu, and Saud Almahdi. 2018. An investor sentiment reward-based trading system using Gaussian inverse reinforcement
2013
learning algorithm. Expert Systems with Applications 114 (2018), 388–401.
2014
[188] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive
2015
Pretraining for Language Understanding. In NeurIPS. 5754–5764.
2016 [189] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification.
2017 In Proceedings of NAACL-HLT. 1480–1489.
2018 [190] Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Ju Xiao, and Bo Li. 2020. Reinforcement-learning based portfolio management with
2019 augmented asset movement prediction states. In Proceedings of AAAI, Vol. 34. 1112–1119.
2020 [191] Wei Jie Yeo, Wihan van der Heever, Rui Mao, Erik Cambria, Ranjan Satapathy, and Gianmarco Mengaldo. 2023. A comprehensive review on
2021 financial explainable AI. arXiv preprint arXiv:2309.11960 (2023).
2022
[192] Jaemin Yoo, Yejun Soun, Yong-chan Park, and U Kang. 2021. Accurate multivariate stock movement prediction via data-axis transformer with
multi-level contexts. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2037–2045.
2023
[193] Jie Yuan and Zhu Zhang. 2020. Connecting the dots: forecasting and explaining short-term market volatility. In Proceedings of ACM ICAIF. 1–8.
2024
[194] Boyu Zhang, Hongyang Yang, and Xiao-Yang Liu. 2023. Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose
2025
Large Language Models. arXiv preprint arXiv:2306.12659 (2023).
2026 [195] Pu Zhang and Zhongshi He. 2015. Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity
2027 classification. Journal of Information Science 41, 4 (2015), 531–549.
2028 Manuscript submitted to ACM
40 Du, Xing, Mao, and Cambria.
2029 [196] Wei Zhang, Xiao Li, Dehua Shen, and Andrea Teglio. 2016. Daily happiness and stock returns: Some international evidence. Physica A: Statistical
2030 Mechanics and its Applications 460 (2016), 201–209.
2031 [197] Yuzhe Zhang and Hong Zhang. 2023. FinBERT–MRC: Financial Named Entity Recognition Using BERT Under the Machine Reading Comprehension
2032 Paradigm. Neural Processing Letters (2023), 1–21.
2033 [198] Lin Zhao, Lin Li, and Xinhao Zheng. 2021. A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts. 2021
2034
IEEE CSCWD (2021), 1233–1238.
[199] Guofu Zhou. 2018. Measuring Investor Sentiment. Annual Review of Financial Economics 10, 1 (2018), 239–259.
2035
[200] Zhihan Zhou, Liqian Ma, and Han Liu. 2021. Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading. In Findings of
2036
ACL-IJCNLP. 2114–2124.
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080 Manuscript submitted to ACM