100% found this document useful (1 vote)
168 views18 pages

Monitoring The Public Opinion About The Vaccination Topic From Tweets Analysis

Monitoring the Public Opinion About the Vaccination Topic From Tweets Analysis

Uploaded by

nia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
168 views18 pages

Monitoring The Public Opinion About The Vaccination Topic From Tweets Analysis

Monitoring the Public Opinion About the Vaccination Topic From Tweets Analysis

Uploaded by

nia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Expert Systems With Applications 116 (2019) 209–226

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Monitoring the public opinion about the vaccination topic from tweets
analysis
Eleonora D’Andrea a, Pietro Ducange b, Alessio Bechini a, Alessandro Renda a,c,
Francesco Marcelloni a,∗
a
Dipartimento di Ingegneria dell’Informazione, University of Pisa, Largo Lucio Lazzarino 1, 56122 Pisa, Italy
b
SMARTEST Research Center, eCampus University, Novedrate (CO), Italy
c
University of Florence, Florence, Italy

a r t i c l e i n f o a b s t r a c t

Article history: The paper presents an intelligent system to automatically infer trends in the public opinion regarding
Received 22 January 2018 the stance towards the vaccination topic: it enables the detection of significant opinion shifts, which can
Revised 4 September 2018
be possibly explained with the occurrence of specific social context-related events. The Italian setting has
Accepted 5 September 2018
been taken as the reference use case. The source of information exploited by the system is represented by
Available online 6 September 2018
the collection of vaccine-related tweets, fetched from Twitter according to specific criteria; subsequently,
Keywords: tweets undergo a textual elaboration and a final classification to detect the expressed stance towards
Opinion mining vaccination (i.e. in favor, not in favor, and neutral). In tuning the system, we tested multiple combinations
Stance detection in tweets of different text representations and classification approaches: the best accuracy was achieved by the
Text mining scheme that adopts the bag-of-words, with stemmed n-grams as tokens, for text representation and the
Tweet classification support vector machine model for the classification. By presenting the results of a monitoring campaign
Vaccines
lasting 10 months, we show that the system may be used to track and monitor the public opinion about
vaccination decision making, in a low-cost, real-time, and quick fashion. Finally, we also verified that the
proposed scheme for continuous tweet classification does not seem to suffer particularly from concept
drift, considering the time span of the monitoring campaign.
© 2018 Elsevier Ltd. All rights reserved.

1. Introduction used for the early detection of real-time events, such as traffic con-
gestions and incidents (D’Andrea, Ducange, Lazzerini, & Marcelloni,
Among existing social networks, over the past few years 2015), earthquakes (Sakaki, Okazaki, & Matsuo, 2013), crime events
Twitter1 has reached a widespread diffusion as a personal and (Gerber, 2014), or riots (Alsaedi, Burnap, & Rana, 2017).
handy information channel. Users typically broadcast information However, analyzing tweets is more challenging than analyzing
about personal or public real-life events, or simply express their messages from other media like blogs, e-mails, etc. because of their
thoughts, viewpoints or opinions on a given topic, product, service, limited length, which forces to operate at the sentence level rather
event, etc., through a public Status Update Message called tweet. A than at the document level. Further, tweets are typically unstruc-
tweet may contain also meta-information such as timestamp, lo- tured and irregular, may contain informal or abbreviated words
cation (in terms of GPS (Global Positioning System) coordinates, (e.g., acronyms, hashtags), colloquial, idiomatic, or ironic expres-
or user profile location), username, links, hashtags, emoticons, and sions, misspellings or grammatical errors, making the conveyed in-
mentions. Recently, it has been shown that tweets may represent formation particularly noisy and fragmentary. This aspect is further
a source of valuable information (Giachanou & Crestani, 2016): worsened by the data sparsity phenomenon, i.e., a great amount of
in fact, they are public and can be easily crawled with no pri- terms in a corpus occurs less than 10 times (Saif, Yulan, & Alani,
vacy limitations, and their content can be analyzed with proper 2012).
text/data mining techniques. Indeed, Twitter has been successfully In the described setting, the extraction of meaningful infor-
mation out of tweets resorts to text mining techniques, including
methods from the fields of data mining, machine learning, statis-

Corresponding author. tics, and Natural Language Processing (NLP). Text mining refers
E-mail addresses: [email protected] (E. D’Andrea), pietro.ducange@ to the process of automatic information mining also from un-
uniecampus.it (P. Ducange), [email protected] (A. Bechini), alessandro.
renda@unifi.it (A. Renda), [email protected] (F. Marcelloni).
structured natural language text. Text mining is hampered by the
1
Twitter, www.twitter.com vagueness of natural language, due to the habit of people to make

https://doi.org/10.1016/j.eswa.2018.09.009
0957-4174/© 2018 Elsevier Ltd. All rights reserved.
210 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

frequent use of idioms, grammatical variations, slang expressions, overall drop in vaccination coverage.2 It is thus clear that the auto-
or to assume an implicit context for a given word (Gupta, Gurpreet, matic monitoring of information over social networks is of primary
& Lehal, 2009). importance for inferring the stance of the public opinion towards
Sentiment analysis and opinion mining over the Web (especially this topic: e.g., countermeasures can be taken in case of spreading
in social networks, forums, microblogs, etc.) have recently and of fake news, or the effect of social events can be detected. In this
rapidly become emergent topics, mainly because of their poten- scenario, a social event is a topic that rapidly attracts the attention
tial in uncovering trends of the public opinion or social emotions of social networks’ users in a certain time interval, by increasing
(Rushdi Saleh, M., T., Montejo-Ráez, & Ureña-López, 2011; Mostafa, the number of user reactions (Nguyen & Jung, 2017).
2013; E. S. Tellez et al., 2017a, b). The terms "sentiment analysis" This paper proposes an intelligent system for real-time monitor-
and "opinion mining" are currently used to refer to special sub- ing and analysis of the public opinion about the vaccination topic
fields of the text mining research aimed at automatically deter- on the Twitter stream. The reference case study relates to the Ital-
mining, in natural language texts, the sentiment, or the opinion ian setting, currently with about 7 millions active Twitter users,
polarity (e.g., positive-biased or negative-biased) towards a certain where the vaccination topic has caused harsh debates. The system
target (Liu, 2010; Liu, 2015; Ribeiro, Araújo, Gonçalves, Gonçalves, introduced in this work employs text mining and machine learn-
& Benevenuto, 2016). They are considered challenging tasks, as ing techniques, appropriately tuned, adapted, and integrated so to
even human experts may disagree about the sentiment associated build the overall intelligent system. We present an experimental
with a text, e.g., because of the presence of ambiguity, sarcasm, or study aimed to determine the most effective solution out of some
irony: such interpretation problems become even more difficult in different state-of-the-art approaches for text representation and
the case of short and informal texts like tweets (Gokulakrishnan, classification. The chosen approach was integrated into the actual
Priyanthan, Ragavan, Prasath, & Perera, 2012; Valdivia, Luzíón, & final system and used for the on-the-field real-time monitoring of
Herrera, 2017). the public opinion. Thus, first, we employ text mining techniques
Regarding a text mining activity performed on tweets, it is to solve a multi-class classification problem by assigning the cor-
particularly important to identify its goals, so to correctly use rect class label (in favor of vaccination, not in favor of vaccination,
the proper term to refer to it. According to the definitions re- and neutral) to tweets. Then, we inspect the trend of polarity of the
cently provided in Mohammad, Kiritchenko, Sobhani, Zhu, and public opinion of Italian Twitter users, in particular in correspon-
Cherry, (2016), sentiment analysis tasks aim to generally determine dence with local peaks of the daily number of tweets related to
whether a piece of text is positive, negative, or neutral, or alter- the vaccination topic. These peaks generally correspond to events
natively to determine the speaker’s opinion along with the rela- concerning vaccination that may have raised the public opinion in-
tive target (the entity towards which the opinion is expressed). terest. An example of a social event of this type is the planned
On the other hand, in a stance detection task the target of inter- projection (subsequently canceled) in the Senate of the Italian Re-
est is pre-chosen, and the opinion towards it must be determined, public of the controversial anti-vaccine documentary film “Vaxxed:
no matter whether it is explicitly mentioned or not in the text. from cover-up to catastrophe”.3 Among other such events we can
Notably, stance detection corresponds to a classification problem, also recall news about the vaccination drop rates in Italy, and the
with the additional difficulty that in a text a stance polarity can discussion about the introduction of vaccination-related laws.
be expressed with any sort of sentiment polarities towards dis- The paper has the following structure. Section 2 reviews the
parate targets (Mohammad et al., 2016; Mohammad, Sobhani, & state of the art and the related work about sentiment analysis,
Kiritchenko, 2017). opinion mining applications and stance detection, with reference
The exploitation of web opinion mining services is becoming to Twitter’s messages. Section 3 describes the proposed system for
prominent in several contexts like marketing, politics, recommen- stance detection about the vaccination topic from tweets, referring
dation systems, healthcare, etc. (Cambria, 2016; Pandey, Singh, Ra- to the Italian scenario. Section 4 compares the results obtained
jpoot, & Saraswat, 2017; Ducange, Pecori, & Mezzina, 2017). Dif- by the proposed system with recent stance detection approaches.
ferent approaches have been proposed to study the reactions of Section 5 shows the results of a 10-months monitoring campaign,
social network’s users to major events, so to uncover, explain, or presenting the relative trend of public opinion about vaccination,
predict the events themselves. Typical examples are the prediction spotting out the influence of particular events and analyzing pos-
of movements in stock markets (Bollen, Mao, & Zeng, 2011) and sible concept drift. Finally, Section 6 draws concluding remarks.
the outcome of political elections (Zhou, Tao, Yong, & Yang, 2013).
Further, the monitoring of public health concerns (e.g. regarding 2. State of the art and related work
vaccines or disease outbreaks) is attracting more and more inter-
est: in Ji, Chun, Wei, and Geller, (2015), the concern level is quanti- Since its early introduction in the research community in
fied on the basis of the number of negative shared tweets, and it is Web search and information retrieval (Dave, Lawrence, & Pennock,
correlated over time to the occurrence of news, with the purpose 2003), the term "Opinion Mining" has been used to emphasize
of identifying in real-time the effect of news on public concerns. the uncovering of judgments towards targets of interest in text
The vaccination topic has become controversial in recent years, analysis, but often it is assumed to cover a wider range of types
also because of the news of the alleged connection (stated in a of text analysis. Within this paper, according to Giachanou and
research article later retracted) between autism and MMR vac- Crestani, (2016), sentiment analysis is broadly intended to be the
cine, against measles, mumps, and rubella. The influence of stances study of opinion, sentiment, mood, and emotion expressed in
spread in social networks over individual behaviors and sentiments texts. Sentiment analysis is a broad research area having several
has been statistically detected (Salathé, Vu, Khandelwal, & Hunter, sub-tasks. In its most general form, it can deal with detecting
2013). Indeed, online discussion groups have arisen, influencing the sentiment polarity, e.g., positive, neutral, and negative, or with
opinion of the population over vaccination decision-making in sev- identifying specific emotions, e.g., hate, anger, joy, and sadness. A
eral countries (Bello-Orgaz, Hernandez-Castro, & Camacho, 2017). specific task is called subjectivity detection and consists in discrim-
Hence, in some cases, a drop in vaccination rates has been no- inating between objective (neutral) and subjective (opinionated)
ticed, increasing the risk of re-emergence of eradicated diseases.
For example, the Italian Ministry of Health has detected, in March 2
Vaccination data in Italy, http://www.repubblica.it/salute/prevenzione/2017/05/
2017, an increase of 230% of the number of measles cases, and an 12/news/i_vaccini_in_italia_i_dati-165262703/, (accessed 16 October 2017).
3
Vaxxed: from cover-up to catastrophe, http://vaxxedthemovie.com/.
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 211

texts. This can be even more challenging than polarity classifica- candidates. A different text representation is proposed in
tion, e.g., in the case of a news article citing people’s opinions or Aisopos, Papadakis, and Varvarigou, (2011), employing n-gram
vice versa (Liu, 2010; Cambria, 2016). Moreover, stance detection is graphs with distance-weighted edges, and making use of two clas-
the task aimed to determine the polarity of the opinion (in favor, sifiers (MNB and C4.5 decision tree) to perform both a two-way
against, or neutral) expressed in a text towards a given target en- and a three-way classification of tweets posted in a time span of
tity, e.g., a topic, a product, a service, a person (Mohammad et al., seven months. In Valdivia et al. (2017), the classification results,
2016; Mohammad et al., 2017). The distinctive trait of stance de- obtained starting from a baseline model comprising text elabora-
tection, in the comparison with general sentiment analysis, is that tion and SVM classification, are improved by applying a major-
the opinion polarity is detected towards a predefined target entity ity vote to several methods for filtering out neutral reviews. In
that may also be not explicitly mentioned in the text. a recent work (E. S. Tellez et al., 2017a) it has been shown that
Sentiment analysis and stance detection can resort to different the use of some traditional machine learning approaches (in the
classes of approaches: machine learning, lexicon-based, and hybrid specific case, text elaboration, BOW vector representation, and an
approaches (Medhat, Hassan, & Korashy, 2014). As regards texts, SVM classifier) can lead to good results in polarity classification of
they can be passed to classification systems in different formal tweets.
representations that should be able to encode the contents with Deep learning techniques have been intensely employed for text
the required precision. Text can be represented as vector of num- representation and classification. In particular, several techniques
bers considering different schemes such as standard bag-of-words leverage word embeddings: they consist in a dense, continuous, rep-
(BOW) (D’Andrea et al., 2015), word embeddings (Tang et al., 2014) resentation of words in a low-dimensional space. The advantage of
and a combination of BOW and other features (for instance part- this vector representation is the possible encoding of general se-
of-speech and word embeddings) (Mohammad et al., 2017). Ma- mantic and syntactic relationships between words, mapping simi-
chine learning employs supervised or unsupervised techniques to lar words in close points of the representation space. Several unsu-
automatically extract knowledge directly from texts. Lexicon-based pervised learning methods have been proposed to generate word
approaches rely instead on a predefined sentiment lexicon (e.g., vector representations from raw text. Well-known word embed-
WordNet (Miller, 1995), SentiWordNet (Baccianella, Esuli, & Sebas- ding generation models like Word2Vec (Mikolov, Sutskever, Chen,
tiani, 2010), SenticNet (Cambria, Havasi, & Hussain, 2012)), a col- Corrado, & Dean, 2013), Glove (Pennington, Socher, & Manning,
lection of words from the considered language, annotated with 2014) and Fast-Text (Bojanowski, Grave, Joulin, & Mikolov, 2017)
the relative sentiment polarities and strength values. The lexicon typically require text corpora containing billions of words to be
is then used along with statistical or semantic methods to per- trained. Word2vec learns representations by training a shallow neu-
form sentiment analysis. Lexicon-based approaches are best suited ral network to reconstruct the linguistic contexts of words from
for general boundless contexts (i.e., without topic), with well- target words or vice versa. FastText is a variant of Word2vec that
formed and grammatically correct texts. On the contrary, they be- breaks words into several n-grams (sub-words). In GloVe the train-
have worse in bounded contexts (i.e., concerning a certain topic) ing is performed on aggregated global word-word co-occurrence
or when an informal language is used, e.g., in social networks, due statistics from a corpus. Recently, several works either exploits pre-
to the absence of context-related words in the lexicon. Further, trained publicly available vectors, or train the models on specific
in social networks like Twitter, the language undergoes continu- corpora. Among deep learning architectures used in this field, Long
ous changes. In fact, a new invented hashtag or word can quickly Short-Term Memory (LSTM) Networks (Hochreiter & Schmidhu-
gain popularity. It is thus clear that lexicon-based approaches, re- ber, 1997) and Convolutional Neural Networks (CNNs) (LeCun, Ben-
lying on predefined dictionaries, struggle to cope with such a dy- gio, & Hinton, 2015) represent state-of-the-art models. The authors
namic setting. Supervised machine-learning approaches make use in Cliché, (2017) pre-trained three well-known unsupervised learn-
of sets of labelled texts, to be exploited for model training. Among ing models, i.e., FastText, GloVe, and Word2vec, using 100 million
the wide assortment of machine learning methods, deep learning unlabeled tweets. To enrich the word representation with polar-
approaches have recently become particularly popular also in the ity information they fine-tuned word vectors using a distant su-
field of sentiment analysis (Zhang, Wang, & Liu, 2018). They ex- pervision with a dataset of 5 million positive and 5 million nega-
ploit multiple layers of nonlinear processing units for feature ex- tive tweets. Finally, they employed the fine-tuned word vectors to
traction and transformation. Lower layers near to the inputs learn initialize a LSTM and a CNN model. An ensemble of such models
simple features, whereas higher layers learn more complex fea- achieved the best absolute performance in the SemEval-2017 Inter-
tures thanks to the representation produced by lower layers. national Workshop on Semantic Evaluation, task 4 (Sentiment analy-
In the following, we recall a few recent works for each class of sis in Twitter) (Rosenthal, Noura, & Preslav, 2017). The authors in
approaches, focusing on the analysis of Twitter messages. As re- Xiong, Hailian, Weiting, and Donghong, (2018) learn sentiment-
gards lexicon-based approaches, we mention two works Basile and specific word embedding by exploiting both lexicon and distant
Nissim, (2013) and Ortega Fonseca, and Montoyo, (2013). In the supervised information. They fed several neural networks with a
former, a tool based only on a polarity lexicon is applied on word representation combining word-level sentiment (i.e., lexicon
a topic-specific and a general dataset related to Italian tweets, information) and tweet-level sentiment (e.g. hashtag and emoti-
both considering three classes. In the latter, the authors employ con) to obtain a multi-level sentiment-enriched word embeddings.
a three-step technique including text preprocessing, polarity de- Recently, in the framework of stance detection in Twitter, the au-
tection, and rule-based classification based on WordNet and Sen- thors of Dey, Ritvik, and Saroj, (2018), discussed a two-phases text
tiWordnet lexicons. As it refers to machine learning solutions, su- classification scheme. In the first phase, a given tweet is classi-
pervised learning is the dominant approach in the literature. In fied as neutral or subjective with respect to the given topic. In
Chien and Tseng, (2011) an SVM is used for the evaluation of the second phase, the stance of a subjective tweet is classified as
the quality of information in product reviews. For the support to in a favor or against towards the topic. In both phases LSTM net-
decision-making in marketing, it has been proposed a framework works are adopted as classification models. In general, it is impor-
for summarization and SVM classification of opinions on Twitter tant to notice that the accuracy of Deep Learning methods is typi-
(Li & Li, 2013). A Naive Bayes (NB) model on unigram features has cally achieved by resorting to massive training sets to support the
been chosen in a system for real-time analysis of tweets related learning phase.
to 2012 U.S. elections (Wang, Can, Kazemzadeh, Bar, & Narayanan, Among hybrid approaches, we can mention some recent works.
2012), with the aim of inferring the public sentiment toward In Ortigosa and Carro, (2014), the authors combine lexical-based
212 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

techniques and SVM classification to perform sentimental analy- of vaccine-related tweets from Twitter. The second module “Text
sis on Spanish Facebook messages. In Agarwal, Xie, Vovsha, Ram- representation” (see Section 3.2) applies a sequence of text elabo-
bow, and Passonneau, (2011), the authors employ a combination ration steps to preprocessed tweets in order to transform them in
of unigrams and a selected set of features, on a manually anno- numeric vectors. In the third module “Text classification and trend
tated three-class dataset of English tweets. In Castellucci, Croce, analysis” (see Section 3.3), the public stance towards the vaccina-
Cao, and Basili, (2016), for a binary classification, texts are rep- tion topic is studied. More in detail, first, an appropriate class label
resented by using different categories of features, combining a (namely, in favor of vaccination, not in favor of vaccination, and neu-
BOW representation of the text, word-embedding semantic at- tral, i.e., neither in favor nor not in favor of vaccination) is assigned
tributes, polarity information, along with other attributes. In to each tweet using a supervised learning model. Then, the classi-
Mohammad et al. (2017), the authors, by employing word embed- fied tweets are analyzed to identify the trend over time of the pub-
ding features in addition to n-grams, improve the accuracy of an lic stance in Twitter, with particular reference to local peaks of the
SVM classifier for stance detection in tweets. daily number of tweets concerning vaccination. We observed that
Taking into account vaccination, which is the target topic for local peaks of the daily number of tweets are related to particular
our research, we can note that several works in the literature deal events concerning vaccination (discussions in Parliament, approval
with the healthcare topic, including both texts and social network of law establishing vaccination requirements, etc.). We have veri-
messages. In Botsis, Nguyen, Woo, Markatou, and Ball, (2011), the fied the presence of these events by analysing news on the vac-
authors propose a text classification of reports collected from the cination topic. The following sub-sections describe in detail the
U.S. Vaccine Adverse Event Reporting System related to the H1N1 steps performed in the three modules and the supervised learn-
vaccine, by employing SVMs. Chew, and Eysenbach, (2010) ana- ing stage (see Section 3.4). Although the focus of the paper is on
lyze the content of tweets related to the H1N1 outbreak to deter- Italian vaccine-related tweets, the proposed system is general and
mine the kind of information exchanged by social media users. In easily adaptable to any other topic or language.
Salathé et al. (2013), the authors employ a hybrid approach based
on NB and maximum entropy classifiers to classify tweets as neg- 3.1. Collection of tweets
ative, positive and neutral with respect to the user’s vaccination
intent against H1N1. In Du, Xu, Song, and Tao, (2017), an SVM clas- The first module of the system consists of two main steps, i.e.,
sifier is employed to assess the human papillomavirus (HPV) vac- fetch and cleaning of tweets, and preprocessing of tweets.
cination sentiment trend from tweets.
In this paper, we propose an intelligent system for monitor- (1) Fetch and cleaning. In this step, tweets are fetched according to
ing public opinion regarding the stance towards the vaccination some search criteria (e.g., keywords, time and date of posting,
topic, with specific reference to the Italian case. Tweets are clas- location of posting, hashtags). Although it is possible to resort
sified adopting the BOW representation of the texts, using n-grams to customizable tools designed for this purpose (Bechini, Gazzè,
as tokens, followed by an SVM model. Indeed, we experimentally Marchetti, & Tesconi, 2016), our main requirement was to have
showed that, for the specific context of stance detection on Twitter, a full coverage of the relevant tweets (and this is not guaran-
the adopted text classification scheme outperforms recent state-of- teed by using the plain Twitter APIs). We have reached our goal
the-art approaches, including text classification models based on by employing the Java library GetOldTweets4 , which performs
deep-learning. In addition to the elaboration and classification of HTTP GET requests to directly collect tweets meeting the pro-
tweets, the system let us check in real-time increments of interest vided search criteria: In practice, it is able to carry out the same
and stance changes of the public opinion, so that we can off-line researches that may be performed using the Twitter Search web
associate them to context-related events of possible influence; this page.5
type of analysis is not present in works mentioned above. More-
over, we also verified that, over the time span of the real-time The downloaded set of raw tweets is reduced with the aim of
monitoring campaign, the system is characterized by a low clas- discarding:
sification concept drift. • duplicate tweets, i.e., tweets having same tweet id, possibly
The word embedding process deserves further discussion, as fetched in different searches;
its proper use in sentiment analysis is not straightforward (Uysal • tweets written in other languages than the target one (Italian):
& Yi Lu, 2017). Two recent works Mohammad et al. (2017) and this may occur because of the presence of keywords/hashtags
Uysal and Yi Lu, (2017) have shown that training a word embed- with the same spelling in different languages (this has been ac-
ding model on a “background” domain-related corpus is beneficial complished using the Apache Tika6 library for Java).
for the task of stance or sentiment classification of tweets: both
the background corpus and the classification dataset consisted of Regarding retweets (i.e., other users’ tweets simply re-shared),
tweets collected in the same time window. However, in the present we decided to maintain them in the dataset, as we think that the
work, we adopted three publicly available word embeddings, pre- retweeting action, in this context, is a way of supporting/sharing
trained on the Wikipedia corpus. The rationale for this choice is the same opinion of another user.
twofold: (i) we do not have at our disposal a sufficiently large cor-
pus of domain-related tweets in Italian, i.e. collected according to (1) Preprocessing. In this second step, tweets are preprocessed by
the same criteria used for the vaccination dataset, to emulate the applying a Regular Expression (RE) filter, in order to extract
training procedure presented in Mohammad et al. (2017); (ii) some only the text of each tweet, and remove all useless meta-
recent works have successfully adopted pre-trained word embed- information. In fact, each fetched raw tweet contains the tweet
dings for text classification (Wang et al., 2016; Uysal & Yi Lu, 2017). id, the user id, the timestamp, the location (if provided), a
retweet flag, and the tweet’s content. The tweet’s content may
3. The architecture of the proposed system for stance detection include the user’s text, hashtag(s), link(s), and mention(s). More

In the following, we present the system to perform stance de- 4


GetOldTweets library available at https://github.com/Jefferson-Henrique/
tection on Twitter, with reference to the vaccination topic in Italy. GetOldTweets-java/, (accessed 2017/06/30).
The system consists of three modules (see Fig. 1). The first mod- 5
https://twitter.com/search-home
ule “Collection of tweets” (see Section 3.1) regards the collection 6
Apache, Tika https://tika.apache.org/, (accessed 2017/06/30).
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 213

Fig. 1. Modules of the proposed system (grey blocks require information from the preliminary supervised learning stage).

in detail, using an RE filter, the tweet id, the user id, the lo- methods for the tokenization (word tokenizer, alphabetic tokenizer,
cation, and the retweet flag are discarded. The timestamp is N-gram with different values of N) and different strategies for the
temporarily discarded for the purposes of text mining elabo- feature representation (binary approach, IDF and TF-IDF). For the
ration, but it will be reconsidered for the analysis of the public sake of brevity, we show just the best combination that we have
opinion trend over time. From the tweet’s content we discard: obtained.
links, mentions, numbers and special characters (e.g., punctu- In Section 4, we discuss in detail the results achieved by eleven
ation marks, brackets, slashes, quotes, etc.). Hashtags are not methods selected from the three selected categories. For the spe-
completely discarded, they are instead reduced to words (by cific text classification contest, the BOW text representation fol-
eliminating the hash (#) symbol), so as not to lose relevant in- lowed by an SVM classification model achieves the best results.
formation. In fact, a common way of writing of Twitter’s users Thus, we adopt this scheme for the text representation and clas-
is to use hashtags in sentences, in place of normal words. Fi- sification in our system.
nally, a case-folding operation is applied to the texts, in order As regards the text representation module, its main steps are de-
to convert all characters to lower case form. scribed in detail in D’Andrea et al. (2015). The main aim of the
module is to transform the set of strings, representing the stream
Hence, each tweet is represented as a sequence of characters.
of tweets, into a set of numeric vectors, by eliminating noise and
We denoted the j th tweet of the set as tweet j , with j = 1,…, N,
extracting useful information. In the following, we briefly recall
where N is the total number of tweets considered in the subse-
the sequence of steps applied to the tweets, whereas Fig. 2 shows
quent steps.
how a sample (vaccine-related) tweet is transformed as it under-
goes the different text elaboration steps. The elaboration is car-
3.2. Text representation
ried out by employing the Java API for Weka (Waikato Environ-
ment for Knowledge Analysis) (Hall et al., 2009). The text elabora-
As discussed in Section 2, several methods have been proposed
tion steps, namely, tokenization, stop-word filtering, stemming, stem
in the specialized literature for text representation and classifica-
filtering, and feature representation, are described in detail in the
tion. To identify the most suitable scheme in our specific case, we
following.
experimented three categories of methods:

1. BOW text representation followed by classical machine learning (1) Tokenization consists in transforming a stream of charac-
algorithms for classification (D’Andrea et al., 2015). ters into a stream of processing units, called tokens, e.g.,
2. A combination of BOW and word embeddings for text repre- words, phrases. Thus, during this step, by choosing n-grams
sentation followed by classical machine learning algorithms for as tokens (with n up to 2) and after removing punctua-
classification (Mohammad et al., 2017). tion marks and special symbols (e.g., accents, hyphens), each
3. Deep learning-based approaches for text elaboration and classi- tweet is converted into a set of tokens, according to the
fication (Cliché et al., 2017). BOW representation. At the end of this step, the j th to-
kenized tweet, tweet Tj = {t Tj1 , . . . , t Tjh , . . . , t TjH }, is represented
j
It is important to underline that the data preparation steps rep-
resent a crucial issue for the success of the overall system: this as the sequence of n-grams contained in it, where t Tjh is the
has been experimentally assessed also in the context of multilin- h th token, and Hj is the number of tokens in t weet Tj .
gual emotion classification (Balahur & Turchi, 2014; Becker, Mor- (2) Stop-word filtering consists in removing stop-words, i.e.,
eira, & dos Santos, 2017). Thus, we cannot claim that the choices words providing little or no useful information to the text
explored and selected through our experimentations are necessar- analysis: these words can hence be considered as noise.
ily the optimal ones, but indeed they have found to deliver very Common stop-words include articles, conjunctions, prepo-
good performances. sitions, pronouns, etc. Other stop-words are those typically
Similar to the work discussed in Balahur and Turchi, (2014) and appearing very often in sentences of the considered lan-
Becker et al. (2017), we carried out an intensive experimental guage (language-specific stop-words), or in the particular
setup analysis for the identification of the parameters of the BOW context analyzed (domain-specific stop-words). In this work,
text representation module. In particular, we considered different we employ a reduced version of the stop-word list for the
214 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Fig. 2. Steps of the text elaboration (second module) applied to a sample tweet.

Italian language, available at the Snowball Tartarus website.7 to the set RStr = {sˆ1 , . . . , sˆ f , . . . , sˆF } of relevant stems. Each
More precisely, we remove from the stop-word list: (i) all tweet is thus associated with a vector of numeric features
the verbal forms, and (ii) the words “non” (not) and “con- t weet Fj R = x j = {x j1 , . . . , x j f , . . . , x jF }, where each element xjf
tro” (against), as we experimentally found that such words is set as follows:
become important for opinion mining and sentiment anal- 
T Fj f · w f i f relevant stem sˆf is in t weet SF
ysis, thus they should be part of the text analysis process. xjf = j .
0 otherwise
At the end of this step, each tweet is cleaned from stop-
words and thus reduced to a sequence of relevant tokens, In the above equation, TFjf is the term frequency of the relevant
t weet SW
j
= {t SW
j1
, . . . , t SW
jk
, . . . , t SW
jK
}, where t SW
jk
is the k th to- stem sˆ f in the j th tweet, whereas weight wf expresses the impor-
j
ken and Kj , Kj ≤ Hj , is the number of relevant tokens in tance in the training dataset of the f th feature, namely, the f th
t weet SW . relevant stem sˆ f , and is computed during the supervised learning
j
stage (the computation of this weight is discussed in Section 3.4).
(3) Stemming is a process typically required in dealing with fu-
sional languages, like English and Italian (in our specific
3.3. The text classification and trend analysis stage
case). It consists of reducing each token (i.e., word) to its
stem or root form, so to group words having closely related
As regards the text classification and trend analysis module,
semantics. In this work, we exploit the Snowball Tartarus
two steps are performed, i.e., classification and trend analysis.
stemmer for the Italian language8 , based on the Porter’s
algorithm (Porter, 1980). Hence, at the end of this step (1) Classification. The fetched tweets are classified using a su-
each tweet is represented as a sequence of stems, t weet Sj = pervised classification model, namely an SVM classifier, pre-
{t Sj1 , . . . , t Sjl , . . . , t SjL }, where t Sjl is the l th stem and Lj , Lj ≤ Hj , viously trained during the supervised learning stage (see
j
Section 3.4). The model assigns to each tweet, now rep-
is the number of stems in t weet Sj . resented with xj , a possible class label belonging to C,
(4) Stem filtering consists in filtering out the stems, which are C = {C1 ,…, Cr ,…, CR }, with R being the number of classes con-
not considered relevant in the training dataset for the su- sidered (in this work we have R = 3).
pervised learning stage (described in detail in Section 3.4). (2) Trend analysis. The classified tweets are analyzed over time,
Thus, each tweet is cleaned from stems not belonging to the in order to infer changes (offline or even in real-time) in
set of relevant stems RStr , and is represented as a sequence the public opinion about the vaccination topic. Such changes
of relevant stems, t weet SF
j
= {t SF
j1
, . . . , t SF
jp
, . . . , t SF
jP
}, where t SF
jp (e.g., spikes in the total number of tweets) may appear
j
is the p th relevant stem in t weet SF , and Pj , with Pj ≤ Lj , is in correspondence with social known or unknown context-
j
the total number of relevant stems in t weet SF . Let F be the related events.
j
number of relevant stems identified in the training dataset.
3.4. The supervised learning stage
(5) Feature representation consists in building, for each tweet,
the corresponding vector of numeric features, i.e., X = {X1 ,…,
As stated previously, a supervised learning stage is required be-
Xf ,…, XF }, in order to represent all the tweets in the same F-
fore performing some of the steps of the second and third mod-
dimensional feature space. The set of F features corresponds
ules of the system, namely, stem filtering, feature representation, and
classification.
7
To this aim, we need a collection of Ntr labelled tweets as train-
Snowball stop-words list (Italian), http://snowball.tartarus.org/algorithms/
italian/stop.txt, (last accessed 2018/07/16). ing set. The training tweets were fetched using a set of context-
8
Snowball Stemmer (Italian), http://snowball.tartarus.org/algorithms/italian/ related keywords as search criteria, and were preprocessed, as de-
stemmer.html, (last accessed 2018/07/16). scribed in Section 3.1. Then, each tweet of the training set went
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 215

through the following text mining steps: tokenization, stop-word fil- search criteria, the date of posting, and a set of vaccine-related
tering, and stemming. Finally, the complete set of stems CStr was keywords, chosen based on a preliminary analysis consisting of:
extracted from the Ntr training tweets: i) the reading of newspaper articles about vaccines and vaccine-
Ntr
related events, and ii) interviews with medical experts. As date of
C Str = {s, . . . , sq , . . . , sQ } = ∪ t weet Sj , posting, we considered a time span of five months, from Septem-
j=1
ber 1st, 2016, to January 31st, 2017. We chose this time span as
CStr is the union of Q stems extracted from the set of training the controversy about vaccines in Italy deeply increased in this pe-
tweets after the stemming step. riod, according to Google Trend data.9 The keywords employed re-
The importance of each stem sq in CStr is represented by means fer to different sub-contexts: i) the vaccination topic itself; ii) dis-
of a weight wq , computed as the Inverse Document Frequency eases possibly caused by negative effects attributed to vaccines;
(IDF) index (Salton & Buckley, 1988) as IDFq = ln(Ntr /Nq ), where Nq and iii) vaccine-preventable diseases. Further, we also took into ac-
is the number of tweets containing stem sq . count three widely used hashtags, namely, #libertadiscelta (hash-
Then, each training tweet is represented as a vector of features tag for “freedom of choice”), #iovaccino (hashtag for “I vaccinate”),
in RQ , i.e, xt = {xt1 ,…, xtq ,…, xtQ }, where and #novaccino (hashtag for “no vaccine” ). Based on these criteria,
 we took into account 38 keywords (including synonyms, or sin-
T Ftq · wq i f t weettS contains stem sq gular/plural variations of the keywords). The set of keywords em-
xtq = ,
0 otherwise ployed is listed in Table 1. We wish to point out that in a few
cases the keywords were used in combination (i.e., logic and) with
with TFtq being the term frequency (TF) of stem q th in the t th
the keywords “vaccino”, “vaccini”, in order to fetch only tweets re-
training tweet. Thus, we employ the well-known TF-IDF index.
lated to the vaccination context. We have read a large number of
Finally, in order to select the set of relevant stems sˆ f in RStr , a
the tweets collected by using this procedure and we can confirm
feature selection algorithm was applied as follows. First, the qual-
that almost the totality of the tweets are related to vaccination.
ity of each stem sq was evaluated by means of the well-known In-
Next, we cleaned the set of tweets, by removing dupli-
formation Gain (IG) value (Patil & Atique, 2013) between feature
cated tweets and non-Italian tweets. In fact, some keywords, e.g.,
Sq (corresponding to stem sq ) and the possible class labels in C.
“autismo”, are spelled in the same way also in Spanish, or some
IG is computed as IG(C|Sq ) = H(C) − H(C|Sq ), where H(C) represents
keywords, e.g., “big pharma”, “vaxxed”, may lead to fetch English
the entropy of C, and H(C|Sq ) represents the entropy of C, after the
tweets.
observation of Sq . Then, the stems are ranked in descending order
Finally, we randomly selected and manually labelled Ntr = 693
and F stems, with F ≤ Q, are selected among these. We experi-
training tweets (about 3% of the fetched tweets) to employ in the
mented with different values for F. Consequently, each feature vec-
learning stage. The training dataset consisted of 219 tweets of class
tor is reduced to the representation in RF (discussed in step 5 of
not in favor of vaccination, i.e., tweets expressing a negative opin-
Section 3.2).
ion about vaccination, 255 tweets of class in favor of vaccination,
Lastly, the supervised classification models are trained by set-
i.e., tweets expressing a positive opinion about vaccination, and
ting the values of their parameters. In our system, we adopted an
219 tweets of class neutral. The class neutral may include news
SVM classification model. The SVM has been used successfully for
tweets about people dead or fell ill due to vaccines or to missed
text classification in the literature (D’Andrea et al., 2015; Moham-
vaccinations, neutral opinion tweets, and off-topic tweets contain-
mad et al., 2017; E. S. Tellez et al., 2017a, b). SVMs are discrimi-
ing the keywords selected (e.g., tweets related to the vaccination
native classification algorithms based on a separating hyper-plane
of pets). Fig. 3 shows an example of manually labelled training
according to which new samples can be classified. The best hyper-
tweets. We chose to manually label tweets despite being an expen-
plane is the one with the largest minimum distance from the train-
sive and tedious activity. Recently, an emerging common practice
ing samples and is computed based on the support vectors (i.e.,
to automatically label tweets exploits the kind of emoticon asso-
samples of the training set). The SVM classifier employed in this
ciated with the tweet (Gokulakrishnan et al., 2012; Aisopos et al.,
work is the implementation described in Keerthi, Shevade, Bhat-
2011; Agarwal et al., 2011). However, we did not take into account
tacharyya, and Murthy, (2001).
this approach as it presents a few problems: (i) the emoticon is
often absent (especially in tweets concerning health topics); (ii)
4. Comparing stance detection approaches
emoticons are rarely associated with tweets containing negative
sentiments or contrary stances (Park, Barash, Fink, & Cha, 2013);
In this Section, we compare the results achieved by recent
(iii) some emoticons, e.g., those for “sad” or “happy”, may actu-
state-of-the art approaches for stance detection. Obviously, in the
ally help us to distinguish between positive-sentiment tweets and
experimental comparison, we consider the dataset extracted for
negative-sentiment tweets, while other emoticons, e.g., “surprise”,
the specific context of stance detection regarding vaccination in
may lead to a wrong labelling, as they are not clearly associated
Italy. First, we describe how we generate the adopted dataset.
with a specific sentiment. In addition, stance detection is different
Then, we show the results achieved by the different methods se-
from sentiment analysis. E.g., the tweet “Il film Vaxxed è molto in-
lected for our experimental comparison campaign. We recall that
teressante, felice che venga diffuso!:-) ” (“The film Vaxxed is inter-
this campaign was carried out for identifying the most suitable
esting, happy it is distributed!:-) ”) refers to a positive sentiment
scheme, to embed in our stance detection system, for text rep-
(manifested also through the emoticon), but expresses a not in fa-
resentation and classification. It is intended that the results de-
vor stance about the vaccination topic.
pend also on the data preparation steps chosen for our system, out
of the wide range of possible ones, as underlined in Balahur and
Turchi, (2014) and Becker et al. (2017). 4.2. Experimental comparisons

4.1. Data set extraction The experiments were performed using a 10-fold stratified
cross validation (CV) procedure. Being 693 the number of labelled
In order to compare the different text representation and clas- tweets, at each iteration, the classification model is trained on
sification approaches, we needed to collect and label a set of
vaccine-related tweets. Thus, we collected tweets by using, as 9
Google Trends, https://trends.google.it/trends/.
216 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Table 1
Set of keywords (with corresponding English translation) used to fetch tweets (please note that, in some cases, the keywords differ from each other only for the singular or
plural form).

Context Italian keyword (English translation)

Vaccination topic “complotto vaccini” (vaccines conspiracy); “copertura vaccinale” (vaccination coverage); “vaccini”, “vaccino”
(vaccine(s)); “big pharma”; “rischio vaccinale”, “rischi vaccinali” (vaccine risk(s)); “vaxxed”; “trivalente” (trivalent);
“esavalente” (hexavalent); “vaccinati”, “vaccinata”, “vaccinato”, “vaccinate” (vaxxed); “quadrivalente” (quadrivalent);
“vaccinazione”, “vaccinazioni” (vaccination(s)); “libertà vaccinale” (vaccination freedom); “obiezione vaccinale”
(vaccination objection); “età vaccinale” (vaccination age); “cocktail vaccinale” (vaccination cocktail); “controindicazioni
vaccinali” (vaccine contraindications)
Negative effects attributed to “paralisi flaccida” (flaccid paralysis); “autismo” (autism); “malattie autoimmuni” (autoimmune diseases); “evento
vaccines avverso”, “eventi avversi” (adverse event(s));
Vaccine-preventable diseases “meningite” (meningitis), “morbillo” (measles); “rosolia” (rubella); “parotite” (mumps); “pertosse” (whooping cough);
“poliomelite” (polio); “varicella” (varicella); “MPR” (italian acronym for measles, mumps, rubella);
Hashtags #novaccino (hashtag for “no vaccine”); #iovaccino (hashtag for “I vaccinate”); #libertadiscelta (hashtag for “freedom of
choice”)

Fig. 3. Some example of manually labelled tweets.

about 624 tweets, and tested on about 69 tweets. We repeated the Iadh, (2017), in order to train a new word embedding model, mil-
10-fold stratified CV for two times, using two different seed values lions of tweets may be necessary. Since we do not have such a
to randomly partition the data into folds. We recall that, for each huge amount of domain related tweets in Italian, despite of the
fold, we consider a specific training set, which is used for learn- work in Mohammad et al. (2017), we adopted three publicly avail-
ing the parameters for the text representation and the classifica- able word embeddings, pre-trained on the Wikipedia corpus. We
tion model. Indeed, all the compared schemes include a first phase considered the pre-trained vectors from Fast-Text,10 Glove11 and
for transforming texts into vectors of features and then a phase for Word2Vec.12 We verified that some recent works have success-
the classification. fully adopted pre-trained word embeddings for text classification
The first scheme that we experimented adopts the BOW for (Wang et al., 2016; Uysal & Yi Lu, 2017). Thus, we experimented
the text representation and classical machine learning classifica- three schemes that we denoted as BOW + FAST-TEXT + SVM and
tion models. We tried different BOW schemes, including differ- BOW + GLOVE + SVM and BOW + W2V + SVM, respectively. In each
ent tokenization methods and the presence or the absence of the of the schemes, the dimension of word embedding space is equal
stemming stage. We also experimented with the following classi- to 300, thus the total number of adopted features is equal to 9829.
fication models: C4.5 decision tree (Quinlan, 1993), Naïve Bayesian Finally, we also experimented two popular schemes for text
(NB) (John & Langley, 1995), Multinomial NB (MNB) (Mccallum & representation and classification based on deep-learning. Both
Nigam, 1998), Random Forest (RF) (Breiman, 2001), Simple Logistic schemes adopt word embeddings for text representation. Convolu-
(SL) (Landwehr, Hall, & Frank, 2005), and SVM (Platt, 1999). The tional Neural Networks (CNN) and Long Short-Term Memory Net-
scheme discussed in Section 3 is the one that achieved the best work (LSTM) are employed for the classification stage. The adopted
results, thus it was considered in the comparison with the other models are inspired to the network architectures presented in
approaches. During the training of the models, we identified on av- Cliché, (2017). Albeit with different parametrizations, similar solu-
erage Q = 9529 features, reduced to F = 20 0 0 features after the fea- tions have been exploited in recent works (Wang et al., 2016; Uysal
ture selection step. In the following, we denote the two schemes & Yi Lu, 2017, Yang et al., 2017, Xiong et al., 2018).
as BOW + SVM_ALL and BOW + SVM_20 0 0, respectively. The models were implemented using the Python Keras library.13
The second scheme taken into consideration is an extension of The preprocessed tweets, as discussed in Section 3.1, were con-
the previous one: the BOW representation, using n-grans as to- verted in sequence of tokens by using the Keras tokenizer and
kenization method, was extended by using word embeddings as padded to a fixed length equal to 80 with a special pad token.
extra features. We took inspiration from a similar approach that
was recently adopted in Mohammad et al. (2017), where authors
also compared a number of state-of-the art schemes for stance 10
https://github.com/facebookresearch/fastText/blob/master/pretrained-
detection. In particular, the authors extended the BOW represen- vectors.md
tation with word embeddings, achieving the best results in their 11
http://hlt.isti.cnr.it/wordembeddings/glove_wiki_window10_size300_iteration50.
experimental comparison. The word embedding model was ob- tar.gz
12
tained by means of a training stage with a domain related cor- http://hlt.isti.cnr.it/wordembeddings/skipgram_wiki_window10_size300_
neg-samples10.tar.gz
pus containing tweets in English. As stated in Yang, Craig, and 13
https://keras.io/
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 217

Table 2 TNs, over the total number of tweets. Precision is the number of
Definitions of the metrics employed.
TPs, over the total number of tweets labeled as belonging to the
Metric name Definition class. Recall is defined as the number of TPs over the total number
TP + TN of tweets that actually belong to the class. The F-measure is the
Accuracy Acc =
TP + FP + FN + TN weighted harmonic mean of precision and recall. The AUC is the
TP area underlying the Receiver Operating Characteristic (ROC) curve
Precision Pr ec =
TP + FP
and is approximated with the equation in Table 2.
TP
Recall Rec = Table 3 shows the average results achieved by the different
TP + FN
Prec · Rec methods discussed above. It is worth noting that with the methods
F-measure F measure = 2 ·
Prec + Rec based on deep learning we achieve the worst results. The remain-
 FP

AUC AUC = Rec − + 1 /2 ing methods achieve similar results, even though BOW + SVM_ALL
FP + TN
shows the highest accuracy. According to previous studies (Uysal
& Yi Lu, 2017), this outcome is not completely unexpected. In-
Also in this case, we adopted pre-trained word embeddings consid- deed, deep architectures have proven to be a successful approach
ering Fast-Text, Glove and Word2Vec, which contain more or less in many areas, but they typically require large training sets. In the
2500 words each. The dimension of the word embedding space type of application considered in the paper, also because of the
is equal to 300 for all the three word embedding models. Since limited number of tweets available in Italian and the known issues
in our dataset we found 2776 different tokens, new words were related to the small length of each data item, the generation of a
initialized with random samples from a uniform distribution over training set suitable for deep architectures would be very tedious
[0,1), according to the findings presented in Yang et al. (2017). As and almost unfeasible, and would make the application itself not
regards the classification models, we carried out a deep experi- very appealing.
mental campaign for identifying the most suitable architectures In order to verify if there exist statistical differences among
and the most performing training parameters. the values of accuracy achieved by the eleven classification
As regards CNN, we defined three convolutional layers (Conv1D models, we also performed a statistical analysis of the results.
with ReLU activation) characterized by three different filter sizes Similar to the analysis carried out in our previous work in
in {3, 4, 5} and 100 filtering matrices each. The number of hid- D’Andrea et al. (2015), and as suggested in Derrac, Garcia, Molina,
den neurons was set to 30. A dropout layer was added after the and Herrera, (2011), we applied non-parametric statistical tests:
pooling layer and the hidden fully connected layer with the aim of for each classifier we generated a distribution consisting of the
reducing overfitting. 20 values of the accuracies on the test set obtained by re-
As regards LSTM, the bidirectional layer consisted in two 100- peating two times the 10-fold cross validation. We selected the
cell LSTM layers. The 200 final hidden states were concatenated BOW + SVM_All as control model and we statistically compared
and fed into a fully connected layer of 30 units. Dropout was added the results achieved by this model with the ones achieved by
in the LSTM layers and after fully connected hidden layer. the remaining models. We applied the Wilcoxon signed-rank test
Both models were trained by minimizing the categorical cross- (Wilcoxon, 1945), which detects significant differences between
entropy loss at the output softmax layer, which consists of three two distributions. In all the tests, we used α = 0.05 as level of
neuronal units. We adopted the Adam optimizer with learning rate significance. Table 4 shows the results of the Wilcoxon signed-
of 0.001 and batch size 128. In order to find the best hyperparam- rank test: R+ and R− denote, respectively, the sum of ranks for
eter configuration for LSTM and CNN models separately, we per- the folds in which the first model outperformed the second, and
formed a grid search using different values of dropout rate, dif- the sum of ranks for the opposite condition. Whenever the p
ferent number of epochs, different pre-trained word embedding. value is lower than the level of significance, we can reject the
We selected the models that delivered the highest accuracy on our statistical hypothesis of equivalence. Otherwise, no statistical dif-
dataset, using a 10-fold cross validation. The best configuration for ferences can be identified. Thus, BOW + SVM_All is statistically
CNN was the one with Fast-Text word embedding, dropout rate equivalent only to BOW + SVM_20 0 0, BOW + FASTTEXT + SVM and
equal to 0.2 and 40 epochs of training. The best configuration for BOW + W2V + SVM. On the other hand, BOW + SVM_All statistically
LSTM was the one with Word2Vec word embedding, dropout rate outperforms the remaining models.
equal to 0.4 and 60 epochs of training. In the following, we denote Since we aim to select the simplest scheme for text repre-
these two schemes as Fast-Text + CNN and W2V + LSTM. However, sentation, we decided to embed the BOW + SVM_20 0 0 scheme
we will show also the best results achieved adopting the three in our stance detection system. We can conclude that the se-
word embedding schemes combined with the CNN and the LSTM. lected scheme is the most suitable one for the task of detect-
Thus, we will also consider the following schemes: GLOVE + CNN, ing stance regarding vaccination discussions in Italy. Indeed, it is
W2V + CNN, Fast-Text + LSTM and GLOVE + LSTM. the simplest one (it adopts just 20 0 0 features for text represen-
We evaluated the models in terms of widely-used metrics tation) and achieves results that are comparable (even slightly
(Forman, 2003), namely, accuracy, precision, recall, F-measure, and better) with the ones achieved by the recent state-of-the art
Area Under the Curve (AUC). Table 2 provides the definitions of the method introduced in Mohammad et al. (2017). The authors of
metrics employed. For the sake of simplicity, we will explain the Mohammad et al. (2017) showed that their scheme, which adopts
metrics referring to the case of a binary classification (i.e., positive the BOW text representation extended with word embeddings and
class vs. negative class), as the adaptation to a multi-class problem the SVM as classification model, is able to achieve better results, in
is straightforward. In fact, in this case, the metrics are computed the framework of stance detection, than the winner of the SemEval
for each class. First, we need to define a few elements of a clas- 2016 competition (Mohammad et al., 2016).
sification task: (i) true positives (TP) is the number of real positive
tweets correctly classified as positive; (ii) true negatives (TN) is the 5. Online monitoring
number of real negative tweets correctly classified as negative; (iii)
false positives (FP) is the number of real negative tweets incorrectly In this section, first we show the outcomes of the real-time
classified as positive; (iv) false negatives (FN) is the number of real monitoring analysis on Twitter of the stance of people towards
positive tweets incorrectly classified as negative. Thus, accuracy is the vaccination topic in Italy. Then, since along the time the terms
the number of tweets correctly labeled, i.e., the sum of TPs and used to express stance about vaccination may change, we present
218 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Table 3
Average results obtained by using the different approaches discussed in the text.

Classifier Class F-measure Precision Recall AUC Accuracy

Not in favor 0.60 62.6% 56.6% 0.73

BOW + SVM_All In favor 0.65 64.5% 65.5% 0.74 65.4%


Neutral 0.71 68.6% 74.0% 0.80

Not in favor 0.59 61.5% 56.2% 0.73

BOW + SVM_20 0 0 In favor 0.64 63.2% 63.9% 0.74 64.8%


Neutral 0.72 69.4% 74.4% 0.81

Not in favor 0.59 57.9% 60.3% 0.75

BOW + FASTTEXT + SVM In favor 0.73 73.3% 72.6% 0.82 64.2%


Neutral 0.61 62.1% 60.4% 0.72

Not in favor 0.56 59.5% 53.0% 0.74

BOW + GLOVE + SVM In favor 0.70 66.9% 72.1% 0.79 62.2%


Neutral 0.61 59.9% 61.6% 0.71

Not in favor 0.59 61.1% 56.6% 0.73

BOW + W2V + SVM In favor 0.72 68.9% 74.9% 0.81 63.7%


Neutral 0.60 60.7% 60.0% 0.72

Not in favor 0.57 57.8% 57.9% 0.69

FASTEXT + CNN In favor 0.63 64.3% 62.7% 0.70 62.9%


Neutral 0.68 69.6% 68.0% 0.77

Not in favor 0.55 54.8% 56.6% 0.67

GLOVE + CNN In favor 0.63 64.5% 62.4% 0.71 60.5%


Neutral 0.63 65.1% 62.2% 0.73

Not in favor 0.57 57.2% 58.0% 0.69

W2V + CNN In favor 0.62 62.9% 61.6% 0.70 62.5%


Neutral 0.69 70.4% 68.1% 0.77

Not in favor 0.55 54.6% 58.4% 0.67

FASTEXT + LSTM In favor 0.63 61.5% 63.6% 0.70 61.2%


Neutral 0.66 72.7% 61.1% 0.75

Not in favor 0.56 55.5% 58.5% 0.68

GLOVE + LSTM In favor 0.62 62.2% 63.2% 0.70 61.8%


Neutral 0.67 73.3% 63.4% 0.76

Not in favor 0.57 56.6% 59.8% 0.68

W2V + LSTM In favor 0.59 59.3% 62.0% 0.69 61.9%


Neutral 0.69 76.2% 63.9% 0.77

Table 4
Results of the Wilcoxon Signed-Rank test on the accuracies obtained on the test set.

Comparison R+ R- p-values Hypotesis

BOW + SVM_ALL vs. BOW + SVM_20 0 0 27 18 0.528926 Not-rejected


BOW + SVM_ALL vs. BOW + FASTTEXT + SVM 35.5 19.5 0.386271 Not-rejected
BOW + SVM_ALL vs. BOW + GLOVE + SVM 55 0 0.003842 Rejected
BOW + SVM_ALL vs. BOW + W2V + SVM 30 15 0.343253 Not-rejected
BOW + SVM_ALL vs. FASTEXT + CNN 42 3 0.016172 Rejected
BOW + SVM_ALL vs. GLOVE + CNN 53 2 0.007267 Rejected
BOW + SVM_ALL vs. W2V + CNN 43 2 0.012851 Rejected
BOW + SVM_ALL vs. FASTEXT + LSTM 55 0 0.003842 Rejected
BOW + SVM_ALL vs. GLOVE + LSTM 41 4 0.022327 Rejected
BOW + SVM_ALL vs. W2V + LSTM 53 2 0.007267 Rejected
BOW + SVM_ALL vs. GLOVE + CNN 53 2 0.007267 Rejected
BOW + SVM_ALL vs. W2V + CNN 43 2 0.012851 Rejected
BOW + SVM_ALL vs. FASTEXT + LSTM 55 0 0.003842 Rejected
BOW + SVM_ALL vs. GLOVE + LSTM 41 4 0.022327 Rejected
BOW + SVM_ALL vs. W2V + LSTM 53 2 0.007267 Rejected
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 219

Fig. 4. Daily number of tweets by class (from September 1st, 2016 to June 30th, 2017).

Fig. 5. Number of cumulated tweets by class (from September 1st, 2016 to June 30th, 2017).

an experimental study for detecting the possible presence of con- ber of tweets per day per class. More precisely, Fig. 4 depicts
cept drift. Finally, we analyze the content of some tweets classified a stacked histogram of the number of not in favor of vaccina-
as belonging to the three classes considered in the paper. tion, in favor of vaccination, and neutral tweets classified by our
system per day during the time span considered. Fig. 5 shows
5.1. Outcome of the online monitoring the cumulated value of tweets by class over the same time in-
terval, and Fig. 6 compares the daily opinion trend limited to
The monitoring analysis lasted 10 months, from September 1st, subjective tweets (in favor of vaccination and not in favor of
2016 to June 30th, 2017. During this time interval, we fetched vaccination). In all figures, we can easily see how the num-
N = 112,397 tweets using the same set of keywords presented in ber of tweets increases in correspondence with some days (lo-
Table 1. From this set of tweets, we removed the training tweets. cal peaks of the daily number of tweets). By analyzing news in
The remaining tweets were preprocessed and classified. As stated these days, we have discovered the presence of specific events
before, we selected the BOW + SVM_20 0 0 scheme as the most suit- related to vaccination (the number upon the peak idicates the
able one for text representation and classification. The adopted event). In particular, we have identified the following events:
scheme was trained using the entire training set extracted from
September 1st, 2016 to January 31st, 2017. (1) event #1: Cancellation of the projection of the documentary
In the following, we show and discuss the outcome of the on- film “Vaxxed: from cover-up to catastrophe” in the Italian Re-
line monitoring analysis over the 10 months. Further, we deepen public Senate on September 28th, 201614 ;
the analysis in correspondence with local peaks of the daily num-
ber of tweets. We have verified that these peaks are related to
particular events concerning vaccination (discussions in Parlia-
ment, approval of law establishing vaccination requirements, etc.) 14
www.ilfattoquotidiano.it/2016/09/28/vaccini-senato-annulla-proiezione-%E2%
and therefore are very interesting to evaluate the effectiveness 80%AAdel- documentario- vaxxed- cover- catastrophe/3062895/, (accessed 16 October
of our systems. Figs. 4–6 illustrate an overview of the num- 2017).
220 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Fig. 6. Number of subjective tweets (in favor of vaccination vs. not in favor of vaccination) per day (from September 1st, 2016 to June 30th, 2017).

(2) event #2: Expected projection of the documentary film (12) event #12: NY Times against Italian political party against
“Vaxxed: from cover-up to catastrophe” in the Italian Repub- vaccines, news of May 4th, 201725 ;
lic Senate on October 4th, 201615 ; (13) event #13: 5 times increase in measles cases in Italy in April
(3) event #3: Speech by President of Italian Republic about vac- 2017, news of May 4th, 201726 ;
cines on October 24th, 201616 ; (14) event #14: Approval of the decree on vaccinations require-
(4) event #4: Approval of the law establishing vaccination re- ment (12 vaccines) in Italian kindergartens on May 19th,
quirements for school children in Emilia Romagna Region, 201727 ;
Italy, approved on November 22nd, 201617 ; (15) event #15: President of Italian Republic signs the decree
(5) event #5: Death of a school teacher for meningitis in Rome, about 12 vaccinations requirement in Italian schools on June
Italy, news of December 28th, 201618 ; 7th, 201728 ;
(6) event #6: Agreement between Italian Health Minister and (16) event #16: Kid sick of leukemia died for measles in Monza,
Italian Regions about vaccinations requirement on January Italy, news of June 22nd, 201729 .
26th, 201719 ;
(7) event #7: Cancellation of the projection of the documentary From Fig. 4 we can observe that, in absence of context-related
film “Vaxxed: from cover-up to catastrophe” at the European events, the number of tweets per day is quite low (e.g., about 100–
Parliament on February 7th, 201720 ; 200 tweets per day until September 28th, 2016). This value rapidly
(8) event #8: Increase of 230% cases of measles in Italy, news of grows when context-related events occur (e.g., it exceeds 500 on
March 16th, 201721 ; September 28th, 2016 when event #1 occurs). Further, by observ-
(9) event #9: Italian TV show Report focusing on vaccines cause ing Figs. 4 and 5, we can see that some events produced a higher
controversy on April 17th, 201722 ; spike in the daily number of tweets, i.e., higher than 20 0 0. These
(10) event #10: Fake vaccinations in the Italian city of Treviso, spikes occur immediately after November 22nd, 2016, April 17th,
news of April 19th, 201723 ; 2017, and June 22nd, 2017. In correspondence with these dates, we
(11) event #11: Fake vaccinations in the Friuli Region, Italy, news can identify the following triggering events: i) event #4 on Novem-
of May 3rd, 201724 ; ber 22nd, 2016; ii) event #9 on April 17th, 2017; iii) event #10 on
April 19th, 2017; and iv) event #16 on June 22nd, 2017.
The effect of a triggering event may be more or less empha-
sized depending on the flow of the event itself, and on the per-
15
www.lastampa.it/2016/09/28/italia/film-contro-i-vaccini-in-senato-la-polemica- ception of the event by Twitter users. Further, the effect of the
cancella-levento-LnDCe2j3uTq8KukEEey8uJ/pagina.html (accessed 10 Sept. 2018). event (in terms of number of shared tweets) may be observable al-
16
www.repubblica.it/salute/medicina/2016/10/24/news/mattarella_sconsiderato_
most immediately, as it typically happens with viral news, or some
chi_critica_vaccini-150471038/, (accessed 16 October 2017).
17
www.repubblica.it/salute/prevenzione/2016/11/22/news/vaccini_obbligatori_
hours/days later. E.g., the spike corresponding to event #4 actually
emilia_romagna_immunita_gregge-152543276/, (accessed 16 October 2017). occurs the day after, i.e., on November 23rd, 2016. Further, events
18
www.ilpost.it/2016/12/28/meningite/, (accessed 16 October 2017). very close in time may contribute to the same spike. E.g., the spike
19
www.huffingtonpost.it/2017/01/26/vaccini- obbligatori- accordo- storico_n_
14417108.html, (accessed 16 October 2017).
20 25
www.repubblica.it/salute/prevenzione/2017/02/07/news/vaccini_il_film_vaxxed_ www.repubblica.it/politica/2017/05/03/news/nyt_contro_i_5_stelle_loro_
sull_autismo_al_parlamento_ue_lorenzin_scrive_a_tajani-157788259/, (accessed 16 negazione_populista_sull_efficacia_dei_vaccini_aumenta_la_diffusione_di_gravi_
October 2017). malattie_-164484892/, (accessed 16 October 2017).
21 26
www.ilfattoquotidiano.it/2017/03/16/morbillo- i- dati- del- ministero- della- salute www.repubblica.it/salute/prevenzione/2017/05/04/news/morbillo_casi_
- preoccupante- aumento- dei- casi- 230- in- un- anno- e- colpa- del- rifiuto- dei- vaccini/ aumento_2017-164596470/, (accessed 16 October 2017).
3456211/, (accessed 16 October 2017). 27
www.repubblica.it/salute/2017/05/19/news/vaccini_oggi_testo_in_cdm_boschi_
22
www.repubblica.it/cronaca/2017/04/19/news/tra_inchieste_e_bufale-163328161/, no_scherzi_su_salute-165815370/ , (accessed 16 October 2017).
(accessed 16 October 2017). 28
www.ilfattoquotidiano.it/2017/06/07/vaccini-mattarella-firma-il-decreto-su-
23
www.repubblica.it/cronaca/2017/04/19/news/treviso_infermiera_fiale_ obbligo- per- iscrizione- a- scuola- bastera- autocertificazione- o- la- prenotazione/
vaccini-163380333/, (accessed 16 October 2017). 3642713/, (accessed 16 October 2017).
24 29
www.repubblica.it/cronaca/2017/05/03/news/friuli_venezia_giulia_fingeva_ milano.repubblica.it/cronaca/2017/06/23/news/morbillo_vaccini_bambino_
vaccini_oltre_20mila_dosi_dubbie-164506727/, (accessed 16 October 2017). morto_monza_dubbi_ospedale-168926326/, (accessed 16 October 2017).
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 221

Fig. 7. Distribution of opinion polarity over the classes by event. The opinion polarity for event #4 regards November 22nd–24th, 2016, that for event #9 and #10 (considered
together) regards April 17th–20th, 2017, and that for event #16 regards June 22nd–24th, 2017.

after April 17th 2017 may be caused by the event itself (event #9), pressing a subjective opinion increased significantly in Spring 2017.
but also event #10, occurring just two days later, can likely be a More precisely, the amount of neutral tweets, initially over 70%, de-
cause. Thus, events #9 and #10 may be merged into a single ag- creases to around 50% in May 2017, making May the month with
gregated event, for the purpose of the analysis. the highest percentages of subjective opinions. Further, we can
Further, in addition to the total number of tweets per day, in- observe from Fig. 4 that also the total number of daily tweets has
dependently of the class, we can observe (in Figs. 4–6) also spikes grown during the time span. These facts suggest that the number
(i.e., sudden changes) in the number of tweets of a given class. of people talking about vaccination increased, as a consequence of
More precisely, spikes of neutral tweets are the most notable, both vaccine-related events.
in terms of frequency and of amplitude. However, neutral tweets,
for the most part, correspond to news tweets and indicate users 5.2. Concept drift detection and analysis
talking about vaccines or sharing news in correspondence with the
event, whereas sometimes correspond also to personal objective When dealing with continuous classification of data streams
texts without a clear opinion. along the time, the issue of concept drift should be analyzed. In-
In order to understand better the opinion polarity after the oc- deed, the classification models are usually trained using data ex-
currence of a context-related event, we can study the distribution tracted in a specific time interval. Then, such models are used for
of tweets over the three classes. Let us consider event #4. The classifying the new instances received in streaming. Since the char-
event causes a main spike in the number of tweets on November acteristics of the phenomenon under observation can change along
23rd, 2016, and two minor spikes (i.e., with more than 500 daily time, the performance of the classification models may deteriorate,
tweets) on November 22nd and November 24th, 2016. Thus, we due to this concept drift. Thus, once in the classification system the
can aggregate the opinions shared during these three days in order presence of concept drift is detected, appropriate strategies for re-
to analyze the effects of event #4. In the time interval November ducing it have to be possibly applied (Gama, Žliobaitė, Bifet, Pech-
22nd–24th, 2016, 3566 tweets were shared, about 59% of tweets enizkiy, & Bouchachia, 2014).
were classified as neutral, about 23% as in favor of vaccination, and In the context analyzed in this work, in which we carry out
about 18% as not in favor of vaccination. Similarly, we can repeat a real-time classification of stance about vaccination in Italy from
this analysis for event #16, and for events #9 and #10 considered tweets, users of Twitter may change over time the words and/or
together. More precisely, the effects of event #4 span on Novem- phrases used for expressing their opinion. For this reason, we
ber 22nd–24th, 2016, those of event #9 and #10 together span on decided to carry out an additional experimental analysis for de-
April 17th–20th, 2017, and those of event #16 span on June 22nd– tecting the presence of concept drift along the time span under
24th, 2017. The number of days to consider in order to study the observation. To this aim, we analyzed the tweets belonging to 7
opinion polarity in correspondence with each event, depends on local peaks of the daily number of tweets, which correspond to
the number of daily tweets, and was decided by visually inspect- seven, namely #4, #5, #6, #8, #10, #14 and #16, out of the sixteen
ing Fig. 4. Fig. 7 summarizes the distribution of opinion polarity for events described in Section 5.1. We randomly read several tweets
the events considered. We can observe that the stance is overall and manually labelled around 60 tweets for each event, trying to
neutral for aggregated events #9 and #10. It is considerably biased identify 20 tweets for each class. We limited the analysis to a sub-
towards in favor of vaccination for event #16 (+11%) and slightly bi- set of the sixteen events just because the labelling task is quite
ased towards in favor of vaccination for event #4 (+5%). By taking tedious and time expensive.
into account the overall time span of 10 months, the distribution To detect the presence of drift, we evaluated the F-measure, the
of opinion over the three classes is: in favor of vaccination for about precision, the recall, the AUC per class and the overall accuracy, ob-
19%, neutral for about 64%, and not in favor of vaccination for about tained classifying the labelled tweets of each selected peak. First,
17%. Having the majority of tweets classified as neutral is a com- we evaluated the performances obtained on each event by the
mon behavior in social networks (Ghiassi, Skinner, & Zimbra, 2013). classification model trained using the initial training set, extracted
Further, by taking into account only subjective tweets (i.e., by dis- from September 1st, 2016 to January 31st, 2017 (we checked that
carding neutral tweets), we can state that, overall during the 10 the tweets of the events #4, #5, and #6 were not included into
months, about 52% of opinions are in favor of vaccination, whereas the training set). Then, before evaluating the classification perfor-
about 48% are not in favor of vaccination. Hence, the opinion is mances on a specific event, we re-trained the classification model
slightly biased towards the in favor of vaccination class. Obviously, extending the training set by considering the labelled tweets of
this is an aggregated result, which may hide the variations occur- the previously analyzed events. As an example, when we analyzed
ring during the time span. Hence, we made a monthly analysis. event #10, we re-trained the classification models considering the
Figs. 8 and 9 show the distribution of tweets over the three classes initial training set extended with the labelled tweets of events
per month, and the number of tweets shared per month, respec- #4, #5, #6 and #8. This incremental learning procedure was pro-
tively. From Fig. 8, we can notice that the number of tweets ex- posed in Costa, Silva, Antunes, and Ribeiro, (2014), where three
222 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Fig. 8. Distribution of opinion polarity over the classes by month.

Fig. 9. Number of tweets per month.

different solutions were compared for approaching the problem of We can conclude that adopting an incremental learning does not
handling concept drift in the classification of Twitter streams. The lead to an effective reduction of the concept drift. Indeed, although
procedure resulted to be the one with the best performance. In in the first events the incremental learning produces a gain in
the following, we refer to the incrementally trained classifier as accuracy, in the subsequent events we observe a decrease. This
BOW + SVM_20 0 0_INC. trend may be due to a strong dependency of the accuracy on spe-
Tables 5 and 6 show the results of the concept drift analysis cific words, which are used in some events, but not in all. Any-
for the BOW + SVM_20 0 0 and BOW + SVM_20 0 0_INC classifiers, re- way, given the complexity of the overall scenario and the relatively
spectively. Furthermore, Fig. 10 shows a plot of the trends of the small amount of labelled data available for the re-training proce-
accuracy over the time span of the selected events for both clas- dure, the incremental solution does not provide very effective im-
sification schemes. In the figure, the blue dotted line and the con- provements.
tinuous black line show the trend of the accuracy achieved, respec- We believe that an average accuracy of 62.75% on the se-
tively, by BOW + SVM_20 0 0 and BOW + SVM_20 0 0_INC. lected and labelled tweets (a test set of around 420 tweets) can
The analysis of Table 5 and Fig. 10 shows that the incremental be considered a good result. Indeed, a recent work discussed in
solution outperforms the BOW + SVM_20 0 0 scheme only in events Dey et al. (2018) obtains, on the SemEval 2016 stance detection
6 and 8. On the other hand, in the worst case the accuracy of Twitter task dataset,30 a best-case accuracy of 60.2% on the test
BOW + SVM_20 0 0_INC deteriorates down to a value below 60%. set. On the other hand, the BOW + SVM_20 0 0 scheme may be
Conversely, the accuracy of BOW + SVM_20 0 0 remains quite sta-
ble along the time window. Thus, we can affirm that our selected
solution does not look to be particularly affected by concept drift. 30
http://saifmohammad.com/WebPages/StanceDataset.htm
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 223

Fig. 10. Accuracy trends over the time span of seven peaks of the daily number of tweets.

Table 5 Table 6
Results of the concept drift analysis for BOW + SVM_20 0 0. Results of the concept drift analysis for BOW + SVM_20 0 0_INC.

Event Class F-measure Precision Recall AUC Accuracy Event Class F-measure Precision Recall AUC Accuracy

Not in favor 0.51 52.9% 50.0% 0.64 Not in favor 0.51 52.9% 50.0% 0.64

#4 In favor 0.63 66.7% 60.0% 0.73 62.1% #4 In favor 0.63 66.7% 60.0% 0.73 62.1%
Neutral 0.70 65.2% 75.0% 0.76 Neutral 0.70 65.2% 75.0% 0.76

Not in favor 0.55 72.7% 44.4% 0.68 Not in favor 0.60 75.0% 50.0% 0.72

#5 In favor 0.51 61.1% 44.0% 0.70 63.2% #5 In favor 0.49 62.5% 40.0% 0.69 63.2%
Neutral 0.75 61.5% 96.0% 0.80 Neutral 0.74 60.0% 96.0% 0.79

Not in favor 0.53 64.3% 45.0% 0.65 Not in favor 0.65 70.6% 60.0% 0.74

#6 In favor 0.55 48.0% 63.2% 0.68 61.9% #6 In favor 0.60 57.1% 63.2% 0.72 68.3%
Neutral 0.75 75.0% 75.0% 0.84 Neutral 0.78 76.0% 79.2% 0.85

Not in favor 0.53 63.2% 46.2% 0.79 Not in favor 0.64 71.4% 57.7% 0.80

#8 In favor 0.64 67.9% 61.3% 0.77 64.7% #8 In favor 0.70 68.8% 71.0% 0.78 71.8%
Neutral 0.73 63.2% 85.7% 0.82 Neutral 0.80 75.0% 85.7% 0.86

Not in favor 0.64 75.0% 56.3% 0.70 Not in favor 0.69 68.8% 68.8% 0.73

#10 In favor 0.43 40.0% 46.2% 0.62 63.0% #10 In favor 0.44 42.9% 46.2% 0.65 63.0%
Neutral 0.73 70.4% 76.0% 0.77 Neutral 0.69 70.8% 68.0% 0.75

Not in favor 0.50 55.6% 45.5% 0.65 Not in favor 0.53 55.6% 50.0% 0.70

#14 In favor 0.66 66.7% 64.3% 0.70 61.6% #14 In favor 0.67 68.4% 65.0% 0.73 60.0%
Neutral 0.67 60.7% 73.9% 0.76 Neutral 0.61 56.5% 65.0% 0.74

Not in favor 0.72 73.1% 70.4% 0.76 Not in favor 0.61 58.6% 63.0% 0.68

#16 In favor 0.50 47.6% 52.6% 0.65 62.1% #16 In favor 0.43 44.4% 42.1% 0.63 59.1%
Neutral 0.62 63.2% 60.0% 0.81 Neutral 0.72 73.7% 70.0% 0.82

Not in favor 0.57 65.3% 51.1% 0.70 Not in favor 0.60 64.7% 57.1% 0.72

Average In favor 0.56 56.9% 55.9% 0.69 62.6% Average In favor 0.57 58.7% 55.4% 0.70 63.9%
Neutral 0.71 65.6% 77.4% 0.79 Neutral 0.72 68.2% 77.0% 0.80

5.3. Analysis of a selected set of tweets on major events


adapted/updated when an appreciable concept drift is detected.
This can happen when new keywords become of interest, or in Lastly, we discuss the outcome of the classification of a few
the case that the way of writing or the opinion of users actually tweets fetched in correspondence with the major events men-
changes with respect to the keywords. tioned above, i.e., events #4, #9, #10, and #16. Table 7 shows 22
224 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Table 7
A few examples of tweets classification for some major events.
Text of tweet – [English translation] Actual class Assigned Note
Event class
#4 “Meningite, donna morta a Firenze. Non era vaccinata contro meningococco di tipo C” – Neutral Neutral Hit, news
[“Meningitis, a woman died in Florence. It was not vaccinated against type- C tweet
meningococco”]
#4 “Ai miei tempi varicella e morbillo si curavano in casa, a letto, con pozioni magiche Not in favor Not in favor Hit
della mamma! Ora solo vaccini contro il male di vivere!” – [“In my time, varicella and
measles were cared for at home, in bed, with mom’s magic potions! Now only vaccines
against the evil of living!”]
#4 “Si ringraziano i criminali imbecilli che fanno propaganda contro i vaccini.” – [“Thanks In favor Not in favor Miss, due to
criminal idiots who make propaganda against vaccines.”] irony/sarcasm
#4 “Epidemia di meningite in Toscana, ma tranquilli continuate a non vaccinare i bambini, In favor Not in favor Miss, due to
che tanto non succede niente!” – [“Meningitis outbreak in Tuscany, but be calm and irony/sarcasm
still do not vaccinate children, nothing will happen!”]
#4 “Dovessi fare io le leggi, proibirei a questa bella gente l’accesso a tutti i luoghi pubblici e In favor In favor Hit
imporrei la vaccinazione coatta.” – [“If I had to do myself the laws, I would prohibit to
such nice people the access to all public places and impose the forced vaccination.”]
#4 “Trattamenti periodici e vaccinazioni del tuo cane. Ricordi tutte le scadenze? Neutral Neutral Hit
Scommettiamo di no…” – [“Periodic treatments and vaccinations for your dog. Do you
remember all deadlines? We bet not…”]
#4 “Vergogna, votano contro la vaccinazione in Emilia Romagna che per fortuna passa. In favor In favor Hit
Oscurantismo e medioevo spacciati per progresso.” – [“Shame! They vote against
vaccination in Emilia Romagna, which fortunately was approved. Obscurity and Middle
Ages passed off as progress.”]
#4 “I vaccini funzionano, servono e la realtà lo ha confermato. Dibattito si può fare solo In favor In favor Hit
con prove che dicano diversamente.” – [“Vaccines work, they are useful and reality has
confirmed it. Debate can only be done only with evidence that show differently.”]
#9, “In Italia un caso di difterite: le conseguenze del calo dei vaccini.” – [“In Italy a case of Neutral Not in favor Miss, news
#10 diphtheria: the consequences of the fall in vaccines.”] tweet
#9, “Visto che i vaccini sono obbligatori, perchè non li fanno gratis? Evidentemente adesso Not in favor Not in favor Hit
#10 vogliono fare arricchire qualche casa farmaceutica.” – [“Since vaccines are mandatory,
why they do not do it for free? Obviously now they want to enrich some
pharmaceutical company.”]
#9, “Quindi, la mia opinione è che alcuni vaccini devono essere obbligatori, anche per tutela In favor In favor Hit
#10 dei genitori.” – [“So, in my opinion some vaccines must be mandatory, also for
protecting the parents.”]
#9, “Non fate girare queste bufale assurde. Ormai è stato dimostrato da anni che l’autismo In favor Neutral Miss
#10 non ha nulla a che fare con i vaccini.” – [“Do not spread these absurd fake news. It has
been shown for years that autism has nothing to do with vaccines.”]
#9, “Loro votano No ai vaccini, e a noi ci chiamano serial killer dei loro figli!” – [“They say In favor Neutral Miss, due to
#10 No to vaccines, and call us serial killers of their children!”] irony/sarcasm
#9, “Il primo effetto del vaccino contro l’influenza è la voglia di dormire altre ore.” – [“The Neutral Neutral Hit
#10 first effect of the vaccine against flue is the wish for sleeping extra hours.”]
#9, “Peccato non esista un vaccino contro la stupidita!” – [“What a pity there is no vaccine In favor In favor Hit
#10 against stupidity!”]
#16 “Le aborrite case farmaceutiche farebbero molti piu profitti senza vaccini, cara signora. In favor Neutral Miss, due to
Sai quanti farmaci per le complicanze del morbillo?” – [“The abhorred pharmaceutical irony/sarcasm
companies would earn more without vaccines, dear lady. Do you know how many
medicines for the complications of measles?”]
#16 “Noi che siamo quasi uguali di età e abbiamo avuto varicella e morbillo, e siamo Not in favor Not in favor Hit
sopravvissuti… Perchè ora fanno i vaccini?” – [“We are almost the same age and we got
varicella and measles, and we survived … Why now they make vaccines?”]
#16 “Contagiato dai fratelli non vaccinati per scelta della famiglia. No, non levamogliela la In favor Not in favor Miss, due to
patria potestà…” – [“Infected by unvaccinated siblings by choice of the family. No, let’s irony/sarcasm
not take away their parental rights…”]
#16 “Morto di morbillo il bimbo che non potendosi vaccinare, contava sull’immunita di In favor In favor Hit
gregge … Lo avete uccisovoi antivax!!” – [“Dead of measles the child who not being
able to vaccinate, relied on the herd immunity… You antivax killed him!!”]
#16 “Un bambino leucemico non puo vaccinarsi e si becca il morbillo da qualcuno non In favor In favor Hit
vaccinato per scelta… Ecco il risultato… Chiaro?” – [“A leukemia child can not
vaccinate and contracts measles from someone not vaccinated by choice … Here’s the
result… Clear?”]
#16 “Bruno, ok, i vaccini servono. Ma che senso ha l’obbligo di vaccini, con quelle sanzioni, Neutral Not in favor Miss, due to
pure per malattie non infettive come il tetano?” – [“Bruno, ok, vaccines are needed. But ambiguity
what is the sense of mandatory vaccines, with those sanctions, even for non-infectious for
diseases like tetanus?”] discording
opinions
#16 “Vaccini, il decreto diventa piu morbido su sanzioni e patria potestà.” – [Vaccines, the Neutral Neutral Hit
decree becomes softer on sanctions and parental rights.”]

randomly chosen tweets. For each tweet, we show the actual class ence of irony may completely reverse the text polarity (Giachanou
label, the class label assigned by the system, and the outcome of & Crestani, 2016). Further, one misclassified tweet is ambiguous,
the classification. More precisely, the system correctly classifies 14 that is, there are discording opinions within the same tweet.
tweets, whereas it misclassifies 8 tweets. Regarding the misclassi- In conclusion, the proposed intelligent system can be success-
fied tweets, we can observe that 6 of such tweets contain irony fully employed for stance detection in the context of vaccination
or sarcasm expressions in the text, making the classification more in Italy. Detecting users’ opinions over vaccines or shifts of the
difficult. In fact, whereas humans are easily able to detect irony or public opinion concurrently with social context-related events may
sarcasm in a text, in the field of sentiment analysis and opinion be important for Public Healthcare Organizations in order to pro-
mining, irony detection is a challenging task, given that the pres- mote actions aimed at avoiding outbreaks of eradicated diseases.
E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226 225

Further, the system may be employed to early detect the spread Bello-Orgaz, G., Hernandez-Castro, J., & Camacho, D. (2017). Detecting discussion
of fake/incomplete news (e.g., in the case of an unexpected rising communities on vaccination in twitter. Future Generation Computer Systems, 66,
125–136.
negative opinion about vaccination, caused by the diffusion of a Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal
fake news). of Computational Science, 2, 1–8.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with
6. Conclusions subword information. Transactions of the Association for Computational Linguis-
tics, 5, 135–146.
Botsis, T., Nguyen, M. D., Woo, E. J., Markatou, M., & Ball, R. (2011). Text mining for
In this work, we have discussed how to perform stance classi- the Vaccine Adverse Event Reporting System: Medical text classification using
fication on Twitter with reference to the vaccination topic in Italy. informative feature selection. Journal of American Medical Informatics Association,
18(5), 631–638.
The proposed approach fetches and pre-processes vaccine-related Breiman, L. (2001). Random forests. Machine Learning,, 45(1), 5–32.
tweets and employs an SVM model to classify tweets as belonging Cambria, E. (2016). Affective computing and sentiment analysis. IEEE Intelligent Sys-
to one among three classes, namely, in favor of vaccination, not in tems, 31, 102–107.
Cambria, E., Havasi, C., & Hussain, A. (2012). SenticNet 2: A Semantic and Affective
favor of vaccination, and neutral, with an accuracy of 64.84%. The
Resource for Opinion Mining and Sentiment Analysis. In Proceedings of 25th
results achieved are in line or outperform similar works in the lit- Florida Artificial Intelligence Research Society Conf. (FLAIRS) (pp. 202–207).
erature. The aim is to monitor and track shifts of the Italian pub- Castellucci, G., Croce, D., Cao, D. D., & Basili, R. (2016). User Mood Tracking for Opin-
ion Analysis on Twitter. In G. Adorni, S. Cagnoni, M. Gori, & M. Maratea (Eds.). In
lic opinion about vaccinations, with reference to social context-
AI∗ IA 2016 advances in artificial intelligence: 10037 (pp. 76–88). Cham: Springer.
related events, which may influence the public opinion itself. In AI∗ IA 2016. Lecture Notes in Computer Science.
fact, an early detection of an opinion shift may be of the utmost Pandey, Chandra,A, Singh, A., Rajpoot, D., & Saraswat, M. (2017). Twitter sentiment
importance for Public Healthcare Organizations in order to pro- analysis using hybrid cuckoo search method. Information Processing & Manage-
ment, 53, 764–779.
mote actions aimed at avoiding outbreaks of eradicated diseases. Chew, C., & Eysenbach, G. (2010). Pandemics in the age of twitter: content analysis
We have also shown the results of a monitoring campaign lasting of tweets during the 2009 H1N1 outbreak. PLoS ONE, 5(11), 1–13.
10 months, from September 2016 to June 2017. In particular, we Chien, C. C., & Tseng, Y.-D. (2011). Quality evaluation of product reviews using an
information quality framework. Decision Support Systems, 50, 755–768.
have analyzed how the polarity of the public opinion changes in Cliche, M. (2017). BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with
correspondence with the local peaks of the daily number of tweets. CNNs and LSTMs. In Proceedings of the 11th International Workshop on Semantic
These peaks correspond to specific events related to the vaccina- Evaluation (SemEval-2017).
Costa, J., Silva, C., Antunes, M., & Ribeiro, B. (2014). Concept drift awareness in twit-
tion topic. Finally, we have shown that our system does not suffer ter streams. Proc. IEEE 13th International Conference on Machine Learning and Ap-
particularly from concept drift. plications (ICMLA) (pp. 294–299). doi:10.1109/ICMLA.2014.53.
D’Andrea, E., Ducange, P., Lazzerini, B., & Marcelloni, F. (2015). Real-Time Detection
Author contributions section of Traffic From Twitter Stream Analysis. IEEE Transactions on Intelligent Trans-
portation Systems, 16(4), 2269–2283. https://doi.org/10.1109/TITS.2015.2404431.
Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion
E. D’Andrea, P. Ducange and F. Marcelloni conceived the extraction and semantic classification of product reviews. In Proceedings of the
initial idea. E. D’Andrea and P. Ducange developed the over- 12th Int’l Conference on World Wide Web (pp. 519–528).
Derrac, J., Garcia, S., Molina, D., & Herrera, F. (2011). A practical tutorial on the use
all approach, performed the experiments, and wrote the rela- of nonparametric statistical tests as a methodology for comparing evolutionary
tive paper section. A. Renda, with the supervision of A. Be- and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1),
chini, dealt with the experiments on neural networks and 3–18.
Dey, K., Ritvik, S., & Saroj, K. (2018). Topical stance detection for twitter: a
wrote the corresponding part in the paper. A. Bechini took
two-phase LSTM model using attention. In Proceedings of European Conference
care of the Introduction and the state of the art section on Information Retrieval (p. 2018).
of the paper. Francesco Marcelloni supervised all the work, wrote Du, J., Xu, J., Song, H.-Y., & Tao, C. (2017). Leveraging machine learning-based ap-
some parts of the paper, and revised the final version of the paper. proaches to assess human papillomavirus vaccination sentiment trends with
Twitter data. BMC Medical Informatics and Decision Making, 17(Suppl 2)(69), 63–
Thus, all authors contributed to the final manuscript. 70. doi:10.1186/s12911- 017- 0469- 6.
Ducange, P., Pecori, R., & Mezzina, P. (2017). A glimpse on big data analytics in the
Acknowledgements framework of marketing strategies. Soft Computing, 22(1), 325–342. https://doi.
org/10.10 07/s0 050 0- 017- 2536- 4.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text
This work was partially supported by the project funded by classification. Journal of Machine Learning Research, 3, 1289–1305.
“Progetti di Ricerca di Ateneo- PRA 2017” of the University of Pisa. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on
concept drift adaptation. ACM Computing Surveys (CSUR), 46(4).
References Gerber, M. S. (2014). Predicting crime using Twitter and kernel density estimation.
Decision Support Systems, 6, 115–125.
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment anal- Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hy-
ysis of twitter data. In Proceedings of the Workshop on Languages in Social Media brid system using n-gram analysis and dynamic artificial neural network. Expert
(pp. 30–38). Systems with Applications,, 40, 6266–6282.
Aisopos, F., Papadakis, G., & Varvarigou, T. (2011). Sentiment analysis of social media Giachanou, A., & Crestani, F. (2016). Like it or not: a survey of twitter sentiment
content using N-Gram graphs. In Proc. of the 3rd ACM SIGMM Int. Workshop on analysis methods. ACM Computing Survey, 49(2), 1–28 2841.
Social Media (pp. 9–14). Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., & Perera, A. (2012). Opin-
Alsaedi, N., Burnap, P., & Rana, O. (2017). Can we predict a riot? Disruptive event de- ion mining and sentiment analysis on a Twitter data stream. In Proceedings
tection using twitter. ACM Transactions on Internet Technology, 17(2), 1–18 1826. of International Conference on Advances in ICT for Emerging Regions (ICTer2012)
Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: an enhanced (pp. 182–188).
lexical resource for sentiment analysis and opinion mining. In Proceedings Gupta, V., Gurpreet, S., & Lehal, S. (2009). A survey of text mining techniques and
of International Conference on Language Resources and Evaluation (LREC) applications. Journal of Emerging Technologies in Web Intelligence., 1(1), 60–76.
(pp. 2200–2204). Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009).
Balahur, A., & Turchi, M. (2014). Comparative experiments using supervised learning The WEKA data mining software: an update. SIGKDD Explorations Newsletter, 11,
and machine translation for multilingual sentiment analysis. Computer Speech & 10–18.
Language, 28(1), 56–75. https://doi.org/10.1016/j.csl.2013.03.004. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computa-
Basile, V., & Nissim, M. (2013). Sentiment analysis on Italian tweets. In Proceedings tion, 9(8), 1735–1780.
of 4th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Ji, X., Chun, S. A., Wei, Z., & Geller, J. (2015). Twitter sentiment classification for
Media Analysis. measuring public health concerns. Social Network Analysis and Mining, 5(1), 25.
Bechini, A., Gazzè, D., Marchetti, A., & Tesconi, M. (2016). Towards a general ar- doi:10.1007/s13278-015-0253-5.
chitecture for social media data capture from a multi-domain perspective. In John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian
Proceedings of 2016 IEEE International Conference on Advanced Information Net- classifiers. In Proceedings of.11th Conference on Uncertainty in Artificial Intelligence
working and Applications (AINA) (pp. 1093–1100). doi:10.1109/AINA.2016.75. (pp. 338–345).
Becker, K., Moreira, V. P., & dos Santos, A. G. L. (2017). Multilingual emotion classifi- Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improve-
cation using supervised learning: comparative experiments. Information Process- ments to Platt’s SMO algorithm for SVM classifier design. Neural Computation,
ing & Management, 53(3), 684–704. https://doi.org/10.1016/j.ipm.2016.12.008. 13(3), 637–649.
226 E. D’Andrea et al. / Expert Systems With Applications 116 (2019) 209–226

Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, Rosenthal, S., Noura, F., & Preslav, N. (2017). SemEval-2017 task 4: sentiment analy-
95(1-2), 161–205. sis in twitter. In Proceedings of the 11th International Workshop on Semantic Eval-
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), uation (SemEval-2017).
436–444. Saleh, Rushdi, M. , Martín-Valdivia, T. , M., Montejo-Ráez, A., & Ure-
Li, Y.-M., & Li, T.-Y. (2013). Deriving market intelligence from microblogs. Decision ña-López, L. A. (2011). Experiments with SVM to classify opinions in different
Support Systems, 55, 206–217. domains. Expert Systems with Applications, 38(12), 14799–14804.
Liu, B. (2010). Sentiment analysis and subjectivity. In N. Indurkhya, & F. J. Dam- Saif, H., Yulan, H., & Alani, H. (2012). Alleviating data sparsity for twitter sentiment
erau (Eds.), Handbook of natural language processing (Second Edition). Taylor and analysis. In 21st Int. Conf. on the World Wide Web (pp. 2–9).
Francis Group, Boca. Sakaki, T., Okazaki, M., & Matsuo, Y. (2013). Tweet analysis for real-time event
Liu, B. (2015). Sentiment analysis: mining opinions, sentiments, and emotions. New detection and earthquake reporting system development. IEEE Transactions on
York, NY, USA: Cambridge University Press ISBN: 9781107017894. Knowledge and Data Engineering, 25(4), 919–931.
Mccallum, A., & Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Salathé, M., Vu, D. Q., Khandelwal, S., & Hunter, D. R. (2013). The dynamics of health
Text Classification. AAAI Workshop on Learning for Text Categorization. behavior sentiments on a large online social network. EPJ Data Science, 2(1), 1–
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and 12. doi:10.1140/epjds16.
applications: a survey. Ain Shams Engineering Journal, 5(4), 1093–1113. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text re-
Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the trieval. Information Processing & Management, 24, 513–523.
ACM, 38, 39–41. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., & Qin, B. (2014). Learning senti-
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed rep- ment-specific word embedding for twitter sentiment classification. In Proceed-
resentations of words and phrases and their compositionality. In Advances in ings of the 52nd Annual Meeting of the Association for Computational Linguistics
Neural Information Processing Systems (pp. 3111–3119). (pp. 1555–1565). Volume 1: Long Papers.
Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer Tellez, E. S., Miranda-Jiménez, S., Graff, M., Moctezuma, D., Siordia, O. S., & Vil-
brand sentiments. Expert Systems with Applications, 40(10), 4241–4251. laseñor, E. A. (2017a). A case study of Spanish text transformations for twitter
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., & Cherry, C. (2016). Se- sentiment analysis. Expert Systems with Applications, 81, 457–471.
meval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th In- Tellez, E. S., Miranda-Jiménez, S., Graff, M., Moctezuma, D., Suárez, R. R., & Sior-
ternational Workshop on Semantic Evaluation (SemEval-2016) (pp. 31–41). dia, O. S. (2017b). A simple approach to multilingual polarity classification in
Mohammad, S. M., Sobhani, P., & Kiritchenko, S. (2017). Stance and sentiment in Twitter. Pattern Recognition Letters, 94, 68–74.
tweets. ACM Transactions on Internet Technology (TOIT), 17(3). Uysal, A. K., & Yi Lu, M. (2017). Sentiment classification: feature selection based
Nguyen, D. T., & Jung, J. E. (2017). Real-time event detection for online behavioral approaches versus deep learning. In Proceedings of 2017 IEEE International Con-
analysis of big social data. Future Generation Computer Systems, 66, 137–145. ference On Computer and Information Technology (CIT).
Ortega, R., Fonseca, A., & Montoyo, A. (2013). SSA-UO: Unsupervised Twitter senti- Valdivia, A., Luzíón, M. V., & Herrera, F. (2017). Neutrality in the sentiment anal-
ment analysis. In Proceedings of the Second Joint Conf. on Lexical and Computa- ysis problem based on fuzzy majority. In Proceeding of the IEEE International
tional Semantics. Conference on Fuzzy Systems, Naples (pp. 1–6).
Ortigosa, J. M. M., & Carro, R. M. (2014). Sentiment analysis in Facebook and its Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A system for
application to e-learning. Computers in Human Behavior, 31, 527–541. real-time Twitter sentiment analysis of 2012U.S. presidential election cycle. In
Park, J., Barash, V., Fink, C., & Cha, M. (2013). Emoticon Style: Interpreting Differ- Proceedings of the ACL System Demonstrations (ACL) (pp. 115–120).
ences in Emoticons Across Cultures. In Proceedings of the Seventh Int. AAAI Conf. Wang, P., Xu, B., Xu, J., Tian, G., Liu, C. L., & Hao, H. (2016). Semantic expansion us-
on Weblogs and Social Media. ing word embedding clustering and convolutional neural network for improving
Pennington, J., Socher, R., & Manning, C. (2014). Glove: global vectors for word rep- short text classification. Neurocomputing, 174, 806–814.
resentation. In Proceedings of the 2014 Conference on Empirical Methods in Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin,
Natural Language Processing (EMNLP). 1(6), 80–83.
Patil, L. H., & Atique, M. (2013). A novel feature selection based on information gain Xiong, S., Hailian, L., Weiting, Z., & Donghong, J. (2018). Towards Twitter sentiment
using WordNet. In 2013 Science and Information Conference (pp. 625–629). classification by multi-level sentiment-enriched word embeddings. Neurocom-
Platt, J. (1999). Fast training of support vector machines using sequential minimal puting, 275, 2459–2466.
optimization. In B. Schoelkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances Yang, X., Craig, M., & Iadh, O. (2017). Using word embeddings in twitter election
in kernel methods: support vector learning (pp. 185–208). Cambridge, MA: MIT classification. Information Retrieval Journal, 21(2-3), 1–25.
Press. Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137. WIREs Data Mining Knowledge Discovery, 8(4), E1253.
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo, CA: Morgan Zhou, X., Tao, X., Yong, J., & Yang, Z. (2013). Sentiment analysis on tweets for social
Kaufmann. events. In Proceedings of the 2013 IEEE 17th International. Conference on Computer
Ribeiro, F. N., Araújo, M., Gonçalves, P., Gonçalves, M. A., & Benevenuto, F. (2016). Supported Cooperative Work in Design (CSCWD) (pp. 557–562).
SentiBench - a benchmark comparison of state-of-the-practice sentiment analy-
sis methods. EPJ Data Science, 5, 23.

You might also like