TERMS: Textual Emotion Recognition in Multidimensional Space
TERMS: Textual Emotion Recognition in Multidimensional Space
https://doi.org/10.1007/s10489-022-03567-4
Abstract
Microblogs generate a vast amount of data in which users express their emotions regarding almost all aspects of everyday life.
Capturing affective content from these context-dependent and subjective texts is a challenging task. We propose an intelligent
probabilistic model for textual emotion recognition in multidimensional space (TERMS) that captures the subjective
emotional boundaries and contextual information embedded in a text for robust emotion recognition. It is implausible with
discrete label assignment;therefore, the model employs a soft assignment by mapping varying emotional perceptions in a
multidimensional space and generates them as distributions via the Gaussian mixture model (GMM). To strengthen emotion
distributions, TERMS integrates a probabilistic emotion classifier that captures the contextual and linguistic information
from texts. The integration of these aspects, the context-aware emotion classifier and the learned GMM parameters provide
a complete coverage for accurate emotion recognition. The large-scale experimentation shows that compared to baseline and
state-of-the-art models, TERMS achieved better performance in terms of distinguishability, prediction, and classification
performance. In addition, TERMS provide insights on emotion classes, the annotation patterns, and the models application
in different scenarios.
Keywords Emotion recognition · Text classification · Valence-Arousal · Gaussian mixture model · Emotion distribution ·
Subjectivity
as the definition of emotions can differ for each individual account to expose the relevant contextual and linguistic
based on their background and culture. Dimensional mod- information. The syntactic structures automatically captures
els, on the other hand, are flexible in personalizing emotions the pattern of the text via a graph-based algorithm
in terms of valence, arousal, and other dimensions. The and further enriches them with embeddings to gather
dimensional models project each emotion as coordinates in semantic content. Second, to cater to the subjectivity
a space of continuous dimensions of valence and arousal as of emotions, it is known that emotional perceptions are
numerical values. Valence (x-axis) represents the pleasant- inherently subjective and cannot be covered by a single
ness of a stimulus, and arousal (y-axis) shows the intensity point or discrete emotion label. Therefore, we consider
of an emotion provoked by a stimulus [16, 17]. Any affec- varying perceptions and generate them as distributions. A
tive experience can be expressed as a combination of these distribution is the exhibition of multiple perspectives and
two independent dimensions, which is then interpreted as better reflects the nuances of emotion content embedded in
representing a particular emotion [18]. This method enables a text. TERMS maps the multiple emotional perspectives
a personalized and quantified analysis as the emotions are of every single text as distributions (numerical values) into
projected in a multidimensional numeric space, which is a multidimensional space, which better personalizes the
effective and useful for analyzing the fuzzy boundaries for emotion variations. TERMS models the subjective emotion
different emotion features. content of the text as a probabilistic emotion distribution
The texts on microblogging sites are usually written in through a Gaussian mixture model (GMM) and learns its
a casual style in which the short length and inconsistent parameters for a soft assignment. To effectively recognize
language make it difficult to completely recognize and emotions, TERMS integrates the proposed context-aware
predict the affective information [3]. We anticipated that emotion classifier and the GMM modeled probability
dealing with informal and ambiguous texts would be crucial emotion distribution to describe the emotions thorough
in designing a model for accurately identifying emotions low-level textual feature space and high-level emotion
in microblogs. Designing such a model, however, is fairly space, respectively. Moreover, due to its probabilistic and
challenging because of the following reasons [19]. First, generative nature, TERMS is conveniently scalable, and
user-generated text in a microblog may contain linguistic assigns soft labels in a multidimensional space.
variations and contextual information. For instance, in the To our knowledge, a model of this kind that caters
text “Thanxxx mom for cooking the same meal every day,” to subjectivity by parameterizing emotion distributions
the word “thanxxx” is a linguistic variation of thanks, in an emotional space has been only applied to music
which requires an understanding of the semantic similarity excerpts [22, 23] and speech [24]. Modeling texts has been
between the two terms. In addition, the term “thanks” is challenging due to their single modal nature that does not
usually associated with joy or a positive sentiment, but provide added information of tone, expression and prosody
in this instance, it refers to an annoyance. Therefore, to to understand the full emotional content as compared to the
accurately classify the user’s intended emotions, it is crucial rich representation of music and speech. The challenges are
to consider contextual information. Second, user-generated further escalated owing to microblogs’ self-focused topics,
text in a microblog can be highly opinionated and subjective short and informal writing format. For microblog texts, the
in nature, where users may perceive different emotions from TERMS integrated approach is a novel attempt to model
the same text [20, 21]. For example, the text “the virus is varying perspectives as distributions in emotion space. We
spreading” can communicate emotional states of both fear cater to these challenges through a context-aware classifier
and sadness, which is partially dependent on the reader’s and personalized emotion distributions in TERMS. The
state of mind. Therefore, capturing the varying emotional main contributions of the article are summarized as follows:
perceptions and fuzzy emotional boundaries is essential for
personalized and complete coverage of possible emotional – We propose TERMS, a probabilistic model for textual
content embedded in a text. emotion recognition in multidimensional space, which
To address this problem, we propose a probabilistic takes the contextual information and subjective nature
model for textual emotion recognition in multidimensional of a microblog’s text into account.
space (TERMS), which takes the contextual information – We propose the soft modeling of the affective content
and subjective nature of the microblog text into account for in a multidimensional space by parameterizing the
emotion recognition. The contextual information requires emotion distribution through a GMM, which provides
additional details from a text to interpret the given insight into dealing with subjectivity and indistinct
information such as the topic, structure, patterns and emotional boundaries.
sentiment orientation. In view of this, TERMS introduces – TERMS integrated approach enhances emotion recog-
a probabilistic context-aware emotion classifier that takes nition by estimating emotional weightage combined
syntactic structure and semantic meaning of a text into with multiple emotional perceptions for each text, thus
TERMS: textual emotion recognition in multidimensional space 2675
taking complete advantage of both the models, deter- Suttles and Ide [27] recognized emotions from texts based
ministic and dimensional. on Plutchik’s eight emotional classes by applying distant
– We annotated our collected data by different annotators supervision. Perikos and Hatzilygeroudis [28] used an
in order to conduct large-scale simulations to evaluate ensemble classifier schema by combining knowledge-based
the performance of TERMS. Our simulation results and statistical machine-learning classification methods for
show that compared to baseline and state-of-the- the automatic identification of emotions in text. Symeonidis
art models, TERMS achieves better distinguishability, et al. [29] applied soft computing techniques, namely NB,
prediction, and classification performance. support vector machines (SVM), logistic regression, and
convolution neural networks (CNN) for analyzing emotions.
The rest of the article is organized as follows. Section 2 Recent significant additions in emotion classification
summarizes the related work. Section 3 presents the domain are two of the largest and dynamic emotion
overview, preliminaries, and the proposed probabilistic corpora, GoEmotions and Vent [30, 31]. GoEmotions is
TERMS model. Section 4 describes the evaluation, compar- the manually annotated dataset for 58k English Reddit
ative models, performance metrics, setup, and the overall comments, labelled for 27 emotion categories by the
results. Section 5 discusses the predicted results and the readers [30]. Likewise, the Vent dataset contains more than
impact of annotators’ number on model’s prediction perfor- 33M comments from the social media sites, tagged with 705
mance and Section 6 concludes the paper. emotions explicitly by the writer [31]. These datasets are
widely being used in the recent academic works [32, 33].
Regarding research with deep learning models, the per-
2 Related work formance of textual emotion recognition tasks is enhanced
due to statistically rich and granular framework of deep
Affective computing is an established research field that learning models [34]. Abdul-Mageed and Ungar [13] pro-
is burgeoning due to its relevance in many application posed a model named Emonet to predict emotions into
domains desiring the feature of emotion recognition from eight emotional classes based on the gated recurrent neural
different forms of user-generated data such as texts, music, networks algorithm (GRNN). Another renowned emotion
speech, and images [11]. Two of the driving interrelated prediction model is DeepMoji presented by Felbo et al. [35],
factors in this flourishing field are social networks and which was trained on billions of emoji-labeled tweets for
microblogs. Microblogs provide an effective platform for affective modeling and recognition. Rosenthal et al. [36]
emotion recognition as they provide a wide variety of self- identified the sentiment of tweets as per the challenge
focused topics published in real time [2]. The texts are of SemEval-2017 Task 4: Sentiment Analysis in Twitter.
explicit and succinct with relatively clear projections of The same series provided SemEval-2018 Task 1: Affect in
users’ emotions. The focused nature and higher density Tweets, a challenge that organized a subtask of multi-label
of affective terms make these platforms highly useful for emotion classification in which teams used state-of-the-art
emotion recognition as compared to topic-based platforms, methodologies to predict emotions from microblog affec-
such as product and movie reviews [3]. The work on tive content [37]. Zhang et al. [38] implemented a multi-
microblog text emotion recognition can be broadly divided layer CNN with an attention mechanism that modelled
into two categories, deterministic and dimensional models. context representations to perform target-dependent senti-
We provide a comprehensive survey on these two categories ment classification. Sadr et al. [39] proposed a multi-view
in this section. deep network that takes into account intermediate features
extracted from convolutional and recursive neural networks
2.1 Deterministic models to enhance classification performance. The deep-learning
models are effective; however, they require complex com-
There is a substantial well-vetted body of research on putations and extensive training data for better performance,
microblog emotion recognition, which focuses on classify- while our proposed model is relatively simple and performs
ing texts into a set of discrete emotion classes [25, 26]. well even on limited data.
Deterministic models utilize supervised, unsupervised, or
semi-supervised methods by employing statistical models, 2.2 Dimensional models
such as machine learning and deep learning.
Using machine-learning models, Meo and Sulis [12] Another significant way to represent affective states is
considered structural and lexical-based features from text dimensional models, which provide a continuous fine-
to automatically identify affective content and compared grained alternative for conducting affective text analy-
the results with latent factors and traditional classifiers. sis [11]. These models contribute in understanding the
2676 Y. Ghafoor et al.
conveyance of emotions through language and how the emo- content analysis. Gaussian distributions in dimensional
tional dimensions influence people’s behaviour [40]. Rus- models have also been widely applied in music-listening
sell [41] proposed a dimensional representation model of behavior analysis [22, 23, 59]. In the work conducted by
affect, named the circumplex model, that distinguishes three Wang et al. [22], an acoustic GMM was employed to
components: valence, arousal, and dominance (VAD). Stud- classify music with the utilization of valence and arousal,
ies have shown the modeling of affective states on a valence which increased the accuracy of acoustic classification.
and arousal map by adopting varying machine-learning Applications of such an approach have also been widely
approaches [16, 42] and lexicon-based methods [43, 44]. adopted for speech emotion recognition [24, 60, 61].
Hasan et al. [45] present a model for real-time emotion However, to our knowledge, the Gaussian parameter-based
tracking by employing [46] and developing an Emo- approach has not been applied to microblog texts, which
texStream framework. Preotiuc-Pietro et al. [18] predicted motivated us to personalize this approach for textual
valence and arousal on Facebook posts by performing lin- emotion recognition. The transformation in mediums has
ear regression and released an expert annotated dataset. been challenging due to the single modal nature of texts that
Mohammad and Bravo-Marquez [47] provided the first contain little information to apprehend underlying emotions
emotion intensity dataset (EmoInt) using a best-worst scal- and intensities relative to speech and music, which are
ing technique. Buechel and Hahn [17, 48] published a enriched with emotional cues such as tone, expression,
benchmark dataset called Emobank (10548 sentences) in accent, prosody etc. The single-mode of information can
which each sentence was manually annotated on the VAD impact the classification task and annotations. We address
dimensions. Recent studies proposed frameworks that learn this issue by proposing a context-aware emotion classifier
from Emobank, the categorical emotion annotations cor- with a GMM in the VA-space, which captures the nuances
pus to predict continuous VAD scores [49, 50]. Cheng of embedded emotions and varying perceptions in a text.
et al. proposed a Bi-directional Long Short-Term Memory
(BiLSTM) model that identifies and forecasts the sentiment
information in terms of VA-values and integrated it into a 3 The probabilistic TERMS model
deep learning model to optimise Government social man-
agement [51]. Another recent experimental work aimed at For the purposes of this discussion, the text in a microblog
testing the role of five emotions (valence, arousal, dom- refers to a single statement posted by a locutor. A locutor
inance, approach-avoidant, and uncertainty) on the inter- in this article refers to the person who is writing a text. The
vention effect of the Learning Mindset study [52]. The text is an expression that reflects the emotionl state of the
SemEval-2018 Task 1: Affect in Tweets challenge asked locutor. The text can be a thought, mood, or an opinion of a
for the prediction of intensities (arousal) and valence from locutor based on his or her prevailing emotional state. The
a stream of texts in terms of regression and ordinal clas- emotions relevant for this study are the emotions felt by
sification [37, 53]. The winning team [54] proposed a the locutor that were embedded in the writing of the posted
unified architecture for both subtasks by using an ensem- text. The proposed model aims to recognize these embedded
ble of multiple prediction models and heterogeneous feature emotions from the texts.
extraction methods. Dimensional models provide useful The texts posted on microblogs are enriched with emo-
measures of emotions; however, they are unable to capture tions, which are seemingly succinct and straightforward. It
varying perceptions of emotions, which are subjective and can be assumed or misunderstood that these explicit texts
might differ regarding the affective content of the same text. can be conveniently assigned eight emotion classes defined
To address the subjective nature of emotion perceptions, by Plutchik [15]. The emotional classes are anger, anticipa-
the extension of VA-based models were proposed where the tion, disgust, fear, joy, sadness, surprise, and trust. However,
representation of emotions was transformed to probability a given tweet contains complex granular details and is
distributions from points on VA-emotional space [55]. In embedded with (i) contextual information and (ii) multiple
view of this, recent studies used Gaussian parameter-based perspectives; thus, it is not easy to classify a microblog’s
approaches to estimate emotion distributions on the VA- text with a straightforward emotion allocation approach.
space that take into account covariance information along This study proposes solution to these problems starting
with the mean [22, 56]. This approach estimates emotion with preliminaries in Section 3.1. TERMS is designed
distribution as a Gaussian with integrated methods. Zhao to address these problem through three major modules.
et al. [57] presented a work that predicts an image’s The first module textual emotion classification (EmoClass)
continuous probability distribution by using a GMM in a solves the first issue with the help of a context-aware
VA-space. Another work by Sun et al. [58] aimed to unify classifier that estimates the emotion probabilities for each
discrete and dimensional emotion models by introducing class based on syntactic templates and word embeddings
a typical fuzzy emotion subspace for affective video (elaborated in Section 3.2). To handle the second issue,
TERMS: textual emotion recognition in multidimensional space 2677
Fig. 1 An illustration of the TERMS probabilistic process. EmoClass them with VA-ratings to parameterize emotion distributions in a VA-
is a textual emotion classification module that outputs emotion prob- space. The prediction module employs a single affective Gaussian on
abilities for each text into specified affective classes. EmoGMM is an weighted GMMs to predict an emotion distribution for each unseen
emotion GMM modeling that takes in the probabilities and combines text
part of the TERMS model is referred to as textual emotion the relationship of words extracted using a window
classification, or EmoClass. In the following, we explain approach [65]. This will help to retain the syntactic
the proposed emotion classifier. structure of the data. For an arc ai ∈ A, its normalized
weight can be computed as:
Emotion Classifier: To estimate the emotion probabilities f req(ai )
p(z = k|x (i) ), we generalize an emotion classifier from w(ai ) = , (4)
maxj ∈A f req(aj )
our previous works, Saravia et al. [64, 65]. We employ this
classifier as it provides an in-depth contextual information where f req(ai ) is the frequency of arc ai .
through syntactic templates. For a given text, the classifier
assigns probabilities to each associated emotion class z, Token categorization. To extract the emotion patterns,
according to affinity based on the context-aware emotion we divide the syntactic structures into two families
pattern extracted from the text. Specifically, it is a graph- of words, connector words (cw) and subject words
based algorithm, which constructs syntactic templates from (sw). This provides the foundation for extracting
the corpus to extract context-aware emotion patterns. We context-aware emotion patterns as the structures are the
refer to these features as context-aware as they take syntactic sequences of these words. The sw correspond to the
structures and semantic meaning of a text in account to words that are high on subjective content, while cw
construct pattern-based emotion features. The syntactic reflect the most frequent words in a text that have high
structures offered by a graph construction is useful to connectivity to influential nodes. To find the cw, we use
automatically expose the relevant linguistic information eigenvector centrality, and to estimate sw, we compute
(i.e., contextual and latent information) from a large- the clustering coefficient elaborated in [65].
scale emotion corpus, whereas to capture and preserve
the semantic relationships between patterns, we implement
Pattern extraction. The syntactic templates constructed
word embeddings on the extracted patterns. This is followed
based on the cw and sw are applied to the dataset,
by emotion probability computation, where each pattern is
resulting in the patterns. The subject words in the
assigned a weight. The weight identifies the relevance of a
extracted patterns are replaced with an asterisk (“*”), a
pattern to an emotion category. In the context of emotion
proxy to cater to linguistic nuances and unknown words
classification, patterns and their weights play the role of
that are not present in the training corpus. Furthermore,
features.
it enhances the applicability of the model to other
The graph-based emotion feature extraction algorithm is
domains as well.
summarized in the following steps:
b) Enriched patterns. The extracted patterns are enriched
a) Graph construction. Given an emotion corpus, we with word embeddings to make them pertinent for
construct a graph G(V ; A), where vertices V are emotion classification and to capture the perspectives
a set of nodes that represent the tokens extracted and semantic relationships between patterns. We
from the corpus, and edges, denoted as A, represent employ agglomerative clustering to link the patterns
TERMS: textual emotion recognition in multidimensional space 2679
to relevant clusters based on the sw component. The πk is set as the computed emotion probability (5) from
details of this procedure can be found in [65]. EmoClass. It is used as the weighted mixing coefficient for
To this end, the resulting enriched patterns con- modeling EmoGMM. We interpret it as the probability of
tain both the semantic information provided by the emotion k for a given text.
word embeddings and the contextual information For any given text, the emotion distribution is denoted
gained through the graph components, hence providing as p(y|x (i) ). An emotion distribution would be a weighted
context-aware emotion patterns. k=1 that uses p(z = k|x ) as
combination of {N (μk , k )}K (i)
c) Emotion probability. The enriched emotion patterns the weights. Accordingly, by combining (5), (8), and (9), the
are then weighted with respect to each emotion emotion distribution of y given text x (i) is
category. It exhibits how relevant a pattern is to the
respective emotion category. This outputs the score of
K
each emotion for a given text. We refer to score as the p(y|x (i) ) = N (y|μk , k )p(z = k|x (i) ), (10)
emotion probability. It is computed as follows: k=1
exp(−tsk )
p(z = k|x (i) ) ← K , (5)
where {p(z = k|x (i) )}K k=1 is the weight of the k-th emotion
k=1 exp(−tsk )
for a given text x (i) , stating the emotion probabilities com-
where sk is the score of emotion k computed puted via the proposed emotion classifier. The computed
with a customized version of term frequency-inverse z = k connects the EmoClass to an emotional space by
document frequency (tf-idf) proposed in [65], and K is parameterizing the emotion probabilities with a GMM. The
the number of emotions. t is an adjusting coefficient process of training a GMM with emotion probabilities as
that scales the scores, 0 < t ≤ 1. input is referred to as EmoGMM (see Fig. 1). This learning
process requires annotated VA-ratings of texts for the GMM
3.3 Emotion GMM (EmoGMM) estimation, where each text is labeled by multiple anno-
tators. With those VA-ratings and emotion probabilities,
The subjectivity in emotion perceptions is inherent and {μk , k }K
k=1 can be estimated by the expectation maxi-
can be summarised as emotion distributions. The emotion mization (EM) algorithm [66]. The EM algorithm has been
distribution in the VA-space is described as a bivariate widely adopted to parameterize emotion distributions for
Gaussian distribution with {μk , k } as its parameters music and speech, but rarely employed to map emotional
associated with emotion k as perceptions in VA-space for a text.
The EM algorithm aims to solve the latent parameter
y ∼ N (μk , k ) (6)
estimation problem in a numerical way. It first computes
Since the distribution of y given an emotion class z = k possible values for the parameters to be estimated by taking
is Gaussian, by following [22] for the rest of analysis, we expectations on all the known variables, which is called
have the E-step, and secondly, the M-step maximizes the log-
likelihood function with the possible values computed in
p(y|z = k) ∼ N (μk , k ), (7) the E-step. Thus, a clear form of the likelihood function is
provided for applying the EM algorithm.
where the parameters μk and k are associated with the k- (i)
We denote y j as the i-th text rated by the j -th annotator.
th emotion class as well. This transformation of z → y in
(i) (i)
the VA-space is a second module in TERMS, referred to as Y (i) = {y 1 , ..., y NAi } is the set of VA-values rated by the
EmoGMM. It maps the associated emotion classes z into annotators, in which NAi is the number of annotators for
VA-space by parameterizing the emotion distributions. text i. Such VA-values are provided by the annotators for all
The probability density function for y is then given by N texts. Let L = {x (i) , Y (i) }Ni=1 denote the entire annotated
the following: dataset.
We first derive the general form of the posterior
K
probability of z = k given y, denoted as follows:
p(y) = πk N (y|μk , k ), (8)
k=1
p(z=k)p(y|z=k)
p(z = k|y) = p(y,z=k) ,
where πk is a mixing coefficient, which we reparameterize p(z=k)p(y|z=k)
as = K , (11)
i=1 p(z=i)p(y|z=i)
π N (y|μk , k )
= Kk .
πk = p(z = k|x (i) ). (9) i=1 πi N (y|μi , i )
2680 Y. Ghafoor et al.
(i)
(i)
p(z = k|x (i) )N (y j |μk , k )
p(z = k|y j ) = K (i)
. (12)
j =1 p(z = k|x )N (y j |μj , j )
(i)
In the M-step, the updating forms for the mean vector and
covariance matrix are as follows:
i,j p(z = k|y (i) (i)
j )y j
μnew
k ← (i)
, (13)
i,j p(z = k|y j )
= log
(i)
N (y j |μk , k )p(z = k|x ).
(i)
(15) p(y|x unseen ) = p(z = k|x unseen )N (y|μk , k ). (16)
i,j k k=1
inputs along with the number of iterations and stopping cri- μpre = p(z = k|x unseen )μk , (17)
teria and outputs the mean and covariance {μnew k , k }k=1
new K k=1
4 Performance evaluation emotion prediction model [35]. The second was that of
the winning team of the emotion classification subtask in
In this section, we report on the performance evaluation of SemEval-2018 Task 1: Affect in Tweets challenge [37, 69].
TERMS that was conducted with large-scale simulations. The third study is a semi-supervised approach for valence
and arousal prediction based on variational autoencoder
4.1 Data collection model [70] and the fourth is a context-aware model for
emotion classification and sentiment score prediction [71].
For the experimental analysis, we collected data from
Twitter, where texts have rich affective content. To collect DeepMoji It is an established model and has been used as
relevant data, we retrieved sentiment-related hashtags a foundation in many recent studies. It has been trained on
placed at the end of the text, which conveyed the emotion billions of tweets and uses the GRNN algorithm for emotion
in the text is felt by the locutor as stated in [13]. Based prediction. We used the model1 available on the GitHub
on this method, after some refinement, we gathered 4000 platform and finetuned it with our dataset.
texts from Twitter with labels that were the same as the
eight emotions in the wheel of emotion model presented
NTUA-SLP NBOW and NTUA-SLP LSTM The second com-
by Plutchik [15]. The eight emotion candidates were anger,
parative study is related to the SemEval-2018 Task-1 chal-
anticipation, disgust, fear, joy, sadness, surprise, and trust.
lenge, which proposed five subtasks related to intensity
The number of affective text selections was designed to
(arousal) and valence detection and multi-label emotion
maintain balance among all the classes of sentiments. The
classification. The first four subtasks required the identi-
statistics of the emotion distributions are shown in Table 2.
fication of arousal and valence scores in tweets in terms
Each of the selected texts was rated with VA-values by
of regression values (Subtasks 1 and 3) and ordinal clas-
five different annotators who passed a sample qualification
sification (Subtasks 2 and 4), and the fifth subtask was
test on Amazon Mechanical Turk (AMT), which is
emotion classification, the assignment of multiple labels
considered a reliable service to obtain high-quality data
to the tweets based on the best fit. We compared our
inexpensively and rapidly. The ratings by five different
TERMS model with the results of the fifth subtask and
annotators for each text makes the collection of 20000
arousal and valence regression subtasks (Subtasks 1 and
rating for the given 4000 texts. We adopted [67] to design
3). The winning team for the fifth subtask was NTUA-
an affective slider (AS) in the form of two slider bars to
SLP [69], which also took second and fourth place in
rate valence and arousal independently. The ranges of the
Subtasks 1 and 4, respectively. We obtained the team’s pre-
valence and arousal were set as v ∈ [1, 9] and a ∈ [1, 9].
trained model2 and implemented it on our data. The team
The rating interface is shown in Fig. 2.
had implemented two approaches: NTUA-SLP NBOW and
NTUA-SLP LSTM. NTUA-SLP NBOW used neural bag-
4.2 Comparative models
of-words model (NBOW) with word2vec and affective word
embeddings fed into an SVM classifier. NTUA-SLP LSTM
For comparative evaluations, we tested TERMS with
employed a transfer learning model, which consisted of a
baseline models as well as state-of-the-art models. We
two-layer bidirectional long short-term memory (LSTM)
implemented baseline models that are known to perform
with a deep self-attention mechanism. We evaluated the
well in classification tasks and had been extensively used
NTUA-SLP model for both the implemented approaches,
for emotion recognition. The baseline classifiers used for
NTUA-SLP NBOW and NTUA-SLP LSTM for compara-
the comparative analysis are elaborated below.
tive evaluation.
Baseline Classifiers For baseline models, we implemented
four prevalent supervised models to compute emotion SRV-SLSTM It is a semi-supervised regression variational
probabilities and parameterize distributions. The classifiers autoencoder (SRV) that identifies VAD scores. The model
employed are multinomial naı̈ve Bayes (NB) [1], support architecture consist of three modules, encoder, sentiment
vector machine (SVM) [16], gradient boosting (GBM) [68], prediction and decoder. Encoder uses LSTM to encode
and convolution neural network (CNN) [65]. All these text into hidden vectors, a sentiment prediction module
approaches directly output the probability of each emotion scores text via a 2-layer stacked Bi-LSTM and decoder
category for a given text; thus, their outputs were directly reconstructs the original text. We use SRV-SLSTM model
used as emotion probabilities.
1 https://github.com/bfelbo/deepmoji
State-of-the-art models We also compared our model with
2 https://github.com/cbaziotis/ntua-slp-semeval2018
four benchmark studies. The first was the DeepMoji
2682 Y. Ghafoor et al.
Emotions Anger Anti. Disgust Fear Joy Sad. Surprise Trust Total
No. of texts 535 482 481 539 495 511 470 487 4000
publicly available at GitHub platform3 and employed it on the two emotion distributions. A smaller value of AED
our dataset. indicates higher prediction correctness. PCC, denoted as
r, was utilized to measure the correlation between the
Context-LSTM-CNN (C-LSTM-CNN) The model combines the predicted emotion and direct observations. It was used with
strength of LSTM and CNN with the lightweight context valence and arousal independently. Differing from the AKL,
encoding algorithm Fixed Size Ordinally Forgetting (FOFE) the PCC is only concerned with the position of emotion
for emotion classification and sentiment score prediction distributions on VA-space, by measuring how close the
based on contexts and long-range dependencies. The model predictions are to the direct observations.
used for comparative evaluation is available at GitHub
platform4 . Classification Performance To evaluate the performance of
the classifiers employed for soft emotion classification, we
4.3 Evaluation measurements use standard evaluation metrics, such as precision, recall,
and F1-score computed with macro-averaging. The reason
We used the following performance metrics to evaluate the to use macro-averaging for these metrics is the balanced
proposed TERMS and comparatives models. structure of emotion classes in the dataset. Precision (Pe )
denotes the fraction of true positives predicted in the
Distinguishability: This shows the average distance among processed data, whereas recall (Re ) measures the fraction
the K emotions: the greater the average distance, the higher of true positives predicted from all the positives in the
the distinguishability of emotions. We denote the average ground truth data [61]. The F1-score is the harmonic mean
distance between the emotion distributions on VA-space by of the precision and recall. These performance metrics are
AEmoD, which is computed as follows: estimated as follows adapted from [37]:
1
K
No. of texts correctly assigned to emotion class e
AEmoD = ||μi − μj ||, (19) Pe =
Npair No. of texts assigned to emotion class e
i=j
(20)
K(K−1)
where Npair = 2 and μi and μj are the means of
,
emotion i and j , respectively.
No. of texts correctly assigned to emotion class e
Re =
Prediction Correctness: This shows the correctness of the No. of texts in emotion class e
predicted emotions with respect to the direct observations, (21)
which were provided by the annotators. The ratings
obtained from the annotators were averaged for each 2 × Pe × R e
Fe = , (22)
text and used as the ground truth for the comparative Pe + Re
evaluation. To quantify the prediction correctness, we 1
used the average Kullback-Leibler (AKL) divergence, F 1 − Score = Fe (23)
|E|
average Euclidean distance (AED), and Pearson correlation e∈E
coefficient (PCC). The AKL divergence [72] measures the To further validate the classification performance, the
distance and similarity between two distributions expressed Jaccard index is computed as in [37]. The Jaccard index
as an average difference. A smaller AKL indicates the computes the accuracy of the models by dividing the
two distributions are similar, hence implying the predicted intersection size of the predicted and ground truth labels
emotion distribution is close to the ground truth. AKL is a with the size of their union as shown in (24), where t refers
notable measure for evaluation as it takes both the mean and to a text, Gt is the set of ground truths, and Pt is the set of
covariance of distributions into account for the correctness predicted labels.
test. In addition to AKL divergence, we also calculated the 1 G t ∩ Pt
J accard = (24)
AED, which shows the mean square difference between |T | G t ∪ Pt
t∈T
3 https://github.com/wuch15/SRV-DSA
The described evaluation metrics are considered effective
4 https://github.com/deansong/contextLSTMCNN
in assessing the efficiency of classifiers and have been
TERMS: textual emotion recognition in multidimensional space 2683
used in many pioneering studies [53, 38]. We selected TextCNN algorithm with Adamax optimization is used with
these evaluation metrics as higher scores in all of them word embeddings (128 dimensions) as features, batch size
represented higher classification performance. 100, and layers for kernel sizes 2 to 5 were included.
Another evaluation metric that is essential to signify To train the models for emotion probability estimation,
the better classification performance of TERMS model we collected another data set with similar textual content.
is Bayesian analysis [73]. In Bayesian analysis, the The data set was gathered from Facebook and Twitter,
experiment is summarised by the posterior distribution. The which, after refining, was reduced to 14350 texts. The
posterior describes the distribution of the mean difference texts were labeled with eight emotions (as per the wheel
of accuracies between the two classifiers. Formally, of emotion model) by three psychological experts from
the interval [−0.01, 0.01] defines a region of practical the field and were also verified by the authors themselves.
equivalence (rope) for classifiers [73, 74]. By querying the This data set was merely used for training models in order
posterior distribution, we infer the probability that TERMS to compute emotion probabilities for the primary data set
is better than other comparative models, if the posterior (4000 texts). Once the emotion probabilities were estimated,
probability of the mean difference are positive, namely they were infused into a GMM like the proposed model
the integral of the posterior on the interval [0.01, ∞]. with the same VA-annotations for comparative evaluation.
Alternatively, if the mean difference is negative (interval The state-of-the-art models NTUA-SLP, SRV-SLSTM, and
[−∞, −0.01]), it states the proposed model is not better, C-LSTM-CNN performed the prediction of valence and
and lastly, if over the rope interval ([−0.01, 0.01]) means arousal in their own setting, therefore, we did not infuse
the posterior probability of the two classifiers are equivalent it into our model. NTUA-SLP LSTM used its multilayered
[73]. design with three main steps: word-embedding pre-training,
transfer learning, and fine-tuning. The first two steps of the
4.4 Setup model were implemented likewise in [69]. For the transfer
learning approach, the biLSTM network with deep self-
Since none of the models use a GMM to map the attention mechanism was pre-trained on the Semeval 2017
(elliptical) emotion distributions in the VA-space, we Task 4A dataset (SA2017). The pre-trained model was
utilized all the described baseline models and DeepMoji combined with the final layer of the model, which was
to map the emotion distributions in the VA-space as attributed to the subtasks, such as predicting valence and
had been done with the TERMS model. The baseline arousal and multi-label classification. We have fine-tuned
classifiers (NB, GBM, and SVM) use the bag-of-words the final layer of the model for our dataset with respect
(BoW) model with term frequency features to train the to each subtask. The same 4000 rated texts were used
classifiers. The classifiers employed were MultinomialNB, to fine-tune valence and arousal prediction subtasks. The
GradientBoostingClassifier, and SVC(linear) respectively experimental settings for SRV-SLSTM and C-LSTM-CNN
from the Python sklearn toolkit. For the parameter setting had been kept same as in the original works as the models
of the classification models, we used GridsearchCV that seemed to perform best on the specified settings. SRV-
exhaustively evaluates all the parameter combinations and SLSTM was trained for various ratios of labeled training
retains the best combination to fit the data. For CNN, the data; however, it showed best performance on 40% of
2684 Y. Ghafoor et al.
setting with an appropriate allocation of emotion polarities were closest to the actual ratings, thus indicating the better
in the VA-space. The baseline models (Fig. 3d–f) also show prediction performance of TERMS over the baseline and
fair adjustment, however with marginal difference, they state-of-the-art models. The integration of a context-aware
fell short of distinct projections of emotion distributions. emotion classifier with the varying emotion perceptions
Upon close inspection, we observe that compared to all modeled via the GMM distributions provided an edge to
the other models, our proposed TERMS model have higher TERMS in capturing the nuances of embedded emotions.
distinguishability. The architecture of the proposed emotion classifier and the
To quantify distinguishability, we computed the AEmoD emotion patterns acted as the key components resulting in
for each model via (19). Figure 4 shows the achieved results. the higher prediction performance of TERMS, compared
A higher value of AEmoD indicates more scatteredness and to other models. NTUA-SLP LSTM performed very well
distinguishable emotion distributions. From Fig. 4, we can with the highest correlation in arousal prediction and the
see that the deep learning models performed well; however, second-best for valence after TERMS. We believe the
the TERMS model achieved the highest distinguishabil- 2-layer bidirectional LSTM (BiLSTM) with a deep self-
ity score of 2.642, while the other models scored lower. attention mechanism captured the salient words in tweets
The graph-based approach of the TERMS emotion classifier by gathering the information from both directions of text.
provides better coverage by capturing rare words through It provided fair estimation of important words that were
syntactic relationships and disambiguating emotional mean- highly indicative of certain emotions. NTUA-SLP NBOW
ing using the enriched and refined contextual information of also performed well, which can be attributed to the fact that
the patterns. The emotion patterns capture fine-grained lin- the pre-trained word2vec embeddings combined with the
guistic affect information, which helps in distinguishing the 10 affective dimensions enabled the model to encode the
emotions. correlation of each word with different affective dimensions
that could result in better intensity performance. SRV-
4.5.2 Prediction correctness SLSTM and C-LSTM-CNN also showed greater prediction
performance compared to the baseline models. The results
We evaluated the prediction performance of TERMS and also indicated that arousal was more challenging to predict
comparative models by computing the distance between the compared to valence as the r of arousal was lower than that
ground truth and the predicted distributions via AKL and of valence for all the models.
AED. Table 3 lists the AKL, AED, and the correlation
coefficient of r for valence and arousal for each model. 4.5.3 Classification performance
We found that among all the models, the proposed TERMS
model achieved the lowest AKL and AED scores (4.71 We evaluated the performance of emotion classification
and 1.32, respectively) and achieved the highest correlation for the proposed TERMS model and all the comparative
for valence (0.60) and the third best for arousal (0.30). models. Figure 5 presents the calculated results of
The results show the predicted distributions for our model precision, recall, F1-score, and Jaccard. We found that
the TERMS emotion classifier achieved higher values
for precision (0.66), recall (0.65), F1-score (0.64), and
Jaccard (0.49). In contrast, the comparative models achieved
lower scores than the TERMS model. Thus, TERMS
outperformed all the comparative models in classification.
This is due to the context-aware emotion patterns that
captured the building blocks in text by creating the syntactic
patterns of connector words and subject words with clear
distinction. This helped to expose the contextual and latent
information, which was followed by the enrichment with
word embeddings to provide semantic relationships. The
enriched emotion patterns offered to capture the minute
details of embedded emotions in a text, such as emotional
intensity expressed through repeating characters in words
like “looove” or similar emotion-relevant verbs like “desire”
and “fancy” that were useful for interpreting context. This
attribute of gathering the embedded emotional information
Fig. 4 AEmoD for each model to determine distinguishability; the
larger the value, the better the clarity in the emotion distributions on enabled the emotion classifier to more effectively recognize
VA-space the emotions relative to other models.
2686 Y. Ghafoor et al.
The model that performed the closest to TERMS in In addition to macro-averaging classification metrics
classification performance was C-LSTM-CNN. C-LSTM- for precision, recall, and F1-score, we evaluated the
CNN model’s architecture combined with FOFE algorithm classification performance with micro-averaging metrics as
effectively captured the large context of the focus sentence well. The results are displayed in Fig. 6, which shows
that helped in better identification of emotions. NTUA- that the difference between the macro and micro-averaging
SLP NBOW, NTUA-SLP LSTM and CNN also showed scores is trivial, ascertaining the minor impact of averaging
satisfactory classification performance. NTUA-SLP LSTM methods on balanced structure of emotion classes in the
performed better on its own dataset for all the subtasks dataset.
provided by SemEval-2018 Task 1. However, in our To end, we evaluated TERMS model with other
setting, in contrast, NTUA-SLP NBOW performed better comparative models for Bayesian analysis and the results
in terms of classification performance. The deep learning are elaborated in Table 4. The Table shows that the TERMS
models, CNN and DeepMoji’s classification performance performs better than the other models as the posterior
was substantially better than the conventional baseline probability of the mean difference of accuracies are all
models, which showed a severe setback in performance positive and above 0. All the posteriors are towards the right
for this task. Altogether, we observed that TERMS scored of the rope i.e. on the interval of [0.01, ∞] shown in last two
higher in classification evaluations followed by the state-of- columns of the Table 4. The test results estimated further
the-art and deep learning models, and with a large margin to strengthened the better performance of the proposed model
baseline models. relative to comparative models.
Fig. 5 Classification evaluation metrics for TERMS and all the comparative models. TERMS performs better by demonstrating higher precision,
recall, F1-score, and Jaccard
TERMS: textual emotion recognition in multidimensional space 2687
above challenges. In particular, the TERMS model captures 11. Calvo RA, Mac Kim S (2013) Emotions in text: dimensional and
the rare and refined contextual emotional information categorical models. Comput Intell 29(3):527–543
12. Meo R, Sulis E (2017) Processing affect in social media: A
through the proposed emotion classifier. To capture and comparison of methods to distinguish emotions in tweets. ACM T
learn from varying perceptions, TERMS utilizes a GMM Internet Techn 17(1):1–25
to derive the emotion distribution in a VA-space. The 13. Abdul-Mageed M, Ungar L (2017) Emonet: Fine-grained emotion
emotional information in the probabilistic form is merged detection with gated recurrent neural networks. In: Proceedings of
ACL, pp 718–728
with learned GMM parameters from the VA-ratings to
14. Ekman P, Sorenson ER, Friesen WV (1969) Pan-cultural elements
generate emotion distributions in VA-space to cover the in facial displays of emotion. Science 164(3875):86–88
varying emotional perceptions. We validate the significance 15. Plutchik R (2001) The nature of emotions human emotions have
of emotion distributions through a detailed comparative deep evolutionary roots, a fact that may explain their complexity
and provide tools for clinical practice. AmSci 89(4):344–350
analysis with baseline and state-of-the-art models. The
16. Paltoglou G, Thelwall M (2013) Seeing stars of valence and
results show that TERMS achieved the best performance arousal in blog posts. IEEE Trans Affect Comput 4(1):116–123
relative to other models based on the performance 17. Buechel S, Hahn U (2017) Emobank: Studying the impact of
metrics of distinguishability, prediction, and classification annotation perspective and representation format on dimensional
emotion analysis. In: Proceedings of EACL (Short Papers),
performance. Furthermore, the proposed model is scalable
pp 578–585
and adaptable since different classifiers can be implemented 18. Preotiuc-Pietro D, Schwartz HA, Park G, Eichstaedt JC, Kern M,
to compute emotional probabilities as well as due to Ungar L, Shulman EP (2016) Modelling valence and arousal in
the transparent learning process of the GMM. TERMS facebook posts. In: ACL Proceedings of WASSA, pp 9–15
19. Mohammad SM (2017) Challenges in sentiment analysis. In: A
paves the way for the affective modeling of texts by
practical guide to sentiment analysis. Springer, pp 61–83
parameterizing emotion distributions with applications to 20. Mulcrone K (2012) Detecting emotion in text. UMM CSci Senior
behavior analysis, forecasting, healthcare, and affective Seminar
human-computer interaction. 21. Liu B (2010) Sentiment analysis and subjectivity. Handb Nat Lang
Process 2(2010):627–666
22. Wang J-C, Yang Y-H, Wang H-M, Jeng S-K (2015) Modeling the
affective content of music with a gaussian mixture model. IEEE
References Trans Affect Comput 6(1):56–68
23. Vinayagasundaram B, Mallik R, Aravind M, Aarthi RJ,
1. Perikos I, Hatzilygeroudis I (2018) A framework for analyzing Senthilrhaj S (2016) Building a generative model for affective
big social data and modelling emotions in social media. In: IEEE content of music. In: IEEE Proceedings of ICRTIT, pp 1–6
Proceedings of BigDataService, pp 80–84 24. Pribil J, Pribilova A, Matousek J (2019) Artefact determination
2. Basile P, Basile V, Nissim M, Novielli N, Patti V et al (2018) by GMM-based continuous detection of emotional changes in
Sentiment analysis of microblogging data synthetic speech. In: IEEE Proceedings of TSP, pp 45–48
3. Bermingham A, Smeaton A (2010) Classifying sentiment in 25. Giachanou A, Crestani F (2016) Like it or not: A survey of twitter
microblogs: Is brevity an advantage? In: ACM Proceedings of sentiment analysis methods. ACM Comput Surv 49(2):1–41
CIKM, pp 1833–1836 26. Seyeditabari A, Tabari N, Zadrozny W (2018) Emotion detection
4. Rintyarna BS, Sarno R, Fatichah C (2020) Enhancing the in text: A review. arXiv:1806.00674v1
performance of sentiment analysis task on product reviews by 27. Suttles J, Ide N (2013) Distant supervision for emotion
handling both local and global context. Int J Inform Decis Sci classification with discrete binary values. In: Proceedings of
12(1):75–101 CICLing, pp 121–136
5. Dini L, Bittar A, Robin C, Segond F, Montaner M (2017) 28. Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text
Soma: The smart social customer relationship management tool: using ensemble of classifiers. Eng Appl Artif Intel 51:191–201
Handling semantic variability of emotion analysis with hybrid 29. Symeonidis S, Effrosynidis D, Arampatzis A (2018) A compar-
technologies. In: Sentiment Analysis in Social Networks, pp 197– ative evaluation of pre-processing techniques and their interac-
209 tions for twitter sentiment analysis. Expert Syst Appl 110:298–
6. Ghanem B, Buscaldi D, Rosso P (2019) Textrolls: Identifying rus- 310
sian trolls on twitter from a textual perspective. arXiv:1910.01340 30. Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G,
7. Abdullah M, Hadzikadic M (2017) Sentiment analysis of twitter Ravi S (2020) Goemotions: A dataset of fine-grained emotions.
data: Emotions revealed regarding Donald Trump during the 2015- arXiv:2005.00547
16 primary debates. In: IEEE Proceedings of ICTAI, pp 760– 31. Lykousas N, Patsakis C, Kaltenbrunner A, Gómez V (2019)
764 Sharing emotions at scale: The vent dataset. In: Proceedings of the
8. Calvo RA, Milne DN, Hussain MS, Christensen H (2017) Natural International AAAI Conference on Web and Social Media, vol 13,
language processing in mental health applications using non- pp 611–619
clinical texts. Nat Lang Eng 23(5):649–685 32. Alvarez-Gonzalez N, Kaltenbrunner A, Gómez V (2021) Uncov-
9. Carrillo-de Albornoz J, Rodrı́guez Vidal J, Plaza L (2018) Feature ering the limits of text-based emotion detection. arXiv:2109.01900
engineering for sentiment analysis in e-health forums. PLoS One 33. Malko A, Paris C, Duenser A, Kangas M, Mollá D, Sparks R, Wan
13(11):e0207996 S (2021) Demonstrating the reliability of self-annotated emotion
10. Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG (2020) data. In: Proceedings of the Seventh Workshop on Computational
Emotion recognition related to stock trading using machine learn- Linguistics and Clinical Psychology: Improving Access, pp 45–
ing algorithms with feature selection. IEEE Access 8:199719– 54
199732
TERMS: textual emotion recognition in multidimensional space 2691
34. Peng S, Cao L, Zhou Y, Ouyang Z, Yang A, Li X, Jia W, Yu S 55. Zhao S, Jia G, Yang J, Ding G, Keutzer K (2021) Emotion recogni-
(2021) A survey on deep learning for textual emotion analysis in tion from multiple modalities: Fundamentals and methodologies.
social networks. Digital Communications and Networks IEEE Signal Proc Mag 38(6):59–73
35. Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S 56. Yang Y-H, Chen HH (2011) Prediction of the distribution of
(2017) Using millions of emoji occurrences to learn any-domain perceived music emotions using discrete samples. IEEE T Audio
representations for detecting sentiment, emotion and sarcasm. Spe 19(7):2184–2196
arXiv:1708.00524 57. Zhao S, Yao H, Jiang X (2015) Predicting continuous probability
36. Rosenthal S, Farra N, Nakov P (2017) Semeval-2017 Task 4: distribution of image emotions in valence-arousal space. In: ACM
Sentiment analysis in twitter. In: ACL Proceedings of SemEval- Proceedings of MM, pp 879–882
2017, pp 502–518 58. Sun K, Yu J, Huang Y, Hu X (2009) An improved valence-
37. Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko arousal emotion space for video affective content representation
S (2018) Semeval-2018 Task 1: Affect in tweets. In: ACL and recognition. In: IEEE Proceedings of ICME, pp 566–569
Proceedings SemEval, pp 1–17 59. Yang Y-H, Liu J-Y (2013) Quantitative study of music listening
38. Zhang S, Xu X, Pang Y, Han J (2020) Multi-layer attention based behavior in a social and affective context. IEEE T Multimed
cnn for target-dependent sentiment classification. Neural Process 15(6):1304–1315
Lett 51(3):2089–2103 60. Huang Z, Epps J (2016) Detecting the instant of emotion change
39. Sadr H, Pedram MM, Teshnehlab M (2020) Multi-view deep from speech using a martingale framework. In: IEEE Proceedings
network: A deep model based on learning features from of ICASSP, pp 5195–5199
heterogeneous neural networks for sentiment analysis. IEEE 61. Trabelsi I, Ayed DB, Ellouze N (2018) Evaluation of influence
Access 8:86984–86997 of arousal-valence primitives on speech emotion recognition. Int
40. Mohammad SM (2021) Sentiment analysis: Automatically detect- Arab J Inf Technol 15(4):756–762
ing valence, emotions, and other affectual states from text. In: 62. Bishop CM (2006) Pattern recognition and machine learning.
Emotion Measurement. Elsevier, pp 323–379 Springer, New York
41. Russell JA (1980) A circumplex model of affect. J Pers Soc 63. Mardia KV (1970) Measures of multivariate skewness and
Psychol 39(6):1161–1178 kurtosis with applications. Biometrika 57(3):519–530
42. Mohammad SM (2016) Sentiment analysis: Detecting valence, 64. Saravia E, Argueta C, Chen Y-S (2016) Unsupervised graph-
emotions, and other affectual states from text. In: Emotion based pattern extraction for multilingual emotion classification.
measurement. Elsevier, pp 201–237 Soc Netw Anal Min 6(1):1–21
43. Warriner AB, Kuperman V, Brysbaert M (2013) Norms of 65. Saravia E, Liu H-CT, Huang Y-H, Wu J, Chen Y-S (2018) Carer:
valence, arousal, and dominance for 13 915 English lemmas. Contextualized affect representations for emotion recognition. In:
Behav Res Methods 45(4):1191–1207 ACL Proceedings of EMNLP, pp 3687–3697
44. Mohammad SM (2018) Obtaining reliable human ratings of 66. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood
valence, arousal, and dominance for 20,000 English words. In: from incomplete data via the em algorithm. J R Stat Soc: Ser B
Proceedings of ACL, pp 174–184 (Methodol) 39(1):1–22
45. Hasan M, Rundensteiner E, Agu E (2018) Automatic emotion 67. Betella A, Verschure PFMJ (2016) The affective slider: A digital
detection in text streams by analyzing twitter data. Int J Data Sci self-assessment scale for the measurement of human emotions.
Anal 7(1):35–51 PLoS One 11(2):e0148037
46. Hasan M, Rundensteiner E, Agu E (2014) Emotex: Detecting 68. Tavares G, Mastelini S et al (2017) User classification on online
emotions in twitter messages. In: Proceedings of ASE, pp 1– social networks by post frequency. In: Anais Principais do XIII
10 Simpósio Brasileiro de Sistemas de Informação, pp 464–471
47. Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared 69. Baziotis C, Athanasiou N, Chronopoulou A, Kolovou A,
task on emotion intensity. arXiv:1708.03700 Paraskevopoulos G, Ellinas N, Narayanan S, Potamianos A
48. Buechel S, Hahn U (2016) Emotion analysis as a regression (2018) Ntua-slp at semeval-2018 Task 1: Predicting affective
problem-dimensional models and their implications on emotion content in tweets with deep attentive rnns and transfer learning.
representation and metrical evaluation. In: ACM Proceedings of arXiv:1804.06658
ECAI, pp 1114–1122 70. Wu C, Wu F, Wu S, Yuan Z, Liu J, Huang Y (2019)
49. Park S, Kim J, Ye S, Jeon J, Park HY, Oh A (2021) Dimensional Semi-supervised dimensional sentiment analysis with variational
emotion detection from categorical emotion. In: Proceedings of autoencoder. Knowl-Based Syst 165:30–39
the 2021 Conference on Empirical Methods in Natural Language 71. Song X, Petrak J, Roberts A (2018) A deep neural network
Processing, pp 4367–4380 sentence level classification method with context information.
50. Rawat T, Jain S (2021) A dimensional representation of depressive arXiv:1809.00934
text. In: Data Analytics and Management. Springer, pp 175–187 72. Hershey JR, Olsen PA (2007) Approximating the Kullback
51. Cheng Y-Y, Chen Y-M, Yeh W-C, Chang Y-C (2021) Valence Leibler divergence between Gaussian mixture models. In: IEEE
and arousal-infused bi-directional lstm for sentiment analysis of Proceedings of ICASSP, pp IV–320
government social media management. Appl Sci 11(2):880 73. Benavoli A, Corani G, Demšar J, Zaffalon M (2017) Time for
52. Li M (2022) Application of sentence-level text analysis: The role a change: a tutorial for comparing multiple classifiers through
of emotion in an experimental learning intervention. J Exp Soc bayesian analysis. J Mach Learn Res 18(1):2653–2688
Psychol 99:104278 74. Kruschke JK (2015) Tutorial: Bayesian data analysis. In: CogSci
53. Mohammad SM, Bravo-Marquez F (2017) Emotion intensities in 75. Zhu S, Li S, Zhou G (2019) Adversarial attention modeling for
tweets. arXiv:1708.03696 multi-dimensional emotion regression. In: Proceedings of the 57th
54. Duppada V, Jain R, Hiray S (2018) Seernet at semeval-2018 Task Annual Meeting of the Association for Computational Linguistics,
1: Domain adaptation for affect in tweets. arXiv:1804.06137 pp 471–480
2692 Y. Ghafoor et al.
Affiliations
Yusra Ghafoor1 · Shi Jinping2 · Fernando H. Calderon1 · Yen-Hao Huang2 · Kuan-Ta Chen3 · Yi-Shin Chen4
Yusra Ghafoor
[email protected]
Shi Jinping
[email protected]
Fernando H. Calderon
[email protected]
Yen-Hao Huang
[email protected]
Kuan-Ta Chen
[email protected]
1 Social Networks and Human-Centered Computing, Taiwan
International Graduate Program, Institute of Information
Science, Academia Sinica. Institute of Information Systems
and Applications, National Tsing Hua University,
Hsinchu City, Taiwan
2 Institute of Information Systems and Applications, National Tsing
Hua University, Hsinchu City, Taiwan
3 Institute of Information Science, Academia Sinica, Taipei, Taiwan
4 Department of Computer Science, National Tsing Hua University,
Hsinchu City, Taiwan