0% found this document useful (0 votes)

49 views21 pages

TERMS: Textual Emotion Recognition in Multidimensional Space

This paper proposes a probabilistic model called TERMS for textual emotion recognition in multidimensional space. TERMS maps varying emotional perceptions in texts to distributions in a multidimensional valence-arousal space using Gaussian mixture models. It also incorporates contextual information using a classifier to better recognize emotions from short, informal texts like microblogs.

Uploaded by

Saisahan T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views21 pages

TERMS: Textual Emotion Recognition in Multidimensional Space

Uploaded by

Saisahan T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Applied Intelligence (2023) 53:2673–2693

https://doi.org/10.1007/s10489-022-03567-4

TERMS: textual emotion recognition in multidimensional space

Yusra Ghafoor1 · Shi Jinping2 · Fernando H. Calderon1 · Yen-Hao Huang2 · Kuan-Ta Chen3 · Yi-Shin Chen4

Accepted: 29 March 2022 / Published online: 11 May 2022

Abstract
Microblogs generate a vast amount of data in which users express their emotions regarding almost all aspects of everyday life.
Capturing affective content from these context-dependent and subjective texts is a challenging task. We propose an intelligent
probabilistic model for textual emotion recognition in multidimensional space (TERMS) that captures the subjective
emotional boundaries and contextual information embedded in a text for robust emotion recognition. It is implausible with
discrete label assignment;therefore, the model employs a soft assignment by mapping varying emotional perceptions in a
multidimensional space and generates them as distributions via the Gaussian mixture model (GMM). To strengthen emotion
distributions, TERMS integrates a probabilistic emotion classifier that captures the contextual and linguistic information
from texts. The integration of these aspects, the context-aware emotion classifier and the learned GMM parameters provide
a complete coverage for accurate emotion recognition. The large-scale experimentation shows that compared to baseline and
state-of-the-art models, TERMS achieved better performance in terms of distinguishability, prediction, and classification
performance. In addition, TERMS provide insights on emotion classes, the annotation patterns, and the models application
in different scenarios.

Keywords Emotion recognition · Text classification · Valence-Arousal · Gaussian mixture model · Emotion distribution ·
Subjectivity

1 Introduction sources of information for effective emotion recognition [3].

Mining emotions from these large volumes of textual
With the emergence of social media, a vast amount of opinions in microblogs can provide expressive information
big heterogeneous data is generated on various platforms, for understanding collective human behavior that can be
where users express their opinions regarding almost all extremely valuable in many domains, such as product
aspects of everyday life. An essential piece of information review analysis [4], marketing campaigns [5], political
that could be extracted from this user-generated data is the stance detection [6, 7], healthcare [8, 9], stock market
emotional content, which provides very expressive aspects analysis [10] etc. Therefore, intelligent textual emotion
of human lives [1]. In big heterogeneous data (texts, images, recognition systems applicable to microblogs are highly
videos, and audio), the text is one of the most abundant desirable.
and effective mediums for understanding emotions. It A great deal of research has been conducted on emotion
is succinct in the expressing of opinions; for example, recognition and the classification of microblog texts,
microblogs contain a high density of relevant, sentiment- which can be broadly categorized into two computational
bearing terms that are readily accessible [2]. In microblogs, directions: deterministic and dimensional models [11].
such as those found on Facebook and Twitter, emotions Deterministic models use a discrete and finite set of emotion
are expressed via short and direct text messages containing labels that most fit a given text, based on the strength of
individual opinions that make them particularly valuable the predicted emotion [12, 13]. Such discrete emotion labels
are generally taken from pioneering models, such as those
of Ekman [14] and Plutchik [15] that specify the primary
Yi-Shin Chen emotions.
[email protected] However, the deterministic approach associates each text
with a discrete label without the attribute of personaliza-
Extended author information available on the last page of the article. tion; in other words, it does not capture personal differences
2674 Y. Ghafoor et al.

as the definition of emotions can differ for each individual account to expose the relevant contextual and linguistic
based on their background and culture. Dimensional mod- information. The syntactic structures automatically captures
els, on the other hand, are flexible in personalizing emotions the pattern of the text via a graph-based algorithm
in terms of valence, arousal, and other dimensions. The and further enriches them with embeddings to gather
dimensional models project each emotion as coordinates in semantic content. Second, to cater to the subjectivity
a space of continuous dimensions of valence and arousal as of emotions, it is known that emotional perceptions are
numerical values. Valence (x-axis) represents the pleasant- inherently subjective and cannot be covered by a single
ness of a stimulus, and arousal (y-axis) shows the intensity point or discrete emotion label. Therefore, we consider
of an emotion provoked by a stimulus [16, 17]. Any affec- varying perceptions and generate them as distributions. A
tive experience can be expressed as a combination of these distribution is the exhibition of multiple perspectives and
two independent dimensions, which is then interpreted as better reflects the nuances of emotion content embedded in
representing a particular emotion [18]. This method enables a text. TERMS maps the multiple emotional perspectives
a personalized and quantified analysis as the emotions are of every single text as distributions (numerical values) into
projected in a multidimensional numeric space, which is a multidimensional space, which better personalizes the
effective and useful for analyzing the fuzzy boundaries for emotion variations. TERMS models the subjective emotion
different emotion features. content of the text as a probabilistic emotion distribution
The texts on microblogging sites are usually written in through a Gaussian mixture model (GMM) and learns its
a casual style in which the short length and inconsistent parameters for a soft assignment. To effectively recognize
language make it difficult to completely recognize and emotions, TERMS integrates the proposed context-aware
predict the affective information [3]. We anticipated that emotion classifier and the GMM modeled probability
dealing with informal and ambiguous texts would be crucial emotion distribution to describe the emotions thorough
in designing a model for accurately identifying emotions low-level textual feature space and high-level emotion
in microblogs. Designing such a model, however, is fairly space, respectively. Moreover, due to its probabilistic and
challenging because of the following reasons [19]. First, generative nature, TERMS is conveniently scalable, and
user-generated text in a microblog may contain linguistic assigns soft labels in a multidimensional space.
variations and contextual information. For instance, in the To our knowledge, a model of this kind that caters
text “Thanxxx mom for cooking the same meal every day,” to subjectivity by parameterizing emotion distributions
the word “thanxxx” is a linguistic variation of thanks, in an emotional space has been only applied to music
which requires an understanding of the semantic similarity excerpts [22, 23] and speech [24]. Modeling texts has been
between the two terms. In addition, the term “thanks” is challenging due to their single modal nature that does not
usually associated with joy or a positive sentiment, but provide added information of tone, expression and prosody
in this instance, it refers to an annoyance. Therefore, to to understand the full emotional content as compared to the
accurately classify the user’s intended emotions, it is crucial rich representation of music and speech. The challenges are
to consider contextual information. Second, user-generated further escalated owing to microblogs’ self-focused topics,
text in a microblog can be highly opinionated and subjective short and informal writing format. For microblog texts, the
in nature, where users may perceive different emotions from TERMS integrated approach is a novel attempt to model
the same text [20, 21]. For example, the text “the virus is varying perspectives as distributions in emotion space. We
spreading” can communicate emotional states of both fear cater to these challenges through a context-aware classifier
and sadness, which is partially dependent on the reader’s and personalized emotion distributions in TERMS. The
state of mind. Therefore, capturing the varying emotional main contributions of the article are summarized as follows:
perceptions and fuzzy emotional boundaries is essential for
personalized and complete coverage of possible emotional – We propose TERMS, a probabilistic model for textual
content embedded in a text. emotion recognition in multidimensional space, which
To address this problem, we propose a probabilistic takes the contextual information and subjective nature
model for textual emotion recognition in multidimensional of a microblog’s text into account.
space (TERMS), which takes the contextual information – We propose the soft modeling of the affective content
and subjective nature of the microblog text into account for in a multidimensional space by parameterizing the
emotion recognition. The contextual information requires emotion distribution through a GMM, which provides
additional details from a text to interpret the given insight into dealing with subjectivity and indistinct
information such as the topic, structure, patterns and emotional boundaries.
sentiment orientation. In view of this, TERMS introduces – TERMS integrated approach enhances emotion recog-
a probabilistic context-aware emotion classifier that takes nition by estimating emotional weightage combined
syntactic structure and semantic meaning of a text into with multiple emotional perceptions for each text, thus
TERMS: textual emotion recognition in multidimensional space 2675

taking complete advantage of both the models, deter- Suttles and Ide [27] recognized emotions from texts based
ministic and dimensional. on Plutchik’s eight emotional classes by applying distant
– We annotated our collected data by different annotators supervision. Perikos and Hatzilygeroudis [28] used an
in order to conduct large-scale simulations to evaluate ensemble classifier schema by combining knowledge-based
the performance of TERMS. Our simulation results and statistical machine-learning classification methods for
show that compared to baseline and state-of-the- the automatic identification of emotions in text. Symeonidis
art models, TERMS achieves better distinguishability, et al. [29] applied soft computing techniques, namely NB,
prediction, and classification performance. support vector machines (SVM), logistic regression, and
convolution neural networks (CNN) for analyzing emotions.
The rest of the article is organized as follows. Section 2 Recent significant additions in emotion classification
summarizes the related work. Section 3 presents the domain are two of the largest and dynamic emotion
overview, preliminaries, and the proposed probabilistic corpora, GoEmotions and Vent [30, 31]. GoEmotions is
TERMS model. Section 4 describes the evaluation, compar- the manually annotated dataset for 58k English Reddit
ative models, performance metrics, setup, and the overall comments, labelled for 27 emotion categories by the
results. Section 5 discusses the predicted results and the readers [30]. Likewise, the Vent dataset contains more than
impact of annotators’ number on model’s prediction perfor- 33M comments from the social media sites, tagged with 705
mance and Section 6 concludes the paper. emotions explicitly by the writer [31]. These datasets are
widely being used in the recent academic works [32, 33].
Regarding research with deep learning models, the per-
2 Related work formance of textual emotion recognition tasks is enhanced
due to statistically rich and granular framework of deep
Affective computing is an established research field that learning models [34]. Abdul-Mageed and Ungar [13] pro-
is burgeoning due to its relevance in many application posed a model named Emonet to predict emotions into
domains desiring the feature of emotion recognition from eight emotional classes based on the gated recurrent neural
different forms of user-generated data such as texts, music, networks algorithm (GRNN). Another renowned emotion
speech, and images [11]. Two of the driving interrelated prediction model is DeepMoji presented by Felbo et al. [35],
factors in this flourishing field are social networks and which was trained on billions of emoji-labeled tweets for
microblogs. Microblogs provide an effective platform for affective modeling and recognition. Rosenthal et al. [36]
emotion recognition as they provide a wide variety of self- identified the sentiment of tweets as per the challenge
focused topics published in real time [2]. The texts are of SemEval-2017 Task 4: Sentiment Analysis in Twitter.
explicit and succinct with relatively clear projections of The same series provided SemEval-2018 Task 1: Affect in
users’ emotions. The focused nature and higher density Tweets, a challenge that organized a subtask of multi-label
of affective terms make these platforms highly useful for emotion classification in which teams used state-of-the-art
emotion recognition as compared to topic-based platforms, methodologies to predict emotions from microblog affec-
such as product and movie reviews [3]. The work on tive content [37]. Zhang et al. [38] implemented a multi-
microblog text emotion recognition can be broadly divided layer CNN with an attention mechanism that modelled
into two categories, deterministic and dimensional models. context representations to perform target-dependent senti-
We provide a comprehensive survey on these two categories ment classification. Sadr et al. [39] proposed a multi-view
in this section. deep network that takes into account intermediate features
extracted from convolutional and recursive neural networks
2.1 Deterministic models to enhance classification performance. The deep-learning
models are effective; however, they require complex com-
There is a substantial well-vetted body of research on putations and extensive training data for better performance,
microblog emotion recognition, which focuses on classify- while our proposed model is relatively simple and performs
ing texts into a set of discrete emotion classes [25, 26]. well even on limited data.
Deterministic models utilize supervised, unsupervised, or
semi-supervised methods by employing statistical models, 2.2 Dimensional models
such as machine learning and deep learning.
Using machine-learning models, Meo and Sulis [12] Another significant way to represent affective states is
considered structural and lexical-based features from text dimensional models, which provide a continuous fine-
to automatically identify affective content and compared grained alternative for conducting affective text analy-
the results with latent factors and traditional classifiers. sis [11]. These models contribute in understanding the
2676 Y. Ghafoor et al.

conveyance of emotions through language and how the emo- content analysis. Gaussian distributions in dimensional
tional dimensions influence people’s behaviour [40]. Rus- models have also been widely applied in music-listening
sell [41] proposed a dimensional representation model of behavior analysis [22, 23, 59]. In the work conducted by
affect, named the circumplex model, that distinguishes three Wang et al. [22], an acoustic GMM was employed to
components: valence, arousal, and dominance (VAD). Stud- classify music with the utilization of valence and arousal,
ies have shown the modeling of affective states on a valence which increased the accuracy of acoustic classification.
and arousal map by adopting varying machine-learning Applications of such an approach have also been widely
approaches [16, 42] and lexicon-based methods [43, 44]. adopted for speech emotion recognition [24, 60, 61].
Hasan et al. [45] present a model for real-time emotion However, to our knowledge, the Gaussian parameter-based
tracking by employing [46] and developing an Emo- approach has not been applied to microblog texts, which
texStream framework. Preotiuc-Pietro et al. [18] predicted motivated us to personalize this approach for textual
valence and arousal on Facebook posts by performing lin- emotion recognition. The transformation in mediums has
ear regression and released an expert annotated dataset. been challenging due to the single modal nature of texts that
Mohammad and Bravo-Marquez [47] provided the first contain little information to apprehend underlying emotions
emotion intensity dataset (EmoInt) using a best-worst scal- and intensities relative to speech and music, which are
ing technique. Buechel and Hahn [17, 48] published a enriched with emotional cues such as tone, expression,
benchmark dataset called Emobank (10548 sentences) in accent, prosody etc. The single-mode of information can
which each sentence was manually annotated on the VAD impact the classification task and annotations. We address
dimensions. Recent studies proposed frameworks that learn this issue by proposing a context-aware emotion classifier
from Emobank, the categorical emotion annotations cor- with a GMM in the VA-space, which captures the nuances
pus to predict continuous VAD scores [49, 50]. Cheng of embedded emotions and varying perceptions in a text.
et al. proposed a Bi-directional Long Short-Term Memory
(BiLSTM) model that identifies and forecasts the sentiment
information in terms of VA-values and integrated it into a 3 The probabilistic TERMS model
deep learning model to optimise Government social man-
agement [51]. Another recent experimental work aimed at For the purposes of this discussion, the text in a microblog
testing the role of five emotions (valence, arousal, dom- refers to a single statement posted by a locutor. A locutor
inance, approach-avoidant, and uncertainty) on the inter- in this article refers to the person who is writing a text. The
vention effect of the Learning Mindset study [52]. The text is an expression that reflects the emotionl state of the
SemEval-2018 Task 1: Affect in Tweets challenge asked locutor. The text can be a thought, mood, or an opinion of a
for the prediction of intensities (arousal) and valence from locutor based on his or her prevailing emotional state. The
a stream of texts in terms of regression and ordinal clas- emotions relevant for this study are the emotions felt by
sification [37, 53]. The winning team [54] proposed a the locutor that were embedded in the writing of the posted
unified architecture for both subtasks by using an ensem- text. The proposed model aims to recognize these embedded
ble of multiple prediction models and heterogeneous feature emotions from the texts.
extraction methods. Dimensional models provide useful The texts posted on microblogs are enriched with emo-
measures of emotions; however, they are unable to capture tions, which are seemingly succinct and straightforward. It
varying perceptions of emotions, which are subjective and can be assumed or misunderstood that these explicit texts
might differ regarding the affective content of the same text. can be conveniently assigned eight emotion classes defined
To address the subjective nature of emotion perceptions, by Plutchik [15]. The emotional classes are anger, anticipa-
the extension of VA-based models were proposed where the tion, disgust, fear, joy, sadness, surprise, and trust. However,
representation of emotions was transformed to probability a given tweet contains complex granular details and is
distributions from points on VA-emotional space [55]. In embedded with (i) contextual information and (ii) multiple
view of this, recent studies used Gaussian parameter-based perspectives; thus, it is not easy to classify a microblog’s
approaches to estimate emotion distributions on the VA- text with a straightforward emotion allocation approach.
space that take into account covariance information along This study proposes solution to these problems starting
with the mean [22, 56]. This approach estimates emotion with preliminaries in Section 3.1. TERMS is designed
distribution as a Gaussian with integrated methods. Zhao to address these problem through three major modules.
et al. [57] presented a work that predicts an image’s The first module textual emotion classification (EmoClass)
continuous probability distribution by using a GMM in a solves the first issue with the help of a context-aware
VA-space. Another work by Sun et al. [58] aimed to unify classifier that estimates the emotion probabilities for each
discrete and dimensional emotion models by introducing class based on syntactic templates and word embeddings
a typical fuzzy emotion subspace for affective video (elaborated in Section 3.2). To handle the second issue,
TERMS: textual emotion recognition in multidimensional space 2677

TERMS proposes emotion GMM (EmoGMM) that maps follows:

the multiple perspectives of each tweet into VA-space via
K
a GMM and learns its parameters (detailed in Section 3.3) p(y) = πk g(θ k ), (2)
and lastly, the third prediction module jointly exploits the k=1
emotion probability and the learned parameters of the GMM which illustrates that the emotion distribution of a text
to predict the emotion distribution of the text. This is clearly is a linear combination of K emotion probabilities, where
explained in Section 3.4. g(θ k ) is the k-th emotion distribution, termed as the k-
th component of the mixture. πk is called the mixing
3.1 Preliminaries coefficient, representing the emotion probabilities of the
k-th component.
Before introducing the details of our approach, we highlight To combine these distributions, we employ the widely
a few notable concepts that are useful in understanding it. used GMM that combines the K Gaussian distributions, also
For clarity, the notations used are explained in Table 1. referred to as mixtures of Gaussians. On this account, g(θ k )
We denote the microblog texts as X = is specified as a bivariate Gaussian distribution as it maps
{x (1) , x (2) , . . . , x (N) }, where x (i) ∈ RM represents an emotions into a two-dimensional VA-space. The reasons
M-dimensional feature vector of a tweet i. Let z be for employing the GMM are as follows: (i) the GMM is
the associated discrete emotion or affective state, where able to approximate almost any continuous PDF to arbitrary
z ∈ {1, 2, 3..., K}. Consider that (v, a) represents a pair of accuracy by using a sufficient number of Gaussians (K)
valence and arousal values (or simply, VA-value), where v and by adjusting their parameters (θ k ) as well as the
denotes a valence value and a represents an arousal value. coefficients (πk ) [62]; and (ii) the continuous text ratings
Since each individual tweet is rated by different annotators are well modeled by the GMM, considering they follow a
for diverse perspectives (as explained in Section 4.1); we bivariate Gaussian distribution. To verify (ii), we tested if
thus denote the valence and arousal ratings (or simply, VA- the VA-ratings of each text from different annotators were
ratings) by y = (v, a). y is the position of the text x (i) on similar to the bivariate Gaussian distribution. The Mardia
the multidimensional VA-space. multivariate normality test [63] with a significance level of
Let g(θ) be an arbitrary probability density function 0.05 was performed on our data to determine the adequacy
(PDF), parameterized by θ. If the valence and arousal of the GMM for modeling the emotion distributions. The
ratings y = (v, a) obey the PDF g(θ), then an emotion results achieved were 100%, asserting that all the texts were
distribution is defined as follows: similar to the bivariate Gaussian distribution, thus making
the GMM an obvious and favorable choice.
y ∼ g(θ ). (1) Technically, TERMS follows a graphical approach with
the form X → z → y. X → z is carried out via textual
Since the emotions defined in VA-space are described by emotion recognition through our proposed classifier that
a distribution, (1) can be expressed via a mixture model as outputs the posterior probability of texts into selected
affective classes z (as detailed in the next section).
z → y is the emotion GMM modeling on a VA-space.
Table 1 Notations’ Table It maps the associated emotion classes z into VA-space
by parameterizing emotion distributions (as described in
Notation Description
Section 3.3). The process flow of the TERMS probabilistic
X Feature vector of all texts model is demonstrated in Fig. 1.
x (i) Feature vector of text i
g(θ k ) Emotion distribution for emotion k 3.2 Textual emotion classiﬁcation (EmoClass)
πk Mixing coefficient for emotion k
μk Mean of k-th emotion distribution To classify the microblogs texts into emotions, X → z, we
k Covariance matrix of k-th emotion distribution propose a classifier named the Emotion Classifier (EC). that
N Gaussian distribution outputs the posterior probability of the texts into selected
Y Labelled valence and arousal dataset affective classes z, as in (3). Adding posterior probabilities
y (i) The i-th text rated by j -th annotator to emotion distributions would enrich the distributions with
j
NAi Number of annotators for text i
linguistic and contextual information.
L Labelled dataset of texts and VA-ratings p(z = k|x (i) ) ∼ EC. (3)
μpre , pre Predicted mean and covariance for a text
We refer to the probabilities accumulated from the texts
K Number of emotional classes
by the emotion classifier as the emotion probability. This
2678 Y. Ghafoor et al.

Fig. 1 An illustration of the TERMS probabilistic process. EmoClass them with VA-ratings to parameterize emotion distributions in a VA-
is a textual emotion classification module that outputs emotion prob- space. The prediction module employs a single affective Gaussian on
abilities for each text into specified affective classes. EmoGMM is an weighted GMMs to predict an emotion distribution for each unseen
emotion GMM modeling that takes in the probabilities and combines text

part of the TERMS model is referred to as textual emotion the relationship of words extracted using a window
classification, or EmoClass. In the following, we explain approach [65]. This will help to retain the syntactic
the proposed emotion classifier. structure of the data. For an arc ai ∈ A, its normalized
weight can be computed as:
Emotion Classiﬁer: To estimate the emotion probabilities f req(ai )
p(z = k|x (i) ), we generalize an emotion classifier from w(ai ) = , (4)
maxj ∈A f req(aj )
our previous works, Saravia et al. [64, 65]. We employ this
classifier as it provides an in-depth contextual information where f req(ai ) is the frequency of arc ai .
through syntactic templates. For a given text, the classifier
assigns probabilities to each associated emotion class z, Token categorization. To extract the emotion patterns,
according to affinity based on the context-aware emotion we divide the syntactic structures into two families
pattern extracted from the text. Specifically, it is a graph- of words, connector words (cw) and subject words
based algorithm, which constructs syntactic templates from (sw). This provides the foundation for extracting
the corpus to extract context-aware emotion patterns. We context-aware emotion patterns as the structures are the
refer to these features as context-aware as they take syntactic sequences of these words. The sw correspond to the
structures and semantic meaning of a text in account to words that are high on subjective content, while cw
construct pattern-based emotion features. The syntactic reflect the most frequent words in a text that have high
structures offered by a graph construction is useful to connectivity to influential nodes. To find the cw, we use
automatically expose the relevant linguistic information eigenvector centrality, and to estimate sw, we compute
(i.e., contextual and latent information) from a large- the clustering coefficient elaborated in [65].
scale emotion corpus, whereas to capture and preserve
the semantic relationships between patterns, we implement
Pattern extraction. The syntactic templates constructed
word embeddings on the extracted patterns. This is followed
based on the cw and sw are applied to the dataset,
by emotion probability computation, where each pattern is
resulting in the patterns. The subject words in the
assigned a weight. The weight identifies the relevance of a
extracted patterns are replaced with an asterisk (“*”), a
pattern to an emotion category. In the context of emotion
proxy to cater to linguistic nuances and unknown words
classification, patterns and their weights play the role of
that are not present in the training corpus. Furthermore,
features.
it enhances the applicability of the model to other
The graph-based emotion feature extraction algorithm is
domains as well.
summarized in the following steps:
b) Enriched patterns. The extracted patterns are enriched
a) Graph construction. Given an emotion corpus, we with word embeddings to make them pertinent for
construct a graph G(V ; A), where vertices V are emotion classification and to capture the perspectives
a set of nodes that represent the tokens extracted and semantic relationships between patterns. We
from the corpus, and edges, denoted as A, represent employ agglomerative clustering to link the patterns
TERMS: textual emotion recognition in multidimensional space 2679

to relevant clusters based on the sw component. The πk is set as the computed emotion probability (5) from
details of this procedure can be found in [65]. EmoClass. It is used as the weighted mixing coefficient for
To this end, the resulting enriched patterns con- modeling EmoGMM. We interpret it as the probability of
tain both the semantic information provided by the emotion k for a given text.
word embeddings and the contextual information For any given text, the emotion distribution is denoted
gained through the graph components, hence providing as p(y|x (i) ). An emotion distribution would be a weighted
context-aware emotion patterns. k=1 that uses p(z = k|x ) as
combination of {N (μk , k )}K (i)

c) Emotion probability. The enriched emotion patterns the weights. Accordingly, by combining (5), (8), and (9), the
are then weighted with respect to each emotion emotion distribution of y given text x (i) is
category. It exhibits how relevant a pattern is to the
respective emotion category. This outputs the score of

K
each emotion for a given text. We refer to score as the p(y|x (i) ) = N (y|μk , k )p(z = k|x (i) ), (10)
emotion probability. It is computed as follows: k=1

exp(−tsk )
p(z = k|x (i) ) ← K , (5)
where {p(z = k|x (i) )}K k=1 is the weight of the k-th emotion
k=1 exp(−tsk )
for a given text x (i) , stating the emotion probabilities com-
where sk is the score of emotion k computed puted via the proposed emotion classifier. The computed
with a customized version of term frequency-inverse z = k connects the EmoClass to an emotional space by
document frequency (tf-idf) proposed in [65], and K is parameterizing the emotion probabilities with a GMM. The
the number of emotions. t is an adjusting coefficient process of training a GMM with emotion probabilities as
that scales the scores, 0 < t ≤ 1. input is referred to as EmoGMM (see Fig. 1). This learning
process requires annotated VA-ratings of texts for the GMM
3.3 Emotion GMM (EmoGMM) estimation, where each text is labeled by multiple anno-
tators. With those VA-ratings and emotion probabilities,
The subjectivity in emotion perceptions is inherent and {μk , k }K
k=1 can be estimated by the expectation maxi-
can be summarised as emotion distributions. The emotion mization (EM) algorithm [66]. The EM algorithm has been
distribution in the VA-space is described as a bivariate widely adopted to parameterize emotion distributions for
Gaussian distribution with {μk , k } as its parameters music and speech, but rarely employed to map emotional
associated with emotion k as perceptions in VA-space for a text.
The EM algorithm aims to solve the latent parameter
y ∼ N (μk , k ) (6)
estimation problem in a numerical way. It first computes
Since the distribution of y given an emotion class z = k possible values for the parameters to be estimated by taking
is Gaussian, by following [22] for the rest of analysis, we expectations on all the known variables, which is called
have the E-step, and secondly, the M-step maximizes the log-
likelihood function with the possible values computed in
p(y|z = k) ∼ N (μk , k ), (7) the E-step. Thus, a clear form of the likelihood function is
provided for applying the EM algorithm.
where the parameters μk and k are associated with the k- (i)
We denote y j as the i-th text rated by the j -th annotator.
th emotion class as well. This transformation of z → y in
(i) (i)
the VA-space is a second module in TERMS, referred to as Y (i) = {y 1 , ..., y NAi } is the set of VA-values rated by the
EmoGMM. It maps the associated emotion classes z into annotators, in which NAi is the number of annotators for
VA-space by parameterizing the emotion distributions. text i. Such VA-values are provided by the annotators for all
The probability density function for y is then given by N texts. Let L = {x (i) , Y (i) }Ni=1 denote the entire annotated
the following: dataset.
We first derive the general form of the posterior

K
probability of z = k given y, denoted as follows:
p(y) = πk N (y|μk , k ), (8)
k=1
p(z=k)p(y|z=k)
p(z = k|y) = p(y,z=k) ,
where πk is a mixing coefficient, which we reparameterize p(z=k)p(y|z=k)
as = K , (11)
i=1 p(z=i)p(y|z=i)
π N (y|μk , k )
= Kk .
πk = p(z = k|x (i) ). (9) i=1 πi N (y|μi , i )
2680 Y. Ghafoor et al.

In the E-step, according to (11), we compute the posterior

(i)
probability given y j , as follows:

In the M-step, the updating forms for the mean vector and
covariance matrix are as follows:

i,j p(z = k|y (i) (i)
j )y j
μnew
k ← (i)
, (13)
i,j p(z = k|y j )

3.4 TERMS prediction

(i) (i) (i)
i,j p(z = k|y j )(y j − μnew
k )(y j − μnew
k )
T
new
k ← (i)
. To demonstrate emotion distribution for each text on
i,j p(z = k|y j ) VA-space, this module provides statistical estimations. It
(14) represents the outcome of the model for the unseen texts by
summarizing the weighted GMMs for each text as well as
serves as an evaluation of the performance of the emotion
Thus, (12), (13), and (14) are the iteration forms for distribution on the unseen texts, as shown in the rightmost
estimating {μk , k }K new and new to compute
k=1 . We use μk k part of Fig. 1.
the log-likelihood function to check if it converges. The Consider p(z = k|x unseen ) as the unseen text emotion
general form of the log-likelihood function is given by probability that is calculated as shown in Section 3.2, and
μk and k are the estimation of the GMM model derived
in Section 3.3; thus, the weighted GMM for unseen text is

N NAi
(i)
= log p(y j |x (i) ), represented as follows:
i=1 j =1

K

To summarize the weighted GMM for the unseen text,

p(y|x unseen ), we estimate a single affective Gaussian
The pseudocode of the EM algorithm for estimating
represented as N(μpre , pre ) and thus approximated as
the EmoGMM parameters by the VA-ratings is shown in
follows:
Algorithm 1. The algorithm takes the emotion probabil-
i=1 from EmoClass and {μk , k }k=1 as the
ity {p(z|x (i) )}N 0 0 K
K

inputs along with the number of iterations and stopping cri- μpre = p(z = k|x unseen )μk , (17)
teria and outputs the mean and covariance {μnew k , k }k=1
new K k=1

parameters of each emotion distribution. We initialize the

log-likelihood function l0 and iterative parameter r in line 1.
K
pre = p(z = k|x unseen )( k + μ∗k T μ∗k ), (18)
The learning loop computes the EM algorithm by estimat-
k=1
ing the posterior probabilities using (12) and updating the
mean vector and covariance with (13) and (14) in lines 2–6. where μ∗k = μk − μpre .
Line 7 halts the loop as per the stopping criteria, while line 8 The above computations indicate the position and shape
shows the assignment of the computed mean and covariance of an unseen text in the VA-space. An affective Gaussian
to the output parameters, which are utilized to map the emo- on the weighted GMM estimates a single mean and a
tion distributions in VA-space. We implement Algorithm 1 covariance, thus providing a single distribution as the
in its standard complexity of O(NK), where N is the num- prediction outcome. This makes the evaluation between the
ber of tweets and K is the number of emotion classes while predicted emotion distribution and the ground truth easier to
K << N. estimate and comprehend in VA-space.
TERMS: textual emotion recognition in multidimensional space 2681

4 Performance evaluation emotion prediction model [35]. The second was that of
the winning team of the emotion classification subtask in
In this section, we report on the performance evaluation of SemEval-2018 Task 1: Affect in Tweets challenge [37, 69].
TERMS that was conducted with large-scale simulations. The third study is a semi-supervised approach for valence
and arousal prediction based on variational autoencoder
4.1 Data collection model [70] and the fourth is a context-aware model for
emotion classification and sentiment score prediction [71].
For the experimental analysis, we collected data from
Twitter, where texts have rich affective content. To collect DeepMoji It is an established model and has been used as
relevant data, we retrieved sentiment-related hashtags a foundation in many recent studies. It has been trained on
placed at the end of the text, which conveyed the emotion billions of tweets and uses the GRNN algorithm for emotion
in the text is felt by the locutor as stated in [13]. Based prediction. We used the model1 available on the GitHub
on this method, after some refinement, we gathered 4000 platform and finetuned it with our dataset.
texts from Twitter with labels that were the same as the
eight emotions in the wheel of emotion model presented
NTUA-SLP NBOW and NTUA-SLP LSTM The second com-
by Plutchik [15]. The eight emotion candidates were anger,
parative study is related to the SemEval-2018 Task-1 chal-
anticipation, disgust, fear, joy, sadness, surprise, and trust.
lenge, which proposed five subtasks related to intensity
The number of affective text selections was designed to
(arousal) and valence detection and multi-label emotion
maintain balance among all the classes of sentiments. The
classification. The first four subtasks required the identi-
statistics of the emotion distributions are shown in Table 2.
fication of arousal and valence scores in tweets in terms
Each of the selected texts was rated with VA-values by
of regression values (Subtasks 1 and 3) and ordinal clas-
five different annotators who passed a sample qualification
sification (Subtasks 2 and 4), and the fifth subtask was
test on Amazon Mechanical Turk (AMT), which is
emotion classification, the assignment of multiple labels
considered a reliable service to obtain high-quality data
to the tweets based on the best fit. We compared our
inexpensively and rapidly. The ratings by five different
TERMS model with the results of the fifth subtask and
annotators for each text makes the collection of 20000
arousal and valence regression subtasks (Subtasks 1 and
rating for the given 4000 texts. We adopted [67] to design
3). The winning team for the fifth subtask was NTUA-
an affective slider (AS) in the form of two slider bars to
SLP [69], which also took second and fourth place in
rate valence and arousal independently. The ranges of the
Subtasks 1 and 4, respectively. We obtained the team’s pre-
valence and arousal were set as v ∈ [1, 9] and a ∈ [1, 9].
trained model2 and implemented it on our data. The team
The rating interface is shown in Fig. 2.
had implemented two approaches: NTUA-SLP NBOW and
NTUA-SLP LSTM. NTUA-SLP NBOW used neural bag-
4.2 Comparative models
of-words model (NBOW) with word2vec and affective word
embeddings fed into an SVM classifier. NTUA-SLP LSTM
For comparative evaluations, we tested TERMS with
employed a transfer learning model, which consisted of a
baseline models as well as state-of-the-art models. We
two-layer bidirectional long short-term memory (LSTM)
implemented baseline models that are known to perform
with a deep self-attention mechanism. We evaluated the
well in classification tasks and had been extensively used
NTUA-SLP model for both the implemented approaches,
for emotion recognition. The baseline classifiers used for
NTUA-SLP NBOW and NTUA-SLP LSTM for compara-
the comparative analysis are elaborated below.
tive evaluation.
Baseline Classiﬁers For baseline models, we implemented
four prevalent supervised models to compute emotion SRV-SLSTM It is a semi-supervised regression variational
probabilities and parameterize distributions. The classifiers autoencoder (SRV) that identifies VAD scores. The model
employed are multinomial naı̈ve Bayes (NB) [1], support architecture consist of three modules, encoder, sentiment
vector machine (SVM) [16], gradient boosting (GBM) [68], prediction and decoder. Encoder uses LSTM to encode
and convolution neural network (CNN) [65]. All these text into hidden vectors, a sentiment prediction module
approaches directly output the probability of each emotion scores text via a 2-layer stacked Bi-LSTM and decoder
category for a given text; thus, their outputs were directly reconstructs the original text. We use SRV-SLSTM model
used as emotion probabilities.

1 https://github.com/bfelbo/deepmoji
State-of-the-art models We also compared our model with
2 https://github.com/cbaziotis/ntua-slp-semeval2018
four benchmark studies. The first was the DeepMoji
2682 Y. Ghafoor et al.

Table 2 Emotion distribution statistics

Emotions Anger Anti. Disgust Fear Joy Sad. Surprise Trust Total

No. of texts 535 482 481 539 495 511 470 487 4000

publicly available at GitHub platform3 and employed it on the two emotion distributions. A smaller value of AED
our dataset. indicates higher prediction correctness. PCC, denoted as
r, was utilized to measure the correlation between the
Context-LSTM-CNN (C-LSTM-CNN) The model combines the predicted emotion and direct observations. It was used with
strength of LSTM and CNN with the lightweight context valence and arousal independently. Differing from the AKL,
encoding algorithm Fixed Size Ordinally Forgetting (FOFE) the PCC is only concerned with the position of emotion
for emotion classification and sentiment score prediction distributions on VA-space, by measuring how close the
based on contexts and long-range dependencies. The model predictions are to the direct observations.
used for comparative evaluation is available at GitHub
platform4 . Classiﬁcation Performance To evaluate the performance of
the classifiers employed for soft emotion classification, we
4.3 Evaluation measurements use standard evaluation metrics, such as precision, recall,
and F1-score computed with macro-averaging. The reason
We used the following performance metrics to evaluate the to use macro-averaging for these metrics is the balanced
proposed TERMS and comparatives models. structure of emotion classes in the dataset. Precision (Pe )
denotes the fraction of true positives predicted in the
Distinguishability: This shows the average distance among processed data, whereas recall (Re ) measures the fraction
the K emotions: the greater the average distance, the higher of true positives predicted from all the positives in the
the distinguishability of emotions. We denote the average ground truth data [61]. The F1-score is the harmonic mean
distance between the emotion distributions on VA-space by of the precision and recall. These performance metrics are
AEmoD, which is computed as follows: estimated as follows adapted from [37]:
1
K
No. of texts correctly assigned to emotion class e
AEmoD = ||μi − μj ||, (19) Pe =
Npair No. of texts assigned to emotion class e
i=j
(20)
K(K−1)
where Npair = 2 and μi and μj are the means of
,
emotion i and j , respectively.
No. of texts correctly assigned to emotion class e
Re =
Prediction Correctness: This shows the correctness of the No. of texts in emotion class e
predicted emotions with respect to the direct observations, (21)
which were provided by the annotators. The ratings
obtained from the annotators were averaged for each 2 × Pe × R e
Fe = , (22)
text and used as the ground truth for the comparative Pe + Re
evaluation. To quantify the prediction correctness, we 1
used the average Kullback-Leibler (AKL) divergence, F 1 − Score = Fe (23)
|E|
average Euclidean distance (AED), and Pearson correlation e∈E
coefficient (PCC). The AKL divergence [72] measures the To further validate the classification performance, the
distance and similarity between two distributions expressed Jaccard index is computed as in [37]. The Jaccard index
as an average difference. A smaller AKL indicates the computes the accuracy of the models by dividing the
two distributions are similar, hence implying the predicted intersection size of the predicted and ground truth labels
emotion distribution is close to the ground truth. AKL is a with the size of their union as shown in (24), where t refers
notable measure for evaluation as it takes both the mean and to a text, Gt is the set of ground truths, and Pt is the set of
covariance of distributions into account for the correctness predicted labels.
test. In addition to AKL divergence, we also calculated the 1 G t ∩ Pt
J accard = (24)
AED, which shows the mean square difference between |T | G t ∪ Pt
t∈T
3 https://github.com/wuch15/SRV-DSA
The described evaluation metrics are considered effective
4 https://github.com/deansong/contextLSTMCNN
in assessing the efficiency of classifiers and have been
TERMS: textual emotion recognition in multidimensional space 2683

Fig. 2 Valence and arousal

rating interface. Top: arousal.
Bottom: valence

used in many pioneering studies [53, 38]. We selected TextCNN algorithm with Adamax optimization is used with
these evaluation metrics as higher scores in all of them word embeddings (128 dimensions) as features, batch size
represented higher classification performance. 100, and layers for kernel sizes 2 to 5 were included.
Another evaluation metric that is essential to signify To train the models for emotion probability estimation,
the better classification performance of TERMS model we collected another data set with similar textual content.
is Bayesian analysis [73]. In Bayesian analysis, the The data set was gathered from Facebook and Twitter,
experiment is summarised by the posterior distribution. The which, after refining, was reduced to 14350 texts. The
posterior describes the distribution of the mean difference texts were labeled with eight emotions (as per the wheel
of accuracies between the two classifiers. Formally, of emotion model) by three psychological experts from
the interval [−0.01, 0.01] defines a region of practical the field and were also verified by the authors themselves.
equivalence (rope) for classifiers [73, 74]. By querying the This data set was merely used for training models in order
posterior distribution, we infer the probability that TERMS to compute emotion probabilities for the primary data set
is better than other comparative models, if the posterior (4000 texts). Once the emotion probabilities were estimated,
probability of the mean difference are positive, namely they were infused into a GMM like the proposed model
the integral of the posterior on the interval [0.01, ∞]. with the same VA-annotations for comparative evaluation.
Alternatively, if the mean difference is negative (interval The state-of-the-art models NTUA-SLP, SRV-SLSTM, and
[−∞, −0.01]), it states the proposed model is not better, C-LSTM-CNN performed the prediction of valence and
and lastly, if over the rope interval ([−0.01, 0.01]) means arousal in their own setting, therefore, we did not infuse
the posterior probability of the two classifiers are equivalent it into our model. NTUA-SLP LSTM used its multilayered
[73]. design with three main steps: word-embedding pre-training,
transfer learning, and fine-tuning. The first two steps of the
4.4 Setup model were implemented likewise in [69]. For the transfer
learning approach, the biLSTM network with deep self-
Since none of the models use a GMM to map the attention mechanism was pre-trained on the Semeval 2017
(elliptical) emotion distributions in the VA-space, we Task 4A dataset (SA2017). The pre-trained model was
utilized all the described baseline models and DeepMoji combined with the final layer of the model, which was
to map the emotion distributions in the VA-space as attributed to the subtasks, such as predicting valence and
had been done with the TERMS model. The baseline arousal and multi-label classification. We have fine-tuned
classifiers (NB, GBM, and SVM) use the bag-of-words the final layer of the model for our dataset with respect
(BoW) model with term frequency features to train the to each subtask. The same 4000 rated texts were used
classifiers. The classifiers employed were MultinomialNB, to fine-tune valence and arousal prediction subtasks. The
GradientBoostingClassifier, and SVC(linear) respectively experimental settings for SRV-SLSTM and C-LSTM-CNN
from the Python sklearn toolkit. For the parameter setting had been kept same as in the original works as the models
of the classification models, we used GridsearchCV that seemed to perform best on the specified settings. SRV-
exhaustively evaluates all the parameter combinations and SLSTM was trained for various ratios of labeled training
retains the best combination to fit the data. For CNN, the data; however, it showed best performance on 40% of
2684 Y. Ghafoor et al.

labelled data; therefore, we compared our model to those 4.5 Results

scores. Each experiment was performed 10 times for SRV-
SLSTM and the average results were added in the paper. The main take-away messages and simulation results are
The approach for C-LSTM-CNN model was modified in provided in this section. We first demonstrate results for
a similar way to [75] in order to return the dimensional distinguishability, followed by prediction correctness, and
emotion scores. at the end the classification performance of the TERMS and
In addition, we did not assess these models (NTUA- comparative models are elaborated.
SLP, SRV-SLSTM, and C-LSTM-CNN) for the metrics of
distinguishability and prediction with AKL and AED, as 4.5.1 Distinguishability
the model’s architectures were not designed for mapping
emotion distributions in VA-space and had their own This part compares the distinguishability achieved by
function for computing the VA-values. This eliminated the TERMS and all the other models as displayed in Fig. 3.
need to test it in our setting and enabled us to evaluate our Figure 3a illustrates that all the emotion distributions for
model in the dynamic environment. the proposed TERMS model are well separated and have a
To evaluate the prediction performance of the TERMS better adjustment (i.e., positive emotions on the right and
and comparative models, five-fold cross-validation was negative on the left in all four quadrants of the VA-space),
carried out on the 4000 rated texts. The data was split into thus, exhibiting well-discriminated emotion distributions.
an 80/20 ratio, where for each fold, 80% was used as the The deep learning models such as, CNN (Fig. 3b) and
training data, and the remaining 20% was used as the testing DeepMoji (Fig. 3c) show good distinguishability compared
data. The validation process was completed five times, with to other baseline models, where all the emotion distributions
each 20% of the set serving once as the testing data, in order lie correctly on the valence dimension with better clarity.
to gather the overall results. The DeepMoji model blended fairly well in the TERMS

Fig. 3 Distinguishability results

TERMS: textual emotion recognition in multidimensional space 2685

setting with an appropriate allocation of emotion polarities were closest to the actual ratings, thus indicating the better
in the VA-space. The baseline models (Fig. 3d–f) also show prediction performance of TERMS over the baseline and
fair adjustment, however with marginal difference, they state-of-the-art models. The integration of a context-aware
fell short of distinct projections of emotion distributions. emotion classifier with the varying emotion perceptions
Upon close inspection, we observe that compared to all modeled via the GMM distributions provided an edge to
the other models, our proposed TERMS model have higher TERMS in capturing the nuances of embedded emotions.
distinguishability. The architecture of the proposed emotion classifier and the
To quantify distinguishability, we computed the AEmoD emotion patterns acted as the key components resulting in
for each model via (19). Figure 4 shows the achieved results. the higher prediction performance of TERMS, compared
A higher value of AEmoD indicates more scatteredness and to other models. NTUA-SLP LSTM performed very well
distinguishable emotion distributions. From Fig. 4, we can with the highest correlation in arousal prediction and the
see that the deep learning models performed well; however, second-best for valence after TERMS. We believe the
the TERMS model achieved the highest distinguishabil- 2-layer bidirectional LSTM (BiLSTM) with a deep self-
ity score of 2.642, while the other models scored lower. attention mechanism captured the salient words in tweets
The graph-based approach of the TERMS emotion classifier by gathering the information from both directions of text.
provides better coverage by capturing rare words through It provided fair estimation of important words that were
syntactic relationships and disambiguating emotional mean- highly indicative of certain emotions. NTUA-SLP NBOW
ing using the enriched and refined contextual information of also performed well, which can be attributed to the fact that
the patterns. The emotion patterns capture fine-grained lin- the pre-trained word2vec embeddings combined with the
guistic affect information, which helps in distinguishing the 10 affective dimensions enabled the model to encode the
emotions. correlation of each word with different affective dimensions
that could result in better intensity performance. SRV-
4.5.2 Prediction correctness SLSTM and C-LSTM-CNN also showed greater prediction
performance compared to the baseline models. The results
We evaluated the prediction performance of TERMS and also indicated that arousal was more challenging to predict
comparative models by computing the distance between the compared to valence as the r of arousal was lower than that
ground truth and the predicted distributions via AKL and of valence for all the models.
AED. Table 3 lists the AKL, AED, and the correlation
coefficient of r for valence and arousal for each model. 4.5.3 Classiﬁcation performance
We found that among all the models, the proposed TERMS
model achieved the lowest AKL and AED scores (4.71 We evaluated the performance of emotion classification
and 1.32, respectively) and achieved the highest correlation for the proposed TERMS model and all the comparative
for valence (0.60) and the third best for arousal (0.30). models. Figure 5 presents the calculated results of
The results show the predicted distributions for our model precision, recall, F1-score, and Jaccard. We found that
the TERMS emotion classifier achieved higher values
for precision (0.66), recall (0.65), F1-score (0.64), and
Jaccard (0.49). In contrast, the comparative models achieved
lower scores than the TERMS model. Thus, TERMS
outperformed all the comparative models in classification.
This is due to the context-aware emotion patterns that
captured the building blocks in text by creating the syntactic
patterns of connector words and subject words with clear
distinction. This helped to expose the contextual and latent
information, which was followed by the enrichment with
word embeddings to provide semantic relationships. The
enriched emotion patterns offered to capture the minute
details of embedded emotions in a text, such as emotional
intensity expressed through repeating characters in words
like “looove” or similar emotion-relevant verbs like “desire”
and “fancy” that were useful for interpreting context. This
attribute of gathering the embedded emotional information
Fig. 4 AEmoD for each model to determine distinguishability; the
larger the value, the better the clarity in the emotion distributions on enabled the emotion classifier to more effectively recognize
VA-space the emotions relative to other models.
2686 Y. Ghafoor et al.

Table 3 Overall performance of prediction

Method AKL AED r valence r arousal

GBM 5.97 1.51 0.34 0.26

SVM 4.88 1.36 0.53 0.25
NB 5.45 1.42 0.52 0.23
CNN 5.07 1.35 0.58 0.24
DeepMoji 4.81 1.35 0.54 0.23
NTUA-SLP NBOW NA NA 0.56 0.39
NTUA-SLP LSTM NA NA 0.59 0.40
SRV-SLSTM NA NA 0.53 0.26
C-LSTM-CNN NA NA 0.56 0.28
TERMS 4.54 1.30 0.60 0.30

The model that performed the closest to TERMS in In addition to macro-averaging classification metrics
classification performance was C-LSTM-CNN. C-LSTM- for precision, recall, and F1-score, we evaluated the
CNN model’s architecture combined with FOFE algorithm classification performance with micro-averaging metrics as
effectively captured the large context of the focus sentence well. The results are displayed in Fig. 6, which shows
that helped in better identification of emotions. NTUA- that the difference between the macro and micro-averaging
SLP NBOW, NTUA-SLP LSTM and CNN also showed scores is trivial, ascertaining the minor impact of averaging
satisfactory classification performance. NTUA-SLP LSTM methods on balanced structure of emotion classes in the
performed better on its own dataset for all the subtasks dataset.
provided by SemEval-2018 Task 1. However, in our To end, we evaluated TERMS model with other
setting, in contrast, NTUA-SLP NBOW performed better comparative models for Bayesian analysis and the results
in terms of classification performance. The deep learning are elaborated in Table 4. The Table shows that the TERMS
models, CNN and DeepMoji’s classification performance performs better than the other models as the posterior
was substantially better than the conventional baseline probability of the mean difference of accuracies are all
models, which showed a severe setback in performance positive and above 0. All the posteriors are towards the right
for this task. Altogether, we observed that TERMS scored of the rope i.e. on the interval of [0.01, ∞] shown in last two
higher in classification evaluations followed by the state-of- columns of the Table 4. The test results estimated further
the-art and deep learning models, and with a large margin to strengthened the better performance of the proposed model
baseline models. relative to comparative models.

Fig. 5 Classification evaluation metrics for TERMS and all the comparative models. TERMS performs better by demonstrating higher precision,
recall, F1-score, and Jaccard
TERMS: textual emotion recognition in multidimensional space 2687

have a high element of sarcasm, satire and irony in them

that makes these emotion classes difficult to comprehend.
Sarcasm or sardonic statements depend on the prosodic
information or non-verbal aspects of communication such
as tone, pitch, volume, timbre, facial expressions etc. Lack
of these paralinguistic dimensions for anger and sadness
can complicate the identification of such emotions from the
texts. We provide the sarcastic misclassified texts for recall
from our dataset to support our reasoning in Table 6.
Apart from humour and sarcastic comments, the open-
ness of natural language invites ambiguity and misunder-
standing. Lack of explicitness in a statement can make it
difficult for the emotion detection model to interpret the
fuzzy margin between nature of emotions. Table 7 shows the
examples from our dataset that were miss calculated due to
the lack of explicitness. Lastly, we believe the minor reason
Fig. 6 Classification evaluation metrics with macro and micro- that led to low recall was the word sense disambiguation.
averaging scores The miscassified texts based on word ambiguity are stated
in Table 8.
5 Discussion
VA-annotations. Another aspect of this study that needs an
This section discusses TERMS with different aspects to argumentative analysis is valence and arousal annotations.
provide insights on emotional classes, VA-annotations and Figure 7 shows TERMS predicted valence and arousal
TERMS in various emotion prediction problems. values relative to the ground truth ratings gathered from
AMT. From the figure, we can observe that in general
Recall by emotion class. The statistical performance of the predictions follow the curve of the ratings in the
the TERMS emotion model is shown but it would be ground truth. For valence, the overall difference between the
interesting yet essential to discuss which emotion classes predicted values and ground truth is smaller as compared to
from the dataset were mainly misclassified by the model. In arousal. Predictions for arousal seem more conservative and
order to do so, we estimated recall by class for the proposed restricted to individual differences. The arousal dimension
model to determine which emotion class has higher count normally is widely subjective and shows subtle variations
of false negatives i.e. the class with lowest recall. The among individuals, which makes this parameter challenging
results for recall by class are shown in Table 5. From to comprehend.
the table, the emotion classes that shows lowest recalls Furthermore, this study analyzed the impact of a number
are sadness (0.46) and anger (0.53), which specifies the of annotators on the VA-rating prediction. The model
misclassifications were made in respective classes. The was trained for the reduced number of annotators i.e. 3
error analysis is provided further to identify the underlying and 2 to analyze the influence of annotators’ number on
causes of misclassification in these classes. prediction performance. Figure 8 shows the PCC curves
Both classes with the lowest recall belong to the negative of valence and arousal for a varied number of annotators.
polarity. We believe the texts related to negative emotions It is explicable that the model performed better for the

Table 4 Bayesian analysis comparative results

Proposed Others t-value p-value Mean diff. Lower Upper

TERMS C-LSTM-CNN 3.12 0.00 0.32 0.18 0.53

NTUA-SLP NBOW 57.10 0.00 2.46 2.41 2.64
NTUA-SLP LSTM 30.46 8E-18 0.23 0.21 0.23
CNN 5.79 8E-09 0.29 0.18 0.38
DeepMoji 15.87 7E-56 0.78 0.70 0.88
GBM 7.64 2E-14 0.37 0.28 0.46
SVM 42.92 0.00 1.88 1.81 1.97
NB 40.19 0.00 1.76 1.99 2.16
2688 Y. Ghafoor et al.

Table 5 Recall by emotion class

Emotions Anger Anti. Disgust Fear Joy Sad. Surprise Trust

Recall 0.53 0.64 0.70 0.61 0.69 0.46 0.73 0.82

Table 6 Misclassified texts (Sarcasm & satire)

Texts Actual Predicted

I love when i can’t sleep. anger trust

seriously?! we had to turn around because my mom forgot the anger sadness
chicken in the freezer.
ummmm grow up? please. thank you! anger joy
sorry sweetheart you downgraded anger joy
lol oh really? is that what its all about?!! hahahaha anger sadness

Table 7 Misclassified texts (Lack of explicitness)

Texts Actual Predicted

royal mail... why you loose my parcel? anger trust

do some girls really think its attractive to look like prostitutes anger disgust
on a daily basis...
that’s fucked up.. anger joy
i don’t even know you anymore. sadness trust
guess i’m not good enough for you... sadness trust

Table 8 Misclassified texts (Word sense disambiguation)

Texts Actual Predicted

this walking dead is very disappointing sadness fear

there are so many disrespectful and disgusting men in this world. anger disgust
had to say it....because this generation is going straight down sadness disgust
the drain.
i hate it when my comforter smells like someone that i miss. sadness disgust
i was in such a good mood...that’s gone out the window! anger joy

Fig. 7 Predicted values of

valence and arousal by TERMS
TERMS: textual emotion recognition in multidimensional space 2689

for negative emotions, they tend to signify unpleasantness

(lower valence). The findings conclude that the personality
traits have a moderate influence on VA-rating behaviour.
The model limits in covering the personality difference
in VA-annotations and the impact it can have on emotion
distributions. In future, we would like to integrate an aspect
of personality variation and its influence in recognizing
emotions on dimensional VA-space.

TERMS in emotion prediction problem TERMS have an

absolute significance in the prevailing global crisis, the
Coronavirus pandemic (COVID-19). The model is well
equipped to identify the nuances of emotions and is
applicable for any emotion prediction problem as severe
Fig. 8 PCC curves of valence and arousal at varied number of as the COVID-19 crisis [77]. The trauma of COVID-
annotators. 19 has spread uncertainty and extreme emotional distress
among people. The variation and uncertainty in emotional
states would be essential to identify and understand the
highest number of annotators. The difference in prediction
emotional needs of people in the crisis. The TERMS
performance between the number of annotators is evident
context-aware emotion classifier can be effectively used
that ascertains the increased number of annotators would
to capture the emotions from the microblog text before
enhance the quality of the model’s performance. However, it
COVID-19 and during the pandemic to analyse prevailing
is notable that the minor variation in annotators’ number has
emotion dynamics. The resulting emotional classes can
resulted in a significant improvement in models prediction
be scaled with any variables that are significant to the
performance. We anticipate that model’s prediction quality
pandemic (such as population, density, migration, etc.) for
and capacity to capture individual differences would
any city or location through linear regression to study
enhance substantially with a slight increase in the number
its impact on emotional states during and before the
of annotators for future implementations.
pandemic. This will provide an overview of the emotional
standing or the cognitive narrative of the respective
Personalizing VA-annotations For this study, the VA- cities, which is essential in this global crisis to provide
ratings used for experiments were annotated through AMT. reassurance and designate the contingency plan as per the
The quality checks for the VA-ratings were maintained emotional needs of city dwellers. Moreover, the model can
during the annotation process; however, it is inferred significantly be employed for any political scenario. The
that the annotation could be influenced by annotators’ microblog texts related to political debates are high on
personality traits or culture differences. To test this emotion and sentimental content. The texts contain varying
inference, a small experiment was conducted with 200 perspectives, fuzzy opinions, and linguistic variations that
texts from Twitter, annotated again for VA-ratings on would require a probabilistic context-aware model and
AMT; however, before annotation the personality test dimensional mapping integrated in a model to capture the
was conducted by Big Five Inventory (BFI) to get nuance, depth and dimensions of emotions embedded in a
the scores of the personality in each dimension of the text.
Big Five (extraversion, agreeableness, conscientiousness,
neuroticism and openness to experience) [76]. The results
show the personalities do influence the va-ratings but in 6 Conclusion
different manner. It shows that people who score high in the
neuroticism dimension would signify the negative emotions, Microblog texts are explicit, relevant, and rich in emotional
which can lead to lower VA-ratings for positive emotion. In content; however, their aberrant and informal language
contrast, annotators high in agreeableness tend to be more makes emotion recognition a challenging task to be
exciting and pleasant for positive emotions (higher arousal employed in real-world systems. It is essential to understand
and valence) and calmer for negative emotions. Extraversion the contextualized information and linguistic variation with
and openness to experience have minimal impact on the a complete coverage of varying emotional perceptions
VA-ratings for negative emotions and the last personality towards the same text in order to recognize emotions from
trait exhibits the annotators high in conscientiousness texts accurately. In this article, we propose a probabilistic
have higher VA-ratings for positive emotions; however, emotion recognition model TERMS that addresses the
2690 Y. Ghafoor et al.

above challenges. In particular, the TERMS model captures 11. Calvo RA, Mac Kim S (2013) Emotions in text: dimensional and
the rare and refined contextual emotional information categorical models. Comput Intell 29(3):527–543
12. Meo R, Sulis E (2017) Processing affect in social media: A
through the proposed emotion classifier. To capture and comparison of methods to distinguish emotions in tweets. ACM T
learn from varying perceptions, TERMS utilizes a GMM Internet Techn 17(1):1–25
to derive the emotion distribution in a VA-space. The 13. Abdul-Mageed M, Ungar L (2017) Emonet: Fine-grained emotion
emotional information in the probabilistic form is merged detection with gated recurrent neural networks. In: Proceedings of
ACL, pp 718–728
with learned GMM parameters from the VA-ratings to
14. Ekman P, Sorenson ER, Friesen WV (1969) Pan-cultural elements
generate emotion distributions in VA-space to cover the in facial displays of emotion. Science 164(3875):86–88
varying emotional perceptions. We validate the significance 15. Plutchik R (2001) The nature of emotions human emotions have
of emotion distributions through a detailed comparative deep evolutionary roots, a fact that may explain their complexity
and provide tools for clinical practice. AmSci 89(4):344–350
analysis with baseline and state-of-the-art models. The
16. Paltoglou G, Thelwall M (2013) Seeing stars of valence and
results show that TERMS achieved the best performance arousal in blog posts. IEEE Trans Affect Comput 4(1):116–123
relative to other models based on the performance 17. Buechel S, Hahn U (2017) Emobank: Studying the impact of
metrics of distinguishability, prediction, and classification annotation perspective and representation format on dimensional
emotion analysis. In: Proceedings of EACL (Short Papers),
performance. Furthermore, the proposed model is scalable
pp 578–585
and adaptable since different classifiers can be implemented 18. Preotiuc-Pietro D, Schwartz HA, Park G, Eichstaedt JC, Kern M,
to compute emotional probabilities as well as due to Ungar L, Shulman EP (2016) Modelling valence and arousal in
the transparent learning process of the GMM. TERMS facebook posts. In: ACL Proceedings of WASSA, pp 9–15
19. Mohammad SM (2017) Challenges in sentiment analysis. In: A
paves the way for the affective modeling of texts by
practical guide to sentiment analysis. Springer, pp 61–83
parameterizing emotion distributions with applications to 20. Mulcrone K (2012) Detecting emotion in text. UMM CSci Senior
behavior analysis, forecasting, healthcare, and affective Seminar
human-computer interaction. 21. Liu B (2010) Sentiment analysis and subjectivity. Handb Nat Lang
Process 2(2010):627–666
22. Wang J-C, Yang Y-H, Wang H-M, Jeng S-K (2015) Modeling the
affective content of music with a gaussian mixture model. IEEE
References Trans Affect Comput 6(1):56–68
23. Vinayagasundaram B, Mallik R, Aravind M, Aarthi RJ,
1. Perikos I, Hatzilygeroudis I (2018) A framework for analyzing Senthilrhaj S (2016) Building a generative model for affective
big social data and modelling emotions in social media. In: IEEE content of music. In: IEEE Proceedings of ICRTIT, pp 1–6
Proceedings of BigDataService, pp 80–84 24. Pribil J, Pribilova A, Matousek J (2019) Artefact determination
2. Basile P, Basile V, Nissim M, Novielli N, Patti V et al (2018) by GMM-based continuous detection of emotional changes in
Sentiment analysis of microblogging data synthetic speech. In: IEEE Proceedings of TSP, pp 45–48
3. Bermingham A, Smeaton A (2010) Classifying sentiment in 25. Giachanou A, Crestani F (2016) Like it or not: A survey of twitter
microblogs: Is brevity an advantage? In: ACM Proceedings of sentiment analysis methods. ACM Comput Surv 49(2):1–41
CIKM, pp 1833–1836 26. Seyeditabari A, Tabari N, Zadrozny W (2018) Emotion detection
4. Rintyarna BS, Sarno R, Fatichah C (2020) Enhancing the in text: A review. arXiv:1806.00674v1
performance of sentiment analysis task on product reviews by 27. Suttles J, Ide N (2013) Distant supervision for emotion
handling both local and global context. Int J Inform Decis Sci classification with discrete binary values. In: Proceedings of
12(1):75–101 CICLing, pp 121–136
5. Dini L, Bittar A, Robin C, Segond F, Montaner M (2017) 28. Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text
Soma: The smart social customer relationship management tool: using ensemble of classifiers. Eng Appl Artif Intel 51:191–201
Handling semantic variability of emotion analysis with hybrid 29. Symeonidis S, Effrosynidis D, Arampatzis A (2018) A compar-
technologies. In: Sentiment Analysis in Social Networks, pp 197– ative evaluation of pre-processing techniques and their interac-
209 tions for twitter sentiment analysis. Expert Syst Appl 110:298–
6. Ghanem B, Buscaldi D, Rosso P (2019) Textrolls: Identifying rus- 310
sian trolls on twitter from a textual perspective. arXiv:1910.01340 30. Demszky D, Movshovitz-Attias D, Ko J, Cowen A, Nemade G,
7. Abdullah M, Hadzikadic M (2017) Sentiment analysis of twitter Ravi S (2020) Goemotions: A dataset of fine-grained emotions.
data: Emotions revealed regarding Donald Trump during the 2015- arXiv:2005.00547
16 primary debates. In: IEEE Proceedings of ICTAI, pp 760– 31. Lykousas N, Patsakis C, Kaltenbrunner A, Gómez V (2019)
764 Sharing emotions at scale: The vent dataset. In: Proceedings of the
8. Calvo RA, Milne DN, Hussain MS, Christensen H (2017) Natural International AAAI Conference on Web and Social Media, vol 13,
language processing in mental health applications using non- pp 611–619
clinical texts. Nat Lang Eng 23(5):649–685 32. Alvarez-Gonzalez N, Kaltenbrunner A, Gómez V (2021) Uncov-
9. Carrillo-de Albornoz J, Rodrı́guez Vidal J, Plaza L (2018) Feature ering the limits of text-based emotion detection. arXiv:2109.01900
engineering for sentiment analysis in e-health forums. PLoS One 33. Malko A, Paris C, Duenser A, Kangas M, Mollá D, Sparks R, Wan
13(11):e0207996 S (2021) Demonstrating the reliability of self-annotated emotion
10. Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG (2020) data. In: Proceedings of the Seventh Workshop on Computational
Emotion recognition related to stock trading using machine learn- Linguistics and Clinical Psychology: Improving Access, pp 45–
ing algorithms with feature selection. IEEE Access 8:199719– 54
199732
TERMS: textual emotion recognition in multidimensional space 2691

34. Peng S, Cao L, Zhou Y, Ouyang Z, Yang A, Li X, Jia W, Yu S 55. Zhao S, Jia G, Yang J, Ding G, Keutzer K (2021) Emotion recogni-
(2021) A survey on deep learning for textual emotion analysis in tion from multiple modalities: Fundamentals and methodologies.
social networks. Digital Communications and Networks IEEE Signal Proc Mag 38(6):59–73
35. Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S 56. Yang Y-H, Chen HH (2011) Prediction of the distribution of
(2017) Using millions of emoji occurrences to learn any-domain perceived music emotions using discrete samples. IEEE T Audio
representations for detecting sentiment, emotion and sarcasm. Spe 19(7):2184–2196
arXiv:1708.00524 57. Zhao S, Yao H, Jiang X (2015) Predicting continuous probability
36. Rosenthal S, Farra N, Nakov P (2017) Semeval-2017 Task 4: distribution of image emotions in valence-arousal space. In: ACM
Sentiment analysis in twitter. In: ACL Proceedings of SemEval- Proceedings of MM, pp 879–882
2017, pp 502–518 58. Sun K, Yu J, Huang Y, Hu X (2009) An improved valence-
37. Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko arousal emotion space for video affective content representation
S (2018) Semeval-2018 Task 1: Affect in tweets. In: ACL and recognition. In: IEEE Proceedings of ICME, pp 566–569
Proceedings SemEval, pp 1–17 59. Yang Y-H, Liu J-Y (2013) Quantitative study of music listening
38. Zhang S, Xu X, Pang Y, Han J (2020) Multi-layer attention based behavior in a social and affective context. IEEE T Multimed
cnn for target-dependent sentiment classification. Neural Process 15(6):1304–1315
Lett 51(3):2089–2103 60. Huang Z, Epps J (2016) Detecting the instant of emotion change
39. Sadr H, Pedram MM, Teshnehlab M (2020) Multi-view deep from speech using a martingale framework. In: IEEE Proceedings
network: A deep model based on learning features from of ICASSP, pp 5195–5199
heterogeneous neural networks for sentiment analysis. IEEE 61. Trabelsi I, Ayed DB, Ellouze N (2018) Evaluation of influence
Access 8:86984–86997 of arousal-valence primitives on speech emotion recognition. Int
40. Mohammad SM (2021) Sentiment analysis: Automatically detect- Arab J Inf Technol 15(4):756–762
ing valence, emotions, and other affectual states from text. In: 62. Bishop CM (2006) Pattern recognition and machine learning.
Emotion Measurement. Elsevier, pp 323–379 Springer, New York
41. Russell JA (1980) A circumplex model of affect. J Pers Soc 63. Mardia KV (1970) Measures of multivariate skewness and
Psychol 39(6):1161–1178 kurtosis with applications. Biometrika 57(3):519–530
42. Mohammad SM (2016) Sentiment analysis: Detecting valence, 64. Saravia E, Argueta C, Chen Y-S (2016) Unsupervised graph-
emotions, and other affectual states from text. In: Emotion based pattern extraction for multilingual emotion classification.
measurement. Elsevier, pp 201–237 Soc Netw Anal Min 6(1):1–21
43. Warriner AB, Kuperman V, Brysbaert M (2013) Norms of 65. Saravia E, Liu H-CT, Huang Y-H, Wu J, Chen Y-S (2018) Carer:
valence, arousal, and dominance for 13 915 English lemmas. Contextualized affect representations for emotion recognition. In:
Behav Res Methods 45(4):1191–1207 ACL Proceedings of EMNLP, pp 3687–3697
44. Mohammad SM (2018) Obtaining reliable human ratings of 66. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood
valence, arousal, and dominance for 20,000 English words. In: from incomplete data via the em algorithm. J R Stat Soc: Ser B
Proceedings of ACL, pp 174–184 (Methodol) 39(1):1–22
45. Hasan M, Rundensteiner E, Agu E (2018) Automatic emotion 67. Betella A, Verschure PFMJ (2016) The affective slider: A digital
detection in text streams by analyzing twitter data. Int J Data Sci self-assessment scale for the measurement of human emotions.
Anal 7(1):35–51 PLoS One 11(2):e0148037
46. Hasan M, Rundensteiner E, Agu E (2014) Emotex: Detecting 68. Tavares G, Mastelini S et al (2017) User classification on online
emotions in twitter messages. In: Proceedings of ASE, pp 1– social networks by post frequency. In: Anais Principais do XIII
10 Simpósio Brasileiro de Sistemas de Informação, pp 464–471
47. Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared 69. Baziotis C, Athanasiou N, Chronopoulou A, Kolovou A,
task on emotion intensity. arXiv:1708.03700 Paraskevopoulos G, Ellinas N, Narayanan S, Potamianos A
48. Buechel S, Hahn U (2016) Emotion analysis as a regression (2018) Ntua-slp at semeval-2018 Task 1: Predicting affective
problem-dimensional models and their implications on emotion content in tweets with deep attentive rnns and transfer learning.
representation and metrical evaluation. In: ACM Proceedings of arXiv:1804.06658
ECAI, pp 1114–1122 70. Wu C, Wu F, Wu S, Yuan Z, Liu J, Huang Y (2019)
49. Park S, Kim J, Ye S, Jeon J, Park HY, Oh A (2021) Dimensional Semi-supervised dimensional sentiment analysis with variational
emotion detection from categorical emotion. In: Proceedings of autoencoder. Knowl-Based Syst 165:30–39
the 2021 Conference on Empirical Methods in Natural Language 71. Song X, Petrak J, Roberts A (2018) A deep neural network
Processing, pp 4367–4380 sentence level classification method with context information.
50. Rawat T, Jain S (2021) A dimensional representation of depressive arXiv:1809.00934
text. In: Data Analytics and Management. Springer, pp 175–187 72. Hershey JR, Olsen PA (2007) Approximating the Kullback
51. Cheng Y-Y, Chen Y-M, Yeh W-C, Chang Y-C (2021) Valence Leibler divergence between Gaussian mixture models. In: IEEE
and arousal-infused bi-directional lstm for sentiment analysis of Proceedings of ICASSP, pp IV–320
government social media management. Appl Sci 11(2):880 73. Benavoli A, Corani G, Demšar J, Zaffalon M (2017) Time for
52. Li M (2022) Application of sentence-level text analysis: The role a change: a tutorial for comparing multiple classifiers through
of emotion in an experimental learning intervention. J Exp Soc bayesian analysis. J Mach Learn Res 18(1):2653–2688
Psychol 99:104278 74. Kruschke JK (2015) Tutorial: Bayesian data analysis. In: CogSci
53. Mohammad SM, Bravo-Marquez F (2017) Emotion intensities in 75. Zhu S, Li S, Zhou G (2019) Adversarial attention modeling for
tweets. arXiv:1708.03696 multi-dimensional emotion regression. In: Proceedings of the 57th
54. Duppada V, Jain R, Hiray S (2018) Seernet at semeval-2018 Task Annual Meeting of the Association for Computational Linguistics,
1: Domain adaptation for affect in tweets. arXiv:1804.06137 pp 471–480
2692 Y. Ghafoor et al.

76. Guan J (2017) Proving personality-related differences in valence Fernando H. Calderon is a

and arousal annotations in social media tasks, National Tsing Hua Honduran, born in 1987. He
University, Hsinchu City is a PhD student and Teacher
77. Ghafoor Y, Calderon FH, Chen LS-W, Chen Y-S (2021) Emotion Assistant with the Social
interaction in cities. In: 2021 IEEE 22nd International Conference Networks and Human Cen-
on Information Reuse and Integration for Data Science (IRI). tered Computing program at
IEEE, pp 91–98 Academia Sinica and the Insti-
tute of Information Systems
Publisher’s note Springer Nature remains neutral with regard to and Applications, National
jurisdictional claims in published maps and institutional affiliations. Tsing Hua University since
2015. His current research
interests include behavioral
Yusra Ghafoor is a Ph.D. and sentiment analysis from
candidate at Institute of Infor- social media, emotion recog-
mation Science, Academia nition and computational
Sinica and Institute of mental health.
Information Systems and
Applications, National Tsing
Hua University (NTHU). Yen-Hao Huang was born
She is a Taiwan International in Taiwan, in 1993. He is
Graduate Program (TIGP) a PhD student and Teacher
student and studies in pro- Assistant with the Institute
gram Social Networks and of Information Systems and
Human Centered Computing Applications, National Tsing
(SNHCC). She received her Hua University since 2015.
M.S. in Electrical Engineering His current research inter-
and Computer Science from ests include sentiment analy-
National Taipei University of sis, emotion recognition, com-
Technology (NTUT), Taiwan. She achieved Excellent student award putational mental health, nat-
and Best thesis award in her Master studies. Her research interests ural language processing and
include data analysis, artificial intelligence, machine learning and deep learning.
natural language processing.

Shi Jinping received the B.E Kuan-Ta Chen (a.k.a. Sheng-

degree in Science and Tech- Wei Chen) was a Research
nology of Remote Sensing Fellow at the Institute of
from China University of Geo- Information Science and the
sciences, Wuhan, China, in Research Center for Infor-
2010, and the M.S. degree mation Technology Innova-
from National Tsing Hua Uni- tion (joint appointment) of
versity, Hsinchu, Taiwan, in Academia Sinica. Dr. Chen
2017. From 2010 to 2014, received his Ph.D. in Electri-
he was a Software Engi- cal Engineering from National
neer/Product Consultant for a Taiwan University in 2006,
GIS company in China. He is and received his B.S. and M.S.
currently an Algorithm Engi- in Computer Science from
neer in an e-commerce com- National Tsing-Hua Univer-
pany, Shenzhen, China, focus- sity in 1998 and 2000, respec-
ing on natural language pro- tively. He received the Young
cessing and recommendation systems. Scholar’s Creativity Award from Foundation for the Advancement of
Outstanding Scholarship in 2013, and IEEE ComSoc MMTC Best
Journal Paper Award in 2014. He was an Associate Editor of IEEE
Transactions on Multimedia (IEEE TMM) and an Associate Editor of
ACM Transactions on Multimedia Computing, Communications, and
Applications (ACM TOMM).
TERMS: textual emotion recognition in multidimensional space 2693

Yi-Shin Chen joined National

Tsing Hua University
(NTHU) at 2004. Dr. Chen
received her Ph.D. in Com-
puter Science from University
of Southern California in
2002, and received her B.B.A.
and M.B.A. in information
management from National
Central University in 1996
and 1997, respectively. Cur-
rently, she is the principal
investigator of the Artificial
Intelligence Talent Cultivation
Project for AI Techniques and
Application Courses (funded
by Minister of Education of Taiwan), and the standing director of
Taiwanese Association for Artificial Intelligence. She is passionate
about increasing society’s benefits through her research efforts. For
avoiding a media monopoly, she focused her research efforts on Web
intelligence and integration. Currently, she has applied natural lan-
guage processing techniques in understanding the characteristics of
music therapy, emotion recognitions, and mental illness.

Aﬃliations
Yusra Ghafoor1 · Shi Jinping2 · Fernando H. Calderon1 · Yen-Hao Huang2 · Kuan-Ta Chen3 · Yi-Shin Chen4

Yusra Ghafoor
[email protected]
Shi Jinping
[email protected]
Fernando H. Calderon
[email protected]
Yen-Hao Huang
[email protected]
Kuan-Ta Chen
[email protected]
1 Social Networks and Human-Centered Computing, Taiwan
International Graduate Program, Institute of Information
Science, Academia Sinica. Institute of Information Systems
and Applications, National Tsing Hua University,
Hsinchu City, Taiwan
2 Institute of Information Systems and Applications, National Tsing
Hua University, Hsinchu City, Taiwan
3 Institute of Information Science, Academia Sinica, Taipei, Taiwan
4 Department of Computer Science, National Tsing Hua University,
Hsinchu City, Taiwan

Challenges and Opportunities of Text-Based Emotion Detection A Survey
No ratings yet
Challenges and Opportunities of Text-Based Emotion Detection A Survey
35 pages
Emotion Recognition of Social Media Users Based On Deep Learing
No ratings yet
Emotion Recognition of Social Media Users Based On Deep Learing
14 pages
Sustainability 15 12539
No ratings yet
Sustainability 15 12539
24 pages
s00521-023-08276-8 - Transformer Learning On Twitter Database
No ratings yet
s00521-023-08276-8 - Transformer Learning On Twitter Database
12 pages
IEEEJV - 82emotion Recognition On Twitter Comparative Study and Training A Unison Model PDF
No ratings yet
IEEEJV - 82emotion Recognition On Twitter Comparative Study and Training A Unison Model PDF
14 pages
Paper6 Implementation
No ratings yet
Paper6 Implementation
23 pages
Jurnal NLP
No ratings yet
Jurnal NLP
72 pages
Uncovering The Limits of Text-Based Emotion Detection
No ratings yet
Uncovering The Limits of Text-Based Emotion Detection
24 pages
Comparison of Various ML and DL Models For Emotion Recognition Using Twitter
No ratings yet
Comparison of Various ML and DL Models For Emotion Recognition Using Twitter
6 pages
GoEmotions Dataset Paper
No ratings yet
GoEmotions Dataset Paper
15 pages
Emotion Detection Using Text
No ratings yet
Emotion Detection Using Text
5 pages
Articulo Textos en Ingles FACPYA
No ratings yet
Articulo Textos en Ingles FACPYA
14 pages
Emotions Detection From Messages Using Machine Learning: Abstract
No ratings yet
Emotions Detection From Messages Using Machine Learning: Abstract
4 pages
EDandSA On Social Media
No ratings yet
EDandSA On Social Media
8 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Emotion Detection From Text: A Survey
No ratings yet
Emotion Detection From Text: A Survey
8 pages
Fine-Grained Emotion Prediction by Modeling Emotion Definitions
No ratings yet
Fine-Grained Emotion Prediction by Modeling Emotion Definitions
8 pages
Emotion Classification Using ML and DL
No ratings yet
Emotion Classification Using ML and DL
8 pages
IJACSA 2022 Mansyetal
No ratings yet
IJACSA 2022 Mansyetal
12 pages
Text-Based Emotion Recognition
100% (1)
Text-Based Emotion Recognition
8 pages
Emotion Detection in Text: A Review: UNC Charlotte UNC Charlotte UNC Charlotte
No ratings yet
Emotion Detection in Text: A Review: UNC Charlotte UNC Charlotte UNC Charlotte
14 pages
JJCIT Template
No ratings yet
JJCIT Template
8 pages
Deep Learning Based Emotion Analysis of Microblog Texts
No ratings yet
Deep Learning Based Emotion Analysis of Microblog Texts
17 pages
Social Media Sentiment Analysis Based On COVID-19
No ratings yet
Social Media Sentiment Analysis Based On COVID-19
16 pages
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
No ratings yet
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
17 pages
Sensors 23 02455
No ratings yet
Sensors 23 02455
33 pages
Emotion Detection From Text Using NLP HCL 2025
No ratings yet
Emotion Detection From Text Using NLP HCL 2025
12 pages
Sentiment Analysis - Beyond Polarity
No ratings yet
Sentiment Analysis - Beyond Polarity
42 pages
Computational Intelligence and Neuroscience - 2022 - Bharti - Text Based Emotion Recognition Using Deep Learning Approach
No ratings yet
Computational Intelligence and Neuroscience - 2022 - Bharti - Text Based Emotion Recognition Using Deep Learning Approach
8 pages
Ip 7
No ratings yet
Ip 7
11 pages
Tree-Structured Regional CNN-LSTM Model For Dimensional Sentiment Analysis
No ratings yet
Tree-Structured Regional CNN-LSTM Model For Dimensional Sentiment Analysis
11 pages
Computer PDF
No ratings yet
Computer PDF
10 pages
Sailunaz Emotion and Sentiment Analysis From Twitter Text
No ratings yet
Sailunaz Emotion and Sentiment Analysis From Twitter Text
18 pages
Stress Detection Using Emotions A Survey
No ratings yet
Stress Detection Using Emotions A Survey
6 pages
A Study On NLP Approaches For Emotion and Sentiment Interpretation From Text
No ratings yet
A Study On NLP Approaches For Emotion and Sentiment Interpretation From Text
5 pages
Anger Detection for Indian Languages
No ratings yet
Anger Detection for Indian Languages
5 pages
Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques
No ratings yet
Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques
7 pages
2024 Law-1 3
No ratings yet
2024 Law-1 3
10 pages
Domain-Generalized Emotion Recognition On German T
No ratings yet
Domain-Generalized Emotion Recognition On German T
25 pages
1-s2.0-S0167739X1931163X-main (comentado-ASC)
No ratings yet
1-s2.0-S0167739X1931163X-main (comentado-ASC)
9 pages
Deep Learning Emotion Detection Review
No ratings yet
Deep Learning Emotion Detection Review
81 pages
Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)
No ratings yet
Understanding Emotions in Text Using Deep Learning and Big Data (PRINTED)
32 pages
Emotions Semantic ENN IEEE 2019
No ratings yet
Emotions Semantic ENN IEEE 2019
13 pages
Acoustic Emotion Recognition SEO
No ratings yet
Acoustic Emotion Recognition SEO
14 pages
Emotion Detection From Text
No ratings yet
Emotion Detection From Text
7 pages
10.1016 J.ins.2013.12.059 Sentiment Topic Models For Social Emotion Mining
No ratings yet
10.1016 J.ins.2013.12.059 Sentiment Topic Models For Social Emotion Mining
11 pages
Semantic-Emotion Neural Network For Emotion Recognition From Text
No ratings yet
Semantic-Emotion Neural Network For Emotion Recognition From Text
13 pages
Affective Text Based Emotion Mining in Social Media
No ratings yet
Affective Text Based Emotion Mining in Social Media
9 pages
Learning To Identify Emotions in Text: Carlo Strapparava Strappa@itc - It Rada Mihalcea Rada@cs - Unt.edu
No ratings yet
Learning To Identify Emotions in Text: Carlo Strapparava Strappa@itc - It Rada Mihalcea Rada@cs - Unt.edu
5 pages
DSP RP 1
No ratings yet
DSP RP 1
21 pages
CP5074 - SNA Unit V Notes
No ratings yet
CP5074 - SNA Unit V Notes
21 pages
Advanced Techniques in Text-Based Emotion Recognition and Conversations
No ratings yet
Advanced Techniques in Text-Based Emotion Recognition and Conversations
5 pages
Emotion Recognition From Text Stories Using An Emotion Embedding Model
No ratings yet
Emotion Recognition From Text Stories Using An Emotion Embedding Model
5 pages
Emotion Detection Analysis Documenration
No ratings yet
Emotion Detection Analysis Documenration
37 pages
MAJOR
No ratings yet
MAJOR
22 pages
Information 14 00090
No ratings yet
Information 14 00090
14 pages
The Acoustic Environment
No ratings yet
The Acoustic Environment
15 pages
Room Acoustic L
No ratings yet
Room Acoustic L
76 pages
Lec 1
No ratings yet
Lec 1
16 pages
Emotion Classification and Sentiment Analysis For Sustainable Agricultural Development
No ratings yet
Emotion Classification and Sentiment Analysis For Sustainable Agricultural Development
41 pages
Aelsya Destiyanti
No ratings yet
Aelsya Destiyanti
2 pages
The Effects of Online Games in Academic Performance of Grade 11 and Grade 12 HUMSS Students in Kapitolyo Senior High School
No ratings yet
The Effects of Online Games in Academic Performance of Grade 11 and Grade 12 HUMSS Students in Kapitolyo Senior High School
16 pages
East Asian Capitalism Diversity Continuity and Change 1st Edition Andrew Walter PDF Download
No ratings yet
East Asian Capitalism Diversity Continuity and Change 1st Edition Andrew Walter PDF Download
52 pages
The Influence of Principal Leadership On Teacher Collaboration
No ratings yet
The Influence of Principal Leadership On Teacher Collaboration
130 pages
Medical Titles (7!17!2015)
No ratings yet
Medical Titles (7!17!2015)
10 pages
Discrete Mathematics, 1ma462, Spring 2021
No ratings yet
Discrete Mathematics, 1ma462, Spring 2021
2 pages
NISHTHA Teacher Training Observation Guide
No ratings yet
NISHTHA Teacher Training Observation Guide
10 pages
January 2014 (IAL) MS - F1 Edexcel
No ratings yet
January 2014 (IAL) MS - F1 Edexcel
18 pages
BUS1001 Ass2 Sem2 2024
No ratings yet
BUS1001 Ass2 Sem2 2024
4 pages
Yhills Intern-8
No ratings yet
Yhills Intern-8
26 pages
Project Management Unit - Lesson Learned
No ratings yet
Project Management Unit - Lesson Learned
8 pages
CHED Memorandum Order CMO Guidelines For Student Internship Abroad Program SIAP PDF
100% (2)
CHED Memorandum Order CMO Guidelines For Student Internship Abroad Program SIAP PDF
9 pages
January Courses With Hyperlinks To Web Site
No ratings yet
January Courses With Hyperlinks To Web Site
8 pages
Module 1 Language and Communication
100% (1)
Module 1 Language and Communication
4 pages
Review of Related Literature Introduction Example
100% (6)
Review of Related Literature Introduction Example
20 pages
Cambridge International AS & A Level: PSYCHOLOGY 9990/42
No ratings yet
Cambridge International AS & A Level: PSYCHOLOGY 9990/42
8 pages
Design Topic: Measurements and Geometry Subject(s) Mathematics Grade(s) 5 Designer(s) Cierra Luna Understanding by Design
No ratings yet
Design Topic: Measurements and Geometry Subject(s) Mathematics Grade(s) 5 Designer(s) Cierra Luna Understanding by Design
7 pages
Branding Guide
No ratings yet
Branding Guide
34 pages
Philippines Animation Directory
No ratings yet
Philippines Animation Directory
3 pages
Artigue and Blomhoj
No ratings yet
Artigue and Blomhoj
14 pages
L4 Diploma in Procurement and Supply Mar17 PDF
No ratings yet
L4 Diploma in Procurement and Supply Mar17 PDF
3 pages
Technology in Schools What Research Says
No ratings yet
Technology in Schools What Research Says
56 pages
Esp8 Q4-Mod.21
100% (1)
Esp8 Q4-Mod.21
49 pages
NUI Galway Undergraduate Prospectus 2022
No ratings yet
NUI Galway Undergraduate Prospectus 2022
196 pages
Supporting Documents For Ipcr 2
No ratings yet
Supporting Documents For Ipcr 2
2 pages
Music Q1 W2 D1 L2 MU4RH-Ic-4 MU4RH-Ic-5
100% (3)
Music Q1 W2 D1 L2 MU4RH-Ic-4 MU4RH-Ic-5
3 pages
A Checklist For Bipa Module A1-C2
No ratings yet
A Checklist For Bipa Module A1-C2
3 pages
Practice 4 Advanced Handout
No ratings yet
Practice 4 Advanced Handout
7 pages
07 r05310304 Kinematics of Machinery
No ratings yet
07 r05310304 Kinematics of Machinery
9 pages
Central: Educational
No ratings yet
Central: Educational
2 pages

TERMS: Textual Emotion Recognition in Multidimensional Space

Uploaded by

TERMS: Textual Emotion Recognition in Multidimensional Space

Uploaded by

Applied Intelligence (2023) 53:2673–2693

TERMS: textual emotion recognition in multidimensional space

Accepted: 29 March 2022 / Published online: 11 May 2022

1 Introduction sources of information for effective emotion recognition [3].

TERMS proposes emotion GMM (EmoGMM) that maps follows:

In the E-step, according to (11), we compute the posterior

3.4 TERMS prediction

To summarize the weighted GMM for the unseen text,

parameters of each emotion distribution. We initialize the

Table 2 Emotion distribution statistics

Fig. 2 Valence and arousal

labelled data; therefore, we compared our model to those 4.5 Results

Fig. 3 Distinguishability results

Table 3 Overall performance of prediction

Method AKL AED r valence r arousal

GBM 5.97 1.51 0.34 0.26

have a high element of sarcasm, satire and irony in them

Table 4 Bayesian analysis comparative results

Proposed Others t-value p-value Mean diff. Lower Upper

TERMS C-LSTM-CNN 3.12 0.00 0.32 0.18 0.53

Table 5 Recall by emotion class

Emotions Anger Anti. Disgust Fear Joy Sad. Surprise Trust

Recall 0.53 0.64 0.70 0.61 0.69 0.46 0.73 0.82

Table 6 Misclassified texts (Sarcasm & satire)

Texts Actual Predicted

I love when i can’t sleep. anger trust

Table 7 Misclassified texts (Lack of explicitness)

Texts Actual Predicted

royal mail... why you loose my parcel? anger trust

Table 8 Misclassified texts (Word sense disambiguation)

Texts Actual Predicted

this walking dead is very disappointing sadness fear

Fig. 7 Predicted values of

for negative emotions, they tend to signify unpleasantness

TERMS in emotion prediction problem TERMS have an

76. Guan J (2017) Proving personality-related differences in valence Fernando H. Calderon is a

Shi Jinping received the B.E Kuan-Ta Chen (a.k.a. Sheng-

Yi-Shin Chen joined National

You might also like