A Comparative Study of Subject-Dependent and Subject-
Independent Strategies for EEG-Based Emotion
Recognition using LSTM Network
Debarshi Nath Anubhav Mrigank Singh
Department of Computer Department of Computer Department of Computer
Science and Engineering Science and Engineering Science and Engineering
Delhi Technological University Delhi Technological University Delhi Technological University
Delhi, India Delhi, India Delhi, India
[email protected] [email protected] [email protected] Divyashikha Sethia Diksha Kalra S. Indu
Department of Computer Department of Electronics and Department of Electronics and
Science and Engineering Communication Engineering Communication Engineering
Delhi Technological University Delhi Technological University Delhi Technological University
Delhi, India Delhi, India Delhi, India
[email protected] [email protected] [email protected]ABSTRACT 1. INTRODUCTION
This paper addresses the problem of EEG-based emotion Assessment of emotion is an area that has been dealt with
recognition and classification and investigates the performance of extensively in recent literature in the domain of Brain-Computer
classifiers for subject-independent and subject-dependent models Interface. Recent researches have targeted mainly for people
separately. The results are compared with other classifiers and suffering from disorders that hinder or alter their ability to express
also with existing work in the concerned domain as well. We emotions effectively [10]. Different modalities have facilitated the
perform the experiments on the publicly available DEAP dataset study of emotions like facial expressions, human speech and tone,
with band power as the feature and classification accuracies are and physiological signals. One of such modalities is the
found pertaining to the widely accepted Valence-Arousal Model. electroencephalogram (EEG). The human brain comprises of
The best results were reported by the LSTM model in case of the billions of cells, of which a large number are neurons, and others
subject-dependent model with accuracies of 94.69% and 93.13% which aid and facilitate the activity of neurons. Whenever any
on valence and arousal scales respectively. SVM performed the activity occurs, it generates an electrical impulse in the brain due
best for the subject-independent model with accuracies of 72.19% to which thousands of neurons fire in sync. The activations of
on valence scale and 71.25% on arousal scale. neurons within our cranium determine emotions and responses.
Brain waves are generated by electrical pulses fired in sync from
CCS Concepts billions of neurons communicating with each other in the nervous
• Human-centered computing →Human computer interaction system. Brain waves can be detected using electrodes placed on
(HCI); • Computing methodologies → Machine learning; the scalp using invasive, non-invasive or semi-invasive
Artificial intelligence; Feature selection; Classification and procedures. We study these brain waves in the form of
regression trees; Machine learning algorithms; Machine Electroencephalography (EEG). EEG in recent years has become
learning approaches; Support vector machines; Neural an essential tool with its applications and utility in neuroscience,
networks. Brain-Computer Interfaces (BCI’s), and commercial applications
as well. Analytical tools prevalent in EEG studies frequently use
Keywords machine learning techniques to uncover relevant information for
Electroencephalography (EEG); Brain Computer Interface (BCI); neural classification and neuroimaging.
Emotion Recognition; Valence-Arousal Model; Long Short-Term
Emotion, as easy as it is to feel, is equally challenging to study
Memory network;
and measure objectively. Researchers for its objective analysis
have pro-posed several models. Most prominent of those is
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are Valence-Arousal Model by Russell [11] that represents emotions
not made or distributed for profit or commercial advantage and that copies on a 2-D circular space where arousal represents the vertical axis,
bear this notice and the full citation on the first page. Copyrights for and valence represents the horizontal axis. Various emotions
components of this work owned by others than ACM must be honored. circumscribe the centre of the valence-arousal plane. This centre
Abstracting with credit is permitted. To copy otherwise, or republish, to represents the neutral valence and a medium value for arousal.
post on servers or to redistribute to lists, requires prior specific permission This model is widely accepted and used extensively in emotion
and/or a fee. Request permissions from [email protected]. recognition studies. Figure 1 illustrates thevarious ratings of
ICCDA 2020, March 9–12, 2020, Silicon Valley, CA, USA
valence and arousal for different emotions.
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7644-0/20/03…$15.00
https://doi.org/10.1145/3388142.3388167
142
We also explore other models available for emotion representation. 2. RELATED WORKS
Bradley et al. [2] proposed another model known as Approach and This section comprises of a detailed review of some prominent
Withdrawal Model or the Vector model. This model is also a 2-D researches related to emotion recognition from EEG Signals. For
efficient classification of emotions, various factors plays a vital
role like pre-processing of brain signals to remove artefacts,
feature extraction, feature reduction techniques, and lastly the
classifiers used for classification of emotions. Feature extraction
plays a vital in emotion recognition as these features represent the
information stored in EEG signals. Broadly these features are of
three types:
Time Domain Features
Frequency Domain Features
Time-Frequency Domain Features
Jenke et al. [5] have reviewed various feature extraction methods
like ReliefF and Min-Redundancy-Max-Relevance (mRMR) for
recognition of emotions using EEG signals. Various time-domain
features like Event-Related Potentials (ERPs), Statistics of signal
such as mean, standard deviation, power, and Hjorth features,
Figure 1. Valence and Arousal ratings corresponding to Fractal Dimension (FD), Higher-Order Crossings (HOC) are
different emotions as reported in DEAP dataset. [6] studied. Also, frequency domain features like band power, and
Higher-Order Spectra (HOS) as well as time-frequency domain
features like Hilbert Huang Spectrum (HHS), Discrete Wavelet
Table 1: Different frequency bands and their significance Transform (DWT) are described in detail. From our review of the
existing literature, we infer that there are two different types of
Frequency Range approaches towards emotion recognition algorithm:
Bands Significance
(Hz)
Subject-Dependent
Delta 0.5 – 4
Deep Sleep, Deepest level Subject-Independent
of Relaxation
In both the algorithms feature extraction methodology remains the
REM Sleep, Deep and same. However, in the case of subject-dependent on the model
Theta 4–8 Raw Emotions, Cognitive algorithm, the classifier is trained for each subject individually,
processing whereas, in the subject-independent model, the classifier is trained
for several subjects. We discuss some notable researches related
Drowsy state, Relaxation,
Alpha 8 – 13 to subject-dependent and subject-independent emotion recognition
Calmness strategies. Pandey and Seeja [9] used the DEAP Dataset for
Conscious state, Thought creating a subject-independent model using wavelet transform as a
Beta 13 – 30 feature and deep neural network (DNN) as the classifier,
process
achieving a maximum classification accuracy of 62.5% for
Two different senses at valance, and 64.25% for arousal. Liu and Sourina [8] proposed a
Gamma > 30
same time real-time subject dependent emotion recognition technique from
EEG Signals using fractal dimension (FD) in combination with
statistical and higher Order Crossings (HOC) as feature and SVM
model where the valence determines the direction of emotion. as classifier. 14-channel Emotiv device was used for data
Here, a positive value of valence turns the emotion in the top collection. Sixteen subjects were shown visual stimuli and eight
vector, and on the contrary, the negative value of valence turns the different emotions were recognized. Maximum mean accuracy of
emotion in the down vector. Watson et al. [4] describes a Positive- 77.81% was achieved for classifying two different emotions. The
Negative Model. In this model, the horizontal axis represents low proposed method is validated on the DEAP Dataset, achieving a
to high negative affect, and the vertical axis represents low to high mean accuracy of 85.38% for classifying two emotions. Various
positive affect. deep learning applications in Natural Language Processing (NLP)
demonstrates the memorizing capability of the LSTM model.
A typical EEG device measures the electrical activity with the
Hence, LSTM model can be instrumental in emotion recognition.
help of electrodes which are in contact with the head of a subject,
We also describe other researches employing LSTM model for the
according to the 10-20 International System. The raw EEG signal task of emotion recognition. Alhagry et al. [1] employed the
is composed of many frequency bands and noise. Hence we need DEAP dataset for emotion recognition. LSTM model was used for
to filter the raw EEG signal then decompose it into constituent
the classification of emotions of 32 subjects individually and
frequency bands like Delta (0-4 Hz), Theta (4-8 Hz), Alpha (8-
achieved 85.65%, 85.45%, and 87.99% accuracy corresponding to
13Hz), Beta (13-30 Hz), and Gamma (>30 Hz) for emotion arousal, valence, and liking. Li et al. [7] classified human
recognition studies. These bands contain useful information of emotions from EEG signals. RASM is used as a feature which is
concern for brain activities. Table 1 describes the significance of
extracted from the DEAP dataset. LSTM network is used as the
these frequency bands. Therefore, EEG signals have broad
classifier, achieving a mean accuracy of 76.67% for valence.
applications ranging from emotion recognition to diseases and
disorders prediction like Sleep Apnea, Epilepsy, and Alzheimer’s Yang et al. [14] employs a parallel combination of Convolutional
disease. Neural Network (CNN) and LSTM Network to derive features
143
from EEG and physiological signals in DEAP dataset. Then, trains emotion recognition task, we require an extensive database. As the
the softmax classifier for emotion classification using the subject- DEAP dataset reports EEG and physiological signals of 32
dependent strategy, which achieves a mean accuracy of 90.80% participants only, it is not currently possible to account for such
and 91.03% for valence and arousal. In this paper, we explore two diversity.
different training strategies, namely subject-dependent and
subject-independent. For both environments, we use band power
feature extracted from the raw EEG signals and test different
classification techniques like KNN, SVM, Decision Tree, Random
Forest, and LSTM model. The next section describes the
methodology adopted in this research.
3. METHODOLOGY
3.1 Dataset
The DEAP dataset [6] is a publicly available multimodal dataset
designed specifically for emotion analysis. This dataset describes
the recordings of EEG and other physiological signals for 32
participants while they watched 40 one-minute-long excerpts of
selected videos. A standard EEG headset with 32 channels and
sensors like EOG, EMG, and temperature record signals with a
sampling frequency of 128 Hz. Then the recorded signals are
passed from a bandpass filter to remove noise and artefacts like
eye blinks. For assigning an emotion to the signals in terms of
valence, arousal, dominance, and liking the participants rated each
video with values ranging from 1-9.
Figure 2. Power Spectral Density of EEG signal
3.2 Preprocessing
We observe the raw EEG signals of the DEAP dataset and try to
find any similarity between EEG signals of each subject, and also This lack of database motivates to contrast the subject-dependent
any similarity between different trials for the same subject. We and subject-independent strategies to train the classifiers for
infer that the EEG signal of each individual is unique. There exists emotion recognition task using EEG signals.
no similarity in the EEG signals of different individuals, even if
the signals are from the same trial. This is intuitive as each 3.3 LSTM Model
individual possesses different emotional limits and affinities. In this paper, we test the LSTM network as an efficient tool for
the prediction of an individual’s emotion. In Natural Language
However, it is also worth noting that there exist similarities in the Processing (NLP), the LSTM networks play an influential role in
EEG signals of the same subject across various trials. From these remembering long-term as well as short-term dependencies. The
two observations, we conclude that there is a scope for developing LSTM cell utilizes the following equations for employing the
two separate models for studying the emotional states for the memory retention property.
subjects, which is also validated by the availability of separate
researches into the two models. A subject-dependent model would ĉ<𝑡> = 𝑡𝑎𝑛ℎ( 𝑊𝑐 [𝑎<𝑡−1> ∶ 𝑥 <𝑡> ] + 𝑏𝑐 ) (1)
facilitate the recognition of the emotional state of a particular 𝛤𝑢 = 𝜎(𝑊𝑢 [𝑎<𝑡−1> ∶ 𝑥 <𝑡> ] + 𝑏𝑢 ) (2)
individual the model is trained on the same individual in prior. A
𝛤𝑓 = 𝜎(𝑊𝑓 [𝑎<𝑡−1> ∶ 𝑥 <𝑡> ] + 𝑏𝑓 ) (3)
subject-independent model will help determine the emotional state
of a person whose previous EEG signals have never been recorded. 𝛤𝑜 = 𝜎(𝑊𝑜 [𝑎<𝑡−1> ∶ 𝑥 <𝑡> ] + 𝑏𝑜 ) (4)
These are two possible use cases of the two models.
𝑐 <𝑡> = 𝛤𝑢 ∗ ĉ<𝑡> + 𝛤𝑓 ∗ 𝑐 <𝑡−1> (5)
The signals reported in the DEAP data set have a time duration of
63s (3s prior and 60s for the video). Although the 3s prior signal 𝑎<𝑡> = 𝛤𝑜 ∗ 𝑡𝑎𝑛ℎ(𝑐 <𝑡> ) (6)
is not affected by the stimulus video, the signals displayed during Here, the input and output values of the LSTM cell for time t is
the stimulus and the final value of emotion may have some 𝑥 <𝑡> , and 𝑎<𝑡> . The symbols W and b represent Weights and
correlation with the prior signals. So in this study, we do not Biases of different gates and candidate value for memory. 𝛤𝑢 ,𝛤𝑓 ,
remove prior signals before feature extraction. As previous studies 𝛤𝑜 represent the Update, Forget, and Output gate for regulating the
suggest, we explore the frequency domain features where we state of a LSTM cell. 𝑐 <𝑡> represents the memory of the cell
extract the band power of different bands of the EEG signals. We while the ĉ<𝑡> represents the candidate value for memory update.
use the Welch method with the Hanning window of 1s to The following equations demonstrate the σ(x) and tanh(x) non-
determine Power Spectral Density as displayed in Figure 2. To linear activation functions:
obtain minute samples of information, we employ a stride of 0.25s 1
for the entire signal. Thus, we obtain 249 band power values for 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = (7)
1+ 𝑒 −𝑥
different time instances. We use only the EEG signals for feature 𝑒 𝑥 + 𝑒 −𝑥
extraction and modelling to determine regions of brains 𝑡𝑎𝑛ℎ(𝑥) = (8)
𝑒 𝑥 − 𝑒 −𝑥
responsible for a specific emotion. Since the previous activations of brain and events critically affect
Even though the anatomy of brains is similar for every human, but the subsequent activations of the brain, thus remembering past
the consciousness of everyone is unique. This fact explains the events may enable the LSTM model to gain insights from the
diversity in the activations of brain, which is validated by the patterns of EEG signals. Hence, we test the LSTM model for the
variance in EEG signals. So to train generalized models for the emotion recognition task using the EEG signals. Figure 3 displays
144
the best LSTM model configuration for subject-dependent training such as KNN, SVM, Decision Tree, Random Forest, and LSTM
strategy. To ensure GPU support, we implement the LSTM model. We train the LSTM model, and other classifiers on the
similarly pre-processed dataset having band power features
extracted from the EEG signals. The DEAP dataset reports the
values of Valence and Arousal on a scale of 1-9; we use 5 as the
threshold value for categorizing the low and high values. For
determining the accuracy in prediction, we convert the predicted
values of different classifiers into labels 0 (low) and 1 (high).
Thus,
Table 2. Testing accuracies for Subject-Dependent and
Subject-Independent models
Subject-Dependent Subject-Independent
Models
Valence Arousal Valence Arousal
KNN 86.03 79.64 70.86 68.36
SVM 76.56 72.66 72.19 71.25
Decision
71.10 67.97 58.13 55.63
Tree
Random
81.25 81.19 61.95 61.25
Forest
LSTM 94.69 93.13 70.31 69.53
we calculate the prediction accuracy using the equation:
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (10)
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
Here, we determine True Positive (TP), True Negative (TN), False
Figure 3. Proposed LSTM Model Negative (FN), and False Positive (FP) values from the confusion
matrix. We follow both the subject-dependent and subject-
independent strategies by separately training and testing the
model in Python 3 using Keras library with Tensorflow backend classifiers. We test these classifiers several times to ensure that the
on the Google Colab platform. We experiment with several results observed are significant. Table 2 highlights the average
configurations of following LSTM model parameters: testing accuracies for both the subject-dependent and subject-
Number of LSTM Nodes: 32, 40 independent training strategies.
Number of Dense 1 Nodes: 10, 20, 24, 32, 40 On analyzing the results, we observe that for the subject-
Activation for Dense 1: ’relu’, ’sigmoid’, ’tanh’ dependent strategy, the LSTM model outperforms the other
Learning Rate: 1, 1e-1, 1e-2, 1e-3, 1e-4 classifiers by a large margin. Whereas in the subject-independent
Learning Rate Decay: 1e-5, 1e-6 strategy, the SVM model performs better. In subject-dependent
conditions, we observe a significant increment of about 16% and
We get the best testing accuracies for the model having 40 nodes
18% for valence and arousal scales, respectively when comparing
in LSTM layer, 10 nodes in Dense 1 layer with ’tanh’ activation
the performance of LSTM model with other classifiers. We obtain
function, and a single node with ’sigmoid’ activation function in
a maximum improvement in average testing accuracy of 18% for
Dense 2 layer. To avoid over-fitting, we use 25% dropout between
valence scale and 20% for arousal scale when comparing the
the LSTM Layer and Dense 1 Layer. This model minimizes the
SVM classifier with the LSTM model.
log-loss function using the Stochastic Gradient Descent (SGD)
optimization algorithm. Following equation displays log-loss We also contrast our results with the results obtained by Yang et
function: al. [14] following a similar experimental procedure with a parallel
combination of Convolutional Neural Network (CNN) and
𝐿(𝑦, ŷ) = −𝑦 𝑙𝑜𝑔(ŷ) − (1 − 𝑦) 𝑙𝑜𝑔(1 − ŷ) (9) Recurrent Neural Network (RNN) model. Here, we observe a gain
Best results are obtained with the configuration of SGD with of 4% for valence and 2% for arousal in average testing accuracies
learning rate=1 ∗ 𝑒 −2 , learning rate decay constant=1 ∗ 𝑒 −5 , and for all the subjects. Proposed LSTM model for subject-dependent
momentum constant= 0.9. approach observes a significant increment of 9% and 7.5% in
valence and arousal when compared with Alhagry et al. [1] and 14%
4. RESULTS and 19% in valence and arousal when compared with Xing et al.
To contrast the subject-dependent and subject-independent [13]. Figure 4 describes the average testing accuracies with
strategies, we examine the performances of different classifiers positive and negative deviations for valence and arousal.
145
Figure 4. Testing accuracies for 32 subjects using LSTM model for subject-dependent model
For subject-independent conditions, we observe a drop in [2] Margaret M Bradley, Mark K Greenwald, Margaret C Petry,
prediction accuracy of all classifiers. Nevertheless, the and Peter J Lang. 1992.Remembering pictures: Pleasure and
performance of the LSTM model drops most significantly, i.e. arousal in memory. Journal of experimental psychology:
almost 24% for valence and arousal. This drop in performance Learning, Memory, and Cognition18, 2 (1992), 379–390.
indicates that the LSTM model requires an extensive data set for [3] Harsh Dabas, Chaitanya Sethi, Chirag Dua, Mohit Dalawat,
achieving generalization. For the subject-independent model, the and Divyashikha Sethia. 2018. Emotion Classification
best results were achieved by SVM, showing better performance Using EEG Signals. In Proc. ACM Int. Conf. Computer
by approximately 6% on the valence scale and 7% on the arousal Science and Artificial Intelligence. ACM, 380–384.
scale over other classifiers. [4] Watson David and Tellegen Auke. 1985. Toward a
consensual structure of mood. Psychological bulletin 98, 2
5. CONCLUSION (1985), 219–235.
In this work, we evaluate power spectral density over the 32 [5] Robert Jenke, Angelika Peer, and Martin Buss. 2014.
channels of EEG from the DEAP dataset. We decompose them Feature Extraction and Selection for Emotion Recognition
into five bands of frequencies, Delta, Theta, Alpha, Beta, and from EEG.IEEE Trans. on Affective Computing 5, 3 (2014),
Gamma to derive the band power of each band of frequency. We 327–339.
use this band power over each trial as a feature for classifying [6] Sander Koelstra et al.2011. Deap: A Database for Emotion
valence and arousal rating of the subject. We evaluate the Analysis; Using Physiological Signals. IEEE Trans.
performance of classification using KNN, SVM, Decision Tree, Affective Computing 3, 1 (2011), 18–31.
Random Forest, and LSTM as our classifiers and compare our [7] Zhenqi Li, Xiang Tian, Lin Shu, Xiangmin Xu, and Bin Hu.
results. We find an average increment of 16% for valence and 17% 2017. Emotion Recognition from EEG using RASM and
for arousal in the average testing accuracies for the subject LSTM. In International Conference on Internet Multimedia
dependent model by LSTM model over other classifiers. The Computing and Service. Springer, 310–318.
maximum classification accuracy of 94.69% for valence and [8] Yisi Liu and Olga Sourina. 2014. Real-Time Subject-
93.13% for arousal was achieved using the LSTM classifier, Dependent EEG-Based Emotion Recognition Algorithm.
which outperforms the state-of-the-art classifiers. For the subject- Springer Berlin Heidelberg, Berlin, Heidelberg, 199–223.
independent model, the best results were achieved by SVM, [9] Pallavi Pandey and K.R. Seeja. 2019. Subject independent
showing an average increment of approximately 6% on the emotion recognition from EEG using VMD and deep
valence scale and 7% on the arousal scale over other classifiers. learning. Journal of King Saud University -Computer and
Thus, to create a generalized Deep Learning models for excellent Information Sciences (2019).
performance on emotion recognition, we require an extensive [10] Robert Plutchik. 1991. The Emotions. University Press of
database. America.
For further work, we can extend the proposed experimental setup [11] James A Russell. 1980. A circumplex model of affect.
for real-time applications. For improvements further, we can add Journal of personality and social psychology39, 6 (1980),
more features from EEG channels along with physiological 1161–1178.
features and test for their performance. A subset of the channels [12] Itsara Wichakam and Peerapon Vateekul. 2014. An
for feature generation, rather than using all the channels of EEG, evaluation of feature extraction in EEG-based emotion
may perform better in terms of accuracy as demonstrated by the prediction with support vector machines. In Proc. IEEE Int.
works of Wichakam and Vateekul [12]. Other emotion models Conf. Joint conference on computer science and software
beyond the Valence-Arousal Model can also be addressed using engineering. IEEE, 106–110.
the strategy used in this paper, like the 3-D emotion model used in [13] Xiaofen Xing, Zhenqi Li, Tianyuan Xu, Lin Shu, Bin Hu,
the work of Dabas et al. [3]. and Xiangmin Xu. 2019. SAE+LSTM: A New Framework
for Emotion Recognition From Multi-Channel EEG.
6. REFERENCES Frontiers in Neurorobotics13 (2019), 37.
[1] Salma Alhagry, Aly Aly Fahmy, and Reda A El-Khoribi. [14] Yilong Yang, Qingfeng Wu, Ming Qiu, Yingdong Wang,
2017. Emotion Recognition based on EEG using LSTM and Xiaowei Chen. 2018. Emotion Recognition from Multi-
Recurrent Neural Network.Emotion8, 10 (2017), 355–358. Channel EEG through Parallel Convolutional Recurrent
146
Neural Network. Proc. IEEE Int. Conf. Joint Conference on Neural Networks(2018)
147