0% found this document useful (0 votes)
30 views5 pages

EEG-Based Emotion Recognition by Using Machine Learning and Deep Learning

The document discusses emotion recognition using EEG data and different machine learning models. It compares the performance of support vector machines, random forests and convolutional neural networks on an EEG dataset. It analyzes features like power spectral density and differential entropy in different frequency bands and electrode combinations to identify the most effective approaches for EEG-based emotion recognition.

Uploaded by

Harish Kumar M V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

EEG-Based Emotion Recognition by Using Machine Learning and Deep Learning

The document discusses emotion recognition using EEG data and different machine learning models. It compares the performance of support vector machines, random forests and convolutional neural networks on an EEG dataset. It analyzes features like power spectral density and differential entropy in different frequency bands and electrode combinations to identify the most effective approaches for EEG-based emotion recognition.

Uploaded by

Harish Kumar M V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

EEG-Based Emotion Recognition by Using Machine


Learning and Deep Learning
2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) | 978-1-6654-8887-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/CISP-BMEI56279.2022.9979849

Wenhui Tong, Li Yang, Yingmei Qin, Yanqiu Che, Chunxiao Han *


Tianjin Key Laboratory of Information Sensing & Intelligent Control, School of Automation and Electrical Engineering,
Tianjin University of Technology and Education,
Tianjin 300222, China
E-mail: [email protected]

Abstract—At present, there are many classification methods for such as computer games, safe driving assistance systems, and
emotion recognition. Based on the SEED dataset, this paper auxiliary learning systems. In addition, emotion recognition is
explored the recognition performance of three emotion also promising in the field of auxiliary diagnosis of various
recognition models: support vector machine(SVM), random emotional disorders, such as depression, post-traumatic stress
forest(RF) and convolutional neural network(CNN). Firstly, we disorder, anxiety. For example, It was shown in the study of Li
compared the accuracy of the five features of power spectral et al. [3]3] that the gamma band EEG signal was appropriate
density (PSD), differential entropy (DE), differential asymmetry for distinguishing happiness from sadness. It was proved by
(DASM), rational asymmetry (RASM), and the differences Nie et al.[4] that occipital lobe and parietal lobe are the main
between DE features of 23 pairs of frontal-posterior electrodes
areas of emotional EEG signals. They also improved that the
(DCAU) in total frequency bands. It is proved that the DE
feature is more conducive to emotion recognition. Secondly, we
alpha band was mainly generated in the right occipital and
also compared the accuracy of DE feature in five different parietal brain. Längkvist et al[4] used Deep Belief Network
frequency bands (Delta, Theta, Alpha, Beta and Gamma). Here (DBN) and Hidden Markov Model to perceive different stages
we verified that the performance of the Gamma and Beta of sleep on multi-modal clinical sleep data sets. A method for
frequency bands were better than the others. Then, we traversed frequency band research was devised by Li and Lu [5] which
the recognition accuracy of each channel through the RF method, can select more sensitive frequency band. The data revealed
and chose four different channel combinations for 4, 6, 9 and 12 that the gamma frequency band meet the requirement of EEG-
channels. Finally, we compared the DE features of these four based sentiment classification. In their conclusions, there are
combined channels with the performance of all 62 channels, and some features that are not related to emotion recognition or
it is demonstrated that the accuracy of the four combined superfluous. However, the research about key channels and
channels were comparatively steady, and the best recognition frequency bands of EEG emotion recognition is still lacking
rate was 89.20%, which was similar to the recognition accuracy and more study is needed.
of the original 62 channels.
The key of EEG emotion recognition is to select the
Keywords-EEG;emotion; support vector machine(SVM); features, frequency bands and channels of EEG signals. In this
random forest(RF); convolutional neural network(CNN) paper, based on the SEED dataset, we used SVM, RF and CNN
three classification models to classify the emotions of 15
I. INTRODUCTION subjects based on EEG signals. Firstly, we compared the
classification accuracy of the five features of PSD, DE, DASM,
Emotions play a very important role in people's daily life, RASM and DCAU in five different frequency bands, in which
which can easily influence people’s behavior, judgment and we found that the DE feature had higher accuracy for emotion
decision-making in interpersonal communications. recognition. After that, we compared the five different
Theoretically, we call the neural activity produced by the frequency bands with DE feature, and it was shown that Beta
cerebral cortex Electroencephalogram (EEG) signals. EEG and Gamma bands had better performance. Finally, we focused
signals record the activity of the cerebral cortex in a non- on the classification results of the four different channels
invasive manner and reflect the activity of the brain to a great combinations. It is proved that based on the combinations of
extent[1]. Traditional emotion recognition is mainly based on channels we have picked, comparatively steady performance
the study of facial features, body movements and speech. can be achieved in all experiments with different subjects. The
These external features are easy to disguise but cannot reflect above experimental results suggested that to some extent, brain
real emotions. When doing research on emotions, EEG signals activity can identify specific emotional states.
can reflect the neurophysiological activities of the brain, which
can make up for the shortcomings of traditional research
methods. II. EXPERIMENT

Therefore, EEG-based emotion recognition has very broad A. Dataset


application scenarios. In human-computer interaction The SEED dataset[4] is a public emotion recognition
scenarios[2], people can obtain various enhanced experiences, dataset provided by the BCMI Laboratory of Shanghai Jiaotong
This work is supported by the Natural Science Foundation of Tianjin,
China (Grant Nos. 18JCYBJC88200), the National Natural Science
Foundation of China (Grant No.62103301) and the Fund of Scientific
Research Project of Tianjin Education Commission (Grant No.2020KJ119).

978-1-6654-8887-7/22/$31.00 ©2022 IEEE


Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
University, which contains EEG signals induced by subjects are arranged in accordance with the international 10-20 system,
watching different movie clips. Fifteen Chinese movie clips which is shown in Fig 3.
(including positive, neutral and negative emotions) as stimulus
materials. Here are a few tips to keep in mind about the
experimental process and the selection of movie clips: (a) The
whole experimental process should be kept at about 30 minutes
to prevent the subjects from feeling tired; (b)The videos should
be easy to understand and do not require explanation; (c) The
target emotion triggered by the videos should be as single as
possible. Each movie clip has been mindfully selected to elicit
a coherent emotional stimulus and maximized the meaning of
emotion. Table Ⅰ shows the details of the movie utilized in the
experiment, where 0 represents negative emotions, 1 represents
neutral emotions, and 2 represents positive emotions

TABLE I. LABLES AND VIDEOS UTILIZED IN THE EXPERIMENT

Figure 2. The experimental environment[8]


Lable Video source
0 Tangshan Earthquake
0 1942

1 World Heritage in China

2 Flirting Scholar

2 Lost in Thailand

Each experiment needs to watch all 15 movie clips, so each


experiment has 15 trial trials. There is a 5s prompt before the
movie clip starts, and then the movie clip is played. The
duration is about 4 minutes. In order to get feedback in time,
subject is required to try to fill out the emotional reaction
immediately after watching each short film and report them.
The self-assessment time is 45s, and the rest time is 15s after Figure 3. 62 channel electrode distribution map
the evaluation. Two movie clips of the same emotion will not
be displayed successively. The experimental paradigm is
shown in Fig 1. C. Data Preprocessing
The original EEG data was down-sampled to 200 Hz, and
the band-pass filter of 0.5~50 Hz was used for filtering, and the
EEG fragments corresponding to each movie were extracted.
There were three experiments and each experiment has 15
trials. So there were a total of 45 .mat files, which contains the
corresponding emotional label (positive The label is +1, the
neutral label is 0, and the negative label is -1).
Figure 1. Protocol of the EEG experiment
III. METHODS
B. Subjects
A. Feature Extraction
In the SEED dataset[7], 15 subjects (7 males, 8 females,
19-28 years old, average 23.27 years old, variance 2.37 years In this study, frequency-domain characteristics and their
old) were recruited to participate in the experiment. All combinations were used. The frequency domain characteristics
participants were students from Shanghai Jiaotong University, were calculated using the 512 points short-time Fourier
who reported that their vision and hearing was normal. transform (STFT) and a Hanning window of 1s without
Participants should sit comfortably during the experiment, overlapping. Five characteristics of PSD, DE, DASM, RASM,
avoid body movement to concentrate on watching movie clips. and DCAU were compared. PSD characterizes the features
The surrounding environment should be kept quiet to avoid extracted from recorded EEG data and can be used for emotion
affecting the participant’s attention. At the same time, the analysis. There is a valid feature of DE[9], which expands the
subjects’ EEG data should be recorded. The experimental concept of transform from a time domain into a frequency
environment is shown in Fig 2[8]. The ESI-Neuro-Scan system domain. The complexity of a continuous random variable can
is used to record the EEG signal, the sampling rate is 1000 Hz, be represented by Shannon entropy [11]. For the first time,
the electrode cap is an active AgCl electrode, and 62 channels Duan et al[7]. cited the DE feature to identify emotions, which

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
can be better distinguished due to the higher energy of the low number of features. The data points of the three main human
frequencies of EEG data. If a random variable x obeys the emotions (positive, negative and neutral) are then clearly
Gaussian distribution N (μ, σ2), the DE feature can simply be classified using the hyperplane with the largest margin.
calculated as: Hyperplanes are linear boundaries with data points on either
side of which belong to different categories. The size of the
hyperplane depends on the number of features, and 5 features
were used in this study. We call the points close to the
hyperplane classification support vectors, which influence and
position the placement of the hyperplane. For SVM, the first
For a fixed length EEG segment, DE is equivalent to the 60% of the data of an experiment is used as the training set for
logarithm energy spectrum in a certain frequency band[9]. the model, and the rest 40% is used as the testing set. We also
Since EEG signals can be divided into five frequency bands, use LIBSVM toolbox in MATLAB to build SVM classifier and
which are δ: 0.5~4 Hz, θ: 4~8 Hz, α: 8~14Hz, β: 14~30 Hz, γ: adopt Radial Basis Function, which is defined as:
31~50 Hz, a value of DE feature can be calculated for each
frequency band.
Early studies[10] showed that brain activity has a certain (5)
asymmetry, that is, its energy manifests differently in the left
and right brains and frontal and posterior brains, so we also Random Forest is an integrated learning model that uses
calculated the difference and ratio between DASM and RASM multiple decision trees to train and predict data samples.
features, which represent the DE features of 27 pairs of Breiman [13] first introduced random forest in combination
hemispherical asymmetric channels, which are FP1, F7, F3, with Bagging ensemble learning theory. RF generation process
FT7, FC3, T7, P7, C3, TP7, CP3, P3, O1, AF3, F5, P7, FC5, is as follows: (a) Assuming that the total sample size is T. A
FC1, C5, C1, CP5, CP1, P5, P1, PO7, PO5, PO3, and CB1 of sample set of size N is randomly selected from the total sample
the left hemisphere and FP2, F8, F4, FT8, FC4, T8, P8, C4, by using the method of Boostrap sample(randomly put back).
TP8, CP4, P4, O2, AF4, F6, F8, FC6, FC2, C6, C2, CP6, CP2, (b)The number of tree grow as much as possible in the optimal
P6, P2, PO8, PO6, PO4, and CB2 of the right hemisphere. splitting mode and then a decision tree is formed without
DASM is defined as: pruning. Then specify a positive integer k < < M (M is the
number of feature dimension in each sample), and randomly
select k optimal features from M when splitting the decision
(2) tree each time; (c) Repeat the above two steps k times to
construct k decision trees, on which RF can be constituted.
and RASM is defined as: The formula of RF output probability value is:

(6)
where and represent the pairs of electrodes on the
left and right hemisphere. DCAU features are used to in which, I is the number of classification sets and k is the
characterize the difference between the DE features of the 23 number of decision trees respectively; pc is the highest average
pairs of front and posterior channels (FC1-CP1, F3-P3, FT8- probability of RF taking all decision trees; pij is the decision
TP8, FCZ-CPZ, FPZ-OZ, FC6-CP6, FT7-TP7, F6-P6, FC2- probability of the decision tree. In this paper, we set the
CP2, F1-P1, FC5-CP5, F7-P7, F8-P8, F2-P2, FC3-CP3, F4-P4, number of trees is 50, 100, 150, 200, 250, 300, and use the
FP2-O2, FC4-CP4, F5-P5, FZ-PZ, FP1-O1, AF3-CB1 and Tree Bagger function in MATLAB to achieve.
AF4-CB2). We define DCAU as: CNN is one of the typical representatives of deep learning
algorithms [14], which can effectively reduce the complexity
of network computing. It is a multilayer neural network with
forward feedback and has good results in text, speech, image,
video and other aspects. Through multi-layer convolution,
where and represent the pairs of frontal and more abstract signal features are extracted continuously, which
posterior channels. not only strengthens the effective signal features, but also
weakens the noise signal features. By this way, the alternating
B. Classfier Training use of convolutional layer and pooled layer extracts image
Three classifiers are used and evaluated in this study. First, features well, and finally achieves good results in classification
a traditional machine learning algorithm was used for recognition layer. In our study, overlapping sampling sliding
classification. The machine learning algorithm used SVM and window technology was used to obtain enough data segments
RF, and then compared with the deep learning model. The deep to increase the number of data sets, and we set the sliding
learning algorithm used CNN. window to 6s and the sliding step to 2s. In contrast to original
feature extraction methods, CNN only need to transform EEG
SVM is a technique for linear classification by finding signals into an image that CNN can recognize, which is only
hyperplanes in an n-dimensional space, where n represents the retained the important message of the EEG signals [15]. We

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
first use short-time Fourier transform (STFT) to transform the Secondly, we compared the accuracy of DE feature in the five
EEG signal into a two-dimensional matrix in image format. frequency bands. It is shown tha the average accuracy plot of
The two-dimensional matrix transformed into an image can the three classifications in the five frequency bands from Fig 4.
then be accepted by the CNN. The transformed image size is And we can draw a conclusion from Fig 4 that the information
15×32×200, in which 15 represents the number of subjects, 32 conducive to the recognition of EEG signals and emotions is
represents the time, and 200 represents the sampling frequency. mainly distributed in Gamma and Beta frequency. It is proved
For the pre-processed EEG signals, the EEG features used by that brain activity is more correlated with Beta and Gamma
CNN to extract emotions include 3 convolutional layers, 3 band oscillations than in other frequency bands.
supreme pooling layers, and fully connected layers which
divide features into three labels of features. The convolutional
layer performs convolution operations on the local perception
area of the upper layer. Then, the image was input to the CNN,
and to reduce the input data, we utilized three convolutional
layers to extract the main features of the EEG data. Next, we
divided each subject's EEG dataset into three subsets according
to the ratio of 6:2:2, which was training set, validation set and
the test set. We divide the datasets chronologically without
overlapping. The convolution kernels of the three
convolutional layers used 16×3×1, 32×2×1, 64×2×1, and the
step size of the convolution kernel is set to 1. In the wake of
the convolution operation, a nonlinear factor is added using the
activation function thus the output of some neurons in the
network is 0. After that, modest sparsity was conducive to
accelerating network convergence and reducing the
interdependence of parameters, thereby helping to avoid the
problem of model failure overfitting and improve the Figure 4: The average accuracy of the three different classifiers in five
generalization ability of the model. Finally, we divided the frequency bands for all subjects.
features into three categories as output with two fully
connected layers. We built a model on torch 1.1.0 using Python Finally, we studied whether the recognition of the three
3.7 and the model run triumphantly based on NVIDIA Quadro emotions can be reduced to a smaller channel combination,
M4000 GPUs. thereby significantly improving performance. We traversed the
recognition accuracy of each channel through the RF method
and designed four different channels combinations. As we can
IV. RESULTS
see from Table Ш that the four different combinations
Firstly, we compared the performance of different features. evaluated in this paper. Then we extracted the DE features of
The comparison conclusions were shown in Table Ⅱ. We can these four combined channels and compared them with the
ascertain from the results that for CNN, DE features with the performance of all 62 channels. It is shown in Table IV that the
total frequency band reach a superior classification accuracy of average accuracy of the three classification methods of
85.27%. For SVM and RF, a similar conclusion was clearly different channels combinations. For the 4-channel
confirmed, that is, the accuracy of DE feature was the highest combinations, it showed that the DE feature with total
in the total frequency band. The conclusions showed that frequency bands can reach a relatively high and stable accuracy
compared with other features, the DE feature had superior of 89.62%. Using only 4 channels, we achieved the best
performance for EEG emotion recognition. Although the average accuracy of 88.21%, and it is slightly lower than the
recognition accuracy of asymmetric features (DASM, RASM, 89.62% accuracy rate with all 62 channels. This is similar to
and DCAU) is slightly lower than that of DE features, they are the experimental conclusion of B. Lu et al[9].More
almost equivalent in accuracy, indicating that there is indeed an importantly, these four channels are located at the temporal
asymmetry in brain activity (its energy manifests differently in border of the human brain and are very easy to install when
the left and right brains and frontal and posterior brains) during used. These conclusions provided an important theoretical
emotion recognition. basis for the development of wearable EEG devices that we can
use for emotion recognition in the real world.
TABLE II. THE MEAN ACCURACY AND STANDARD DEVIATION (%) OF
PSD, DCAU, DASM, RASM AND DE FEATURES OF THE THREE CLASSIFIERS IN
TABLE III. FOUR DIFFERENT CHANNEL COMBINATIONS
TOTAL BANDS

Channel combinations Channel


Classifier PSD DCAU DASM RASM DE
SVM 50/15.93 80.25/7.85 81.02/14.56 80.74/13.29 83.68/10.63 4 Channels FT7,FT8,T7 , T8

RF 69.44/16.12 70.21/8.24 80.94/14.43 78.01/14.01 82.71/10.79 6 Channels FT7,FT8,T7 ,T8,TP7 ,TP8


CNN 66.67/16.75 72.31/8.52 75.63/14.37 81.21/13.97 85.27/10.56 9 Channels FP1,FPZ,FP2,FT7,FT8,T7,T8,TP7 , TP8
12 Channels FT7,FT8,T7,T8,C5,C6,TP7,TP8,CP5,CP6,P7 ,P8.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
TABLE IV. THE AVERAGE ACCURACY AND STANDARD DEVIATION (%) FOR DIFFERENT CLASSIFIERS IN FIVE DIFFERENT CHANNEL COMBINATIONS IN
TOTALFREQUENCY BANDS

Classifier 4 Channels 6 Channels 9 Channels 12 Channels 62 Channels


SVM 88.21/10.35 88.66/10.25 87.27/10.82 89.20/8.93 89.62/8.63
RF 75.69/11.83 79.32/10.85 81.25/11.24 81.48/9.49 81.93/8.91
CNN 78.21/11.71 80.98/10.79 79.63/11.75 83.929.85 85.37/8.94
DBN[8] 86.08/10.92 85.03/9.63 84.02/10.34 86.65/8.62 83.99/8.34

[5] M. Li and B. Lu, “Emotion classification based on gamma-band EEG,”


V. CONCLUSION Ann. Int. Conf. of the IEEE Eng. in Medi. and Biol. Soc., pp. 1223-
1226, April 2009.
In this paper, we have combined Machine Learning and [6] K. Li, X. Li, Y. Zhang, and A. Zhang, “Affective state recognition from
Deep Learning algorithm to reveal the primary features, EEG with deep belief networks,” IEEE Int. Conf. on Biol. and Biom.
frequency bands and channels to recognize three human (BIBM), pp. 305-310, Dec 2013.
emotions (positive, neutral, and negative). Firstly, it can be [7] R. Duan, J. Zhu, and B. Lu, “Differential entropy feature for EEG-based
shown that in our experimental results, compared to the other emotion classification,” Int. IEEE/EMBS Conf. on Neural. Eng. (NER).
four features (PSD, DASM, RASM, DCAU), the DE feature is IEEE, pp. 81-84, June 2013.
more conducive to emotion recognition. Secondly, we proved [8] W. Zheng, B. Lu, “Investigating critical frequency bands and channels
that the beta and gamma bands are critical frequency bands and for EEG⁃based emotion recognition with deep neural networks,” IEEE
Trans. on Auto. Men. Dev., vol. 7, pp. 162-175, July 2015.
the lateral temporal and prefrontal channels are critical
[9] L. Shi, Y. Jiao, and B. Lu, “Differential entropy feature for EEG-based
channels for EEG emotion recognition. Finally, we also vigilance estimation,” Proc. IEEE 35th Ann. Int. Conf. IEEE Eng. Med.
combined 4 different channel combinations according to the Biol. Soc. (EMBC), pp. 6627-6630, Nov 2013.
above conclusions. With the four channel combinations, we [10] J. Gibbs, “Elementary principles in statistical mechanics,” Dev. with
have achieved a relatively stable accuracies in our study, even Esp. Ref. to Rat. Foun. of the. Cam., vol.7, pp. 15-17, July 2010.
some higher than total 62 channels. [11] R. Davidson and N. Fox, “Asymmetrical brain activity discriminates
between positive and negative affective stimuli in human infants,”
Science, vol. 218, pp. 1235-1237, July 1982.
REFERENCES
[12] Y. Lin, Y. Yang and T. Jung, “Fusion of electroencephalogram
[1] Q. Ma and D. Guo, “Research progress of emotional Brain Mechanism,” dynamics and musical contents for estimating emotional responses in
J. Adv. in Psyc. Sci., vol. 11, pp. 328-333, April 2003. music listening,” Front. Neurosci., vol. 8, pp. 1221-1224, June 2014.
[2] R. Calvo and S. Mello, “Affective detection; An interdisciplineary [13] L. Breiman, “Random forests, Machine Learning,” J. Biomed. Sci. Eng.,
review of models, methods, and their applications,” IEEE Trans. on vol. 45, pp. 5-32, Sep 2001.
Affe. Com., vol. 1, pp. 18-37, July 2010. [14] J. Li, Z. Zhang and H. He, “Hierarchical Convolutional Neural Networks
[3] M. Langkvist, L. Karlsson, and A. Loutfi, “Sleep stage classification for EEG-Based Emotion Recognition,” J. Cogn.Com., vol.10, pp. 368-
using unsupervised feature learning,” Adv. in Arti. Neural. Sys., vol. 9, 380, Dec 2018.
pp. 5-6, May 2012. [15] K. He, X. Zhang and S. Ren, “Spatial Pyramid Pooling in Deep
[4] X. Wang, D. Nie, and B. Lu, “Emotional state classification from EEG Convolutional Networks for Visual Recognition”J. IEEE Trans. on Pat.
data using machine learning approach,” J. of Neuro. Meth., vol. 129, Anal. and Mac. Intel. ,vol. 37, pp. 1904-1916, April 2015.
pp. 94-106, June 2014.

Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.

You might also like