EEG-Based Emotion Recognition by Using Machine Learning and Deep Learning
EEG-Based Emotion Recognition by Using Machine Learning and Deep Learning
Abstract—At present, there are many classification methods for such as computer games, safe driving assistance systems, and
emotion recognition. Based on the SEED dataset, this paper auxiliary learning systems. In addition, emotion recognition is
explored the recognition performance of three emotion also promising in the field of auxiliary diagnosis of various
recognition models: support vector machine(SVM), random emotional disorders, such as depression, post-traumatic stress
forest(RF) and convolutional neural network(CNN). Firstly, we disorder, anxiety. For example, It was shown in the study of Li
compared the accuracy of the five features of power spectral et al. [3]3] that the gamma band EEG signal was appropriate
density (PSD), differential entropy (DE), differential asymmetry for distinguishing happiness from sadness. It was proved by
(DASM), rational asymmetry (RASM), and the differences Nie et al.[4] that occipital lobe and parietal lobe are the main
between DE features of 23 pairs of frontal-posterior electrodes
areas of emotional EEG signals. They also improved that the
(DCAU) in total frequency bands. It is proved that the DE
feature is more conducive to emotion recognition. Secondly, we
alpha band was mainly generated in the right occipital and
also compared the accuracy of DE feature in five different parietal brain. Längkvist et al[4] used Deep Belief Network
frequency bands (Delta, Theta, Alpha, Beta and Gamma). Here (DBN) and Hidden Markov Model to perceive different stages
we verified that the performance of the Gamma and Beta of sleep on multi-modal clinical sleep data sets. A method for
frequency bands were better than the others. Then, we traversed frequency band research was devised by Li and Lu [5] which
the recognition accuracy of each channel through the RF method, can select more sensitive frequency band. The data revealed
and chose four different channel combinations for 4, 6, 9 and 12 that the gamma frequency band meet the requirement of EEG-
channels. Finally, we compared the DE features of these four based sentiment classification. In their conclusions, there are
combined channels with the performance of all 62 channels, and some features that are not related to emotion recognition or
it is demonstrated that the accuracy of the four combined superfluous. However, the research about key channels and
channels were comparatively steady, and the best recognition frequency bands of EEG emotion recognition is still lacking
rate was 89.20%, which was similar to the recognition accuracy and more study is needed.
of the original 62 channels.
The key of EEG emotion recognition is to select the
Keywords-EEG;emotion; support vector machine(SVM); features, frequency bands and channels of EEG signals. In this
random forest(RF); convolutional neural network(CNN) paper, based on the SEED dataset, we used SVM, RF and CNN
three classification models to classify the emotions of 15
I. INTRODUCTION subjects based on EEG signals. Firstly, we compared the
classification accuracy of the five features of PSD, DE, DASM,
Emotions play a very important role in people's daily life, RASM and DCAU in five different frequency bands, in which
which can easily influence people’s behavior, judgment and we found that the DE feature had higher accuracy for emotion
decision-making in interpersonal communications. recognition. After that, we compared the five different
Theoretically, we call the neural activity produced by the frequency bands with DE feature, and it was shown that Beta
cerebral cortex Electroencephalogram (EEG) signals. EEG and Gamma bands had better performance. Finally, we focused
signals record the activity of the cerebral cortex in a non- on the classification results of the four different channels
invasive manner and reflect the activity of the brain to a great combinations. It is proved that based on the combinations of
extent[1]. Traditional emotion recognition is mainly based on channels we have picked, comparatively steady performance
the study of facial features, body movements and speech. can be achieved in all experiments with different subjects. The
These external features are easy to disguise but cannot reflect above experimental results suggested that to some extent, brain
real emotions. When doing research on emotions, EEG signals activity can identify specific emotional states.
can reflect the neurophysiological activities of the brain, which
can make up for the shortcomings of traditional research
methods. II. EXPERIMENT
2 Flirting Scholar
2 Lost in Thailand
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
can be better distinguished due to the higher energy of the low number of features. The data points of the three main human
frequencies of EEG data. If a random variable x obeys the emotions (positive, negative and neutral) are then clearly
Gaussian distribution N (μ, σ2), the DE feature can simply be classified using the hyperplane with the largest margin.
calculated as: Hyperplanes are linear boundaries with data points on either
side of which belong to different categories. The size of the
hyperplane depends on the number of features, and 5 features
were used in this study. We call the points close to the
hyperplane classification support vectors, which influence and
position the placement of the hyperplane. For SVM, the first
For a fixed length EEG segment, DE is equivalent to the 60% of the data of an experiment is used as the training set for
logarithm energy spectrum in a certain frequency band[9]. the model, and the rest 40% is used as the testing set. We also
Since EEG signals can be divided into five frequency bands, use LIBSVM toolbox in MATLAB to build SVM classifier and
which are δ: 0.5~4 Hz, θ: 4~8 Hz, α: 8~14Hz, β: 14~30 Hz, γ: adopt Radial Basis Function, which is defined as:
31~50 Hz, a value of DE feature can be calculated for each
frequency band.
Early studies[10] showed that brain activity has a certain (5)
asymmetry, that is, its energy manifests differently in the left
and right brains and frontal and posterior brains, so we also Random Forest is an integrated learning model that uses
calculated the difference and ratio between DASM and RASM multiple decision trees to train and predict data samples.
features, which represent the DE features of 27 pairs of Breiman [13] first introduced random forest in combination
hemispherical asymmetric channels, which are FP1, F7, F3, with Bagging ensemble learning theory. RF generation process
FT7, FC3, T7, P7, C3, TP7, CP3, P3, O1, AF3, F5, P7, FC5, is as follows: (a) Assuming that the total sample size is T. A
FC1, C5, C1, CP5, CP1, P5, P1, PO7, PO5, PO3, and CB1 of sample set of size N is randomly selected from the total sample
the left hemisphere and FP2, F8, F4, FT8, FC4, T8, P8, C4, by using the method of Boostrap sample(randomly put back).
TP8, CP4, P4, O2, AF4, F6, F8, FC6, FC2, C6, C2, CP6, CP2, (b)The number of tree grow as much as possible in the optimal
P6, P2, PO8, PO6, PO4, and CB2 of the right hemisphere. splitting mode and then a decision tree is formed without
DASM is defined as: pruning. Then specify a positive integer k < < M (M is the
number of feature dimension in each sample), and randomly
select k optimal features from M when splitting the decision
(2) tree each time; (c) Repeat the above two steps k times to
construct k decision trees, on which RF can be constituted.
and RASM is defined as: The formula of RF output probability value is:
(6)
where and represent the pairs of electrodes on the
left and right hemisphere. DCAU features are used to in which, I is the number of classification sets and k is the
characterize the difference between the DE features of the 23 number of decision trees respectively; pc is the highest average
pairs of front and posterior channels (FC1-CP1, F3-P3, FT8- probability of RF taking all decision trees; pij is the decision
TP8, FCZ-CPZ, FPZ-OZ, FC6-CP6, FT7-TP7, F6-P6, FC2- probability of the decision tree. In this paper, we set the
CP2, F1-P1, FC5-CP5, F7-P7, F8-P8, F2-P2, FC3-CP3, F4-P4, number of trees is 50, 100, 150, 200, 250, 300, and use the
FP2-O2, FC4-CP4, F5-P5, FZ-PZ, FP1-O1, AF3-CB1 and Tree Bagger function in MATLAB to achieve.
AF4-CB2). We define DCAU as: CNN is one of the typical representatives of deep learning
algorithms [14], which can effectively reduce the complexity
of network computing. It is a multilayer neural network with
forward feedback and has good results in text, speech, image,
video and other aspects. Through multi-layer convolution,
where and represent the pairs of frontal and more abstract signal features are extracted continuously, which
posterior channels. not only strengthens the effective signal features, but also
weakens the noise signal features. By this way, the alternating
B. Classfier Training use of convolutional layer and pooled layer extracts image
Three classifiers are used and evaluated in this study. First, features well, and finally achieves good results in classification
a traditional machine learning algorithm was used for recognition layer. In our study, overlapping sampling sliding
classification. The machine learning algorithm used SVM and window technology was used to obtain enough data segments
RF, and then compared with the deep learning model. The deep to increase the number of data sets, and we set the sliding
learning algorithm used CNN. window to 6s and the sliding step to 2s. In contrast to original
feature extraction methods, CNN only need to transform EEG
SVM is a technique for linear classification by finding signals into an image that CNN can recognize, which is only
hyperplanes in an n-dimensional space, where n represents the retained the important message of the EEG signals [15]. We
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
first use short-time Fourier transform (STFT) to transform the Secondly, we compared the accuracy of DE feature in the five
EEG signal into a two-dimensional matrix in image format. frequency bands. It is shown tha the average accuracy plot of
The two-dimensional matrix transformed into an image can the three classifications in the five frequency bands from Fig 4.
then be accepted by the CNN. The transformed image size is And we can draw a conclusion from Fig 4 that the information
15×32×200, in which 15 represents the number of subjects, 32 conducive to the recognition of EEG signals and emotions is
represents the time, and 200 represents the sampling frequency. mainly distributed in Gamma and Beta frequency. It is proved
For the pre-processed EEG signals, the EEG features used by that brain activity is more correlated with Beta and Gamma
CNN to extract emotions include 3 convolutional layers, 3 band oscillations than in other frequency bands.
supreme pooling layers, and fully connected layers which
divide features into three labels of features. The convolutional
layer performs convolution operations on the local perception
area of the upper layer. Then, the image was input to the CNN,
and to reduce the input data, we utilized three convolutional
layers to extract the main features of the EEG data. Next, we
divided each subject's EEG dataset into three subsets according
to the ratio of 6:2:2, which was training set, validation set and
the test set. We divide the datasets chronologically without
overlapping. The convolution kernels of the three
convolutional layers used 16×3×1, 32×2×1, 64×2×1, and the
step size of the convolution kernel is set to 1. In the wake of
the convolution operation, a nonlinear factor is added using the
activation function thus the output of some neurons in the
network is 0. After that, modest sparsity was conducive to
accelerating network convergence and reducing the
interdependence of parameters, thereby helping to avoid the
problem of model failure overfitting and improve the Figure 4: The average accuracy of the three different classifiers in five
generalization ability of the model. Finally, we divided the frequency bands for all subjects.
features into three categories as output with two fully
connected layers. We built a model on torch 1.1.0 using Python Finally, we studied whether the recognition of the three
3.7 and the model run triumphantly based on NVIDIA Quadro emotions can be reduced to a smaller channel combination,
M4000 GPUs. thereby significantly improving performance. We traversed the
recognition accuracy of each channel through the RF method
and designed four different channels combinations. As we can
IV. RESULTS
see from Table Ш that the four different combinations
Firstly, we compared the performance of different features. evaluated in this paper. Then we extracted the DE features of
The comparison conclusions were shown in Table Ⅱ. We can these four combined channels and compared them with the
ascertain from the results that for CNN, DE features with the performance of all 62 channels. It is shown in Table IV that the
total frequency band reach a superior classification accuracy of average accuracy of the three classification methods of
85.27%. For SVM and RF, a similar conclusion was clearly different channels combinations. For the 4-channel
confirmed, that is, the accuracy of DE feature was the highest combinations, it showed that the DE feature with total
in the total frequency band. The conclusions showed that frequency bands can reach a relatively high and stable accuracy
compared with other features, the DE feature had superior of 89.62%. Using only 4 channels, we achieved the best
performance for EEG emotion recognition. Although the average accuracy of 88.21%, and it is slightly lower than the
recognition accuracy of asymmetric features (DASM, RASM, 89.62% accuracy rate with all 62 channels. This is similar to
and DCAU) is slightly lower than that of DE features, they are the experimental conclusion of B. Lu et al[9].More
almost equivalent in accuracy, indicating that there is indeed an importantly, these four channels are located at the temporal
asymmetry in brain activity (its energy manifests differently in border of the human brain and are very easy to install when
the left and right brains and frontal and posterior brains) during used. These conclusions provided an important theoretical
emotion recognition. basis for the development of wearable EEG devices that we can
use for emotion recognition in the real world.
TABLE II. THE MEAN ACCURACY AND STANDARD DEVIATION (%) OF
PSD, DCAU, DASM, RASM AND DE FEATURES OF THE THREE CLASSIFIERS IN
TABLE III. FOUR DIFFERENT CHANNEL COMBINATIONS
TOTAL BANDS
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.
TABLE IV. THE AVERAGE ACCURACY AND STANDARD DEVIATION (%) FOR DIFFERENT CLASSIFIERS IN FIVE DIFFERENT CHANNEL COMBINATIONS IN
TOTALFREQUENCY BANDS
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on December 21,2023 at 11:40:30 UTC from IEEE Xplore. Restrictions apply.