0% found this document useful (0 votes)
32 views7 pages

Lasri 2019

The document describes a system for recognizing students' facial emotions using convolutional neural networks. The system detects faces from images using Haar Cascades, then normalizes the detected faces and inputs them into a CNN model for emotion recognition. The CNN is trained on seven emotion categories from the FER2013 database. Experimental results show the feasibility of using facial emotion recognition to help teachers adjust their instruction based on how students are feeling.

Uploaded by

Pradeep Sonyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views7 pages

Lasri 2019

The document describes a system for recognizing students' facial emotions using convolutional neural networks. The system detects faces from images using Haar Cascades, then normalizes the detected faces and inputs them into a CNN model for emotion recognition. The CNN is trained on seven emotion categories from the FER2013 database. Experimental results show the feasibility of using facial emotion recognition to help teachers adjust their instruction based on how students are feeling.

Uploaded by

Pradeep Sonyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/338370921

Facial Emotion Recognition of Students using Convolutional Neural Network

Conference Paper · October 2019


DOI: 10.1109/ICDS47004.2019.8942386

CITATIONS READS

75 3,430

3 authors, including:

Imane Lasri
Mohammed V University of Rabat
10 PUBLICATIONS 111 CITATIONS

SEE PROFILE

All content following this page was uploaded by Imane Lasri on 11 October 2022.

The user has requested enhancement of the downloaded file.


Facial Emotion Recognition of Students using
Convolutional Neural Network
Imane Lasri Anouar Riad Solh Mourad El Belkacemi
Laboratory of Conception and Systems Laboratory of Conception and Systems Laboratory of Conception and Systems
Faculty of Sciences Rabat, Mohammed Faculty of Sciences Rabat, Mohammed Faculty of Sciences Rabat, Mohammed
V University V University V University
Rabat, Morocco Rabat, Morocco Rabat, Morocco
[email protected] [email protected] [email protected]

Abstract— Nowadays, deep learning techniques know a big The rest of this paper is structured as follows: Section 2
success in various fields including computer vision. Indeed, a reviews the related work. Section 3 describes the proposed
convolutional neural networks (CNN) model can be trained to system. The implementation details are presented in section 4,
analyze images and identify face emotion. In this paper, we followed by the experimental results and discussion in section
create a system that recognizes students’ emotions from their 5. In the last section we conclude this paper with the future
faces. Our system consists of three phases: face detection using extensions of our work.
Haar Cascades, normalization and emotion recognition using
CNN on FER 2013 database with seven types of expressions. II. RELATED WORK
Obtained results show that face emotion recognition is feasible
in education, consequently, it can help teachers to modify their Many researchers are interested in improving the learning
presentation according to the students’ emotions. environment with Face Emotion Recognition (FER). Tang et
al. [3] proposed a system which is able to analyze students’
Keywords— Student facial expression, Emotion recognition, facial expressions in order to evaluate classroom teaching
Convolutional neural networks (CNN), Deep learning, Intelligent effect. The system is composed of five phases: data
classroom management system acquisition, face detection, face recognition, facial expression
recognition and post-processing. The approach uses K-nearest
I. INTRODUCTION neighbor (KNN) for classification and Uniform Local Gabor
The face is the most expressive and communicative part of Binary Pattern Histogram Sequence (ULGBPHS) for pattern
a human being [1]. It’s able to transmit many emotions analysis. Savva et al. [4] proposed a web application that
without saying a word. Facial expression recognition performs an analysis of students’ emotion who participating
identifies emotion from face image, it is a manifestation of the in active face-to-face classroom instruction. The application
activity and personality of a human. In the 20th century, the uses webcams that are installed in classrooms to collect live
American psychologists Ekman and Friesen [2] defined six recordings, then they applied machine learning algorithms on
basics’ emotions (anger, fear, disgust, sadness, surprise and its.
happiness), which are the same across cultures. In [5] Whitehill et al. proposed an approach that
Facial expression recognition has brought much attention recognizes engagement from students’ facial expressions. The
in the past years due to its impact in clinical practice, sociable approach uses Gabor features and SVM algorithm to identify
robotics and education. According to diverse research, engagement as students interacted with cognitive skills
emotion plays an important role in education. Currently, a training software. The authors obtained labels from videos
teacher use exams, questionnaires and observations as sources annotated by human judges. Then, the authors in [6] used
of feedback but these classical methods often come with low computer vision and machine learning techniques to identify
efficiency. Using facial expression of students the teacher can the affect of students in a school computer laboratory, where
adjust their strategy and their instructional materials to help the students were interacting with an educational game aimed
foster learning of students. to explain fundamental concepts of classical mechanics.
The purpose of this article is to implement emotion In [7] the authors proposed a system that identifies and
recognition in education by realizing an automatic system that monitors student’s emotion and gives feedback in real-time in
analyze students’ facial expressions based on Convolutional order to improve the e-learning environment for a greater
Neural Network (CNN), which is a deep learning algorithm content delivery. The system uses moving pattern of eyes and
that are widely used in images classification. It consist of a head to deduce relevant information to understand students’
multistage image processing to extract feature representations. mood in an e-learning environment. Ayvaz et al. [8] developed
Our system includes three phases: face detection, a Facial Emotion Recognition System (FERS), which
normalization and emotion recognition that should be one of recognizes the emotional states and motivation of students in
these seven emotions: neutral, anger, fear, sadness, happiness, videoconference type e-learning. The system uses 4 machine
surprise and disgust. learning algorithms (SVM, KNN, Random Forest and
Classification & Regression Trees) and the best accuracy rates

978-1-7281-0003-6/19/$31.00 ©2019 IEEE


were obtained using KNN and SVM algorithms. Kim et al. [9] Convolution Layer: is the first layer to extract features from
proposed a system which is able of producing real-time an input image. The primary purpose of Convolution in case
recommendation to the teacher in order to enhance the of a ConvNet is to extract features from the input
memorability and the quality of their lecture by granting the image. Convolution preserves the spatial relationship between
teacher to make modification in real-time to their non-verbal pixels by learning image features using small squares of input
behavior like body language and facial expressions. The data [21]. It performs a dot product between two matrices,
authors in [10] proposed a model that recognizes emotion in where one is the image and the other is a kernal. The
virtual learning environment based on facial emotion convolution formula is represented in Equation 1 :
recognition with Haar Cascades method [14] to identify mouth
and eyes on JAFF database in order to detect emotions. In [11] net(t, f)  (x w)[t, f] ∑m∑nx[m, n]w[t m, f n] 
Chiou et al. used wireless sensor network technology to create
an intelligent classroom management system that aids Where net(t, f) is the output in the next layer, x is the input
teachers to modify instruction modes rapidly to avert wasting image, w is the filter matrix and  is the convolution operation.
of time. Figure 3, shows how the convolution works.
III. PROPOSED APPROACH
In this section, we describe our proposed system to analyze
students’ facial expressions using a Convolutional Neural
Network (CNN) architecture. First, the system detects the face
from input image and these detected faces are cropped and
normalized to a size of 48×48. Then, these face images are
used as input to CNN. Finally, the output is the facial
expression recognition results (anger, happiness, sadness,
disgust, surprise or neutral). Figure 1 presents the structure of
our proposed approach.

Fig. 3. Details on Convolution layer [20].

Pooling Layer: reduces the dimensionality of each feature


map but retains the most important information [21]. Pooling
can be of different types : Max Pooling, Average Pooling and
Sum Pooling. The function of Pooling is to progressively
reduce the spatial size of the input representation and to make
the network invariant to small transformations, distortions and
translations in the input image [21]. In our work, we took the
maximum of the block as the single output to pooling layer as
shown in Figure 4.

Fig. 1. The structure of our facial expression recognition system.

A Convolutional Neural Network (CNN) is a deep


artificial neural networks that can identify visual patterns from
imput image with minimal pre-processing compared to other
image classification algorithms. This means that the network
learns the filters that in traditional algorithms were hand-
engineered [19]. The important unit inside a CNN layers is a
neuron. They are connected together, in order that the output
Fig. 4. Details on Pooling layer [20].
of neurons at a layer becomes the input of neurons at the next
layer. Fully connected layer: is a traditional Multi Layer Perceptron
that uses an activation function in the output layer. The term
In order to compute the partial derivatives of the cost “Fully Connected” implies that every neuron in the previous
function the backpropagation algorithm is used. The term layer is connected to every neuron on the next layer. The
convolution refers to the use of a filter or kernel on the input purpose of the Fully Connected layer is to use the output of
image to produce a feature map. In fact, CNN model contains the convolutional and pooling layers for classifying the input
3 types of layers as shown in Figure 2: image into various classes based on the training dataset. So the
Convolution and Pooling layers act as Feature Extractors from
the input image while Fully Connected layer acts as a
classifier. [21].
Fig. 2. CNN architecture.
Fig. 5. Our convolutional neural network model.
IV. IMPLEMENTATION DETAILS
Figure 5 represents our CNN model. It contains 4 A. Data acquisition
convolutional layers with 4 pooling layers to extract features, To train our CNN architecture, we used the FER2013 [12]
and 2 fully connected layers then the softmax layer with 7 database as shown in Figure 7. It was generated using the
emotion classes. Input image is grayscale face image with a Google image search API and was presented during the ICML
size of 48×48. For each convolutional layer we used 3×3 2013 Challenges. Faces in the database have been
filters with stride 2. For the pooling layers, we used max automatically normalized to 48×48 pixels. The FER2013
pooling layer and 2×2 kernels with stride 2. Thus, to introduce database contains 35887 images (28709 training images, 3589
the non linearity in our model we used the Rectified Linear validation images and 3589 test images) with 7 expression
Unit (ReLU), defined in Equation 2, wich is the most used labels. The number of images for every emotion is represented
activation function recently. in Table II.
 R(z)  max(0, z) 
As shown in Figure 6, R(z) is zero when z is less than zero
and R(z) is equal to z when z is above or equal to zero. Table
I presents the network configuration of our model.

Fig. 7. Samples from FER 2013 database.

TABLE II. THE NUMBER OF IMAGE FOR EACH EMOTION OF FER 2013
DATABASE
Fig. 6. ReLU function.
Emotion label Emotion Number of image
TABLE I. CNN CONFIGURATION 0 Angry 4593

Layer type Size Stride 1 Disgust 547


Data 48x48 - 2 Fear 5121
Convolution 1 3x3 2 3 Happy 8989
Max Pooling 1 2x2 2 4 Sad 6077
Convolution 2 3x3 2 5 Surprise 4002
Max Pooling 2 2x2 2 6 Neutral 6198
Convolution 3 3x3 2

Max Pooling 3 2x2 2 B. CNN Implementation


We used OpenCV library [16] to capture live frames from
Convolution 4 3x3 2
web camera and to detect students’ faces based on Haar
Max Pooling 4 2x2 2 Cascades method [14] as shown in Figure 8. Haar Cascades
Fully Connected - -
uses the Adaboost learning algorithm invented by Freund et
al. [15], who won the 2003 Gödel Prize for their work. The
Fully Connected - - Adaboost learning algorithm chose a few number of
significant features from a large set in order to provide an
effective result of classifiers. We built a Convolutional Neural V. EXPERIMENTAL RESULTS
Network model using TensorFlow [18] Keras [17] high-level We trained our Convolutional Neural Network model
API. using FER 2013 database which includes seven emotions
(happiness, anger, sadness, disgust, neutral, fear and surprise)
The detected face images are resized to 48×48 pixels, and
converted to grayscale images then were used for inputs to the
CNN model. Thus, 9 youthful master’s students from our
faculty participated in the experiment, amoung them there
were two wearing glasses. The Figure 11 shows the emotions’
results of 9 students. The predicted emotion label are
represented with red text, and the red bar represents the
probability of the emotion.
We achieved an accuracy rate of 70% at the the 106
epochs. To evaluate the efficiency and the quality of our
proposed method we calculated confusion matrix, precision,
recall and F1-score as shown in Figure 12 and in Figure 13,
respectively. Our model is very good for predicting happy and
surprised faces. However it predicts quite poorly feared faces
Fig. 8. Face Detection using Haar Cascades. because it confuses them with sad faces.

In Keras, we used ImageDataGenerator class to perform


image augmentation as shown in Figure 9. This class allowed
us to transform the training images by rotation, shifts, shear,
zoom and flip. The configuration used is :
rotation_range=10, width_shift_range=0.1, zoom_range=0.1,
height_shift_range=0.1 and horizontal_flip=True.

Original Transformed

Fig. 9. Image augmentation using Keras.

Then we defined our CNN model with 4 convolutional


layers, 4 pooling layers and 2 fully connected layers. After
that, to provide non linearity in our CNN model we applied
Fig. 12. Confusion matrix of the proposed method on FER 2013 database.
the ReLU function and we used batch normalization to
normalize the activation of the precedent layer at each batch
and L2 regularisation to apply penalties on the different
parameters of the model. Thus, we chose softmax as our last
activation function, it takes as input a vector z of k numbers
and normalizes it into a probability distribution. The softmax
function is shown in Figure 10 :

Fig. 10. Softmax function.


To train our CNN model we splitted the database into
80% training data and 20% test data, then we compiled the
model using Stochastic gradient descent (SGD) optimizer. At
each epoch, Keras checks if our model performed better than
the models of the previous epochs. If it is the case, the new
best model weights are saved into a file. This will allow us to Fig. 13. Classification report of the proposed method on FER 2013 database.
load directly the weights without having to re-train it if we
want to use it in another situation.
Fig. 11. Students’ facial emotion recognition results.
VI. CONCLUSION AND FUTURE WORK [14] P. Viola and M. Jones, “Rapid object detection using a boosted cascade
of simple features,” in Proceedings of the 2001 IEEE Computer Society
In this paper, we presented a Convolutional Neural Conference on Computer Vision and Pattern Recognition. CVPR 2001,
Network model for students’ facial expression recognition. Kauai, HI, USA, 2001, vol. 1, p. I-511-I‑518.
The proposed model includes 4 convolutional layers, 4 max [15] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization
pooling and 2 fully connected layers. The system recognizes of On-Line Learning and an Application to Boosting,” Journal of
Computer and System Sciences, vol. 55, no 1, p. 119‑139, août 1997.
faces from students’ input images using Haar-like detector and
classifies them into seven facial expressions: surprise, fear, [16] Opencv. opencv.org.
disgust, sad, happy, angry and neutral. The proposed model [17] Keras. keras.io.
achieved an accuracy rate of 70% on FER 2013 database. Our [18] Tensorflow. tensorflow.org .
facial expression recognition system can help the teacher to [19] aionlinecourse.com/tutorial/machine-learning/convolution-neural-
recognize students’ comprehension towards his presentation. network. Accessed 20 June 2019
Thus, in our future work we will focus on applying [20] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a
convolutional neural network,” in 2017 International Conference on
Convolutional Neural Network model on 3D students’ face Engineering and Technology (ICET), Antalya, 2017, p. 1‑6.
image in order to extract their emotions. [21] ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/. Accessed
05 July 2019
ACKNOWLEDGMENT
We wish to thank the 9 students for their participation in
the experiment.
REFERENCES
[1] R. G. Harper, A. N. Wiens, and J. D. Matarazzo, Nonverbal
communication: the state of the art. New York: Wiley, 1978.
[2] P. Ekman and W. V. Friesen, “Constants across cultures in the face and
emotion,” Journal of Personality and Social Psychology, vol. 17, no 2,
p. 124‑129, 1971.
[3] C. Tang, P. Xu, Z. Luo, G. Zhao, and T. Zou, “Automatic Facial
Expression Analysis of Students in Teaching Environments,” in
Biometric Recognition, vol. 9428, J. Yang, J. Yang, Z. Sun, S. Shan,
W. Zheng, et J. Feng, Éd. Cham: Springer International Publishing,
2015, p. 439‑447.
[4] A. Savva, V. Stylianou, K. Kyriacou, and F. Domenach, “Recognizing
student facial expressions: A web application,” in 2018 IEEE Global
Engineering Education Conference (EDUCON), Tenerife, 2018, p.
1459‑1462.
[5] J. Whitehill, Z. Serpell, Y.-C. Lin, A. Foster, and J. R. Movellan, “The
Faces of Engagement: Automatic Recognition of Student
Engagementfrom Facial Expressions,” IEEE Transactions on Affective
Computing, vol. 5, no 1, p. 86‑98, janv. 2014.
[6] N. Bosch, S. D'Mello, R. Baker, J. Ocumpaugh, V. Shute, M. Ventura,
L. Wang and W. Zhao, “Automatic Detection of Learning-Centered
Affective States in the Wild,” in Proceedings of the 20th International
Conference on Intelligent User Interfaces - IUI ’15, Atlanta, Georgia,
USA, 2015, p. 379‑388.
[7] Krithika L.B and Lakshmi Priya GG, “Student Emotion Recognition
System (SERS) for e-learning Improvement Based on Learner
Concentration Metric,” Procedia Computer Science, vol. 85, p.
767‑776, 2016.
[8] U. Ayvaz, H. Gürüler, and M. O. Devrim, “USE OF FACIAL
EMOTION RECOGNITION IN E-LEARNING SYSTEMS,”
Information Technologies and Learning Tools, vol. 60, no 4, p. 95,
sept. 2017.
[9] Y. Kim, T. Soyata, and R. F. Behnagh, “Towards Emotionally Aware
AI Smart Classroom: Current Issues and Directions for Engineering
and Education,” IEEE Access, vol. 6, p. 5308‑5331, 2018.
[10] D. Yang, A. Alsadoon, P. W. C. Prasad, A. K. Singh, and A.
Elchouemi, “An Emotion Recognition Model Based on Facial
Recognition in Virtual Learning Environment,” Procedia Computer
Science, vol. 125, p. 2‑10, 2018.
[11] C.-K. Chiou and J. C. R. Tseng, “An intelligent classroom management
system based on wireless sensor networks,” in 2015 8th International
Conference on Ubi-Media Computing (UMEDIA), Colombo, Sri
Lanka, 2015, p. 44‑48.
[12] I. J. Goodfellow et al., “Challenges in Representation Learning: A
report on three machine learning contests,” arXiv:1307.0414 [cs, stat],
juill. 2013.
[13] A. Fathallah, L. Abdi, and A. Douik, “Facial Expression Recognition
via Deep Learning,” in 2017 IEEE/ACS 14th International Conference
on Computer Systems and Applications (AICCSA), Hammamet, 2017,
p. 745‑750.

View publication stats

You might also like