14
14
com
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2021) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 192 (2021) 3423–3431
Abstract
Online learning is growing in various forms, including full-online, hybrid, hy-flex, blended, synchronous, and asynchronous.
Assessing students’ engagement without having real contact between teachers and students is becoming a challenge for the teachers.
Therefore, this paper focuses on analyzing online lecture videos to detect students’ engagement without relying on learning
management systems produced data. In this regard, an intelligent application for teachers is developed to understand students’
emotions and detect students’ engagement levels while a lecture is in progress. Real-time and offline lecture videos are analyzed
using computer vision-based methods to extract students’ emotions, as emotions play essential roles in the learning process. Six
types of basic emotions, namely ‘angry’, ‘disgust’, ‘fear’, ‘happy’, ‘sad’, ‘surprise’, ‘neutral’ are extracted using a pre-trained
Convolutional Neural Network (CNN), and those are used to detect ‘Highly-engaged’, ‘Engaged’, and ‘Disengaged’ students in a
virtual classroom. This educational application is tested on a 28-second-long lecture video taken from YouTube consisting of 11
students. The results are visualized for engagement detection using visualization methods. Furthermore, this intelligent application,
in real-time, is capable of recognizing multiple faces when multiple students share a single camera. Nonetheless, this educational
application could be used for supporting collaborative learning, problem-based learning, and emotion-based grouping.
© 2021 The Authors. Published by Elsevier B.V.
© 2021 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of KES International.
Peer-review under responsibility of the scientific committee of KES International
Keywords: emotion-aware learning analytics; engagement; intelligent learning system; lecture video analysis; multimodal learning analytics
*
Mohammad Nehal Hasnine (Corresponding author.) Tel.: +81-(0)42-387-6070; fax: +81-(0)42-387-6085.
E-mail address: [email protected]
1. Introduction
1.1. Link between student emotion and engagement in higher educational contexts
Student engagement can be defined [1] as: “the interaction between the time, effort and other relevant resources
invested by both students and their institutions intended to optimize the student experience and enhance the learning
outcomes and development of students and the performance, and reputation of the institution”. According to Kuh [2],
“engagement premise is straightforward and easily understood: the more students study a subject, the more they know
about it, and the more students practice and get feedback from faculty and staff members on their writing and
collaborative problem solving, the deeper they come to understand what they are learning”.
In higher education, engagement does not represent one variable. Instead, it could come across in many distinct
ways. According to Jennifer Fredricks, students’ engagement happens in three ways: emotionally, behaviorally, and
cognitively. She also stated that emotional engagement is a distinction between boredom and interest. In contrast,
behavioral engagement and cognitive engagement are the distinctions between positive behaviors and those we might
associate with acting out, and distinction between rote learning and what we might associate with more profound
learning, respectively. Besides behavioral, emotional, and cognitive engagements, agentic engagement is vital in
learning. Agentic engagement turns students’ proactiveness and constructive contributions into learning activities such
as making suggestions and offering input. As learning analytics matures, the community continuously monitors
students’ engagement from their digital learning trails as engagement plays an essential role in the learning process.
In the learning process, engagement is known to be one of the qualitative indicators [3]. It is said that students with
higher engagement often learn more than those whose engagement level is low.
Emotions are also important factors that influence our learning process. Emotion also has a substantial influence
on our memory, reasoning capability, perception, and logical thinking. In education, emotion is essential as it is highly
associated with students’ attention. Scholars are continuously researching different types of emotions. These include
basic emotions such as anger, disgust, fear, happiness, sadness, and surprise. [4]; and academic emotions such as
happy, bore, enjoy, lazy, sad, excite, proud, tire, and angry. [5]. Emotions may influence students’ engagements.
Therefore, emotion-based models aim to establish a link between academic emotions and engagement levels to identify
the level of student engagement depending on their emotions [6]. Therefore, it is necessary to understand students’
emotions and associate them with various engagement levels for better educational outcomes.
This research aims to understand students’ emotions and engagement while attending classes online without having
real contact with the teachers. An educational application is aimed to develop for the teachers who are conducting
lectures using online tools. Therefore, the key contributions of this paper include:
• The use of emotion recognition in the e-learning platform.
• Design the architecture of a model that could understand students' emotions and detect various engagements
from those emotions.
• Develop an intelligent educational application based on the model.
• Visualize results to the teachers as the intervention.
2. Literature review
In the multimodal learning analytics literature, there are several applications of automatic emotion extraction and
analysis. As education transitions online, emotions are much discussed in the research community for improving
students' learning. Many scholars have combined multimodal data, including emotion, pose, and gestures, generated
data to predict students' engagements with learning management systems.
To date, several methods have been employed to detect student engagements including engagement tracing, log
file analysis, sensor data analysis, and computer vision-based methods. Computer vision-based methods are based on
the modalities of three sub-categories, namely facial expression analysis, gesture-and-posture detection, and eye
Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431 3425
Author name / Procedia Computer Science 00 (2021) 000–000 3
movement detection [7]. As teaching over web conferencing technologies is becoming popular, computer vision-based
methods could be applied to analyze lecture videos.
A recent study analyzed lecture videos to identify ‘happy human’, ‘happy agent’, ‘bored human’, and ‘bored agent’
[8]. The findings of this study suggest that students are happier and found to be more motivated when they learn from
happy instructors than from bored instructors. A lecture video analysis tool [9] is developed that is cable to analyze
and digitalize students' facial expressions. This tool can understand disgust, sadness, happiness, fear, contempt, anger,
and surprise students from the faces. Also, this tool provides instant feedback to the lecturer by monitoring the students'
emotions. Another study [10] used eyes and head movement data and combined it with facial emotions to generating
a concentration index. Later on, this concentration index is used to classify three types of engagements, namely ‘very
engaged’, ‘nominally engaged’, and ‘not engaged at all’. Another example is, the Wits Intelligent Teaching System
[11], for analyzing online video lectures to understanding students' engagements. In this system, students sitting in
large lecture venues are considered where the authors have used behavior and posture detection using AlexNet's
convolutional neural network.
The stakeholder of this educational application is the teachers who are teaching using web conferencing tools such
as Zoom or WebEx. While teaching over web conferencing tools, it is difficult for the instructors to understand their
students' attention and engagement with the content. Therefore, this system is for assisting the instructors in having a
better understanding of the students. Figure 1 shows the overall architecture of the model.
Output
Emotion Emotion
Detection Weight Highly-engaged
Eye Gaze
Eye Detection Disengaged
Weight
In computer vision and image processing, face recognition is a method that is used to identify and verify people
through their facial features. In general, a face recognition algorithm captures, analyzes, and compares patterns
depending on the person's facial properties such as color, shape, and size. In general, the face detection process is for
detecting and locating human faces in images and videos. Face capture and face match are the processes for
transforming analog information (a face) into a set of digital information (data) based on the person's facial features
and verifying if two faces belong to the same person [12].
In Artificial Intelligence, several libraries could be used to recognize faces, including Python, YoLo, and OpenCV.
For this study, OpenCV is used for face recognition as it is an open-source library and is free of cost. OpenCV also
provides ready-to-use methods with advanced capabilities such as facial detection, face tracking, and facial recognition
[13]. Figure 2 is the design of the face recognition system.
The face recognition system is comprised of four steps. They are initial dataset construction, facial features
extraction, face embedding, and face matching.
3426 Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431
4 Author name / Procedia Computer Science 00 (2021) 000–000
Initial dataset
LiLy
Feature Extraction Matching
• Initial dataset construction: The first step is to construct a database with the known students. This can be done
by inputting a student name, id, and photo in the database. The initial dataset is necessary to know about the
number of students in the class, the facial description (that is, photo) of the students, and identify each student in
the database.
|--Dataset
|----- student_name.jpg
|--------huyen.jpg
|--------thuy.jpg
|--------ho.jpg
• Facial feature extraction and face embedding: The next step is face embedding. This part of the model can
detect and locate the face and embed the located face. This task is done with deep metric learning by dlib, a
toolkit for face embedding. The dlib libraries provide the pre-trained model used to generate 128 measurements
for each input face. The network generates nearly the same numbers when looking at two different pictures of the
same person else, it generates different numbers when there are two different pictures of a different person.
[128 Measurements]
Embedding
Embedding Network
[128 Measurements]
• Face matching: In testing new student’s information, the matching step checks each face in the input images
whether the face matches with the face in the database or not. The output of the matching steps is:
Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431 3427
Author name / Procedia Computer Science 00 (2021) 000–000 5
At present, our emotion detection model is based on a research paper by Arriaga O et al. in 2017 [14]. This model
can classify the human face within the following classes: angry, disgust, fear, happy, sad, surprise, neutral. For this
classification, a 21 layered architecture was trained on the FER 2013 dataset that obtained an accuracy of 68%.
Detection of eye positions and their movements are analyzed. For this step, we used the guidelines of eye detection
provided by Sergio Canu [15]. As guide-lined by Sergio Canu, we detected two eyes separately using the following
measurement.
After that, using the face landmarks detection approach [16], our model detected 68 specific face landmarks. We
assigned a specific index to each of the points. The next step was to create two lines: crossing the eye horizontally and
crossing the eye vertically. Finally, for detecting the eyes are open or not, we calculate the left eye ratio, right eye
ratio, and blinking ratio based on the horizontal and vertical line. And, to determine the eye gaze, we calculate
left_gaze_ratio_lr, left_gaze_ratio_up, right_gaze_ratio_lr, right_gaze_ratio_ud.
Table 1 shows the eye size scores.
In this paper, engagement detection is calculated based on the concentration index calculation method proposed by
Sharma P et al. in 2019 [10]. For this study, the concentration index is based on eye gaze and emotion weights.
Emotion weight describes a specific emotional state at the time of a person. The value varies from 0 to 1. Table 2
shows the emotion weight to present each of the six emotions.
Eye gaze weight is calculated based on the weights presented in the Table 3.
3428 Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431
6 Author name / Procedia Computer Science 00 (2021) 000–000
A student’s engagement is calculated based on the CI. The following table is used to calculate engagement from
the CI.
4. Result visualization
Figure 4 displays the way our application detects students, understands their emotion and detects their engagements.
We used a YouTube video consisting of 11 students engaged in an interactive lecture to visualize the result. The
video is 28-second-long. We input the video file to our application to understand those students’ behaviors while
attending an online lecture. To test the result, our application processed 25 frames per second.
Once the system finishes processing the video, the application generates three types of engagement visualizations
to support a teacher.
• First type of visualization is the time-series of engagement detection (refer to Fig. 5).
• Second type of visualization is about the summarization of engagement level (refer to Fig.6(a)).
• The third type of visualization is each student’s engagement in the class (refer to Fig.6(b)).
Fig. 5. Time series of engagement detection for (a) Highly-engaged; (b) Engaged; (c) Disengaged students.
(a) (b)
Fig. 6. (a) Summarization of engagement level; (b) Visualization for each student’s engagement in the class.
5. Discussion
Engagement plays an essential role in students’ learning. Since it is dynamic and continually changes, detecting
multiple students’ engagement during the lecture is not an easy task for instructors, especially during the online
learning scenarios. Thanks to the advancement in artificial intelligence and related fields (e.g., deep learning), there
3430 Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431
8 Author name / Procedia Computer Science 00 (2021) 000–000
are more possibilities in automatically detecting engagement. There are two main approaches in the literature for
automatic engagement detection, and they are as follows sensor data analysis-based approach and computer vision-
based approach [7]. This study focused especially on emotional engagement and followed a computer vision-based
approach. Specifically, students’ facial expressions captured from a web camera were used as a proxy to measure their
engagement continually across the lecture. As learning analytics matures, it has shown significant advances in several
areas, from pedagogy to artificial intelligence in education. Without relying on learning management systems
produced data, it also explores sophisticated technologies, including eye-tracking, sentiment analysis, emotion
analysis, and video analysis. As virtual classrooms are becoming popular, lecture video analysis is getting attention in
the community as lecture videos could reveal learners' emotions, sentiments, and engagement. Visual cues such as
eye movement, facial expression, and gestures, and posture can be used to measure one’s engagement. However, facial
expression is the most potent and effective way to assess emotional engagement [18].
As student engagement and emotions play essential roles in the learning process, lecture videos could be analyzed
using computer vision-based methods. This paper proposed an educational application for understanding students'
emotions and engagement by analyzing lecture videos. In order to create easy to interpret metric for course instructors,
we related students’ emotions with their level of engagement. In order to link students’ emotions to their engagement,
we used the results of the previous study conducted by [10]. Teachers could use this application to understand students'
behavior while teaching classes using online tools such as Zoom or WebEx. The contributions of this work are
designing a model that understands students' emotions and detects various engagements, develops an educational
application based on the model, and visualizes results for better educational insights. The developed model
successfully identified multiple students’ emotions from the lecture recording. However, to use the developed model
in real-life situations, external validation is required by using the sensor data (e.g., EEG, eye, and tracker) or ground
truth data labeled by an expert.
This study has some limitations, criticisms, and several challenges that would be addressed in future work. For
example, the challenges that we faced include engagement detection from the profile pictures; challenges when objects
cover the face; engagement detection under bad light; when speaking or doing off-task behavior, there will be various
kinds of emotion showing up on their face; the orientation of the camera may affect the performance of analyzing
emotion; and the orientation of the face. The proposed model is not validated with actual data, and therefore, we aim
to collect students’ data and validate it. Moreover, the result presented in this paper is based on a publicly available
lecture video taken from YouTube. Hence, at this point, we could not ensure that the three types of engagement
detected by the application match the three types of engagement determined by the person. To solve these limitations,
we aim to conduct a user study in the future.
References
[1] V. Trowler, “Student engagement literature review,” The higher education academy, vol. 11, no. 1, pp. 1–15, 2010.
[2] G. D. Kuh, “The national survey of student engagement: Conceptual and empirical foundations,” New directions for institutional research,
vol. 2009, no. 141, pp. 5–20, 2009.
[3] F. D’Errico, M. Paciello, and L. Cerniglia, “When emotions enhance students’ engagement in e-learning processes,” Journal of e-Learning
and Knowledge Society, vol. 12, no. 4, 2016.
[4] P. E. Griffiths, “Basic emotions, complex emotions, Machiavellian emotions,” 2002.
[5] A. B. Bernardo, J. A. Ouano, and M. G. C. Salanga, “What is an academic emotion? Insights from Filipino bilingual students’ emotion
words associated with learning,” Psychological Studies, vol. 54, no. 1, pp. 28–37, 2009.
[6] K. Altuwairqi, S. K. Jarraya, A. Allinjawi, and M. Hammami, “A new emotion–based affective model to detect student’s engagement,”
Journal of King Saud University-Computer and Information Sciences, 2018.
[7] M. A. A. Dewan, M. Murshed, and F. Lin, “Engagement detection in online learning: a review,” Smart Learning Environments, vol. 6, no.
1, pp. 1–20, 2019.
[8] T. Horovitz and R. E. Mayer, “Learning with human and virtual instructors who display happy or bored emotions in video lectures,”
Computers in Human Behavior, vol. 119, p. 106724, Jun. 2021, doi: 10.1016/j.chb.2021.106724.
[9] G. Tonguç and B. Ozaydõn Ozkara, ÒAutomatic recognition of student emotions from facial expressions during a lecture,Ó Computers &
Education, vol. 148, p. 103797, Apr. 2020, doi: 10.1016/j.compedu.2019.103797.
[10] P. Sharma, S. Joshi, S. Gautam, S. Maharjan, V. Filipe, and M. J. C. S. Reis, “Student Engagement Detection Using Emotion Analysis, Eye
Tracking and Head Movement with Machine Learning,” arXiv:1909.12913 [cs], Dec. 2020, Accessed: Apr. 15, 2021. [Online].
Available: http://arxiv.org/abs/1909.12913
[11] R. Klein and T. Celik, “The Wits Intelligent Teaching System: Detecting student engagement during lectures using convolutional neural
Mohammad Nehal Hasnine et al. / Procedia Computer Science 192 (2021) 3423–3431 3431
Author name / Procedia Computer Science 00 (2021) 000–000 9
networks,” in 2017 IEEE International Conference on Image Processing (ICIP), Sep. 2017, pp. 2856–2860. doi:
10.1109/ICIP.2017.8296804.
[12] “Facial recognition: top 7 trends (tech, vendors, markets, use cases & latest news),” Thales Group.
http://www.thalesgroup.com/en/markets/digital-identity-and-security/government/biometrics/facial-recognition (accessed Apr. 22, 2021).
[13] M. Khan, S. Chakraborty, R. Astya, and S. Khepra, “Face Detection and Recognition Using OpenCV,” in 2019 International Conference
on Computing, Communication, and Intelligent Systems (ICCCIS), Oct. 2019, pp. 116–119. doi: 10.1109/ICCCIS48478.2019.8974493.
[14] O. Arriaga, M. Valdenegro-Toro, and P. Plöger, “Real-time Convolutional Neural Networks for Emotion and Gender Classification,”
arXiv:1710.07557 [cs], Oct. 2017, Accessed: Apr. 22, 2021. [Online]. Available: http://arxiv.org/abs/1710.07557
[15] S. Canu, “Eye detection - Gaze controlled keyboard with Python and Opencv p.1,” Pysource, Jan. 07, 2019.
https://pysource.com/2019/01/07/eye-detection-gaze-controlled-keyboard-with-python-and-opencv-p-1/ (accessed Apr. 22, 2021).
[16] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” 2014 IEEE Conference on Computer
Vision and Pattern Recognition, 2014, doi: 10.1109/CVPR.2014.241.
[17] “GitHub - CaedenZ/distractionModel.” https://github.com/CaedenZ/distractionModel (accessed Apr. 30, 2021).
[18] M. Mukhopadhyay, S. Pal, A. Nayyar, P. K. D. Pramanik, N. Dasgupta, and P. Choudhury, “Facial Emotion Detection to Assess Learner’s
State of Mind in an Online Learning System,” in Proceedings of the 2020 5th International Conference on Intelligent Information
Technology, 2020, pp. 107–115.