Machine Learning Model For Emotion Detection and Recognition Using An Enhanced Convolutional Neural Network
Machine Learning Model For Emotion Detection and Recognition Using An Enhanced Convolutional Neural Network
Journal of Integrated
SCIENCE & TECHNOLOGY
Machine learning model for emotion detection and recognition using an enhanced
Convolutional Neural Network
Sowmya BJ1*, Meeradevi2, Sini Anna Alex3, Anita Kanavalli1, Supreeth S4, Shruthi G,4 Rohith S5
1
Department of Artificial Intelligence and Data Science, M S Ramaiah Institute of Technology, 560054, India. 2Department of
Artificial Intelligence and Machine Learning, M S Ramaiah Institute of Technology, 560054, India. 3Department of CSE (AI &
ML), M S Ramaiah Institute of Technology, 560054, India. 4School of Computer Science and Engineering, REVA University,
560064, India. 5Department of ECE, Nagarjuna College of Engineering & Technology, 562110, India.
Received on: 29-Sep-2023, Accepted and Published on: 27-Dec-2023 Article
ABSTRACT
Emotion expression
recognition has been a
challenging task in recent
years due to large intra-
class variation and
persistent difficulty. Most
studies fail on datasets
with image variations and
partial faces but work
best on controlled
datasets. Recent work
using deep learning models has improved emotion recognition by developing mini-Xception based on Xception and Convolution Neural Network
(CNN). This system can focus on important parts like the face, performing face recognition, and emotion classification simultaneously. A
visualization method is used to distinguish between different emotions based on the classifier results. An experimental study on the FER-2013
dataset demonstrated that the mini-Xception algorithm successfully performed all tasks, including emotion recognition and classification, with
an accuracy of approximately 95.60%.
Keywords: Emotion detection, Face recognition, Electric signal, Artificial Intelligence System
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 1
Sowmya BJ et. al.
image segmentation, recognition, and classification is • Extraction of features
Convolutional Neural Networks (CNNs).3–8 • Classification based on characteristics
Among the most well-known ways for determining an
individual’s emotion using deep learning-based algorithms, Mini- Facial Emotion Recognition
Xception, a deep learning-based algorithm, was developed by us.
The designed model's main objective is to accurately predict FER is typically categorized into four stages. The first stage
emotional states and automatically detect emotions. In this method, identifying a face in an image and drawing a rectangle around it.
tagged face expression pictures out of the FER file are used to Finding landmarks inside the face area comes next. The third stage
analyze the experimental outcomes. A created copy gets the is to clip the dimensional and transitory characteristics of the face
pictures as a load and is trained to use them. The designed model components. Finally, a Feature Extraction (FE) classifier and
decides which face expression is used after that. attribute removal are used to generate a recognition result. Facial
Business promotions are a key area where emotion recognition markers comprise the tip of the nose, the corners of the lips, and the
is important. Most businesses rely on customer feedback for all ends of the brows. Features comprise the local texture of a landmark
their services and products to stay in business. An artificial and the alignment of two visible signs. The dimensional and
intelligence algorithm can determine whether a user likes or transitory attributes of a face are eliminated using pattern
dislikes a product or offer by identifying real feelings in a photo or classifiers, and one of the face classifications is used to determine
video. This research has shown that the most frequent reason to circulation.
identify someone is for their safety. It is feasible to use passwords, Due to the ability to perform end-to-end learning using face-to-
voice recognition, retina scanning, fingerprint matching, and other face physics-based modeling and other preprocessing techniques,
methods. To reduce risk, it is essential to ascertain someone’s DL-based FER solutions significantly reduce the amount of
purpose. This is useful in high-risk locations where security training required.10 To use a CNN, a feature map is produced by
breaches have recently occurred, such as airlines, concerts, and convolutional filtering of the input picture. Following that,
large public gatherings. The three main components of emotion complete connection layers are loaded with the output of the FE
detection are depicted in Figure 1. The following are the steps: classifier, which identifies the face circulation as fit into a group.
• Image Pre-processing The Facial Emotion Recognition 2013 (FER 2013) dataset was
• Feature Extraction utilized to train this design (Fatima et al., 2021). For a Kaggle
• Feature Grading competition, this open-source file was produced for work and made
accessible to the general public. 35,000 48 x 48 color face images
with unique mood tags are used in something. There are five
emotions used: dread, sadness, anger, and neutrality.
Organization of CNN
Figure 1. Process of identifying human emotion The basic CNN blueprint, with several elements that are simple
to comprehend and relate to the designed CNN model. A basic
Fear, disgust, rejection, rage, surprise, sadness, happiness, and CNN is made up of load, secret, and output layers. The load
neural are all human emotions. These are truly minor emotions. It covering is where the data centers are located at CNN. It then goes
is difficult to identify them because even slight variations in facial through various secret measures before addressing the last layer.
muscle contortions produce distinct expressions.9 Furthermore, An output layer represents the prediction made by the network. The
various individuals may express the same emotion in different ways output of the network is compared to the real labels to detect any
because emotions are very context-dependent. Even though loss or error.
attention may be drawn to features other than the facial that express The network's hidden layers are the fundamental building blocks
emotions, such as the lips and optics, the question of how to extract for data transformation. Each layer can be broken into its four
and categorize these gestures is still important. These goals were actions: layer function, pooling, normalizing, and activation. The
achieved with the help of the synaptic and the expert systems. following layers help compensate for the architecture of the CNN.
The computational classification and categorization methods are The CNN framework from the viewpoint of our work has also been
depicted as being very useful. Attributes are the best key segments considered.
of any expert system strategy. For algorithms like Support Vector • Convolution layer
Machines, explore how attributes are extracted and updated in this • ReLu layer
work.1 There will be comparisons between the algorithms and • Pooling loop
feature extracts used in various articles. The person feeling data • Fully connected layer
may be used to evaluate categorization algorithms' strengths and • Softmax
nature, and how well they function across various sources of data. • Batch normalization
Facial recognition techniques are often used on the video or image
frame before extracting features for emotional identification. The The main objectives of the work are:
emotion detection steps are listed below: The objective of emotion recognition is to identify the
• The Pre-processing of data sets emotions of a human.
• Face recognition
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 2
Sowmya BJ et. al.
The purpose of carrying out this exploration is to makes use of a Deep CNN model for growing a highly accurate
accurately classify seven main emotions: happiness, FER device that uses transfer learning techniques. A novel pipe
surprise, anger, disgust, neutrality, and fear. method was introduced to increase FER accuracy.
The purpose of this research is to analyze the outcome of Emotion recognition is a hot topic in science, used in robotics
models in terms of precision in each class.
view and correlated with robotic connections. This paper proposes
An emotion can be captured either from a face or from a
.csv file. an actual-time path for implementing feeling detection in the
ML may be used to deliver FER solutions that are cheap, robotic view function using the Media Pipe facial grid method and
reliable, and computation-intensive. Principal Component Analysis.13 Mellouk and Handouzi (2020)22
describe the automated feeling based on face explanation, which is
People often express their emotions on their faces during social used in various areas such as sentient interactions, wellness, and
encounters to demonstrate their characteristics and feelings. This security. This paper provides a view of previous efforts on deep
research’s primary objective is to derive feelings to which pictures learning-based automatic facial emotion recognition. Jaiswal
with a single-face expression relate. Feelings can be recognized that et.al.14 projected a deep structure design on CNNs for feeling
are specifically split into the categorization of fundamental feelings depiction of snapshots, which was evaluated by the usage of sets:
and the classification of composite feelings because of the difficulty Face Emotion explanation task and the Japanese girl face Emotion
of reading a human face. The key challenge for the work and scope dataset. The designed system produced 70.14 and 98.65 percent,
is to focus on classifying the seven fundamental feelings as respectively.
happiness, sadness, surprise, neutrality, disgust, anger, and fear. Liliana et.al.10 used a deep Convolutional Neural Network to
detect facial expressions. They use a regularized technique called
LITERATURE REVIEW "dropout" in the CNN fully connected loops to reduce overfitting.
Visualization is a technique used to enhance an objective image The expanded Cohn Kanade dataset is used in the research, and the
or extract useful data from it. Nonverbal communication takes the mean certainty rate of the system has increased up to 2%. The basic
form of facial expressions, and it is important to recognize these feeling set has been successfully categorized by the system,
emotions on the face. A technology-based monitoring system for showing that it is active in emotion recognition. Fatima et.al.2 have
elderly people that detects emotions from video images is proposed, reported a mini-Xception based on Xception and a CNN to improve
which includes video analysis technology to enable real-time emotion expression recognition. They developed a system of seeing
monitoring of elders' living conditions. In the case of an emergency, that uses the mini-Xception platform to perform face analysis and
the system will send a message to their relatives and children. It feeling categorization. The method can effectually complete all
explains that face identification has been around for centuries, but tasks, including emotion detection and classification, with an
emotion detection is essential for modern AI systems. It requires a accuracy of about 95.60%, according to the exploratory
range of algorithms for feature extraction, and expert system investigation on the FER-2013 dataset.
techniques can be used to complete the task. The study examined Human emotions are spontaneous mental states of sentiments,
less system learning algorithms and attribute removal methods to and the research of facial recognition is challenging. Human
aid in the more accurate detection of human emotion. emotions are spontaneous mental states of sentiments, and the
Deep learning is used to recognize human emotions through research of facial recognition is challenging. Machine learning and
facial expressions, using the Kaggle FER2013 dataset to neural networks are used to identify emotions, and features are
experiment with and train a deep convolutional network. This work extracted from images by using CNN. CNN Model is 80% accurate
has been implemented in a real-time system with great success.11 for the four emotions, and 72% accurate for the five emotions. CNN
This research used the Haar-Cascade Classifier and CNNs to Pattern 2 is 79% accurate for the four and 72% accurate for the
classify facial emotion. Results showed that the CNN architecture five.15 Khare et.al.16 used Deep learning with Convolutional Neural
gained MSE and accuracy values based on epoch variety, Networks (CNNs) to provide input images of human facial
increasing the value, decreasing the MSE rate, and increasing the expressions for pre-trained models to be trained on datasets. The
accuracy value. It has been demonstrated that the CNN algorithm traditional approach can become corrupt due to changes in light and
is effective in recognizing facial expressions. Mehta et.al.12 object location, making feature engineering difficult.
discussed automated face recognition due to the growing appeal for Sentiment analysis (SA) is used to assess the author's
conducting biometric frameworks and human–machine interaction. sentiment,17 a variety of fields, such as forecasting, agriculture,
This study used Gabor filters, histogram-oriented gradients, a psychology, the judiciary, social media, and the stock market. This
local binary array, a support vector machine, a random forest, and work looks into various neural network-based methods for
the nearest neighbour method to extract features, categorize sentiment analysis, including lexicon- and ontology-based SA and
emotions, and estimate the depth of such feelings in a directory. The machine learning. A customized SVM for machine learning is used
results showed that this study can be used for actual-time behavioral to detect 6 distinct feelings using 68-point facial landmarks and
of eye contact and depth notice. video and system learning.18 Alhalaseh & Alasasfeh, 2020)17
The studies network has been interested in the capability to make studied how to build an automated system to identify emotions
use of human face emotion identity (FER). Deep Neural Nets, using brain signals. Four algorithms were used to classify emotional
especially Convolutional Neural Networks, are utilized to collect states: nave Bayes, K-Nearest Neighbour (K-NN), CNN, and
emotional facts from high-resolution images. The application
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 3
Sowmya BJ et. al.
Decision Tree (DT). The effectiveness of the suggested tasks was There are above 60,000 parameters in this architecture. Fig. 2
evaluated using metrics like accuracy, specificity, and sensitivity. and 3 depict the entire architecture, also referred to as mini-
Badrulhisham et.al20 reviewed how emotions are expressed Xception. Residual modules modify the desired mapping between
through body language, vocal inflection, and face expression. This two subsequent layers so that the learned features become the
difference between the original feature map and the desired
study created a mobile-system sentiment recognition application
features. Consequently, the desired features H(x) are modified to
that can identify feelings using facial expressions in real-time using solve an easier learning problem F(X) such that:
Convolutional Neural Network (CNN) and MobileNet algorithm.
21
, projected a real-time emotion detection approach using CNN and 𝐻𝐻(𝑥𝑥) = 𝐹𝐹(𝑥𝑥) + 𝑥𝑥 (1)
pre-processing techniques to improve model performance. The
authors in 22 provide a narrative method for detecting facial The following is a description of each component:
emotions using convolutional neural networks (FERC) to extract
face feature vectors and identify five varieties of normal face • Input:
reactions. Supervisory information is obtained using a 10,000
image database, resulting in 96 % accuracy. Face emotion detection The input layer is the source for the whole CNN. It normally
is used to analyze facial expressions of sadness, happiness, surprise, constitutes the picture's pixel matrix in neural networks used for
anger, and fear to determine a woman's emotional state in 23. image processing. Must provide the CNN a 4D array as input. Thus,
Machine learning algorithms are used to estimate sentiment using input data is in the form of height, width and size, where the first
transformed photos. dimension denotes the batch size of the picture and the other three
denote the height, breadth, and depth of the image, respectively.
DESIGN AND IMPLEMENTATION
The diagram step for developing software involves creating the • Pooling:
specifications, goals to achieve, and so on which are subject to The pooling layer reduces the spatial size of a dispersed feature.
constraints. In the design process, have set the structure for the units The extent of work necessary to examine the data and retrieve the
that are integrated together to form the final system. The design key, rotation and position-invariant structures is condensed as a
description allows programmers to easily program modules and result. Pooling can be divided into two types: maximum pooling
integrate them by following the design as depicted in Figure 2. and average pooling. The mean of matching values is the result
produced by the pooling layer and max pooling outputs the
maximum integer using the area of the photographs covered by the
kernel. An outcome applies maximum and median pools in
picture.15
• Max Pooling:
The term "max pooling" refers to a pooling operation that
selects an item from the feature map region that the filter covers.
The output of the max-pooling layer would thus be a feature map
that includes the most noticeable features from the previous feature
map.
Maxpooling down trials the input along its spatial dimensions
Figure 2. The architecture of Designed Model A by taking the extreme value for each input channel over an input
window of the pool size. Each dimension of the window is moved
The designed architecture consists of four residual depth-wise one step at a time. The following is the pseudo-code:
separable convolutions, with a batch normalization operation and
ReLUs activation function come after each convolution. To provide tf.keras.layers.MaxPooling2D(pool_size= (2, 2), strides=None,
a prediction, the last layer applies a soft-max activation function padding="valid", data_format=None, **kwargs)
and global average pooling.
• Sep-conv2D:
This layer creates a convolution kernel which is also combined
with the layer input to produce a matrix of output. Convolution
matrix or masks are used in image processing to blur, sharpen,
emboss, detect edges, and execute other tasks by convolutioning a
kernel and an image. Convolutional Neural Networks (CNNs) may
be used to classify data using image frames & and learn features.
There are several types of CNNs. Depth-wise separable CNNs are
one type of CNN.
Depth-wise separable convolutional networks' effectiveness
over simple CNNs is discussed in this article along with the design
Figure 3. The architecture of designed Model B and operation of these networks. Suppose that the input data has the
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 4
Sowmya BJ et. al.
dimensions: Df x Df x M, where M is the number of routes and Df Step 2: Partition the dataset into training and testing
x Df can be the size of the image (3 for an RGB image). Suppose Step 3: Apply the pre-processing techniques
there are N filters or kernels of size Dk x Dk x M. The output size Step 4: Build the model using the Mini-Xception algorithm
if a standard convolution action is carried out will be Dp x Dp x N. Step 5: Test data is given to Mini-Xception for classification
The size of filter = Dk x Dk x M is the number of multiplications Step 6: Calculate the Accuracy of Classification of Emotions
in one convolution operation. The total number of multiplications
is N x Dp x Dp x since there are N filters and each filter slides • Output:
vertically and horizontally Dp times (Multiplications per The receptive field of a convolution combines all the pixels into
convolution). a single value. A 4D array is another output from the CNN. The
Thus, for a standard convolution operation, the total number of
three other picture dimensions may alter depending on the filter,
multiplications = N x Dp2 x Dk2 x M. Separable convolution
consists of two different convolution layers such as: kernel size, and padding settings that are used, but the sample size
remains the same as the batch size of the input images.
• Depth-wise convolutions: Users can submit the input data to a front end and then click the
predict button. Mini-Xception, a deep learning-based algorithm, is
Unlike standard CNNs, where convolution is applied to all M used. The model for machine learning gets the training data as a csv
channels at once, the depth-wise operation only applies convolution file. The user's data is processed before it is sent to the model as
to one channel at a time. The filters/kernels used here will thus be input. The model suggests the output with different emotions shown
Dk x Dk x 1. Given load information as M channels, M where to the user. Modules used in the models are as follows:
refinement is required. The outcome will be Dp x Dp x M in size. • Face capturing module
• Pre-processing module
• Point-wise Convolutions: • Training module
• Face recognition module
Point-wise processing employs A 1 x 1 convolution operation on
• Expression recognition module
the M channels. Hence, the filter size for this operation will be 1 ×
1 x M. The output size is Dp x Dp x N when N of these filters are
Face Capturing Module: Users are now capturing individual
used.
faces for the next processing. For this, used a webcam or an external
webcam 10. Without taking the image first, the procedure could be
• Region Proposal Network(RPN):
finished, and without doing so, it is impossible to determine the
An RPN is a fully convolutional network that consecutively emotions.
predicts object limits and objectness scores at each location. Fast Pre-processing Module: will process the captured images after
R-CNN leverages the fully trained RPN to produce outstanding the photos are taken. The colour photos have been converted to
region recommendations for detection. grayscale to create the grayscale photos.
Training module: It will be necessary to prepare a dataset in this
• Mini-Xception: step, which will be made up of a binary array of the taken images.
The images will be saved in a.yml file that contains the collected
A pre-trained convolution model is famous for its cutting-edge
face data. Since the YML file is compressed, it can process the
performance in several applications, such as object detection and
collected photos more quickly.
image classification. It was trained on the Image Net dataset.
Face Recognition Module: Training the host system using the
Following the suggested architecture's training, the trained model
collected facial data is the latest stage in face detection procedures.
was assessed in real-time using the below pseudocode.
The subject's face is photographed using computer system's web
Step 1. Load the FER-2013 dataset
camera, which records
60 different images of
the subject's face 24. In
this section, examines
how to use the LBP
algorithm to identify
faces. The term "local
binary pattern
histogram" is an
acronym. Using
previously saved
NAME and face ID
saved earlier, the faces
in the database will be
Figure 4. Designed methodology for facial emotion detection placed.
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 5
Sowmya BJ et. al.
Face Expression Recognition Module: Software for
recognizing facial expressions uses biometric data to recognize
emotions in human faces 14. It has the potential to offer an
unfiltered, impartial emotional reaction or data because it collects
and analyses information from images.
By using software, a webcam is utilized to capture, recognize,
and record a person's facial expressions. It is possible to acquire a
rectangular frame on the face area in the camera by using the Viola
Jones method, the LBPH Face Recognizer algorithm, and the Haar
cascade frontal face dataset the rectangular frame on the face area
in the camera can be acquired. The face region is distinguished from
a non-facial non-facial region in this manner. Before to being saved
in a folder labelled with the subject's ID and name, captured person
faces are pre-processed. The trained dataset for these pictures is
saved as Trainer.yml in the Trainer folder after being trained using
the LBPH method. The face on a video camera is matched with the
face in the dataset during the Face Detection process that uses a
trained dataset. A person's ID and name will be displayed on the
Figure 6. Flow of the model of CNN
screen if their face matches one in the trained dataset.
Convolutional neural networks and the FER2013 database are used
The outcome of the training phase is a set of weights that perform
to perform the classification on the obtained face 2. Based on the
well with the training data. When the grayscale image of a face is
individual's characteristics, facial expression shows the possibility
fed during the test, the system generates the projected expression
of achieving the highest expression. A facial expression from a
using the final network weights discovered during training. The
possible seven is shown with the subject's identifiable picture. The
program generated a single number that represents one of the seven
complete process is depicted in Figure 4.
fundamental expressions.
Figure 5 depicts that in the training phase, the system learned a
set of weights for the network by receiving training data comprising RESULTS AND DISCUSSIONS
grayscale images of faces with signals of the respective emotions. The designed model consists of a machine learning architecture
An image with a face was used as input in this phase and then that is operated using a graphical user interface associated with the
subjected to intensity normalization. Windows application. The application is launched using the Visual
Figure 6 depicts that the convolutional network is trained using Studio code with Anaconda Navigator (Anaconda 3). The FER2013
the normalized images. The use of a validation dataset is made to data from the Kaggle competition on FER2013 was the data source
select the final best set of weights from a set of training performed used for the application. The framework for detecting facial
with examples presented in a different order, ensuring that the expressions is integrated using the database. The database
training performance is not dependent on the order in which the comprises of 35,887 images total, which are split into 28,709
examples are presented. pictures of trains and 3589 tests. For the final test, the dataset also
includes 3589 more private test images. The transcript pattern of
the FER2013 data is represented in Figure 7.
for validating
Class Distribution
8000
Number
6000
4000
2000
0
angry disgust fear happy sad surprise neutral
Emotions
Figure 5. Flow of the Training Model Figure 7. Shows expression disposition using FER 2013 data
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 6
Sowmya BJ et. al.
browse the dataset from the system and click on the open button as
shown in Figure 10.
Once after clicking on the open button, need to click on the train
button; the model trains the dataset as shown in Figure 11. There
are a total of 110 epochs, each epoch consists of 897 steps. In this
case, some parameters are trainable and some parameters are non-
trainable.
Open the command prompt from the search tab, activate tensor
flow (tf), then run the main file as shown in Figure 8. All the python
libraries were installed.
For real-time image processing, should run the main file, and the
web camera will open. Once the camera is opened, the face must be
shown to the camera and click on the predict button as shown in
Figure 12. The accuracy of the happy emotion is shown in
probabilities.
Figure 9. GUI window
Once the main file is run, the user interface window opens as
shown in Figure 9. Using a graphical user window, can upload the
csv dataset.
For real-time image processing, should run the main file, the web
camera will open. Once the camera is opened, the face must be
shown to the camera and click on the predict button as shown in
Figure 13. The accuracy of the surprised emotion is shown in
Figure 10. Upload csv dataset probabilities.
Once the graphical user window appears, the dataset can be Figure 14. Angry emotion
uploaded. To upload the dataset click on the browse button, then
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 7
Sowmya BJ et. al.
For real-time image processing, should run the main file, the web 𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 ÷ 𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹 + 𝐹𝐹𝐹𝐹
camera will open. Once the camera is opened, the face must be Recall is given by
shown to the camera and click on the predict button as shown in 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑇𝑇𝑇𝑇 ÷ 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
Figure 14. The accuracy of the angry emotion is shown in Precision is given by
probabilities. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇𝑇𝑇 ÷ 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
Figure 15. Neutral emotion Table 1. Accuracy results of neural networks retrained with FER2013
Neural network Layers Accuracy %
For real-time image processing, should run the main file, the web name
camera will open. Once the camera is opened, the face must be GoogLeNet26 22 63.21
shown to the camera and click on the predict button as shown in CaffeNet27 8 68
Figure 15. The accuracy of the neutral emotion is shown in VGG1628 16 71.4
probabilities. ResNet5028 152 73.8
Comparison and Evaluation of Results
Designed Model 36 95.60
This research presents a precisely trained model for identifying
driving distractions. The number of fatal accidents caused by driver
mistakes or negligence has reached an all-time high in recent years. Table 2. Comparison of the different techniques
Drivers might be warned if they tend to become distracted to avoid Neural network Accuracy Precision Recall (%)
name (%) (%)
accidents. Images of distracted drivers, such as those who are
texting, changing radio stations, drinking, and/or engaging in other GoogLeNet26 63.21 62 62
similar activities, are used as input to train the system. The best CaffeNet27 68 67 66.2
model for this job is chosen after this dataset has been used to train VGG1628 71.4 81.9 79.4
a variety of deep CNN algorithms.25 The model proportionately ResNet5028 73.8 83.3 80.7
detects a wide range of distracted drivers while eliminating the non- Designed Model 95.60 90 93
distracted drivers as distraction levels rise.
The deep CNN algorithms can detect spatial and temporal A confusion matrix is used to calculate accuracy. Quantitative
dependencies in images with a minimal amount of preprocessing. analysis of the experimental outcomes was done by considering the
Basic preprocessing techniques are still required to ensure that the following performance indicators, including accuracy (AC). This
dataset excludes irrelevant data. The RGB images are changed to a (FN) also makes use of the variables True Positive (TP), True
grayscale format, where a two-dimensional matrix structure serves Negative (TN), False Positive (FP), and False Negative (FN).
as the representation for each image. Because of background noise Uncertainty Matrix: This specific Table 3 style makes it possible to
from the car seats, thresholding of the images is required. assess the effectiveness of an algorithm.
Thresholding ensures that only the necessary portion(s) of the
image is extracted, describing distracted driving. The main image- Table 3. Confusion matrix of the designed algorithm
processing methods ensure the suitability of the final image and add Expressio Anger Disgus Fear Happy Sad Surpris Neutra
to the diversity of the dataset. The Deep CNN architecture offers n t e l
several image categorization models and methods. Three models Anger 63.85 0.06 0.0 0.05 6.21 1.03 8.30
were used: ResNet, Xception, and VGG16. The distracted driver Disgust 0.0 55.5 0.5 0.7 10.9 20.9 0.0
dataset was used to train these models separately. To measure the Fear 12.9 0.01 65.78 0.5 0.12 17.8 0.0
effectiveness of these models, a variety of evaluation metrics were Happy 0.01 0.00 0.02 98.38 0.01 0.04 1.5
also employed. To choose the optimal model, evaluation metrics Sad 25.0 11.9 0.12 0.08 85.0 0.05 7.5
were utilized. For this purpose, the ResNet model was shown to be Surprise 18.7 0.08 17 0.18 8.10 28.8 2.41
the most effective one for the successful classification of driver's Neutral 12.06 0.02 0.0 4.8 14.7 1.75 54.45
distraction. Different assessment measures including precision,
recall, and F1-score are used to assess the performance of system
models. An overall accuracy is determined by using below formula:
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 8
Sowmya BJ et. al.
Accuracy and loss graph read someone's face in real-time. The majority of firms find it easy
The authors recognize that determining model fitting involves to understand employee behaviour and address some minor and
assessing the precision of training and validation. If there is a wide major difficulties at an earlier phase due to small, portable gadgets.
difference between the two, the algorithm is overfitting. The To do that, a framework that can be used in any firm to understand
precision of the validation should be equal to, or less than, the employee behaviour has been tested and recommended. For
precision of the preparation to be a superior design. The study is assessing the ability of facial expressions, facial expression
shown in Figure 16, where researchers improve the loops and representation is crucial. It may be seen as providing important
eventually add some more convolution and different similar loops, features for characterizing the look, make-up, and movements of
making the wider and broader connection. An epoch advances; the facial emotions. An increased version of the Xception design using
trained accuracy is only slightly bigger than the validation unused networks for emotion circulation and identification
accuracy. It seems that there is a loss of validation which may be implemented the Mini-Xception replica in this research. The
lesser than the lack of preparation. performance of the FER-2013 file is better than the current
technique. An effective system was created to recognize the seven
ways in which sentiments can be conveyed, which are: disgust,
fear, anger, happiness, sadness surprise, and neutral. In recent
studies, deep learning models have been widely applied to propose
an end-to-end method for expression identification. Even if it is a
difficult task, recognizing emotion still needs a lot of development.
Using Mini-Xception, accuracy for emotion expression and
recognition was 95.60%, while recall rate and precision were 93%
and 90%, respectively.
ABBREVIATIONS
CNN: Convolution Neural Network
FER: Facial Emotion Recognition
FE: Feature Extraction
Figure 16. Graphical depiction of training & validation accuracy per SA: Sentiment analysis
epoch
KNN: K-Nearest Neighbour
DT: Decision Tree (DT)
Figure 16, Comparison of training failure and validation failure.
RPN: Region Proposal Network
This shows that as the epoch grows, the validating cost rises while
GUI: Graphical User Interface
the training loss reduces. Additionally, as the weights are changed,
AC: Accuracy (AC)
it is always expected that the validation data will decrease. Figure
TP: True Positive
17 may anticipate a lower value validation loss than training loss in
TN: True Negative
this case as the epoch grows in a higher order, as the author had
FP: False Positive
witnessed in the previous stage of the picture. The model therefore
FN: False Negative
fits the training results well.
AUTHOR CONTRIBUTIONS
Sowmya B J and Meeradevi: Implemented and wrote the few text
of the manuscript; Sini Anna Alex and Anita Kanavalli: Reviewed
the article; Supreeth S and Shruthi G: were responsible for
analyzing and building the algorithm and framework for the study.
FUNDING
This research received no specific grant from any funding
agency in the public, commercial, or not-for-profit sectors.
ACKNOWLEDGMENTS
The authors acknowledge the support from MSRIT, REVA
University for the facilities provided to carry out the research.
CONFLICTS OF INTEREST
Figure 17. Graphical depiction of training and validation loss per All authors declare that they have no conflicts of interest.
epoch
REFERENCES
CONCLUSIONS 1. P. Angusamy, S. Inba, K.S. Pavithra, M. Ameer Shathali, M.
Without personal contact with the topic or suspect, real-time Athiparasakthi. Human Emotion Detection using Machine Learning
detection of any suspicious activity is difficult. It might be hard to Techniques. SSRN Electron. J. 2020, 3591060.
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 9
Sowmya BJ et. al.
2. S.A. Fatima, A. Kumar, S.S. Raoof. Real Time Emotion Detection of 15. P. Bagane, S. Vishal, R. Raj, T. Ganorkar. Facial Emotion Detection using
Humans Using Mini-Xception Algorithm. IOP Conf. Ser. Mater. Sci. Eng. Convolutional Neural Network. Int. J. Adv. Comput. Sci. Appl. 2022, 13
2021, 1042 (1), 012027. (11), 168–173.
3. J. Bao, M. Ye. Head pose estimation based on robust convolutional neural 16. S.K. Khare, V. Bajaj. Time–Frequency Representation and Convolutional
network. Cybern. Inf. Technol. 2016, 16 (Specialissue6), 133–145. Neural Network-Based Emotion Recognition. IEEE Trans. Neural
4. B.P. Babu, S.J. Narayanan. One-vs-All Convolutional Neural Networks for Networks Learn. Syst. 2021, 32 (7), 2901–2909.
Synthetic Aperture Radar Target Recognition. Cybern. Inf. Technol. 2022, 17. P. Chakriswaran, D.R. Vincent, K. Srinivasan, et al. Emotion AI-Driven
22 (3), 179–197. Sentiment Analysis: A Survey, Future Research Directions, and Open
5. A. Lazarov, C. Minchev. ISAR image recognition algorithm and neural Issues. Appl. Sci. 2019, 9 (24), 5462.
network implementation. Cybern. Inf. Technol. 2017, 17 (4), 183–189. 18. M. Healy, R. Donovan, P. Walsh, H. Zheng. A Machine Learning Emotion
6. M.A. Ansari, D.K. Singh. ESAR, An Expert Shoplifting Activity Detection Platform to Support Affective Well Being. In 2018 IEEE
Recognition System. Cybern. Inf. Technol. 2022, 22 (1), 190–200. International Conference on Bioinformatics and Biomedicine (BIBM);
7. H. V. Ramachandra, P. Chavan, S. Supreeth, et al. Secured Wireless 2018; pp 2694–2700.
Network Based on a Novel Dual Integrated Neural Network Architecture. 19. R. Alhalaseh, S. Alasasfeh. Machine-learning-based emotion recognition
J. Electr. Comput. Eng. 2023, 2023, 1–11. system using EEG signals. Computers 2020, 9 (4), 1–15.
8. M. Pujar, M.R. Mundada, B.J. Sowmya, S. Supreeth, G. Shruthi. An 20. N.A.S. Badrulhisham, N.N.A. Mangshor. Emotion Recognition Using
Efficient Framework for Web Content Mining Systems Using Improved Convolutional Neural Network (CNN). J. Phys. Conf. Ser. 2021, 1962 (1).
CD-PAM Clustering and the A-CNN Technique. SN Comput. Sci. 2023, 4 21. T. Gilligan, B. Akis. Emotion AI, Real-Time Emotion Detection using
(5), 692. CNN. 2016.
9. P.A. Riyantoko, Sugiarto, K.M. Hindrayani. Facial Emotion Detection 22. N. Mehendale. Facial emotion recognition using convolutional neural
Using Haar-Cascade Classifier and Convolutional Neural Networks. J. networks (FERC). SN Appl. Sci. 2020, 2 (3), 1–8.
Phys. Conf. Ser. 2021, 1844 (1). 23. B. Hdioud, M. El, H. Tirari. Facial expression recognition of masked faces
10. D.Y. Liliana. Emotion recognition from facial expression using deep using deep learning. 2023, 12 (2), 11591.
convolutional neural network. J. Phys. Conf. Ser. 2019, 1193 (1). 24. W. Mellouk, W. Handouzi. Facial emotion recognition using deep learning:
11. I. Talegaonkar, K. Joshi, S. Valunj, R. Kohok, A. Kulkarni. Real Time Review and insights. Procedia Comput. Sci. 2020, 175, 689–694.
Facial Expression Recognition using Deep Learning. SSRN Electron. J. 25. B. Kabra, C. Nagar. Attention-Emotion-Embedding BiLSTM-GRU
2019, 1, 1–5. network based sentiment analysis. J. Integr. Sci. Technol. 2023, 11 (4), 563.
12. D. Mehta, M.F.H. Siddiqui, A.Y. Javaid. Recognition of emotion intensities 26. P. Giannopoulos, I. Perikos, I. Hatzilygeroudis. Deep learning approaches
using machine learning algorithms: A comparative study. Sensors for facial emotion recognition: A case study on FER-2013. Smart Innov.
(Switzerland) 2019, 19 (8), 1–24. Syst. Technol. 2018, 85, 1–16.
13. A.I. Siam, N.F. Soliman, A.D. Algarni, F.E. Abd El-Samie, A. Sedik. 27. W. Mohamed Yassin, M. Faizal Abdollah, Z. Muslim, R. Ahmad, A. Ismail.
Deploying Machine Learning Techniques for Human Emotion Detection. An Emotion and Gender Detection Using Hybridized Convolutional 2D and
Comput. Intell. Neurosci. 2022, 2022. Batch Norm Residual Network Learning. ACM Int. Conf. Proceeding Ser.
14. A. Jaiswal, A. Krishnama Raju, S. Deb. Facial Emotion Detection Using 2021, 79–84.
Deep Learning. In 2020 International Conference for Emerging 28. X. Zhao, X. Shi, S. Zhang. Facial Expression Recognition via Deep
Technology (INCET); 2020; pp 1–5. Learning. IETE Tech. Rev. 2015, 32 (5), 347–355.
Journal of Integrated Science and Technology J. Integr. Sci. Technol., 2024, 12(4), 786 Pg 10