0% found this document useful (0 votes)
117 views25 pages

Facial Sentiment Analysis Using AI Techniques State-of-the-Art, Taxonomies, and Challenges

This document summarizes a research paper on facial sentiment analysis using artificial intelligence techniques. The paper provides an overview of state-of-the-art methods for facial expression recognition using machine and deep learning algorithms. It presents a taxonomy of existing facial sentiment analysis strategies and reviews novel neural networks designed specifically for facial expression recognition from static images. The paper also discusses open issues and challenges in building robust systems for facial sentiment analysis.

Uploaded by

Rajesh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views25 pages

Facial Sentiment Analysis Using AI Techniques State-of-the-Art, Taxonomies, and Challenges

This document summarizes a research paper on facial sentiment analysis using artificial intelligence techniques. The paper provides an overview of state-of-the-art methods for facial expression recognition using machine and deep learning algorithms. It presents a taxonomy of existing facial sentiment analysis strategies and reviews novel neural networks designed specifically for facial expression recognition from static images. The paper also discusses open issues and challenges in building robust systems for facial sentiment analysis.

Uploaded by

Rajesh Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SPECIAL SECTION ON ADVANCES IN MACHINE LEARNING AND COGNITIVE COMPUTING

FOR INDUSTRY APPLICATIONS

Received April 28, 2020, accepted May 7, 2020, date of publication May 11, 2020, date of current version May 26, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2993803

Facial Sentiment Analysis Using AI Techniques:


State-of-the-Art, Taxonomies, and Challenges
KEYUR PATEL 1 , DEV MEHTA 1 , CHINMAY MISTRY 1 ,
RAJESH GUPTA 1 , (Student Member, IEEE), SUDEEP TANWAR 1, (Member, IEEE),
NEERAJ KUMAR 2,3,4 , (Senior Member, IEEE), AND
MAMOUN ALAZAB5 , (Senior Member, IEEE)
1 Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad 382481, India
2 Department of Computer Science Engineering, Thapar Institute of Engineering and Technology, Patiala 147004, India
3 Department of Computer Science and Information Engineering, Asia University, Taichung 41354, Taiwan
4 Department of Information Technology, King Abdul Aziz University, Jeddah 21589, Saudi Arabia
5 College of Engineering, IT and Environment, Charles Darwin University, Casuarina, NT 0810, Australia

Corresponding authors: Neeraj Kumar ([Link]@[Link]) and Mamoun Alazab ([Link]@[Link])


This work was supported by the Department of Corporate and Information Services, NTG of Australia.

ABSTRACT With the advancements in machine and deep learning algorithms, the envision of various
critical real-life applications in computer vision becomes possible. One of the applications is facial sentiment
analysis. Deep learning has made facial expression recognition the most trending research fields in computer
vision area. Recently, deep learning-based FER models have suffered from various technological issues like
under-fitting or over-fitting. It is due to either insufficient training and expression data. Motivated from
the above facts, this paper presents a systematic and comprehensive survey on current state-of-art Artificial
Intelligence techniques (datasets and algorithms) that provide a solution to the aforementioned issues. It also
presents a taxonomy of existing facial sentiment analysis strategies in brief. Then, this paper reviews the
existing novel machine and deep learning networks proposed by researchers that are specifically designed
for facial expression recognition based on static images and present their merits and demerits and summarized
their approach. Finally, this paper also presents the open issues and research challenges for the design of a
robust facial expression recognition system.

INDEX TERMS Facial sentiment analysis, machine learning, deep learning, convolutional neural network,
deep belief network, artificial intelligence.

I. INTRODUCTION FER is a baffling task and its accuracy is completely depen-


Emotions are efficacious and self-explanatory in normal dent on the parameters selected, such as illumination factors,
day-to-day human interactions. The most noticeable human occlusion, i.e., obstruction on the face like hand, age, and sun-
emotion is through their facial expressions. The Facial glasses. Researchers of the field are taking these parameters
Expression Recognition (FER) is quite complex and tedious into consideration while making their FER models so that
but helps in various applications areas such as healthcare the considerable accuracy can be achieved. The description
[1]–[3], emotionally driven robots, and human-computer of some important factors for FER is as follows.
interaction. Although the advancements in FER increases its • Illumination factor: The light intensity falling on the
effectiveness, achieving a high accuracy is still a challenging object affects the classification of the model. The tex-
task [4]. The six most generic emotions of a human are anger, tural values increase the false acceptance rate due to
happiness, sadness, disgust, fear, and surprise. Moreover, either by minimizing the distance between classes or by
the emotion called contempt was added as one of the basic increasing the contrast [6].
emotions [5]. • Expression Intensity: The expression recognition is
highly dependent on the intensity of the expression.
The expression is recognized more accurately when the
The associate editor coordinating the review of this manuscript and expression is less subtle. It highly affects the accuracy
approving it for publication was Min Xia . of the model.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
VOLUME 8, 2020 90495
K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 1. Nomenclature. data. A dataset must have well-defined emotion tags


of facial expression is essential for testing, training,
and validating the algorithms for the development of
FER. These datasets contain a sequence of images
with distinct emotions, as mentioned above. We have
reviewed many datasets to train different models for
real-world profits. Table 4 provides an overview of dif-
ferent datasets available for FER.
• Pre-Processing: This step pre-processes the dataset by
removing noise and data compression. Various steps
involved in data pre-processing are: (i) facial detection
is the power to detect the location of the face in any
image or frame. It is often considered as a special case
of object-class detection, which determines whether the
face is present in an image or not, (ii) dimension reduc-
tion is used to reduce the variables by a set of principal
variables. If the number of features is more, then it gets
tougher to visualize the training set and to work on it.
Here, PCA and LDA can be used to handle the afore-
mentioned situation. (iii) normalization: It is also known
as feature scaling. After the dimension reduction step,
reduced features are normalized without distorting the
differences in the range of values of features. There are
various normalization methods, namely Z Normaliza-
tion, Min-Max Normalization, Unit Vector Normaliza-
tion, which improves the numerical stability and speeds
up the training of the model.
• Feature Extraction: It is the process of extracting fea-
tures that are important for FER. It results in smaller and
richer sets of attributes that contain features like face
edges, corners, diagonal, and other important informa-
tion such as distance between lips and eyes, the distance
between two eyes, which helps in speedy learning of
trained data.
• Emotion Classification: It involves the algorithms to
classify the emotions based on the extracted features.
The classification has various methods, which classi-
fies the images into various classes. The classification
FIGURE 1. The general pipeline of FER systems. of a FER image is carried out after passing through
pre-processing steps of face detection and feature extrac-
tion. Various classification techniques are discussed
later in the proposed survey.
• Occlusion: If occlusion is present on an image then it
becomes difficult for the model to extract features from The FER system has various applications such as
the occluded part due to inaccurate face alignment and computer-human interactions, healthcare system [9]–[14],
imprecise feature location. It also introduces noise to and social marketing. In the proposed survey, we analyze the
outliers and the extracted features. existing surveys pertaining to different approaches of FER
FER systems can be either static or dynamic based on proposed by the authors globally. We compare the surveys
image. Static FER considers only the face point location and develop a taxonomy on various pre-processing, feature
information from the feature representation of a single image, extraction, and emotion classification steps. We also discuss
whereas, the Dynamic Image FER considers the temporal the various open issues and future research challenges related
information with continuous frames [7], [8]. The static FER to FER.
process over is exhibited in FIGURE 1 with description of
steps as follows. A. MOTIVATION
• Dataset: To avoid over-fitting, the following FER algo- Paul Ekman first coined the term FER in the mid-1980s.
rithms are discussed, which needs extensive training Since then, various machine learning techniques like random

90496 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

forest classifiers, artificial neural networks, etc. were used by C. RESEARCH CONTRIBUTIONS
the researchers to recognize the seven basic emotions. They In this paper, we surveyed various existing literature on Facial
also claimed good and effective results. Automated human Sentiment Analysis focusing on the DL techniques, datasets,
emotion detection is all-important in security and surveillance and the methodologies used to classify emotions. Following
applications these days. To further improve its performance, are the crisp contributions of the paper.
the researchers are trying hard to explore further in this field. • We present an in-depth survey on FER methods and
Various challenges like occlusion in datasets, over-fitting of dataset used. Then, we highlight the advanced methods
models, etc. have to be taken care of while implementing the used for FER and their comparative analysis.
FER. As per the literature explored and knowledge of the • We present a taxonomy on FER methods based on face
authors, no survey is available, which exhaustively compares detection, feature extraction, and emotion classification.
the FER approaches from the perspective of AI. Motivated • Finally, we presented the open issues and research chal-
from the aforementioned fact, we present a comprehensive lenges in the Facial Sentiment Analysis.
survey on FER using Artificial Intelligence (AI) techniques in
which we have explored the state-of-the-art machine learning
D. ORGANIZATION
and DL (DL) approaches with their merits and demerits.
Structure of the survey is as shown in FIGURE 2. Section II
focuses on the evolution of facial recognition techniques
B. SCOPE OF THE SURVEY presented by the authors across the globe and the dataset
Facial sentiment analysis is the most trending topics in Com- used. It also describes the need for facial detection, dimension
puter Vision area. A lot of literature has already been pub- reduction, normalization, feature extraction, and emotion
lished by researchers across the globe in this field, but still, classification. In Section III, we highlighted the bibliometric
many researchers are trying to solve the challenges and issues analysis and methodology used for conducting the proposed
in FER. Various surveys have been published in recent years survey. In Section IV, we discuss various facial expression
[7], [15], [26], [27] on sentiment analysis. These surveys have databases available for analysis. Section V discusses the
mainly focused on traditional methods like support vector proposed taxonomy (facial sentiment analysis taxonomy).
machine (SVM), decision tree classifiers, and artificial neural In Section VI, we discuss the open issues and research chal-
network (ANN). The DL methods [28], [29] have rarely been lenges of FER, and finally, Section VII concludes the survey.
explored by the researchers working in the same field. So, Table 1 lists all the acronyms used in the paper.
in this paper, we analyzed the surveys on facial sentiment
analysis and presented a comparative analysis. For example, II. BACKGROUND
Hemalatha and Sumathi [15] surveyed various methods for This section focuses on the background and importance of
facial detection, facial feature extraction and classification of facial expressions for sentiment analysis. It is bifurcated into
FER, but not presented the proper comparison of methods four subsections. Firstly, we discuss the evolution of the
considered and the dataset used. Later, the authors in [16] also time-line of facial recognition methods. Secondly, we dis-
presented the survey on FER, but they had not mentioned any- cuss the need for Facial Detection, Dimension Reduction,
thing about datasets useful for emotions recognition. Another and Normalization for sentiment analysis. In the third sub-
survey of Chengeta and Viriri [19] was on various traditional section, we focus on the need for feature extraction from
feature extraction techniques like principal component analy- the face image. Finally, we highlight the need for emotion
sis (PCA), Linear Discriminant Analysis (LDA), and Locally classification.
Linear Embedding (LLE), and thereafter they proposed an
ensemble classifier. They failed to compare the advanced A. EVOLUTION TIMELINE
DL approach, which is currently the most novel approach in Figure 3 gives a brief overview on the evolutionary time-line
FER. Again, Baskar and Kumar [18] also lacks in explaining of facial sentimental recognition methods given by the
various DL approaches. researchers across the globe along with the datasets. There
Recently, DL-based FER approaches has been explored in exists various algorithms for FER such as traditional state-of-
[7], [27], which are the detailed surveys without the discus- the-art algorithms and DL-based algorithms proposed by var-
sion on FER. Therefore, in the proposed survey, we make a ious researchers till 2020. The emotion recognition was first
systematic survey of various databases used for FER, various stated in the paper proposed by Bassili [30] in 1978 where
methods for face detection, facial feature extraction, and authors have classified the emotions into six basic gestures
emotion classification, future challenges, and current issues such as happiness, sadness, fear, surprise, anger, and dis-
in facial sentiment analysis. Our aim for this survey that it gust. Different algorithms (traditional and DL) were used for
would be quite beneficial for those who want to explore in FER by the authors. For example, Padgett and Cottrell [31],
this field and they will get a complete overview of all the the first time (ANN) in 1996, SVM [32] in 2000, CNN [33]
advanced systematic approaches in facial sentiment analysis. in 2003, Multi-SVM [34] in 2006, boosted DBN [34] in 2014,
Table 2 presents relative differences between the existing RNN [35] in 2015, and (PHRNN and MSCNN) [36] in 2017.
surveys with the proposed survey. Also, many datasets have been created for training ans testing

VOLUME 8, 2020 90497


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 2. A relative comparison of the proposed survey with the existing FER surveys.

these FER models. The time-line shows the list of datasets as bounding box, which is put over the face. Detecting a face
per the creation year. Datasets are- JAFFE [37] in 1998, CK+ is quite complex as the human faces can be in different sizes
[38] in 2000, MMI [39] in 2002, Oulu-CASIA [40] in 2008, and shapes. So, the face detection algorithm plays a vital role
Multi-PIE [41] in 2009, (RaFD [42], MUG [43], and TFD in the aforementioned situation. Various algorithms for face
[44]) in 2010, and FER-2013 [45] in 2013. As new datasets detection are available such as Viola-Jones [46], PCA, LDA,
are being available supported with DL algorithms to solve the and genetic algorithms. Viola-Jones algorithm is one of the
challenges in FER. most widely used algorithms for face detection. It differenti-
ates faces from the non-faces. PCA is the other most widely
B. NEED FOR FACIAL DETECTION, DIMENSION used face detection method. It is used to reduce the image
REDUCTION AND NORMALIZATION dimensions and has four main parts:- feature covariance,
In the FER process, the first pre-requisite step is face detec- eigen decomposition, principal component transformation,
tion, which involves the detection of a face in the image or and choosing components [47]. Reducing the dimensions
frame and removes the insignificant pixels. The face detection from m − dimensions to n − dimensions: ∀ m > n does not
algorithm gives the output in the form of coordinates of the means we are losing the properties of the image, moreover it

90498 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

FIGURE 2. Roadmap of the survey.

preserves [48]. After the dimension reduction, normalization The second method, which is an appearance-based method,
can be used to scale-up the image. takes care of the states of different points of the face, such
as the position of the eye, shape of important points such
C. NEED FOR FEATURE EXTRACTION as mouth and eyebrows using the salient point features.
Facial Expression analysis comprises of various methods The majority of the traditional methods have used Local
such as facial landmark identification, feature extraction, Binary Pattern (LBP) as the feature extraction technique,
and different feature extraction databases. Facial landmarks which is a generic-based framework for the extraction of
are drawn by the facial key points which are derived from features from the static image. It converts the most impor-
the geometry of the face [16]. Feature Extraction is done tant features of the input image, as mentioned above, into a
after preprocessing phase [49]. There are two methods avail- histogram [52].
able for feature extraction are appearance-based extrac-
tion and geometric-based extraction. The geometric-based
method extracts feature like edge features and corner features. D. NEED FOR EMOTION CLASSIFICATION
Neha et al. [50] analyzed the performance of the feature The third step in the FER is the Emotion Classification. There
extraction technique Gabor filter. They also tested the average are various methods that are used for the classification of
gabor filter and compared both the filtering techniques to emotions after applying face detection and feature extraction
enhance the recognition rate. algorithms. The various classification algorithms are con-
• Corners: Corners of an image is a significant property, volutional neural network (CNN) [53], SVM, and restricted
which can be inferred from the complex objects of boltzmann machine (RBM). The most widely used method
the image. Cho et al. [51] developed the corner detec- for classification is CNN. It is the most efficient algorithm as
tion technique, which measures the distance and angle it can be applied directly to the input image without applying
between two straight lines. any feature extraction and face detection algorithms and still
• Edges: They are one-dimensional features that represent gets better accuracy over the input data [54]. The number
the boundary of an image region. of images in the training data set also has a huge impact

VOLUME 8, 2020 90499


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 3. Research questions and their objectives.

FIGURE 3. Evolution of facial recognition methods and datasets.

FIGURE 4. Possible strings used to search the literature.


on classification results. CNN faces a huge challenge in the
training of limited-image dataset. So, the models which are
built on a limited dataset can use the SVM algorithm for fea-
ture extraction and face detection. The emotions of a human C. DATA SOURCES
are not static, it varies time-to-time. So, the classification of Various literature has been studied for a thorough and com-
situation-based emotions is challenging. plete survey. We followed the genuine digital databases
such as Springer, IEEEXplore (early access, magazine, and
III. SURVEY METHODOLOGY transaction articles), Science Direct (elsevier), ACM digital
In this section, we present the methodology followed to con- library, Google Scholar for accessing the existing literature
duct the proposed survey such as search strings used, research surveys on Facial Sentiment Analysis [56].
questions, and the authentic data sources.
D. SEARCH CRITERIA
The search is performed using some standard keywords like
A. RESEARCH PLANNING
‘‘Facial Sentiment Analysis’’, ‘‘Facial Emotion Recognition’’
The proposed survey initiated with the discussion and iden- and other matching keywords as mentioned in FIGURE 4.
tification of various quality research questions, data sources, There exist many articles in different digital libraries, where
as well as search criteria. We identified the relevant surveys the search string is not present either in the title or abstract
proposed by various researchers and if data is found relevant, [57], then a manual search process was done for such research
then we extracted data from it [55]. papers.

B. RESEARCH QUESTIONS IV. FACIAL EXPRESSION DATABASES


The proposed survey found out the existing literature on In this section, we discuss various datasets that are currently
Facial Sentiment Analysis. The identified research questions used for training and testing in FER. We also presented a
are specified in Table 3. comparative analysis of these datasets based on the articles

90500 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 4. Different facial expression databases.

published till-date. The comparative analysis of the datasets To remove less illumination problem on the face, tungsten
is presented in Table 4. lights were used.

A. EXTENDED COHN-KANADE (CK+)


CK+ is the most widely used dataset for FER systems [7], C. RADBOUD FACES DATABASE (RAFD)
[38], which contains almost 593 different sequences captured RaFD database is the database of portrait images of 49 sub-
from 123 different subjects. Sequences may vary between jects of 39 Dutch adults and 10 Dutch children [42]. All
10 to 50 frames, whereas frames shows the shift from a models have 8 facial emotions like neutral, sadness, happi-
neutral face to a specific expression [7]. ness, disgust, anger, contempt, surprise, and fear with three
gaze directions. Every emotion in the image was shown with
B. THE JAPANESE FEMALE FACIAL EXPRESSION eyes coordinated straight ahead, deflected to the left side, and
JAFFE data-set includes 219 different images with 7 facial turned away to the right side. Pictures were captured with
expressions captured from 10 Japanese female models [73]. white background from five different camera angles at the
Images of each and every model were captured while look- same time from left to right with 45◦ angle. There are total
ing through a camers with semi-reflective plastic sheet. of 120 images for each model.

VOLUME 8, 2020 90501


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

D. DELIBERATE EXPRESSION DATASET MMI I. OULU-CASIA


MMI database is a laboratory-controlled database which Oulu-CASIA NIR (near-infrared) and VIS (visible light)
contains 326 sequences from 32 subjects [39], [74]. There facial expression database with six diverse expressions (sur-
are total of 213 sequences labeled with 6 expressions prise, happiness, sadness, anger, fear, and disgust) having
and the main advantage of this database is that there are 80 subjects between the ages of 23 and 58 years [40]. 73.8%
205 sequences captured in frontal view. The difference of the subjects are males and the rest are females. The image
between CK+ and this database is that this database contains resolution is 320 × 240 pixels.
sequences which starts with a neutral expression and reaches
to the specific expression at the middle of the sequence J. MULTIMEDIA UNDERSTANDING GROUP
and then returns to the neutral expression. It is a complex It is also known as MUG [43] dataset, which is a
database because there exist large relational variations due laboratory-controlled dataset with 6 basic emotions-anger,
to the images have the same expressions and non-uniformity disgust, fear, happy, sad, surprise, and neutral. This database
(many of the images have glasses, long hair, and mustache). is divided into two segments. In the initial segment, the sub-
Researchers widely use the first and the last three frames to jects were approached to play out the six basic emotions. The
the final expression to perform emotion recognition task [7]. subsequent part contains a research facility that prompted
feelings. There is a total of 86 subjects and 1462 image
E. FER2013 sequences. A camera has the option to click pictures at a
This database was first established during the ICML pace of 19 frames/second. Each picture is in jpg format with
2013 challenges of Kaggle [7], [45]. It includes a large num- (896 × 896) pixels and 240 to 340 KB size. This dataset was
ber of images and important characteristics. It also includes made to defeat the restrictions of other comparable datasets
unconstrained images collected automatically by Google in FER, such as high goals, uniform lighting, numerous sub-
image search API. It contains 28,709 images for the training jects, and numerous clicks per subject.
set, 3,589 images for validation set, and 3,589 images for test
set, which sums up to total 35,887 images with 7 expression K. AFFECTNET
labels [7]. All these images have been reduced to the size of It is a dataset for the identification of wild human expressions.
(48 × 48) pixels. This is one of the most challenging datasets It has above 1 million different images to be collected from
in FER. the internet source by using over 1200 keywords related to
human emotions.
F. TORONTO FACE DATABASE (TFD)
It is the combination of different FER datasets [44]. L. EMOTIC
It includes 1,12,234 images, 4,178 of which are labeled It is a facial emotion dataset with EMOTions In Context.
with one of the seven expression labels, such as sadness, It collected the images of people in the real environment with
surprise, fear, happiness, anger, disgust, and neutral [7]. The apparent emotions. It has widest 26 emotions categories.
main advantage of this dataset is that it contains images that
have faces been already detected and reduced to the size of V. FACIAL SENTIMENT ANALYSIS:
(48 × 48). There are 5 folds in this database where each fold THE PROPOSED TAXONOMY
has 70% of images for the training set, 10% of images for the This section presents the taxonomy for Facial Sentiment
validation set, and remaining for test set [7]. Analysis, which is splitted into three subsections such as
pre-processing (face detection, dimension reduction, and nor-
malization), feature extraction, and emotion classification.
G. MULTI-PIE The detailed taxonomy for FER is shown in FIGURE 5.
This database contains 755,370 images with 337 subjects
[41]. The main advantage of this database is that it includes A. FACIAL DETECTION, DIMENSION REDUCTION, AND
images with 15 different viewpoints and 19 diverse illu- NORMALIZATION
mination conditions and each image is named as one
In this section, we discuss the various face detection, dimen-
of the 6 expression. This is mainly used for multi-view
sion reduction, and normalization techniques that are widely
3D FER [7]. used in FER models. We also highlight and compare the
various face detection techniques proposed by different
H. STATIC FACIAL EXPRESSIONS IN THE WILD (SFEW) researches. Table 5 shows the comparison of various state-
IT was designed by selecting frames (static) from the acted of-art detection techniques available in existing literature.
facial expressions in wild (AFEW) [67]. The advanced SFEW
2.0 was the benchmarking data used for EmotiW 2015 chal- 1) VIOLA-JONES FACE DETECTION ALGORITHM
lenge. It includes 958 images for train, 436 images for Viola-Jones is extensively used to perceive the face from an
validation, and 372 for test set labeled with 7 expression image. The training time of this algorithm quite long, but face
labels [7]. identification is fast. It needs the full front view of the face as

90502 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

FIGURE 5. The proposed taxonomy for facial sentiment analysis.

TABLE 5. A relative comparison of various state-of-the-art facial detection techniques.

an input image. It has four stages, such as haar-like features, • Integral Graph: As the dimension of the generated Haar
integral graphs, AdaBoost training, and cascading classifier. feature is large, a technique called integral map can be
• Haar-Like Features: Viola-Jones algorithm uses Haar- used to isolate the picture cells, such as 2D coordinates
like features, 1.e., a scalar product between the image of the gray-scale picture and the estimations of every
and some Haar-like templates [82]. As shown in FIG- pixel point [83]. The procedure to make an integral graph
URE 6, edge features, linear features, center features, is that each pixel is made equivalent to the total of all
and diagonal features are the four Haar features used pixels above and to one side of the concerned pixel.
in Viola-Jones face detection algorithm [83]. There are Henceforth, the total of all pixels in the rectangle shape
two regions, as shown in the figure, black shaded and is determined.
white shaded regions. The eigenvalue is calculated using • AdaBoost Training: This training algorithm is a weak
the difference between those two regions for linear fea- classifier and is made to learn multiple times to become
tures [83]. good. The two things that we need to consider for the
X X classification of an image. First, the locale of the eyes
eigenvalue(v) = − (1)
is darker than the district of the nose and the cheeks,
white black
X X whereas the other thing is, eyes should be darker than
eigenvalue(v) = −2 × , (2) nasal scaffold [75].
white black

VOLUME 8, 2020 90503


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

• The features are extracted from the acquired face and


four sub-images using Zero/Null Space Linear Discrim-
inant Analysis (NLDA).
• Then, the extracted features are evaluated by using dis-
criminant distance.
• Now new composite feature vectors are generated from
the discriminant values and then they are fed to a classi-
fier for face recognition.

2) PRINCIPAL COMPONENT ANALYSIS


The fundamental thought behind the PCA is that multi-
attribute data is projected onto a linear lower-dimensional
space. This subspace is known as the principal subspace. The
human face can be recognized using eigen face. Eigen space
(the basis of faces) is a set of eigen vectors (the covariance
network of the face space) is to classify the faces according
to their basis representation. Steps to create the eigen faces
are described as follows.
• Provide a training set of faces with pixel resolution of
FIGURE 6. Four different types of Haar-like feature representations [83].
w × h.
• Then the mean is calculated and subtracted from each
image in the matrix. Weight of the k th eigen face is
• Cascading Classifiers: We can classify the face or not wk = VkT (U − M ), // where Input image vector U ∈ Rn
a face in a single image using a single classifier, but (training set) and mean M.
the result of that might not reach our expectations. So, Then W = [w1 , w2 , . . . , wk , . . . , wn ]
in Viola-Jones, we use cascading classifiers to make this • Calculate Euclidean distance (D) of Wx and W .
classification accurate. Stages of the cascading classifier • After calculating Euclidean distance, the image is clas-
is shown in FIGURE 7. If one of the stages fails, then the sified as face or not a face.
image is not a face, but if it successfully reaches the last Islam et al. [84] used PCA to reduce the redundant features.
classifier, then the image is classified as a face and store They have used downsampling to eliminate the number of
it into the corresponding database. redundant features. Consider the size of an image as (m × n).
The size gets (40 × m × n) after filtering its size, but after
Viola-Jones is not capable of handling occlusion and rigid
downsampling its size reduced to (10 × m × n), i.e., the
objects. So the algorithm might generate false detection of
dimension is reduced by a factor of 4. Later, Luo et al. [85]
face. To overcome this issue, the authors in [83] proposed
used PCA to extract global features from an image that are
the modified Viola-Jones algorithm using composite fea-
important, but they can be environment-sensitive for facial
tures. The procedure to detect the face using the modified
expression. To overcome such issue, some local features are
Viola-Jones algorithm is as follows.
also selected using LBP.
• A rectangular frame of the face is determined using
Viola-Jones. 3) LINEAR DISCRIMINANT ANALYSIS
• The face in the rectangular frame is then calibrated and LDA is the same as PCA, which is used to reduce the
handled into four sorts of sub-images. dimensions of a given data. Like the eigenface in PCA, LDA

FIGURE 7. Cascading classifier.

90504 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 6. Comparison of various state-of-art feature extraction techniques.

uses fisher face (enhancement of eigenface) for reducing the number. But the basic LBP had a limitation with large-scale
dimensions of the features and the identification of face in an structures and highly sensitive to noise [100], [101].
image. Fisher’s face is usually used when the images have a It is also invariant to the rotations and size of the fea-
contrast in illumination. Various steps to create the fisher face tures, which are increasing exponentially with the increase in
is same as PCA and performs better than PCA [86]. neighbors.
A Uniform LBP was proposed that considers the U pattern,
B. FEATURE EXTRACTION which has at most 2 bitwise transitions from 0 to 1. So, various
This section discussed the various feature extraction tech- extensions of the LBP were proposed for the neighborhoods
niques and explained how they could be used in FER models. of any size. A circular neighborhood was proposed with any
We also compare the various Feature Extraction techniques number of pixels and radius. It is represented by the notation
and also analyzed various works done using the various tech- (P, R) where P means the number of sampling points on
niques, as shown in Table 6. a circle of radius R. There are various applications where
LBP is used, such as-texture analysis, face analysis, and
1) LOCAL BINARY PATTERN classification. It codifies the local primitives, including the
It is the recent texture descriptor that converts the value edges, corners, different spots, and flat areas. Nowadays, LBP
of the original pixel of the image with a decimal value converts the important pixels of an image into a histogram
and converts it into codes known as LBP codes [97]–[99]. which is known as the Histogram of Oriented Graph approach
Labels are formed by thresholding the 3 × 3 neighborhood (HoG), that stores the information of local-micro patterns of
with a central value and considers the result as a binary the faces.

VOLUME 8, 2020 90505


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

A modified algorithm for LBP was proposed by the authors • The first stage is to apply DCT on the image.
of [89]. The steps for generating the threshold are as follows. • The second step is the selection of co-efficients [106].
• The input preprocessed image is divided into 3 × 3 By applying DCT on an U × V image then a 2D U × V
blocks. co-efficient matrix is formed.
• For each block, calculate the minimum and maximum of √ M −1
block representing the pixel intensity value of the block. 2 X
Gx (0) = X (m) (6)
• Now, calculate the threshold value of block B by taking M
m=0
an average of both the minimum and maximum values.
M −1
• If any element of the block is greater than threshold, then 2 X (2m + 1)k5
Gx (k) = X (m) cos , (7)
write ‘1’ to it, else write ‘0’. M 2M
m=0
• The eight-bit pattern is converted to a decimal number,
where k = 1, 2, . . . , (M − 1) (8)
representing transformed block B.
where Gx (k) is the kth DCT co-efficient [107].
2) GABOR FILTER Jayalekshmi et al. [75] in their work have used DCT over
It extracts both time and frequency domains [102] of the fixed discrete sequences to convert the data into elementary
image. Means, it analyzes whether there is any particular fre- frequency components.
quency content in the image in a particular direction around
the point of analysis. The use of 2D Gabor Filter is made in 4) SCALE INVARIANT FEATURE TRANSFORM (SIFT)
the spatial domain. Gabor Filters is quite successful in FER It transforms the input image data into scale-invariant co-
models. Multi-resolution structures are applied to images ordinates relative to the local features and stored into a
which consist of multi-frequencies and multi orientations. database [108]. SIFT features are highly distinctive i.e., it
These structures relate Gabor Filters to wavelets [103]. can match a single feature with a large probability from
The filters having real and imaginary components rep- the database. SIFT also features scale and rotation invariant,
resents the orthogonal directions. The equations are shown which means that even if we scale or rotate the image, the fea-
below [102]: tures remain preserved. This is useful in FER when a rotated
" 02 02
!#
1 1 a b image comes as an input. If we rotate the image, then also
9ω,θ (a, b) = exp jωa0
 
exp − + 2 the features are maintained, and we can get those features
2π σa σb 2 σa 2 σb
efficiently [109]. The stages of the SIFT procedure are as
(3)
follows.
a0 = a cos θ + b sin θ (4)
• Scale-Space Extrema Detection: The first stage searches
b = −a sin θ + b cos θ
0
(5) all image locations and scales using the Gaussian
where (a,b) is the pixel position in the spatial domain, θ is the method.
orientation of Gabor filter, ω is the radial central frequency • Keypoint localization: It determines the location and
and σ is the standard deviation of the Gaussian Filter which scale at each point of the image.
means it controls the size of the Gabor Envelope. • Orientation Assignment: Rotation Invariance is per-
Liu et al. [87] proposed in his paper the local Gabor filter formed by assigning one or more dominant orientations
bank LG (m × n) which spreads all over. It also contains to each key point.
multi-scale information of features those having a global • Keypoint Descriptor: A descriptor is made to represent
filter or the image as well as it also reduces the redundancy in each keypoint, which supports the assigned orientation
eigen values. This reduces the time for extracting the features. in the preceding stage. It supports the histogram of the
[84] used a Gabor Filter bank with eight orientations and gradient within the image. The changes in illumination
five scales. The formed bank is used to filter the generated are scaled back by the descriptor to the key point. When
divided images 40 times each. This created a computational the keypoint descriptors are received, they are often used
burden from them, so they reduced the number of features by as a feature or keypoint for data to solve various prob-
dimension reduction techniques. Dimension reduction using lems. More detailed information on SIFT computation
PCA is explained under section PCA. are often found in [110], [111].
Kravets et al. [90] proposed P-SIFT (Parallel SIFT) algo-
3) DISCRETE COSINE TRANSFORM (DCT) rithm, which reduces the computation time and increases the
A finite sequence of data or feature points as the sum of processing speed. In P-SIFT, the problem is divided into sub-
cosine functions are oscillating at different frequencies is tasks, and multiple processors are used for feature extraction.
represented by DCT. It is a way to compressing the data/ The program reads the input image and generates the key
2D-image without losing its original meaning [104]. In such points. After that, it matches the key points with respect to
type of applications (data compression), an input of 8 × 8 each image in the database. Matching is done using Euclidean
size is used for DCT [105] for feature extraction. It has distance. The images that have a ratio of first least distance
two stages: to the second least distance is less than 0.8 are taken into

90506 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

consideration. Then the image is given to the classifier that


classifies the emotion [112].
A newly adapted method of SIFT to extract features from
an image called IntraFace was proposed in [91]. It uses
multi-directional warping of active visualization model and a
supervised descent model [113], which uses the SIFT feature
extraction technique for feature mapping and trains a method
to extract 49 points from the image. These points are used for
registering an average face, which is then termed as the face
region.

5) SPEEDED UP ROBUST FEATURES (SURF)


FIGURE 8. Traditional CNN architecture.
Speeded Robust Features is a type of local feature detector
as well as descriptor. It is inspired from SIFT and the authors
claim that it is faster than SIFT. Around the interesting point, with unsigned orientation. Reference [115] used a fusion of
a certain reproducible orientation is fixed based on infor- HoG and LBP features. Firstly, the HoG and LBP features
mation from a circular region. From this selected orienta- were extracted from the segmented parts. The final feature
tion, we make a squared region, and the SURF descriptor vector consists of both the features of HoG and LBP, which
is extracted [114]. The SURF has two parts (i) a detector had 1892 features out of which 1656 came from HoG while
and (ii) a descriptor. The location of the key points of the the rest came from LBP.
image is provided by the detector and the descriptor expresses
the features of those key points. SURF uses Hessian matrix C. EMOTION CLASSIFICATION
for the fast assessment of box filters. The integral images are
In this section, we discuss the various Classification tech-
expressed as
niques that are used to classify human face emotions. We also
q
p X
X presented a comparative analysis of various emotion classifi-
J (p, q) = I (l, m) (9) cation techniques, as shown in Table 7.
l=0 j=0

The Hessian matrix is represented as: 1) CONVOLUTIONAL NEURAL NETWORK (CNN)


CNN is most widely used architectures in computer vision
Oaa (M , σ ) Oab (M , σ )
 
H (X , σ ) = (10) techniques as well as in machine learning [130]. A massive
Oab (M , σ ) Oab (M , σ )
data is required for training purpose to harness its complex
The introduction of pyramid scale space is done in SURF functions solving ability to its fullest [131]. CNN uses con-
because of box filters. The descriptor makes use of the sum volution, min-max pooling, and fully connected as layers
of Haar wavelet features that increases the robustness and than the conventional fully connected deep neural network
decreases the computation time. For extraction, it constructs [53], [132], [133]. When all these layers are stacked together,
a square region around the keypoint and oriented along the the complete architecture is formed. The complete architec-
orientation decided by a method. Each region is split into 4×4 ture of CNN is shown in FIGURE 8.
sub-regions, which keeps the important spatial features. The • The input layer of CNN contains the image pixel values.
Haar wavelet computes at 5 × 5 sampled points [88]. • The Convolutional layer convolves the l × l kernels with
x feature maps of its preceding layer. If the next layer has
6) HISTOGRAM OF ORIENTED GRADIENTS (HOG) feature maps, then n×m convolutions are performed and
The features extracted by HoG are tough against photometric n × m × (w × h × l × l) Multiply-Accumulate (MAC)
and geometric deviations. Many applications use HoG as the operations are needed, where h and w represents the
feature extraction technique, such as human detection [110]. feature map height and width of the next layer [132]. The
The steps to calculate these features are: important function of Convolutional layer is to calculate
• In the first step, it divides the entire image into small the output of all the neurons which are connected to
cells. the input layer. The activation functions such as ReLu,
• Then, its direction and the magnitude is calculated. sigmoid, tanh etc. aim to apply element wise activation
• Calculate the Bin for each direction as well as magnitude and to add the non-linearity into the output of neuron
using HoG. • The pooling layer has the responsibility to achieve spa-
• The blocks from adjacent cells are calculated, and block tial invariance by minimizing the resolution of feature
normalization are created to calculate feature vector map. One feature map of the preceding CNN model
from it. layer is corresponding to the one pooling layer.
During implementation, 9 bins of histogram were calculated 1) Max Pooling: It has a function u(x,y) (i.e., window
using the cells of 8 × 8 pixels and blocks of 2 × 2 pixels function) to the input data, and only picks the most

VOLUME 8, 2020 90507


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

TABLE 7. Comparison of various state-of-art emotion classification techniques.

active feature in a pooling region [134]. The max Mostly, 2 × 2 pooling can be used without overlapping.
pooling function is as follows: This means that M in the above equation is always 2.
And the large pooling has M value as 4, 8, 16, which
aj = max (an×n
i u(n, n)) (11) always have a dependency on input image size. So,
N ×N
[136], in their paper, proposed a Multi-activation pool-
2) Average Pooling: It has a function u(x,y) (i.e., win- ing method in order to satisfy the need of a large pooling
dow function) to the input data, and selects the region. This method allows top-p activations to pass
average value for each input data on the preceding through the pooling rate. Here p indicates the total num-
layer feature map [135], [136] ber of picked activations. If p = M × M , then it means
×M
MX that each and every activation through the computation
1 contributes to the final output of neuron [136]. For the
acti = xj (12)
M ×M random pooling region Xi , we denote the nth-picked
J =1

90508 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

activation as actn d filters, which were then followed by activation function


  ReLu. They placed max-pooling layers after each of the 1st
n−1
X two Convolutional layers, and after the 3rd convolutional
actn = max Xi 2 actj  (13)
layer, they placed the quadrant pooling layer. Then after
j=1
convolutional layers, FC layer containing 300 hidden units
where the value of n ∈ [1,p]. The above pooling region followed by quadrant pooling was used. At last, the softmax
can be expressed as below where symbol 2 represents layer was used for classification.
the removal of elements from the assemblage. The sum- The idea of using zero-bias model was first introduced
mation character in Eq. 13 represents the set of elements in [141] for the fully-connected layers in the CNN model and
that contains top1 (n-1) activation, but not adding the later was extended in [142]. They implemented this model
activation values numerically. After having the top-p on CK+ and TFD datasets. Their model was successful in
activation value, we simply compute the average of each recognizing the emotions with the rate of 88.6% ± 1.5% on
value. Then, a hyper-parameter σ is taken as a constraint TFD with 7 classes and 95.1%±3.1% on CK+ with 8 classes.
factor which perform the multiplication of the top-p They used Data Augmentation combined with dropout to
activations [136]. The final output refers to boost the performance [120].
p Further, the increase in accuracy and recognition perfor-
mance of the computer vision algorithms researchers pro-
X
output = σ ∗ actj (14)
j=1
posed advanced and deeper CNN architectures. Taking into
consideration the work of Ding et al. [77] designed a new
Here, the summation symbol represents the addition technique named FaceNet2ExpNet to train the model. They
operation, where σ ∈ (0,1). Particularly,if σ = 1/p, used a fine-tuned face net and proposed a unique distribution
the output is the average value. The constraint factor, function to train neurons of expression net. To improve the
i.e., σ can be used to adjust the output values [136]. discriminativeness of features that were learned, they used the
• Fully connected (FC) layer is the last layer of CNN conventional network to design the ession net. The training
architecture. It is the most fundamental layer which is process was executed in the two levels. In 1st level, the Con-
widely used in traditional CNN models [137], [138]. volutional layers were given training using loss function, and
As it is the last layer, each node in it is directly connected the output of the last pooling layer was used for supervision.
to each and every node on both sides. As shown in In the 2nd stage, they added randomly initialized FC layer and
FIGURE 8, it can be noted that all the nodes in the then trained the network using labeled training data. The test-
last frame of the pooling layer are converted into a ing was done on constrained as well as unconstrained datasets
vector and then are connected to the first layer of the and achieved better results than the previous approaches [77].
fully-connected layer. There are many parameters used In 2018, Jadhav et al. [54] investigated the previous
with CNN and need more time for training [139], [140]. approaches and applied modifications to increase perfor-
The major limitation of FC layer, is that it contains a mance. They investigated three different popular networks
large number of parameters that need complex compu- that were quite successful in classifying emotions. The first
tational power for training purposes. Due to this, we try network was proposed by Krizhevsky and Hintion [143].
to reduce the number of connections and nodes in the FC The second network for investigation was inspired by
layer. The nodes and connections which are removed can AlexNet [138] Convolutional network and the last one was
be retrieved again by adding the new technique named proposed from work done by Gudi [144]. It consisted of
dropout technique. an input layer of 48 × 48, one Convolutional layer, nor-
In the past few years, CNN has emerged in Computer Vision, malization layer to reduce normalize dimensions, and then
including the field of facial sentiment analysis. Researchers a max-pooling layer. Then again, 2 Convolutional layers
have modified the traditional CNN for better performance. and then finally 1 FC layer, linked with the softmax layer.
A modified CNN was proposed by Mollahosseini et al. [91] To decrease the number of parameters, [54] applied one
in which an inception layer was introduced. Their network more max-pooling layer. They trained and tested the model
architecture consisted of two elements. Firstly, it had two on FER2013 and RaFD respectively and got better results
traditional CNN architectures containing the Convolutional than [144].
layer, followed by ReLu. Following these modules, they Cai et al. in [123], introduced a new island loss layer
added two inception layers which consists of 1 × 1, 3 × 3 in CNN to minimize intra-class variations. Their network
and 5 × 5 Convolutional layers with ReLu in parallel. includes 3 Convolutional layers, each followed by the PReLU
By slightly modifying the traditional CNN, and batch normalization (BN) layer. Pooling was used with
Khorrami et al. [120] developed the model based on a classic the two initial BN layers. After the third Convolutional layer,
feed-forward CNN. They introduced a modification in their 2 FC layers and island loss was calculated at the second FC
model by ignoring the biases of the Convolutional layers. The layer. And then, at last, the softmax layer was used. Their
network contained three convolutional layers having filters architecture was named IL-CNN. They also employed VGG-
of size 64, 128, and 256, respectively, with 5 × 5 sized 16 [145] network as their backbone network. This approach

VOLUME 8, 2020 90509


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

• It used a stochastic gradient descent method, which


caused trouble in mounting to enormous data and also
in learning NN with multiple layers. It is because the
gradients tended to decrease and also because of the
problem of ‘‘vanishing gradients’’ [121], [147].
• The last one was for real-time environments, CNNs for
human facial appearance was easily affected by various
parameters like age, gender, face length, hair, mustache,
and ethnicity [148]. Due to this, facial expressions had
overlapping features, which makes its implementation
difficult and complex [121], [149].

To overcome these barriers, ensemble learning was applied


because it helped in more accurate classification and pre-
diction by concatenating the outputs of the base learning
approaches. So, Dhankhar [124], [150] developed an Assem-
ble model by combining the state-of-the-art DL approaches
such as VGG16 and ResNet50. He attempted to obtain a
vector of weights from the second to last layers, which can
be treated as feature vectors, which represented the latent
representations of the input image. Then, he combined above
mentioned representations by joining the feature vectors, then
it can be taken as input to logistic regression models to
calculate the final emotion prediction [124]. After applying
transfer learning, he trained and tested his model on Karolin-
ska Directed Emotional Faces (KDEF) dataset and got better
accuracy compared to individual models.
Another Ensemble network was proposed by Wen et al.
[121] where they ensembled CNN with probability-based
fusion for FER [151]. For each ensemble method, the mul-
tiplicity and diversity in the classifiers is considered as a
major concern in achieving comparable performance [152].
The CNN architecture was implemented using ReLU and
multiple hidden layers (maxout layer) with random values for
them to overcome the problem of calculating the stochastic
FIGURE 9. A conventional VGG16 architecture. gradient descents. They used Softmax classifier at the last
in order to roughly calculate the possibility to test sample
to each class [121]. They trained and tested the model on
FER2013 and CK+ datasets, respectively and the accuracy
achieved good performance in comparison to the state-of-art
was 76.05%, which was better than other methods. Thus
methods.
it can be concluded that ECNN consistently outperformed
In 2012, Simonyan et al. [145] proposed VGG16 architec-
traditional CNN.
ture for object recognition and classification task. VGG16 has
Alessandro Renda et al. [125] proposed a feed-forward
replicative structure of convolution, ReLu and pooling layer.
CNN, which was inspired by Kim et al. [153], in which
The network architecture of VGG16 is shown in FIGURE 9.
three convolutional and max-pooling layers with 32, 32 and
The invention of Residual Networks has created a rev-
64 feature maps respectively were placed after input layer.
olution in the image recognition field. Residual Networks
They used an ensemble learning approach to increase perfor-
(ResNet) [146] is a classic network used as a backbone for the
mance. These max-pooling layers consist of an overlapping
many computer vision tasks. ResNet50 is the current state of
kernel with size 3 × 3 and stride of 2 × 2, which results in
art convolutional neural network, which has the identity map-
size halving. They added a dropout layer after FC hidden
ping capability. It introduced the concept of skip connections.
layer, having 0.15 as drop probability. A FC layer with almost
Traditional CNN was successful in FER to some extent, but
1024 neurons was proposed to yield 7 classes of emotion
there were various limitations too. Those limitations were:
in the FER2013 dataset. Their model has a network depth
• It requires a considerable skills and high experience in of 5 with 2,436,007 trainable parameters. They used ReLU
selecting the appropriate hyper-parameters values for (Non-linear function) as an activation function for both the
CNN [121]. convolutional and FC layers to remove non-linearity and the

90510 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

softmax function at the output layer. They used the batch The aim of SVM is to identify the maximum margin plane
normalization [154] with each convolutional layer as well as between the classes. The maximum margin plane can be
the FC layers. To preserve the data, they used zero-padding obtained from the maximum distance between the positive
in the convolutional layers. They achieved an accuracy of and the negative margin plane, respectively of the two classes.
approx. 72.249 % with the ensemble of 9 networks. The distance between the separating plane and the positive
To solve the problem of poor performance in real appli- margin plane should be equal on both sides.
cations caused because of the stored facial images that most To solve the problem of recognizing emotions from facial
of the time show expression not as a single emotion but expressions in a simple and speeded manner, Datta et al. [122]
represents a multiple emotions, Gan et al. [126] designed an presented a classification system that used the concatenation
approach using CNN and soft label which associates with of geometric as well as texture-based features to classify the
multiple emotions and expressions. They obtained the soft emotions using SVMs. They have used the hierarchical SVM
labels using constructor involving 2 step scheme: architecture to leverage the benefits of multi-class binary
1) The initial step is to prepare a CNN model with hard classification. CK+ dataset was used for classification. They
data labels for supervision and the softmax function for have achieved the significant enhancements in the accuracy
optimization. using hybrid SVM features compared to LBP features.
2) The second step is to fuse the possibility of prediction Nuno Lopes et al. [78] in 2018 given a classification model
to get soft labels from the pre-trained models [126]. for FER in the elderly and also present the differences of FER
Their architecture is similar to VGG16, however, the last FC in the elderly and other age people. They used the Support
layer is adjusted as C-way yields, where quantity C is the Vector Machine with a multi-class classification for classi-
number of emotion classes. fying the emotions [159]. They proposed two architectures,
Zadeh et al. [127] proposed a DL model having a CNN the first approach removes the wrinkles, nasolabial fold, and
layer and 2 Gabor Filters to classify different human sen- other facial features, using edge-preserving smoothing tech-
timent. This model uses a feature selection method called niques. While in the second architecture, they introduced an
Gabor Filter, which is commonly applied for texture outline. algorithm from API Microsoft, which detects the age of the
It returns where there is any texture change in the image. person. The lifespan dataset was used to train and test the
Then these features are fed to a CNN (Convolutional Neural multi-class SVM. They used 80% images to test the accuracy
Network) for the classification of human sentiment. Their of the SVM and 20% to test the accuracy of the application.
model has the following stages- Input Images, resize, 1st They got an accuracy of 95.24% in the young age group and
Gabor Filter, 2nd Gabor Filter, CNN layer, and classification accuracy of 90.32% in the elderly age group.
of sentiments. They tested their model on the JAFFE dataset. SVM is a linear classifier that can be applied for linearly
They also compared the dataset classification using simple separable data. But SVM can take high dimensional data as
CNN and its model (CNN with 2 Gabor Filters). They trained input also which most of the time is non-linear data. So a
them for 30 epoch and got an accuracy of 91.16% on simple mapping function is applied to the SVM training, which is
CNN and 97.16% on their model [127]. non-linear and converts the data into linearly separable but in
a higher dimension. This function is called a kernel function.
2) SUPPORT VECTOR MACHINE (SVM) There are various kernel functions, but [85], in his paper, used
SVM is a classifier [155] was designed for classifying out Radial Basis Kernel Function(RBF). They used one versus
of two classes. If the SVM has more than two classes, then one approach in this paper.
more than one SVMs is to be implemented. There are three Ibrahim Adeyanju et al. [118] proposed a method in which
methods by which we can implement SVM for more than two he used four SVM kernels to classify different emotions of
classes. faces. They used a Radial Basis, Polynomial, Linear, and
• One versus all: It was proposed in [156]. It constructs Quadratic functions as SVM kernels. They tested their model
k SVM models for training data having k number of on 467 training and 238 test sets to classify 7 emotions. They
classes. If there are three classes, then SVM can be got a maximum average accuracy of 86.4% on RBF kernel,
performed three times for every class [157]. 99.33% on Quadratic function, 97.65% on Polynomial, and
• One versus one: It was introduced in [156]. This method 97.86% on Linear.
constructs k(k − 1)/2 classifiers, where two classes at
a time are taken to train the model. In this method, 3) ARTIFICIAL NEURAL NETWORK (ANN)
SVM is performed between every class that is to be ANN is inspired by the biological neural networks that consti-
classified [157]. tute the brain [160]. Our brain consists of millions of neurons
• Directed Acyclic Graph SVM (DAGSVM): It was pro- that form a neural network. These neurons are interconnected
posed in [158]. Its training phase is similar to one-vs-one with each other and process the signals to/from the brain to
method. The testing phase makes use of a rooted binary the other parts of our body [161]. This type of link is called
DAG having at the most k(k−1)/2 internal nodes and the synapses. There are approximately 100 billion neurons and
maximum k leaves. An advantage of using a DAGSVM are interconnected by thousands or more synapses. In ANN,
is that is to generalize the analysis [158]. the signal is a real or binary number and the output of these

VOLUME 8, 2020 90511


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

4) DEEP BELIEF NETWORK (DBN)


A probabilistic and unsupervised DL algorithm which
comprises of numerous stochastic dormant factors. These
Idle factors are likewise called as feature locators. It is a
hybrid graphical model that has two undirected upper layers,
while lower layers have directed connections [162]. DBN
consists stack of Restricted Boltzmann Machine(RBM) or
Auto-encoders. They represent a data vector. The two most
FIGURE 10. W.S McCulloch and W. Pitts proposed a single neuron model important properties of DBN are:
as a mathematical model for an artificial neuron [160].
• It utilizes layer by layer learning approach that decides
how the loads rely on the layer above it, a top-down
approach.
neurons is obtained by some activation function of the sum
• A single bottom-up pass layer which begins with
of all inputs to that neuron [160]. The connection between
observed data vector and furthermore that utilizations
two neurons/nodes is called an edge. This edge has a weight,
loads in separate layers give the estimation of latent
which describes how intensive the signal is. Normally, neu-
variables [163].
rons are aggregated into layers. The first and last layer
is the input and output layer, respectively. The in-between Deep Belief Network is pre-trained using the Greedy
layers are called the hidden layers. A simple perceptron algorithm. In greedy algorithms, we train each layer,
model given by W.S McCulloch and W. Pitts is shown in in turn, in unsupervised learning. The multi-layer DBN is
the FIGURE 10. divided into various RBMs, which are learned sequentially.
  Pre-training is done for better optimization. Fine Tuning is
n
X done because the features are modified so that we can get the
net =  wj xj −u (15)
category boundaries right [164].
j=1
Lui et al. [116] proposed a novel model named Boosted
y = θ (net) (16) Deep Belief Network to implement FER with performance
1 enhancement. Their Framework has three main contributions.
θ(x) = (17)
1 + exp (−x) First, they build the model, which consists of three-stage
training of feature learning, feature selection, and classifier
where n inputs are given to the neuron x1, x2, . . . , xn,
construction. Secondly, their proposed work facilitated the
weights are assigned to edges as w1, w2, . . . , wn, θ is
part-based representation and not the whole facial region as
the step function before calculating the output O. The pos-
input, which is highly suitable for expression analysis. At last,
itive weights correspond to excitatory synapses and nega-
they proposed a discriminative DL framework where multiple
tive as inhibitory synapses. The activation function is the
DBNs are integrated and a boosting technique is also applied.
sigmoid function. This model does not possess the actual
They used an experimental dataset named CK-DB prepared
behavior of biological neurons. Talele et al. [119] pro-
from the first and last three frames of the famous CK+ dataset
posed a model using the General Regression Neural Net-
with a total of 1308 images. The accuracy of the model was
work (GRNN) based on ANN to classify emotions from
found to be 96.7%.
the image. The proposed model has the following features-
Kurup [128] used five layers. The input layer has two
Input layer goes about as feed to the subsequent layers. The
nodes. All classes are represented as 4-bit codes. All other
pattern layer decides the Euclidean distance and activation
layers have 3,3 and 4 nodes and have a sigmoid activation
function. The summation layer comprises of the numerator
function. An unsupervised approach is used to train the first
and the denominator part took care of by the output layer.
layer using contrastive divergence(CD) and then a softmax
The fundamental principle on which the system works is the
activation function is applied. At the end of the DBN, a fine-
joint likelihood estimate of the input and the output as given
tuning backpropagation procedure was applied. RBMs are
below
trained layer by layer and each RBM was trained individually
1 five times. 5 times k fold cross-validation was used and the
f (x, y) = d+1
(2π) 2 training data was divided into 5 groups. With this model, they
1X (x − xi )r (x − xi ) + (y − yi )2 got an accuracy of 98.57% on the CK+ dataset while 98.75%
× exp (18) on the MMI database.
n 2σ 2
For the FER, Yadan Lv et al. [117] had proposed an
where n is the number of watched tests, σ is the spread approach via DL. Unconventional training of component
parameter, xi is the ith training vector, xi is the corresponding detectors was done with DBN and was adjusted by logistic
yield esteem. The physical interpretation of the likelihood regression. After that, the parsed component features, includ-
estimate is that it assigns test likelihood of width σ for each ing eyes, mouth are concentrated for expression recognition.
input and output test. The main contributions of their work are, they were the first

90512 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

The results of their model were better than most of the other
strategies.
FIGURE 12 shows the comparative analysis of the accu-
racies of various state-of-the-art approaches on different
datasets.

VI. OPEN ISSUES AND RESEARCH CHALLENGES


FER has been an active research area in recent years. There
are various works that have shown tremendous results and
classified the emotions accurately. Yet there are various chal-
lenges and issues which are faced during facial sentiment
analysis. In this section, we will discuss about various issues
FIGURE 11. A typical RNN architecture. and challenges faced by FER. We analyzed various survey
papers and understood the issues.
to use only facial components to recognize emotion, treated
every single feature of parsed component equal and parse the A. OCCLUSION AND OCCLUDED DATA COLLECTION
face via DBN so that the images need not to be pre-processed It is the major obstacle that comes in the way of automatic
before extracting features. In simple words, their approach at facial expression. Most of the current works are on the
first detected the face, and then the nose, eyes, and mouth are JAFFE, CK+ datasets without occlusion, and also with artifi-
used for expression recognition. Emotion classification was cially occluded faces. There is a lack of datasets that include
done by a stacked autoencoder classifier. natural facial occlusion. So, there should be a creation of
databases that has occlusion, which is a time-consuming and
5) RECURRENT NEURAL NETWORK (RNN) difficult task to do. Datasets should be prepared by deciding
They are an exciting twist to basic neural networks. RNNs what or where the face should be occluded. Certain crucial
can take a series of inputs with no initial limit on it. They parts of the image should not have an occluded region. The
remember the past and make decisions based on past learning. effective training and testing of the occluded dataset still
RNNs remember the prior inputs while generating outputs. remain a big challenge [4].
RNNs take at least one input vectors that produce output It is still a challenging task to collect spontaneous expres-
vectors impacted by hidden state vectors dependent on earlier sions under occlusion. The everyday human emotions such
sources of input and output, as shown in FIGURE 11. They as happiness, surprise, and sadness can be easily evoked, but
provide a smart method for managing the sequential data that emotions such as curiosity, attentiveness are still difficult to
gives co-relations between data points, which are close in the evoke, particularly under occlusion. Therefore several strate-
sequence [165]. The information captured by the RNN relies gies should be considered which induce emotions that are
upon the structure and training algorithm it implements. precise and contextual dependent [26]. These strategies might
[166] in his work used RNN by assuming the eucledian metric bring challenges in implementation on selection and limit the
which Arecords the distance between two frame sequences. types of occluded data collected.
They used RNN with one hidden layer consisting of 150 uni- After the collection of the occluded dataset, it’s effective
directional Long-Short Term Memory fully interconnected training, and testing remains a big issue. The occluded region,
cells. They give 1500 frame vectors to the input layer. the level of occlusion, the type of occlusion, the compo-
They trained the RNN with Adam Optimizer with learning nents, materials that are present in the occluded region pose
rate = 0.0001. a challenge for the FER System. One way to counter this
Zhang et al. [36] proposed a PHRNN (Part-based Hier- challenge is to use raw pixels of the occluded region, but
archical bidirectional Recurrent Neural Network) to classify enough information on the specific features of the image may
the different facial emotions. Their model extracts facial not be recorded. Reliably determining the special parameters
features using PHRNN, which extracts the temporal fea- such as type, materials, components, location is a critical
tures and used a multi-signal CNN (MSCNN) to extract component of FER.
facial emotion features from the still frames. Their model, There are many ways to detect occlusion in an image.
Spatial-Temporal Network, mainly consists of PHRNN and Still, the most current feature extraction techniques extract
MSCNN. Their model has the following stages of interest- features from the face directly without passing through a
PHRNN, MSCNN, and Model Fusion. They tested their pre-processing layer of occlusion detection. Huang et al.
model on Oulu-CASIA, MMI and CK+. Their model per- [167] in his work, showed that the accuracy is improved if
forms well on Oulu-CASIA, MMI and CK+ and diminishes we include a pre-processing layer for occlusion detection.
the error rates from the early trial. They also compared differ-
ent models- CNN, MSCNN, PHRNN-MSCNN (without sort- B. DATASETS IN FER
ing item), and PHRNN-MSCNN (with sorting item) with the The other challenge in FER is the lack of proper training
assessed accuracy of 93.4, 95.7, 96.7, and 98.5 respectively. dataset in terms of both quality and quantity. The dataset

VOLUME 8, 2020 90513


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

FIGURE 12. Accuracy of proposed models on different datasets vs year.

should include images of people of all age groups as different visualization-based rule of FER and also figures out which
age groups exhibit emotions differently. There are datasets part of the face has more discerning information. Its results
that have images of a particular age group, but no dataset indicate the activations of filters with strong correlation to the
has a mixture of all the age groups [7]. This dataset, if devel- face mark regions which correspond to a particular Action
oped, would assist in developing research on cross-age, cross- Unit. In 2016, Mousavi et al. [173] used the concept of
culture, and cross-gender. visualization techniques and proposed a new visualization
technique LIPNet.
C. FER ON 3D DATA
The current research mainly focuses on 2D FER data, which G. OTHER ISSUES
faces challenges to illumination factors and pose variations Various other issues have risen based on the prototypi-
[168]. 3D face shape models are naturally robust to pose vari- cal expression categories, namely real versus fake emotion
ations and illumination factors. [169] in his work, proposed recognition challenge and complementary emotion recogni-
CNN without facial landmark detection, which estimates tion problem. Also, the apps for real-time FER is still a chal-
expression coefficients from image intensities. Recently, lenging task [174]. Many DL techniques have been applied
many works have been proposed which combines both 2D regarding the above problems.
and 3D data to improve the accuracy.
VII. CONCLUSION
D. DIFFERENT MODALITIES IN FER This paper presents a detailed systematic survey to analyze
Facial Expression is only modality that can be used to recog- current state-of-the-art approaches for facial emotion recog-
nize human behavior. The combination of other patterns like nition in static images and various parameters that influ-
infrared images, capturing the information of 3D models, and ence the results of these approaches. We have developed a
physiological data is trending research area due to large com- taxonomy based on different methods used for face detec-
plementary expressions. Reference [170] employed various tion, feature extraction, and emotion classification. Various
multi-modal affect recognition techniques. facial expression databases used as input for the FER are
discussed. We have reviewed previous works on this field
E. FER ON INFRARED DATA and concluded that much of the work had been done in this
At present, the gray-scale and RGB colors are at the trend field. We have compared various detection, extraction, and
in the deep FER, but are more vulnerable to light effects. classification approaches and concluded that which approach
However, the infrared images records the emotions produced is more prominent in achieving better performance in avail-
by skin distribution which are not subtle to the illumination able computation power. By discussing current issues and
variations. In 2017, Wu et al. [171] given a 3D CNN archi- research challenges in the future, we concluded that there is
tecture to fuse spatial and temporal features in FER images. still much research needed in this field, such as FER in 3D
face shape models, recognizing emotion in images under
F. VISUALIZATION TECHNIQUES occlusion, etc. Real-time FER is still a challenging task. In the
Adding visualization techniques [172] over the CNN model future, we would like to survey the FER problem in videos
results in a quantitative analysis of how it contributes to the using more advanced DL techniques.

90514 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

REFERENCES [25] A. Fathima and K. Vaidehi, ‘‘Review on facial expression recognition


[1] A. Kumari, S. Tanwar, S. Tyagi, and N. Kumar, ‘‘Fog computing for system using machine learning techniques,’’ in Advances in Decision
healthcare 4.0 environment: Opportunities and challenges,’’ Comput. Sciences, Image Processing, Security and Computer Vision. Berlin, Ger-
Electr. Eng., vol. 72, pp. 1–13, Nov. 2018. many: Springer, 2020, pp. 608–618.
[2] J. Hathaliya, P. Sharma, S. Tanwar, and R. Gupta, ‘‘Blockchain-based [26] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, ‘‘A survey of affect
remote patient monitoring in healthcare 4.0,’’ in Proc. IEEE 9th Int. Conf. recognition methods: Audio, visual, and spontaneous expressions,’’ IEEE
Adv. Comput. (IACC), Dec. 2019, pp. 87–91. Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39–58, Jan. 2009.
[3] J. Vora, P. DevMurari, S. Tanwar, S. Tyagi, N. Kumar, and M. S. Obaidat, [27] E. Sariyanidi, H. Gunes, and A. Cavallaro, ‘‘Automatic analysis of
‘‘Blind signatures based secured E-Healthcare system,’’ in Proc. Int. facial affect: A survey of registration, representation, and recognition,’’
Conf. Comput., Inf. Telecommun. Syst. (CITS), Jul. 2018, pp. 1–5. IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 6, pp. 1113–1133,
[4] L. Zhang, B. Verma, D. Tjondronegoro, and V. Chandran, ‘‘Facial expres- Jun. 2015.
sion analysis under partial occlusion: A survey,’’ ACM Comput. Surv., [28] P. Bhattacharya, S. Tanwar, U. Bodke, S. Tyagi, and N. Kumar,
vol. 51, Apr. 2018. ‘‘BinDaaS: Blockchain-based deep-learning as-a-Service in health-
[5] D. Matsumoto, ‘‘More evidence for the universality of a contempt expres- care 4.0 applications,’’ IEEE Trans. Netw. Sci. Eng., early access,
sion,’’ Motivat. Emotion, vol. 16, no. 4, pp. 363–368, Dec. 1992. Dec. 25, 2019, doi: 10.1109/TNSE.2019.2961932.
[6] T. Amano, ‘‘Coded facial expression,’’ in Proc. SIGGRAPH ASIA Emerg. [29] J. Vora, D. Vekaria, S. Tanwar, and S. Tyagi, ‘‘Machine learning-based
Technol., New York, NY, USA, 2016, pp. 1–2. voltage dip measurement of smart energy meter,’’ in Proc. 5th Int. Conf.
[7] S. Li and W. Deng, ‘‘Deep facial expression recognition: A survey,’’ 2018, Parallel, Distrib. Grid Comput. (PDGC), Dec. 2018, pp. 828–832.
arXiv:1804.08348. [Online]. Available: [Link] [30] J. N. Bassili, ‘‘Facial motion in the perception of faces and of emotional
[8] T. M. Abhishree, J. Latha, K. Manikantan, and S. Ramachandran, ‘‘Face expression.,’’ J. Experim. Psychol., Hum. Perception Perform., vol. 4,
recognition using Gabor filter based feature extraction with anisotropic no. 3, pp. 373–379, 1978.
diffusion as a pre-processing technique,’’ Procedia Comput. Sci., vol. 45, [31] C. Padgett and G. W. Cottrell, ‘‘Representing face images for emotion
pp. 312–321, Jan. 2015. classification,’’ in Proc. NIPS, 1996, pp. 894–900.
[9] R. Gupta, S. Tanwar, S. Tyagi, and N. Kumar, ‘‘Tactile Internet and its [32] G. Guo, S. Z. Li, and K. Chan, ‘‘Face recognition by support vector
applications in 5G era: A comprehensive review,’’ Int. J. Commun. Syst., machines,’’ in Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit.,
vol. 32, no. 14, Sep. 2019, Art. no. e3981. Mar. 2000, pp. 196–201.
[10] R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, M. S. Obaidat, and B. Sadoun, [33] M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, ‘‘Subject independent
‘‘HaBiTs: Blockchain-based telesurgery framework for healthcare 4.0,’’ facial expression recognition with robust face detection using a convo-
in Proc. Int. Conf. Comput., Inf. Telecommun. Syst. (CITS), Aug. 2019, lutional neural network,’’ Neural Netw., vol. 16, nos. 5–6, pp. 555–559,
pp. 1–5. Jun. 2003.
[11] J. Vora, A. Nayyar, S. Tanwar, S. Tyagi, N. Kumar, M. S. Obaidat, [34] I. Kotsia and I. Pitas, ‘‘Facial expression recognition in image sequences
and J. J. P. C. Rodrigues, ‘‘BHEEM: A blockchain-based framework for using geometric deformation features and support vector machines,’’
securing electronic health records,’’ in Proc. IEEE Globecom Workshops IEEE Trans. Image Process., vol. 16, no. 1, pp. 172–187, Jan. 2007.
(GC Wkshps), Dec. 2018, pp. 1–6. [35] S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic, and C. Pal,
[12] J. J. Hathaliya, S. Tanwar, S. Tyagi, and N. Kumar, ‘‘Securing electronics ‘‘Recurrent neural networks for emotion recognition in video,’’ in Proc.
healthcare records in healthcare 4.0 : A biometric-based approach,’’ ACM Int. Conf. Multimodal Interact. ICMI, 2015, pp. 467–474.
Comput. Electr. Eng., vol. 76, pp. 398–410, Jun. 2019. [36] K. Zhang, Y. Huang, Y. Du, and L. Wang, ‘‘Facial expression recogni-
[13] S. Tanwar, K. Parekh, and R. Evans, ‘‘Blockchain-based electronic health- tion based on deep evolutional spatial-temporal networks,’’ IEEE Trans.
care record system for healthcare 4.0 applications,’’ J. Inf. Secur. Appl., Image Process., vol. 26, no. 9, pp. 4193–4203, Sep. 2017.
vol. 50, Feb. 2020, Art. no. 102407. [37] W. Zheng, X. Zhou, C. Zou, and L. Zhao, ‘‘Facial expression recogni-
[14] R. Gupta, S. Tanwar, F. Al-Turjman, P. Italiya, A. Nauman, and tion using kernel canonical correlation analysis (KCCA),’’ IEEE Trans.
S. W. Kim, ‘‘Smart contract privacy protection using AI in cyber- Neural Netw., vol. 17, no. 1, pp. 233–238, Jan. 2006.
physical systems: Tools, techniques and challenges,’’ IEEE Access, vol. 8,
[38] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews,
pp. 24746–24772, 2020.
‘‘The extended cohn-kanade dataset (CK+): A complete dataset for action
[15] G. Hemalatha and C. P. Sumathi, C. P, ‘‘A study of techniques for facial
unit and emotion-specified expression,’’ in Proc. IEEE Comput. Soc.
detection and expression classification,’’ Int. J. Comput. Sci. Eng. Surv.,
Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 94–101.
vol. 5, no. 2, pp. 27–37, Apr. 2014.
[39] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, ‘‘Web-based database
[16] D. Deodhare, Facial Expressions to Emotions: A Study of Computational
for facial expression analysis,’’ in Proc. IEEE Int. Conf. Multimedia Expo,
Paradigms for Facial Emotion Recognition, New Delhi, India: Springer,
Jul. 2005, p. 5.
2015, pp. 173–198.
[17] U. Asad, N. Kashyap, and S. N. Singh, ‘‘Recent advancements in facial [40] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietikäinen, ‘‘Facial
expression recognition systems: A survey,’’ in Proc. Int. Conf. Comput., expression recognition from near-infrared videos,’’ Image Vis. Comput.,
Commun. Autom. (ICCCA), May 2017, pp. 1203–1208. vol. 29, no. 9, pp. 607–619, Aug. 2011.
[18] A. Baskar and T. G. Kumar, ‘‘Facial expression classification using [41] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, ‘‘Multi-PIE,’’
machine learning approach: A review,’’ in Data Engineering and Intel- Image Vis. Comput., vol. 28, no. 5, pp. 807–813, May 2010.
ligent Computing. Singapore: Springer, 2018, pp. 337–345. [42] O. Langner, R. Dotsch, G. Bijlstra, D. H. J. Wigboldus, S. T. Hawk,
[19] K. Chengeta and S. Viriri, ‘‘Facial expression recognition: A survey on and A. van Knippenberg, ‘‘Presentation and validation of the radboud
local binary and local directional patterns,’’ in Proc. Int. Conf. Comput. faces database,’’ Cognition Emotion, vol. 24, no. 8, pp. 1377–1388,
Collective Intell. Cham, Switzerland: Springer, 2018, pp. 513–522. Dec. 2010.
[20] G. Rajeswari and P. IthayaRani, ‘‘Literature survey on facial expression [43] N. Aifanti, C. Papachristou, and A. Delopoulos, ‘‘The mug facial expres-
recognition techniques,’’ in Proc. 3rd Int. Conf. Commun. Electron. Syst. sion database,’’ in Proc. 11th Int. Workshop Image Anal. Multimedia
(ICCES), Oct. 2018, pp. 137–142. Interact. Services WIAMIS, Apr. 2010, pp. 1–4.
[21] B. Martinez, M. F. Valstar, B. Jiang, and M. Pantic, ‘‘Automatic analysis [44] J. M. Susskind, A. K. Anderson, and G. E. Hinton, ‘‘The toronto face
of facial actions: A survey,’’ IEEE Trans. Affect. Comput., vol. 10, no. 3, database,’’ Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada,
pp. 325–347, Jul. 2019. Tech. Rep, 2010, vol. 3.
[22] S. Bhattacharya and M. Gupta, ‘‘A survey on: Facial emotion recognition [45] I. J. Goodfellow et al., ‘‘Challenges in representation learning: A report
invariant to pose, illumination and age,’’ in Proc. 2nd Int. Conf. Adv. on three machine learning contests,’’ Neural Netw., vol. 64, pp. 59–63,
Comput. Commun. Paradigms (ICACCP), Feb. 2019, pp. 1–6. Apr. 2015.
[23] A. S. Vyas, H. B. Prajapati, and V. K. Dabhi, ‘‘Survey on face expression [46] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of
recognition using CNN,’’ in Proc. 5th Int. Conf. Adv. Comput. Commun. simple features,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Syst. (ICACCS), Mar. 2019, pp. 102–106. Recognit. CVPR, Dec. 2001, p. 1.
[24] S. Li and W. Deng, ‘‘Deep facial expression recognition: A survey,’’ [47] B. Xiao, ‘‘Principal component analysis for feature extraction of image
IEEE Trans. Affect. Comput., early access, Mar. 17, 2020, doi: 10.1109/ sequence,’’ in Proc. Int. Conf. Comput. Commun. Technol. Agricult. Eng.,
TAFFC.2020.2981446. Jun. 2010, pp. 250–253.

VOLUME 8, 2020 90515


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

[48] S. Tanwar, T. Ramani, and S. Tyagi, ‘‘Dimensionality reduction using [76] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, ‘‘A convolutional neural
PCA and SVD in big data: A comparative case study,’’ in Future Internet network cascade for face detection,’’ in Proc. IEEE Conf. Comput. Vis.
Technologies and Trends, Z. Patel and S. Gupta, eds. Cham, Switzerland: Pattern Recognit. (CVPR), Jun. 2015, pp. 5325–5334.
Springer, 2018, pp. 116–125. [77] H. Ding, S. K. Zhou, and R. Chellappa, ‘‘FaceNet2ExpNet: Regular-
[49] G. Kumar and P. K. Bhatia, ‘‘A detailed review of feature extraction izing a deep face recognition net for expression recognition,’’ in Proc.
in image processing systems,’’ in Proc. 4th Int. Conf. Adv. Comput. 12th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG ), May 2017,
Commun. Technol., Feb. 2014, pp. 5–12. pp. 118–126.
[50] N. Janu, S. Kumar, and P. Mathur, ‘‘Performance analysis of feature [78] N. Lopes, A. Silva, S. R. Khanal, A. Reis, J. Barroso, V. Filipe, and
extraction techniques for facial expression recognition,’’ Int. J. Comput. J. Sampaio, ‘‘Facial emotion recognition in the elderly using a SVM clas-
Appl., vol. 166, no. 1, pp. 1–3, 2017. sifier,’’ in Proc. 2nd Int. Conf. Technol. Innov. Sports, Health Wellbeing
[51] K. Cho and S. M. Dunn, ‘‘Learning shape classes,’’ IEEE Trans. Pattern (TISHW), Thessaloniki, Greece, Jun. 2018, pp. 1–5.
Anal. Mach. Intell., vol. 16, no. 9, pp. 882–888, Sep. 1994. [79] M. N. Chaudhari, M. Deshmukh, G. Ramrakhiani, and R. Parvatikar,
[52] K.-C. Song, Y.-H. Yan, W.-H. Chen, and X. Zhang, ‘‘Research and ‘‘Face detection using viola jones algorithm and neural networks,’’ in
perspective on local binary pattern,’’ Acta Automatica Sinica, vol. 39, Proc. 4th Int. Conf. Comput. Commun. Control Autom. (ICCUBEA),
no. 6, pp. 730–744, Mar. 2014. Aug. 2018, pp. 1–6.
[53] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn- [80] N. B. Kar, K. S. Babu, A. K. Sangaiah, and S. Bakshi, ‘‘Face expression
ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, recognition system based on ripplet transform type II and least square
pp. 2278–2324, Nov. 1998. SVM,’’ Multimedia Tools Appl., vol. 78, no. 4, pp. 4789–4812, Feb. 2019.
[81] H. M. Shah, A. Dinesh, and T. S. Sharmila, ‘‘Analysis of facial landmark
[54] R. S. Jadhav and P. Ghadekar, ‘‘Content based facial emotion recogni-
features to determine the best subset for finding face orientation,’’ in Proc.
tion model using machine learning algorithm,’’ in Proc. Int. Conf. Adv.
Int. Conf. Comput. Intell. Data Sci. (ICCIDS), Feb. 2019, pp. 1–4.
Comput. Telecommun. (ICACAT), Dec. 2018, pp. 1–5.
[82] Y.-Q. Wang, ‘‘An analysis of the viola-jones face detection algorithm,’’
[55] B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey, and Image Process. Line, vol. 4, pp. 128–148, Jun. 2014.
S. Linkman, ‘‘Systematic literature reviews in software engineering—A [83] W.-Y. Lu and M. Yang, ‘‘Face detection based on viola-jones algorithm
systematic literature review,’’ Inf. Softw. Technol., vol. 51, no. 1, pp. 7–15, applying composite features,’’ in Proc. Int. Conf. Robots Intell. Syst.
Jan. 2009. (ICRIS), Jun. 2019, pp. 82–85.
[56] P. Mehta, R. Gupta, and S. Tanwar, ‘‘Blockchain envisioned UAV net- [84] B. Islam, F. Mahmud, and A. Hossain, ‘‘Facial expression region segmen-
works: Challenges, solutions, and comparisons,’’ Comput. Commun., tation based approach to emotion recognition using 2D Gabor filter and
vol. 151, pp. 518–538, Feb. 2020. multiclass support vector machine,’’ in Proc. 21st Int. Conf. Comput. Inf.
[57] B. Kitchenham and S. Charters, ‘‘Guidelines for performing systematic Technol. (ICCIT), Dec. 2018, pp. 1–6.
literature reviews in software engineering,’’ School Comput. Sci. Math., [85] Y. Luo, C.-M. Wu, and Y. Zhang, ‘‘Facial expression recognition based on
Keele Univ., Keele, U.K., Tech. Rep. EBSE-2007-01, 2007. fusion feature of PCA and LBP with SVM,’’ Optik Int. J. Light Electron
[58] Japanese Female Facial Expressions (JAFFE). Accessed: 1998. [Online]. Opt., vol. 124, no. 17, pp. 2767–2770, Sep. 2013.
Available: [Link] [86] Carnap, Hilbert, Ackermann, Russell, and Whitehead, ‘‘A logical calculus
[59] Extended Cohn-Kanade (CK+). Accessed: 2008. [Online]. Available: of the ideas immanent in nervous activity,’’ Tech. Rep., Jan. 1970.
[Link] [87] S.-S. Liu and Y.-T. Tian, ‘‘Facial expression recognition method based on
[60] Mmi. Accessed: 2005. [Online]. Available: [Link] Gabor wavelet features and fractional power polynomial kernel PCA,’’ in
[61] Oulu-Casia. Accessed: Nov. 17, 2011. [Online]. Available: [Link] Advances in Neural Networks—ISNN, L. Zhang, B.-L. Lu, and J. Kwok,
[Link]/CMV/Downloads/Oulu-CASIA/ eds. Berlin, Germany: Springer, 2010, pp. 144–151.
[62] Multi-Pie. Accessed: Oct. 2009. [Online]. Available: [Link] [88] H.-F. Huang and S.-C. Tai, ‘‘Facial expression recognition using new
edu/afs/cs/project/PIE/MultiPie/Multi-Pie/[Link] feature extraction algorithm,’’ ELCVIA Electron. Lett. Comput. Vis. Image
[63] Multimedia Understanding Group (MUG). Accessed: Apr. 2010. Anal., vol. 11, no. 1, p. 41, 2012.
[Online]. Available: [Link] [89] S. Biswas and J. Sil, ‘‘Facial expression recognition using modified local
[64] Toronto Faces Dataset (TFD). Accessed: Apr. 2005. [Online]. Available: binary pattern,’’ in Computational Intelligence in Data Mining, vol. 2,
[Link] L. C. Jain, H. S. Behera, J. K. Mandal, and D. P. Mohapatra, eds.
[65] Radbound Faces Database (RAFD). Accessed: 2011. [Online]. Available: New Delhi, India: Springer, 2015, pp. 595–604.
[Link] [90] S. Chickerur, T. Reddy, and O. Shabalina, ‘‘Parallel scale invariant fea-
[66] Fer-2013. Accessed: 2013. [Online]. Available: [Link] ture transform based approach for facial expression recognition,’’ in
c/challenges-in-representation-learning-facial-expression-recognition- Creativity in Intelligent Technologies and Data Science, A. Kravets,
challenge/data M. Shcherbakov, M. Kultsova, and O. Shabalina, eds. Cham, Switzerland:
[67] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, ‘‘Static facial expression Springer, 2015, pp. 621–636.
[91] A. Mollahosseini, D. Chan, and M. H. Mahoor, ‘‘Going deeper in facial
analysis in tough conditions: Data, evaluation protocol and benchmark,’’
expression recognition using deep neural networks,’’ in Proc. IEEE Win-
in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCV Workshops),
ter Conf. Appl. Comput. Vis. (WACV), Lake Placid, NY, USA, Mar. 2016,
Nov. 2011, pp. 2106–2112.
pp. 1–10.
[68] Sfew (Emotiw). Accessed: 2012. [Online]. Available: [Link]
[92] N. Mehta and S. Jadhav, ‘‘Facial emotion recognition using log Gabor
[Link]
filter and PCA,’’ in Proc. Int. Conf. Comput. Commun. Control Autom.
[69] A. Mollahosseini, B. Hasani, and M. H. Mahoor, ‘‘AffectNet: A database
(ICCUBEA), Aug. 2016, pp. 1–5.
for facial expression, valence, and arousal computing in the wild,’’ IEEE [93] M. Sajjad, A. Shah, Z. Jan, S. I. Shah, S. W. Baik, and I. Mehmood,
Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, Jan. 2019. ‘‘Facial appearance and texture feature-based robust facial expression
[70] Affectnet. Accessed: 2017. [Online]. Available: [Link] recognition framework for sentiment knowledge discovery,’’ Cluster
com/affectnet/ Comput., vol. 21, no. 1, pp. 549–567, Mar. 2018.
[71] R. Kosti, J. Alvarez, A. Recasens, and A. Lapedriza, ‘‘Context based [94] A. Srivastava, S. Mane, A. Shah, N. Shrivastava, and B. Thakare, ‘‘A sur-
emotion recognition using EMOTIC dataset,’’ IEEE Trans. Pattern vey of face detection algorithms,’’ in Proc. Int. Conf. Inventive Syst.
Anal. Mach. Intell., early access, May 14, 2019, doi: 10.1109/TPAMI. Control (ICISC), Jan. 2017, pp. 1–4.
2019.2916866. [95] R. Ravi, S. V. Yadhukrishna, and R. Prithviraj, ‘‘A face expression recog-
[72] Context is Important to Recognize Emotions. Accessed: 2020. [Online]. nition using CNN & LBP,’’ in Proc. 4th Int. Conf. Comput. Methodologies
Available: [Link] Commun. (ICCMC), Mar. 2020, pp. 684–689.
[73] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, ‘‘Coding facial [96] A. L. A. Ramos, B. G. Dadiz, and A. B. G. Santos, ‘‘Classifying emotion
expressions with Gabor wavelets,’’ in Proc. 3rd IEEE Int. Conf. Autom. based on facial expression analysis using Gabor filter: A basis for adaptive
Face Gesture Recognit., Apr. 1998, pp. 200–205. effective teaching strategy,’’ in Computational Science and Technology.
[74] M. F. Valstar and M. Pantic, ‘‘Induced disgust, happiness and surprise: Singapore: Springer, 2020, pp. 469–479.
An addition to the MMI facial expression database,’’ Tech. Rep., 2010. [97] T. Ojala, M. Pietikainen, and T. Maenpaa, ‘‘Multiresolution gray-scale
[75] J. Jayalekshmi and T. Mathew, ‘‘Facial expression recognition and emo- and rotation invariant texture classification with local binary patterns,’’
tion classification system for sentiment analysis,’’ in Proc. Int. Conf. IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,
Netw. Adv. Comput. Technol. (NetACT), Jul. 2017, pp. 1–8. Jul. 2002.

90516 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

[98] V. Takala, T. Ahonen, and M. Pietikäinen, ‘‘Block-based methods for [123] J. Cai, Z. Meng, A. S. Khan, Z. Li, J. OReilly, and Y. Tong, ‘‘Island loss for
image retrieval using local binary patterns,’’ in Image Analysis, H. Kalvi- learning discriminative features in facial expression recognition,’’ in Proc.
ainen, J. Parkkinen, and A. Kaarna, eds. Berlin, Germany: Springer, 2005, 13th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG ), May 2018,
pp. 882–891. pp. 302–309.
[99] T. Ahonen, A. Hadid, and M. Pietikainen, ‘‘Face description with local [124] P. Dhankhar, ‘‘Resnet-50 and VGG-16 for recognizing facial emotions,’’
binary patterns: Application to face recognition,’’ IEEE Trans. Pattern Int. J. Innov. Eng. Technol., vol. 13, no. 4, pp. 126–130, 2019.
Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006. [125] A. Renda, M. Barsacchi, A. Bechini, and F. Marcelloni, ‘‘Com-
[100] M. Guo, X. Hou, Y. Ma, and X. Wu, ‘‘Facial expression recognition using paring ensemble strategies for deep learning: An application to
ELBP based on covariance matrix transform in KLT,’’ Multimedia Tools facial expression recognition,’’ Expert Syst. Appl., vol. 136, pp. 1–11,
Appl., vol. 76, no. 2, pp. 2995–3010, Jan. 2017. Dec. 2019.
[101] D. Huang, C. Shan, M. Ardabilian, Y. Wang, and L. Chen, ‘‘Local binary [126] Y. Gan, J. Chen, and L. Xu, ‘‘Facial expression recognition boosted by
patterns and its application to facial image analysis: A survey,’’ IEEE soft label with a diverse ensemble,’’ Pattern Recognit. Lett., vol. 125,
Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 41, no. 6, pp. 765–781, pp. 105–112, Jul. 2019.
Nov. 2011. [127] M. M. Taghi Zadeh, M. Imani, and B. Majidi, ‘‘Fast facial emo-
[102] K. Verma and A. Khunteta, ‘‘Facial expression recognition using Gabor tion recognition using convolutional neural networks and Gabor fil-
filter and multi-layer artificial neural network,’’ in Proc. Int. Conf. Inf., ters,’’ in Proc. 5th Conf. Knowl. Based Eng. Innov. (KBEI), Feb. 2019,
Commun., Instrum. Control (ICICIC), Aug. 2017, pp. 1–5. pp. 577–581.
[103] J. Ilonen, J. Kämäräinen, and H. Kälviäinen, ‘‘Efficient computation of [128] A. Rajendra Kurup, M. Ajith, and M. Martínez Ramón, ‘‘Semi-supervised
Gabor,’’ Dept. Inf. Technol., Lappeenranta Univ. Technol., Lappeenranta, facial expression recognition using reduced spatial features and deep
Finland, Res. Rep. 100. belief networks,’’ Neurocomputing, vol. 367, pp. 188–197, Nov. 2019.
[104] A. B. Watson, ‘‘Image compression using the discrete cosine transform,’’ [129] E. Pranav, S. Kamal, C. Satheesh Chandran, and M. H. Supriya, ‘‘Facial
Math. J., vol. 4, no. 1, p. 81, 1994. emotion recognition using deep convolutional neural network,’’ in Proc.
[105] E. Feig and S. Winograd, ‘‘Fast algorithms for the discrete cosine trans- 6th Int. Conf. Adv. Comput. Commun. Syst. (ICACCS), Mar. 2020,
form,’’ IEEE Trans. Signal Process., vol. 40, no. 9, pp. 2174–2193, pp. 317–320.
Sep. 1992. [130] R. Gupta, S. Tanwar, S. Tyagi, and N. Kumar, ‘‘Machine learning mod-
[106] S. Dabbaghchian, A. Aghagolzadeh, and M. S. Moin, ‘‘Feature extraction els for secure data analytics: A taxonomy and threat model,’’ Comput.
using discrete cosine transform for face recognition,’’ in Proc. 9th Int. Commun., vol. 153, pp. 406–440, Mar. 2020.
Symp. Signal Process. Appl., Feb. 2007, pp. 1–4. [131] M. Mathieu, M. Henaff, and Y. LeCun, ‘‘Fast training of convolutional
[107] N. Ahmed, T. Natarajan, and K. R. Rao, ‘‘Discrete cosine transform,’’ networks through ffts,’’ 2013, arXiv:1312.5851. [Online]. Available:
IEEE Trans. Comput., vol. 100, no. 1, pp. 90–93, 1974. [Link]
[108] D. G. Lowe, ‘‘Distinctive image features from scale-invariant keypoints,’’ [132] S. Anwar, K. Hwang, and W. Sung, ‘‘Structured pruning of deep convolu-
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. tional neural networks,’’ ACM J. Emerg. Technol. Comput. Syst., vol. 13,
[109] D. G. Lowe, ‘‘Object recognition from local scale-invariant features,’’ in no. 3, pp. 1–18, May 2017.
Proc. 7th IEEE Int. Conf. Comput. Vis., Sep. 1999, pp. 1150–1157. [133] D. Mungra, A. Agrawal, P. Sharma, S. Tanwar, and M. S. Obaidat,
[110] N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for human ‘‘PRATIT: A CNN-based emotion recognition system using histogram
detection,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern equalization and data augmentation,’’ Multimedia Tools Appl., vol. 79,
Recognit. (CVPR), Jun. 2005, pp. 886–893. nos. 3–4, pp. 2285–2307, Jan. 2020.
[111] T. Nguyen, E.-A. Park, J. Han, D.-C. Park, and S.-Y. Min, ‘‘Object detec- [134] D. Scherer, A. C. Müller, and S. Behnke, ‘‘Evaluation of pooling oper-
tion using scale invariant feature transform,’’ in Genetic and Evolutionary ations in convolutional architectures for object recognition,’’ in Proc.
Computing. Cham, Switzerland: Springer, 2014, pp. 65–72. ICANN, 2010, pp. 92–101.
[112] S. Tanwar, J. Vora, S. Kaneriya, S. Tyagi, N. Kumar, V. Sharma, and [135] M. D. Zeiler and R. Fergus, ‘‘Stochastic pooling for regularization of
I. You, ‘‘Human arthritis analysis in fog computing environment using deep convolutional neural networks,’’ 2013, arXiv:1301.3557. [Online].
Bayesian network classifier and thread protocol,’’ IEEE Consum. Elec- Available: [Link]
tron. Mag., vol. 9, no. 1, pp. 88–94, Jan. 2020. [136] Q. Zhao, S. Lyu, B. Zhang, and W. Feng, ‘‘Multiactivation pooling
[113] X. Xiong and F. De la Torre, ‘‘Supervised descent method and its appli- method in convolutional neural networks for image recognition,’’ Wire-
cations to face alignment,’’ in Proc. IEEE Conf. Comput. Vis. Pattern less Commun. Mobile Comput., vol. 2018, Jun. 2018, Art. no. 8196906.
Recognit., Jun. 2013, pp. 532–539. [137] C.-L. Zhang, J.-H. Luo, X.-S. Wei, and J. Wu, ‘‘In defense of fully
[114] H. Bay, T. Tuytelaars, and L. Van Gool, ‘‘Surf: Speeded up robust fea- connected layers in visual representation transfer,’’ in Proc. PCM, 2017,
tures,’’ in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2006, pp. 807–817.
pp. 404–417. [138] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classifica-
[115] B. Islam, F. Mahmud, A. Hossain, P. B. Goala, and M. S. Mia, ‘‘A facial tion with deep convolutional neural networks,’’ in Proc. NIPS, 2012,
region segmentation based approach to recognize human emotion using pp. 1097–1105.
fusion of HOG & LBP features and artificial neural network,’’ in Proc. [139] S. Albawi, T. A. Mohammed, and S. Al-Zawi, ‘‘Understanding of a
4th Int. Conf. Electr. Eng. Inf. Commun. Technol. (iCEEiCT), Sep. 2018, convolutional neural network,’’ in Proc. Int. Conf. Eng. Technol. (ICET),
pp. 642–646. Aug. 2017, pp. 1–6.
[116] P. Liu, S. Han, Z. Meng, and Y. Tong, ‘‘Facial expression recognition via [140] Y. Sun, W. Zhang, H. Gu, C. Liu, S. Hong, W. Xu, J. Yang, and
a boosted deep belief network,’’ in Proc. IEEE Conf. Comput. Vis. Pattern G. Gui, ‘‘Convolutional neural network based models for improving
Recognit., Jun. 2014, pp. 1805–1812. super-resolution imaging,’’ IEEE Access, vol. 7, pp. 43042–43051, 2019.
[117] Y. Lv, Z. Feng, and C. Xu, ‘‘Facial expression recognition via deep [141] R. Memisevic, K. R. Konda, and D. Krueger, ‘‘Zero-bias autoen-
learning,’’ in Proc. Int. Conf. Smart Comput., Nov. 2014, pp. 303–308. coders and the benefits of co-adapting features,’’ 2014, arXiv:1402.3337.
[118] I. A. Adeyanju, E. O. Omidiora, and O. F. Oyedokun, ‘‘Performance [Online]. Available: [Link]
evaluation of different support vector machine kernels for face emotion [142] T. L. Paine, P. Khorrami, W. Han, and T. S. Huang, ‘‘An analy-
recognition,’’ in Proc. SAI Intell. Syst. Conf. (IntelliSys), Nov. 2015, sis of unsupervised pre-training in light of recent advances,’’ 2014,
pp. 804–806. arXiv:1412.6597. [Online]. Available: [Link]
[119] K. Talele, A. Shirsat, T. Uplenchwar, and K. Tuckley, ‘‘Facial expression [143] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
recognition using general regression neural network,’’ in Proc. IEEE with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6,
Bombay Sect. Symp. (IBSS), Dec. 2016, pp. 1–6. pp. 84–90, May 2017.
[120] P. Khorrami, T. L. Paine, and T. S. Huang, ‘‘Do deep neural networks [144] A. Gudi, ‘‘Recognizing semantic features in faces using deep learn-
learn facial action units when doing expression recognition?’’ in Proc. ing,’’ 2015, arXiv:1512.00743. [Online]. Available: [Link]
IEEE Int. Conf. Comput. Vis. Workshop (ICCVW), Dec. 2015, pp. 19–27. 1512.00743
[121] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, ‘‘Ensemble of [145] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks
deep neural networks with probability-based fusion for facial expression for large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online].
recognition,’’ Cognit. Comput., vol. 9, no. 5, pp. 597–610, Oct. 2017. Available: [Link]
[122] S. Datta, D. Sen, and R. Balasubramanian, ‘‘Integrating geometric and [146] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for
textural features for facial emotion classification using SVM frame- image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
works,’’ in Proc. CVIP, 2016, pp. 619–628. (CVPR), Jun. 2016, pp. 770–778.

VOLUME 8, 2020 90517


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

[147] D. Eigen, J. T. Rolfe, R. Fergus, and Y. LeCun, ‘‘Understanding [169] F.-J. Chang, A. Tuan Tran, T. Hassner, I. Masi, R. Nevatia, and
deep architectures using a recursive convolutional network,’’ 2013, G. Medioni, ‘‘ExpNet: Landmark-free, deep, 3D facial expressions,’’
arXiv:1312.1847. [Online]. Available: [Link] in Proc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG),
[148] G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin, ‘‘Static and dynamic May 2018, pp. 122–129.
3D facial expression recognition: A comprehensive survey,’’ Image Vis. [170] F. Ringeval, M. Pantic, B. Schuller, M. Valstar, J. Gratch, R. Cowie,
Comput., vol. 30, no. 10, pp. 683–697, Oct. 2012. S. Scherer, S. Mozgai, N. Cummins, and M. Schmitt, ‘‘AVEC 2017:
[149] S. Wan and J. K. Aggarwal, ‘‘Spontaneous facial expression recognition: Real-life depression, and affect recognition workshop and challenge,’’ in
A robust metric learning approach,’’ Pattern Recognit., vol. 47, no. 5, Proc. 7th Annu. Workshop Audio/Visual Emotion Challenge AVEC, 2017,
pp. 1859–1868, May 2014. pp. 3–9.
[150] P. Thakkar, K. Varma, V. Ukani, S. Mankad, and S. Tanwar, ‘‘Com- [171] Z. Wu, T. Chen, Y. Chen, Z. Zhang, and G. Liu, ‘‘NIRExpNet: Three-
bining user-based and item-based collaborative filtering using machine stream 3D convolutional neural network for near infrared facial expres-
learning,’’ in Information and Communication Technology for Intelligent sion recognition,’’ Appl. Sci., vol. 7, no. 11, p. 1184, 2017.
Systems, S. C. Satapathy and A. Joshi, eds. Singapore: Springer, 2019, [172] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolu-
pp. 173–180. tional networks,’’ in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland:
[151] S. Kaneriya, S. Tanwar, S. Buddhadev, J. P. Verma, S. Tyagi, N. Kumar, Springer, 2014, pp. 818–833.
and S. Misra, ‘‘A range-based approach for long-term forecast of weather [173] N. Mousavi, H. Siqueira, P. Barros, B. Fernandes, and S. Wermter,
using probabilistic Markov model,’’ in Proc. IEEE Int. Conf. Commun. ‘‘Understanding how deep neural networks learn face expressions,’’
Workshops (ICC Workshops), May 2018, pp. 1–6. in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016,
[152] R. Lysiak, M. Kurzynski, and T. Woloszynski, ‘‘Optimal selection of pp. 227–234.
ensemble classifiers using measures of competence and diversity of base [174] I. Song, H.-J. Kim, and P. B. Jeon, ‘‘Deep learning for real-time robust
classifiers,’’ Neurocomputing, vol. 126, pp. 29–35, Feb. 2014. facial expression recognition on a smartphone,’’ in Proc. IEEE Int. Conf.
[153] B.-K. Kim, S.-Y. Dong, J. Roh, G. Kim, and S.-Y. Lee, ‘‘Fusing aligned Consum. Electron. (ICCE), Jan. 2014, pp. 564–567.
and non-aligned face information for automatic affect recognition in the
wild: A deep learning approach,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. Workshops (CVPRW), Jun. 2016, pp. 1499–1508.
[154] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep
network training by reducing internal covariate shift,’’ 2015,
arXiv:1502.03167. [Online]. Available: [Link]
03167
[155] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, KEYUR PATEL is currently pursuing the bache-
USA: Springer, 2013. lor’s degree with Nirma University, Ahmedabad,
[156] S. Knerr, L. Personnaz, and G. Dreyfus, ‘‘Single-layer learning revisited: India. His research interests include computer
A stepwise procedure for building and training a neural network,’’ in vision, natural language processing, energy-based
Neurocomputing (NATO ASI Series), vol. F68, F. F. Soulié and J. Hérault, models, and reinforcement learning.
Eds. Berlin, Germany: Springer-Verlag, 1990, pp. 41–50.
[157] C.-W. Hsu and C.-J. Lin, ‘‘A comparison of methods for multiclass
support vector machines,’’ IEEE Trans. Neural Netw., vol. 13, no. 2,
pp. 415–425, Mar. 2002.
[158] J. C. Platt, N. Cristianini, and J. Shawe-Taylor, ‘‘Large margin dags for
multiclass classification,’’ in Proc. NIPS, 1999, pp. 547–553.
[159] S. Tanwar, Q. Bhatia, P. Patel, A. Kumari, P. K. Singh, and W.-C. Hong,
‘‘Machine learning adoption in blockchain-based smart applications: The
challenges, and a way forward,’’ IEEE Access, vol. 8, pp. 474–488, 2020.
[160] W. S. McCulloch and W. Pitts, ‘‘A logical calculus of the ideas immanent
in nervous activity,’’ Bull. Math. Biol., vol. 52, nos. 1–2, pp. 99–115,
Jan. 1990. DEV MEHTA is currently pursuing the bachelor’s
[161] R. Gupta, S. Tanwar, S. Tyagi, and N. Kumar, ‘‘Tactile-Internet-Based degree with Nirma University, Ahmedabad, India.
telesurgery system for healthcare 4.0: An architecture, research chal- His research interests are machine learning, com-
lenges, and future directions,’’ IEEE Netw., vol. 33, no. 6, pp. 22–29, puter vision, and natural language processing.
Nov. 2019.
[162] H. Vachhani, M. S. Obiadat, A. Thakkar, V. Shah, R. Sojitra, J. Bhatia,
and S. Tanwar, ‘‘Machine learning based stock market analysis: A short
survey,’’ in Innovative Data Communication Technologies and Applica-
tion, J. S. Raj, A. Bashar, and S. R. J. Ramson, eds. Cham, Switzerland:
Springer, 2020, pp. 12–26.
[163] G. Hinton, ‘‘Deep belief networks,’’ Scholarpedia, vol. 4, no. 5, p. 5947,
2009.
[164] G. E. Hinton, S. Osindero, and Y.-W. Teh, ‘‘A fast learning algorithm
for deep belief nets,’’ Neural Comput., vol. 18, no. 7, pp. 1527–1554,
Jul. 2006.
[165] M. Schuster and K. K. Paliwal, ‘‘Bidirectional recurrent neural net-
works,’’ IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681,
Nov. 1997. CHINMAY MISTRY is currently pursuing the
[166] A. Mostafa, M. I. Khalil, and H. Abbas, ‘‘Emotion recognition by facial bachelor’s degree with Nirma University, Ahmed-
features using recurrent neural networks,’’ in Proc. 13th Int. Conf. Com- abad, India. His research interests are machine
put. Eng. Syst. (ICCES), Dec. 2018, pp. 417–422. learning, computer vision, and natural language
[167] X. Huang, G. Zhao, W. Zheng, and M. Pietikäinen, ‘‘Towards a dynamic processing.
expression recognition system under facial occlusion,’’ Pattern Recognit.
Lett., vol. 33, no. 16, pp. 2181–2191, Dec. 2012.
[168] M. Pantie and L. J. M. Rothkrantz, ‘‘Automatic analysis of facial expres-
sions: The state of the art,’’ IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 22, no. 12, pp. 1424–1445, Dec. 2000.

90518 VOLUME 8, 2020


K. Patel et al.: Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges

RAJESH GUPTA (Student Member, IEEE) NEERAJ KUMAR (Senior Member, IEEE)
received the B.E. degree from the University of received the Ph.D. degree in CSE from Shri Mata
Jammu, India, in 2008 and the [Link]. degree Vaishno Devi University, Katra, India. He was
from Shri Mata Vaishno Devi University, Jammu, a Postdoctoral Research Fellow with Coven-
India, in 2013. He is a full-time Ph.D. Research try University, Coventry, U.K. He is currently
Scholar with Computer Science and Engineer- a Full Professor with the Department of Com-
ing Department, Nirma University, Ahmedabad, puter Science and Engineering, Thapar University,
India. He has authored/coauthored 13 publications Patiala, India. He is also a Visiting Professor
(including seven articles in SCI indexed jour- at Coventry University. He has published more
nals and six articles in IEEE ComSoc sponsored than 300 technical research articles in leading
international conferences). Some of his research findings are published in journals and conferences from IEEE, Elsevier, Springer, John Wiley, and
top-cited journals, such as the IEEE NETWORKS, Computer Communications, so on. Some of his research findings are published in top-cited jour-
Computer and Electrical Engineering (Elsevier), and the International nals, such as the IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, the IEEE
Journal of Communication System (Wiley). His research interests include TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, the IEEE TRANSACTIONS
network security, blockchain technology, 5G communication networks, and ON INTELLIGENT TRANSPORTATION, the IEEE TRANSACTIONS ON CLOUD COMPUTING,
machine learning. He is a recipient of the Doctoral Scholarship from the the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, the IEEE
Ministry of Electronics and Information Technology, Govt. of India, under TRANSACTIONS ON VEHICULAR TECHNOLOGY, the IEEE TRANSACTIONS ON
the Visvesvaraya Ph.D. Scheme. CONSUMER ELECTRONICS, the IEEE Network, the IEEE Communications Mag-
azine, the IEEE WIRELESS COMMUNICATIONS, the IEEE INTERNET OF THINGS
JOURNAL, the IEEE SYSTEMS JOURNAL, Future Generation Computing Sys-
tems, the Journal of Network and Computer Applications, and Computer
Communications. He has guided many Ph.D. and M.E./[Link]. students.
His research was supported by funding from Tata Consultancy Service,
SUDEEP TANWAR (Member, IEEE) received
the Council of Scientific and Industrial Research (CSIR), and the Department
the [Link]. degree from Kurukshetra Univer-
of Science and Technology. He was awarded the Best Research Paper
sity, India, in 2002, the [Link]. degree (Hons.)
Awards from IEEE ICC 2018 and IEEE SYSTEMS JOURNAL 2018. He is
from Guru Gobind Singh Indraprastha University,
leading the research group Sustainable Practices for Internet of Energy and
Delhi, India, in 2009, and the Ph.D. degree with
Security (SPINES) where group members are working on the latest cutting
specialization in wireless sensor network, in 2016.
edge technologies. He is a TPC member and a reviewer of many international
He is an Associate Professor with Computer Sci-
conferences across the globe.
ence and Engineering Department, Institute of
Technology, Nirma University, Ahmedabad, India.
He is a Visiting Professor with Jan Wyzykowski AMOUN ALAZAB (Senior Member, IEEE)
University, Polkowice, Poland and the University of Pitesti, Pitesti, Romania. received the Ph.D. degree in computer science
He has authored or coauthored more than 130 technical research articles from the School of Science, Information Tech-
published in leading journals and conferences from the IEEE, Elsevier, nology and Engineering, Federation University of
Springer, Wiley, and so on. Some of his research findings are published in Australia. He is a Cyber Security Researcher and
top-cited journals, such as the IEEE TRANSACTIONS ON NETWORK SCIENCE AND a Practitioner with industry and academic experi-
ENGINEERING, the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, the IEEE ence. He is an Associate Professor with the Col-
TRANSACTIONS ON INDUSTRIAL INFORMATICS, Computer Communication, Applied lege of Engineering, IT and Environment, Charles
Soft Computing, the Journal of Network and Computer Application, Perva- Darwin University, Australia. His research is mul-
sive and Mobile Computing, the International Journal of Communication tidisciplinary that focuses on cyber security and
System, Telecommunication System, Computer and Electrical Engineering, digital forensics of computer systems with a focus on cybercrime detec-
and the IEEE SYSTEMS JOURNAL. He has also published six edited/authored tion and prevention. He has more than 150 research articles in many
books with international/national publishers, such as IET and Springer. international journals and conferences, such as the IEEE TRANSACTIONS ON
He has guided many students leading to M.E./[Link]. and guiding students INDUSTRIAL INFORMATICS, the IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS,
leading to Ph.D. His current interests include wireless sensor networks, the IEEE TRANSACTIONS ON BIG DATA, the IEEE TRANSACTIONS ON VEHICULAR
fog computing, smart grid, the IoT, and blockchain technology. He was TECHNOLOGY, COMPUTERS AND SECURITY, and Future Generation Comput-
invited as a Guest Editor/Editorial Board Member of many international ing Systems. He delivered many invited and keynote speeches, 24 events
journals, invited for keynote speaker in many international conferences held in 2019 alone. He convened and chaired more than 50 conferences and work-
in Asia and invited as the program chair, the publications chair, the publicity shops. He works closely with government and industry on many projects,
chair, and the session chair in many international conferences held in North including the Northern Territory (NT) Department of Information and Cor-
America, Europe, Asia, and Africa. He has been awarded the Best Research porate Services, IBM, Trend Micro, the Australian Federal Police (AFP),
Paper Awards from the IEEE GLOBECOM 2018, IEEE ICC 2019, and Westpac, and the Attorney Generals Department. He is the Founding Chair
Springer ICRIC-2019. He is an Associate Editor of IJCS (Wiley) and Security of the IEEE Northern Territory (NT) Subsection.
and Privacy (Wiley).

VOLUME 8, 2020 90519

You might also like