0% found this document useful (0 votes)
84 views6 pages

FYP Technical Paper Hafiz

This document discusses a study investigating facial expression recognition under noisy environments using 2D-Empirical Mode Decomposition. The study aims to minimize the effects of noise when extracting facial expressions. It involves detecting faces, applying Radon transform to convert images to 1D signals, decomposing the signals into intrinsic mode functions using EMD to remove noise, reducing the dimension of the data using PCA, and classifying facial expressions using k-NN classifier. The method is tested on the JAFFE database and results show it enables classifying facial expressions with less noise impact compared to other methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views6 pages

FYP Technical Paper Hafiz

This document discusses a study investigating facial expression recognition under noisy environments using 2D-Empirical Mode Decomposition. The study aims to minimize the effects of noise when extracting facial expressions. It involves detecting faces, applying Radon transform to convert images to 1D signals, decomposing the signals into intrinsic mode functions using EMD to remove noise, reducing the dimension of the data using PCA, and classifying facial expressions using k-NN classifier. The method is tested on the JAFFE database and results show it enables classifying facial expressions with less noise impact compared to other methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

INVESTIGATION ON FACIAL EXPRESSION

RECOGNITION UNDER NOISY


ENVIRONMENT BY USING 2D-EMPIRICAL
MODE DECOMPOSITION
Hafizuddin Tarmizi
School of Mechatronic Engineering
UniMAP, Perlis, Malaysia

[email protected]

Abstract—Performing a method to remove noise from the


image are indeed one of the great task for the researchers. Many approaches of analysing facial expression have
Noise exhibit on an image during capturing or transmitting been proposed in the literature. Some of them utilise the
could degrade its quality. Before applying any image
application of Human Amygdala Activity; differentiation in
processing tools to an image, noise removal task is done at the
early of the process. Existing algorithm are based on
amygdala activity on comprehension of neutral face when
researcher assumptions, both are excellent or poor. Depending stimulated and normal subject, Facial Action Coding System
on the type of noise such as Gaussian, Poisson, Salt and Pepper (FACS); based framework intended to recognize
and Speckle present in the image, noise filtering algorithm is unpretentious changes in facial components by comparing
performed. In order to solve this problem, this paper aims to FACS criteria CMU-Pittsburgh AU-Coded Face Expression
examine facial emotion recognition under noisy environment Database [2], that pictures of a subject's outward
using empirical mode decomposition (EMD). EMD is a appearances characterize a smooth complex in the high
multiresolution technique which is adaptively decomposed dimensional image space field. Such a complex
nonstationary and nonlinear data into a small set of frequency
representation can bound together structure to outward
component known as intrinsic mode functions (IMFs). First,
the image is subjected to radon transform to convert into 1D appearance analysis, graph-preserving sparse nonnegative
projection signal followed by EMD algorithm. Then, the matrix factorization (GSNMF).
extracted IMFs features are subjected to a dimension reduction
technique, namely Principle Component Analysis (PCA). The Even though facial expression has achieved a significant
reduced feature vector is used as input to k-Nearest Neighbor level of success, however to extract facial features of facial
(k-NN) classifier for recognizing seven facial emotions. A series emotion is still challenging task nowadays. Largely
of experiments has been conducted on JAFFE database. The variation of facial expression, unpredictable environmental
experimental results show that facial emotion recognition condition such as highly contaminated noise, partial
under noisy environment using EMD technique enables to
minimize the effect of the noise in classifying the facial
occlusion, subtle changes in muscular movement at face
emotion, thus demonstrates promising results. region end up producing result that vary and very complex
to analysed.
Keywords—Facial emotion recognition; noisy environment;
empirical mode decomposition; PCA; k-Nearest Neighbor So this project aimed to investigate the facial expression
recognition under highly corrupted environment using
I. INTRODUCTION Empirical Mode Decomposition (EMD). Basically EMD
used to detect seven facial emotions, this method will
Upon this continuously developing era, face recognition decompose non-stationary and non-linear data into several
still pose a threat in the presence of facial expression. set of small frequency known as intrinsic mode function
Despite the success of some of these systems in constrained (IMFs) while minimize the effect of noise when extracting
scenarios, the general task of facial expression still poses a facial expression. By applying Principal Component
number of challenges with respect to changes in Analysis (PCA) to minimize dimension of data space,
contaminated noise, facial expression, occlusion and pose features extraction process are reduced in term of time. The
that vary in complexity, intensity and meaning. Facial last procedure is using k-Nearest Neighbors (k-NN)
expression as stated by Darwin is a universal facial classifier to classify facial expression with training data
expression of emotion with a common evolutionary origin. acquired. Coordination of this two powerful tools previously
Later Tomkins develop a linkage between facial muscular give a significant good result respectively and expected to
movements and experienced affect for eight primary affect give the same result in this project.
states. Ekman et al. continue the development of Tomkins
by classifying the primary six affect state expression in
literate and preliterate culture and found the similarities
across culture [1].
II. MATERIALS AND METHODS process which the data from corrupted image accumulated
in form of intrinsic mode function are extracted.
Fig. 1 shows the block diagram of the proposed method.
It consists of face detection of face region, radon transform, D. Empirical Mode Decomposition: EMD
empirical mode decomposition, feature reduction and facial Empirical Mode Decomposition (EMD) [4], was early
emotion classification. The working process of each block is developed by Huang et al. for analysing data from
described in the subsequent sections. nonstationary and nonlinear processes. The major advantage
of the EMD is that the derivation of the basis function is
within the signal. In this manner, the examination is
versatile rather than the conventional techniques where the
essential capacities are settled. The EMD is based on the
sequences of extraction toward energy accumulated with
Facial Expression
Features
Extraction Using
Feature
Reduction Using
Facial Emotion
Classification
various intrinsic time scales of the signal in ascending
Noisy Image
Recognition
2D-Empirical
Mode
Principal
Component
Using k-Nearest
Neighbor
frequency mode. The method was expected to produce equal
Decomposition Analysis(PCA)
number of signal with overall addition of the intrinsic mode
functions (IMFs).

Let the given signal s ( t ) be the contribution to the sifting


procedure. The signal si , k ( t )characterizes a part of the
Figure 1: Scope on this project sifting procedure, which for the main emphasis is
si , k ( t )=s(t ). The filtering procedure can be abridged as
A. Noisy Image
takes after:
During this preliminary step, image taken are full of
noise. This implicate a problem in industry that use a lot of 1. In the first place, decide the local maxima and
image processing field. To analyse the problems, an image
minima of si , k ( t ).
database from JAFFE are imitated with different noise
intensity to resemble as noise-corrupted image. Then, the
data accumulated will be used for training and evaluate for 2. Insert between the local maxima and minima with a
classification accuracy [3]. specific end goal to acquire an upper envelope, Eu (t )

B. Facial Expression Database and lower envelope, El ( t )individually.

3. Find the mean envelope, m i ,k ( t )=( E u ( t ) + El ( t ) )/2

4. Remove the subtle elements (residue), so that the


following segments of sifting procedure are,
si , ,k+1 ( t )=s i ,k ( t )−mi ,k ( t ).

5. Repeat steps 1 to 4 on the residual until the detail


signal, r 1 ( t ) , r 2 ( t ) , … r k ( t )can be viewed as an IMF
Angry Disgust Fear Happy Neutral Sad Surprise
(that fulfils the two imperatives): Then, c 1 (t)=r 1 (t),
Figure 2: Example of images from JAFFE database where IMF1 ¿ c 1

A popular and free to use dataset, the Japanese Female 6. Repeat steps 1 to 5 on the residual
Facial Expression (JAFFE) [2] is used for the experiments to r i ( t ) =s i ,k (t )−c i (t)with a specific end goal to get
evaluate the effectiveness of the propose method. JAFFE
dataset contains 213 images of seven prototype facial every one of the IMFs c 1 (t ) , c 2 ( t ) , … c n (t ) of the
expressions (6 basic facial expressions + 1 neutral), posed signal.
by 10 Japanese female women. The dataset was planned and
assembled by Miyuki Kamachi, Michael Lyons, and Jiro The procedure ends when the residual r n (t ) is either a
Gyoba [13]. The photos were taken at the psychology consistent, a monotonic slant or a capacity with just single
department in Kyushu University. The expressions are extrema. The resultant of the sifting procedure produces n
surprise (30), angry (30), disgust (29), fear (32), happy (31), IMFs (c 1 (t) , c 2 (t), … c n(t)) and residue signal r n (t ) .
neutral (30) and sad (31). Same expression from the same The first signal can be reproduced utilizing superposition of
subject was collected 2-4 times. All the faces are posed and all the IMF:
taken in a controlled environment. Fig. 2 shows some
sample faces from JAFFE dataset. n
x ( t )=∑ C n ( t ) + R n(t) (1)
C. Image Preprocessing i =1
This method is used to localize the face region from
unnecessary background. The size of original image In this work, to stop a sifting procedure, standard
256X256 pixel is cropped to 128X96. Then, the subject will deviation (SD) is utilized. We have utilized the default
be subjected to noises and continued to features extraction
estimation of most extreme standard deviation patterns, the distance computed from the best
( SD¿¿ MAX) ¿. possible match can be obtain.
5. Obtain the decision rule of train vector of each
facial expression label.

E. Principal Component Analysis To characterize the symmetrical plot of the neighborhood


Principal Component Analysis (PCA) is a common of each point in the manifold, we assume each x i can be
application used to extract features for facial recognition approximately represented by an affine combination of its
neighbor points,
purposes [5]. PCA is an eigenvector method intended to
display direct variety in high-dimensional data. PCA
x i= ∑ W ij x j (3)
xi ∈N x i
performs dimensionality reduction by anticipating the first
where W =W ij is the reconstruction weight matrix for all
n-dimensional information onto the k (<< n)- dimensional
points and the i -th row of W stores all reconstruction
linear subspace spread over by the main eigenvectors of the
information's covariance matrix. coefficients for the i -th point x i with ∑ W ij=1
j
Specifically, least squares are exploited to describe the local
Let the data from samples x 1 , x 2 , … , x n in Rn that linear properties of each point,

‖x − ∑ W x ‖=0
belong to number of emotion class, PCA will find the a
transformation matrix W that map the mean,m points of all i ij j (4)
x i ∈ N xi
samples.
Obviously, Eq. (4) is affine invariant
1. First, find the mean,m of all samples. approximately. Thus, we can further formulate E g as the
following function of object of the weighted matrix denoted
2. Obtain the within-class scatter matrix, Sw using by W ,

‖ ‖
2
this formula Sw =¿. E g= ∑ ∫ f ( xi ) − ∑ ∫ W ij f ( x j ) (5)
x

x ∈X
i
x ∈N x j i

3. However, the matrix Sw is singular so it need to be x


where W represents the reconstruction weight
projected onto PCA. matrix of the image set X. If we mark the mapping relation
of each point as a vector, the function f can be represented
4. The resulting number of eigenvector will be
by a {0,1} binary matrix F mxn. Thus, the function (5) can be
selected to represent the PCA subspace.
rephrased as the matrix form,
2
E g=‖( I −W ) F Y ‖ =‖L F Y ‖ .
2
The output set of principal vectors w 1, w 2 , . . . , w k , is an x T x T
(6)
orthonormal set of vectors speaking to which the Because of the sum of
eigenvectors of the example covariance grid connected with each row in W equals to 1, L x
the k < d biggest eigenvalues.
can be treated as the Laplacian
matrix of a graph, where the
F. K-Nearest Neighbor
edge may be constructed by W x .
As nonparametric sample-based method [6], k-Nearest
Neighbors target closest adjust for two arrangements
without the need of any prior information. As of late, Wang
et al. planned an unmonitored arrangement method without
correspondence, which takes in a task altering occasions III. EXPERIMENTAL RESULT AND DISCUSSION
from two set of space to a reduced space, and at the same
time coordinates the nearby geometry structures by the k
A. EMD on Noisy Image
closest neighbors.
The developed algorithm is tested using 256X256,
Generalization of this classification could be summarise 8-bit/pixel image from JAFFE database cropped to 128X96
into several steps: on face region (see Fig. 3). As no real image are taken,
image from JAFFE database will be stimulated with four
1. Parameter setting using k-NN and simulated type of noise; Gaussian, Poisson, Salt and Pepper, Speckle.
images from JAFFE dataset contain 213 images of The test image is corrupted with all type of noise with
seven facial expressions. different level of density ranging from 10 to 90 with an
2. Training emotion detector k-NN to calls seven increment of 20 but for Poisson with default mean and
discrete emotion classes into their subset class. Gaussian noise with noise density ranging from 10 to 30
3. Set suitable range of nearest neighbor for train data with an increment of 10 as shown in Fig. 4(a)(b)(c)(d)
energy (Note that by reducing the number of below. Thenceforth, the corrupted image will be
components will boost the accuracy). decomposed into five intrinsic mode function(IMFs)
4. Organize the matrix of local geometry set to including residue using EMD.
compute neighboring point using k-NN method, by
comparing possible matches between two local
is repeated 10 times including all subsets to be tested and
trained, finally the average is computed.
Figure 3: Cropped images from original
images

a.

b.

Figure 6: Choosing number of PC.

From Table 1 facial emotion recognition performances


c.
under noise-corrupted images using EMD are slightly
decreased compared to noise free images. The k-NN
classifier give recognition rate, which are 76.06%, 69.01%
d. and 64.32% for different level of noise (v=0.01, v=0.02 and
v=0.03) individually.

Figure 4: Noisy images; a. Gaussian noise, b. Poisson


noise, c. Salt and Pepper noise and d. Speckle noise a. Classification accuracy for Gaussian noise.

Table 1: Recognition rate of IMF2 + PCA using k-NN


Differences of decomposition of IMF between original classifier under Gaussian noise
image and noise-corrupted image can be clearly seen by the
first IMF which contain the highest variation of noise Noise density(%) IMF2+PCA+kNN(%)
according to their type of noise. This compliment with the
earlier findings by Qing et al. about the noise information Noise Free 87.79
are accommodate at the first modes. This proof that at first
IMF, high-frequency noise will be produced. Therefore,
IMF2 will be selected as reference features to classify facial 10 76.06
emotion images that are corrupted as illustrated in Fig. 5.
20 69.01

30 64.32

a. b. c. d.
As shown in Table 2 below, k-NN able to give 76.06%
Figure 5: Noisy images after decomposed into number
classification accuracy. This shows promising result as it
of modes using EMD: a. first mode IMF1, b. second
mode IMF2, c. third mode IMF3 and d. residue more towards positive value. However, a number of
mistakenly recognize emotion are from disgust, fear, happy
B. Classification of IMF2 using PCA and k-NN and sad. An average of 11 images mistakenly recognize-able
This part will discuss about space minimization of IMF2 from another class.
vector before classifying their respective class of emotion.
The matrix size of the IMF2 is 213X12288 will be reduced Table 2: Confusion matrix of IMF2+PCA with Gaussian
to 213X200. The reason 200 first eigenvalue is picked noise, variance(v=0.01)
because it contains as much information from previous data AN DI FE HA NE SA SU
(see Fig. 6). However, at 200 number of eigenvector already AN 27 5 0 1 0 1 0
limited the other variances that in the range of 0.8 to 1. To DI 1 22 1 2 0 1 0
FE 0 1 21 0 0 5 1
assess a better classification, it is wise to choose then HA 0 0 0 18 0 1 0
number of Principal Component (PC) in a range between NE 2 0 5 7 27 5 0
SA 0 1 3 2 1 18 0
150 to 200. The new eigenvector matrix obtain are further SU 0 0 2 1 2 0 29 162
evaluated using 10-fold cross validation, in which the 0 0 0 0 0 0 0 76.06
reduced matrix was scattered among 10 subsets with equal
portion of data. Consistently, nine set are chosen as training b. Classification accuracy for Poisson noise.
set while the rest is set for testing. Rotation of this procedure
As shown in Table 3, facial emotion recognition fall up to Table 6: Confusion matrix of IMF2+PCA with Salt and
78.87% recognition rates for IMF2 as the mean for Poisson Pepper noise, variance(v=0.01)
noise are m=0 . Table 4 result shows only 168 images are AN DI FE HA NE SA SU
AN 9 2 3 0 0 1 0
successfully categorized into their respective class of DI 2 11 1 1 0 0 0
emotion. Another 45 images are inaccurately recognized and FE 1 3 11 2 2 6 2
HA 1 2 2 17 4 3 2
most of them belongs to class disgust, fear, happy and sad. NE 4 2 9 6 17 6 10
SA 13 9 5 4 4 14 1
Table 3: Recognition rate of IMF2 + PCA using k-NN SU 0 0 1 1 3 1 15 94
0 0 0 0 0 0 0 44.13
classifier under Poisson noise
d. Classification of Speckle noise.
Noise density(%) IMF2+PCA+kNN(%)
Table 7: Recognition rate of IMF2 + PCA using k-NN
- 78.87 classifier under Speckle noise

Noise density(%) IMF2+PCA+kNN(%)

Table 4: Confusion matrix of Poisson noise with default 10 81.22


mean.

AN
AN
28
DI
3
FE
0
HA
0
NE
0
SA
4
SU
0
30 75.12
DI 0 24 1 0 0 0 0
FE 2 0 23 0 0 2 0
HA 0 0 0 19 0 1 0 50 73.71
NE 0 0 4 6 29 9 0
SA 0 2 3 3 1 15 0
SU 0 0 1 3 0 0 30 168
0 0 0 0 0 0 0 78.87 70 74.18

c. Classification accuracy for Salt and Pepper.


90 75.59
Table 5: Recognition rate of IMF2 + PCA using kNN
classifier under Salt & Pepper noise
Table 7 show facial emotion recognition rates under
Noise density(%) IMF2+PCA+kNN(%) Speckle noise and gives recognition rate up to 81.22%,
75.12%, 73.71%, 74.18% and 75.59% for different level of
noise
10 44.13
(m=0.01, m=0.03 , m=0.05 , m=0.07∧m=0.09)
respectively.
30 24.41
Table 8: Confusion matrix of IMF2+PCA with Speckle
noise, variance(v=0.01)
50 17.37
AN DI FE HA NE SA SU
AN 29 2 1 0 0 4 1
70 16.90 DI
FE
1
0
23
2
1
24
0
0
0
0
0
1
0
0
HA 0 0 1 19 0 1 0
NE 0 1 2 8 29 3 1
90 17.84 SA 0 1 3 3 1 21 0
SU 0 0 0 1 0 1 28 173
0 0 0 0 0 0 0 81.22

Table 8 above shows confusion matrix of Speckle noise at


The result from Table 5 shows low recognition rates which v=0.01 because at this level the recognition rate display the
did not reach not even 50%. The k-NN classifier gives most promising result other than the rest of variance level.
recognition rates as low as 44.13%, 24.41%, 17.37, 16.90% An average of 9 images are misidentified as other class
and 17.84% for different level of noise while these image belongs to class disgust, fear, happy and
(m=0.01, m=0.03 , m=0.05 , m=0.07∧m=0.09) sad.
respectively. To observe the unusual behaviour of this type
noise, it could be seen by confusion matrix stated in Table 6 One can be observed that when images are corrupted, the
that most of the emotion class are misinterpret with an facial emotion recognition will be affected and thus
average of 17 false image classification for each type of decreased the recognition rates when tested on original
emotion. database. This will implicate when implementing in real
time image processing. Despite having increasing level of
noise intensity added to images, the results does not vary at
much. However, during observation on the presence of
different types of noise, the outcome of recognition rates is
slightly different and does compromise with outcome from
previous researches. Although EMD is strongly
recommended to extract features, it is not enough to
completely delete the noise causing image pixel to corrupt.

IV. CONCLUSION

This paper has presented facial emotion recognition under


noisy environment using empirical mode decomposition. To
investigate the facial emotion recognition under noisy
environment, we have generated image with noise-
corruption likeness using different type of noise such as
Gaussian, Poisson, Salt and Pepper and Speckle on different
noise level. Henceforward, the noise corrupted image are
applied to EMD algorithm to derived the image signal into
number of layer of intrinsic mode functions and a residue.
Overall findings of noise corrupted images using EMD
expose the occurrence of listed type of noise on facial
images increased the number of modes by one as compared
to original images.

During extraction process using EMD, IMF2 are chosen


as main features to classify corrupted facial expression
because at the very first modes (IMF1) of EMD
decomposition, it contains the most variation of frequency
reflected upon the noise. The recognition rates of IMF2 +
PCA for different level acquired however conflict with the
expected result, for example k-NN classifier supposed to
recognize corrupted facial images with salt and pepper noise
( v=0.01 ¿ only gives 41.31% accuracy. The result also
degraded over increasing noise level specifically recognition
of SVM and k-NN gives 16.90% and 17.84% respectively.

Therefore, based on the result acquired the EMD


technique did not meet the expected result and cannot solve
the effect of the noise in classifying the facial emotion
effectively.

REFERENCES

[1] Hans C. Breiter, W. A. (November 1996). Response and Habituation


of the Human Amygdala during Visual Processing of Facial
Expression. Neuron: Cell Press, 17, 875-887.
[2] Takeo Kanade, J. F. (n.d.). Comprehensive Database for Facial
Expression Analysis.W.-K. Chen, Linear Networks and Systems
(Book style). Belmont, CA: Wadsworth, 1993, pp. 123–135.
[3] Kamachi, M., Lyons, M., & Gyoba, J. (1998). The Japanese
female facial expressionB. Smith, “An approach to graphs of linear
forms (Unpublished work style),” unpublished.
[4] Esteve Gallego-Jutglà, K. L.-d.-I.-P.-C. (2008). Empirical Mode
Decomposition-Based Face Recognition System. System Engineering
and Automation Department.
[5] Xiaofei He, S. Y.-J. (MARCH 2005). Face Recognition Using
Laplacianfaces. Ieee Transactions On Pattern Analysis And Machine
Intelligence, 27(3), 328-340.
[1] [6] Tae-Kyun Kim, J. K. (JUNE 2007). Discriminative Learning
and Recognition of Image Set Classes Using Canonical Correlations.
Ieee Transactions On Pattern Analysis And Machine Intelligence,
29(6), 1005-1018.

You might also like