0% found this document useful (0 votes)
4 views

EE_475_Report (3)

This study presents a comprehensive approach for gender classification using geometric facial features and various machine learning classifiers, including SVM, Random Forest, KNN, Logistic Regression, and Neural Network. The research utilizes the UTKface and CelebA datasets, demonstrating that enhanced feature vectors incorporating gender-specific information can significantly improve classification accuracy, achieving rates of approximately 98%. The methodology includes preprocessing techniques, feature extraction, and performance evaluation through confusion matrices and F-scores to compare the effectiveness of different classifiers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

EE_475_Report (3)

This study presents a comprehensive approach for gender classification using geometric facial features and various machine learning classifiers, including SVM, Random Forest, KNN, Logistic Regression, and Neural Network. The research utilizes the UTKface and CelebA datasets, demonstrating that enhanced feature vectors incorporating gender-specific information can significantly improve classification accuracy, achieving rates of approximately 98%. The methodology includes preprocessing techniques, feature extraction, and performance evaluation through confusion matrices and F-scores to compare the effectiveness of different classifiers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Comparative Study of Machine Learning Models

for Gender Classification Using Geometric Facial


Features
Ozcan, Utku Celikkaya, Musa Furkan
Electrical and Electronics Engineering Electrical and Electronics Engineering
Bogazici University Bogazici University
Istanbul, Turkey Istanbul, Turkey
[email protected] [email protected]

Abstract—This study shows a comprehensive approach that gender classification, and that enhanced feature vectors incor-
uses feature vectors and different classifiers for gender classifi- porating additional gender-specific information can improve
cation. We utilized the UTKface and CelebA datasets to compare test accuracy across different classifiers.
the robustness of the gender recognition algorithm for different
datasets. For each dataset, different classifiers (SVM, Random II. R ELATE W ORKS
Forest, K-Nearest Neighbors, Logistic Regression, and Neural
Network) were used to compare the efficiency and performance Gender classification from facial images has been an active
of different classifiers for the same facial images. We used 68- research area for more than two decades, and it has evolved
point facial landmarks and computed novel geometric ratios to significantly from early traditional approaches to modern deep
train the models and increase the performance of our code.
After training the models with feature vectors, we compare the learning methods. Much of the early work in this field was
resultant test images with their corresponding gender predictions, based on geometric facial features and traditional machine
confusion matrices, and F-scores of each model to compare the learning algorithms.
results of different classifiers. The results suggest that our model Khan et al. [1] published a very comprehensive survey
achieves strong performance across different classifiers while on gender classification techniques using image processing,
providing an effective solution to gender recognition problem.
showing that the first steps of researchers in this area were to
Index Terms—face detection,gender recognition,feature vec- extract and analyze specific facial features and their geometric
tors,machine learning relationships to determine gender. These early methods usually
achieved 75-80% accuracy on controlled datasets. Starting
I. I NTRODUCTION from this basic foundation, many efforts have been directed
toward improving the accuracy of classifications by using
Facial feature-based gender classification has become a different techniques in feature extraction.
prominent application in computer vision and machine learn- Guttikonda and Kalam [2] demonstrated the effectiveness
ing, with significant impact on demographic studies, human- of using carefully selected geometric facial features such as
computer interaction, and various real-world applications. This interocular distance, lip-to-nose ratio, and several other facial
work proposes a comprehensive approach for gender classi- measures. Their study showed that traditional feature extrac-
fication based on feature vectors consisting of facial land- tion can achieve quite impressive accuracy levels of up to 85%,
marks and proportions using machine learning models on two while maintaining a clear understanding of the classification
different datasets: UTKFace and CelebA. For classifier input process. The emergence of deep learning has been a turning
preparation, we extracted face regions and 68 facial landmarks point in the performance of gender classification.
using the dlib library. Different data preprocessing techniques, Lapuschkin et al. [3] compared different deep neural net-
including affine transformation and normalization, were ap- work architectures for age and gender classification. They
plied to the landmark point coordinates. The original feature showed that classifiers with deep learning model could be
vector algorithm was extended by incorporating facial ratios. trained to achieve more than 90% accuracy. One of the biggest
To evaluate effectiveness, we compared the performance of benefits of the methods in deep learning is that they can au-
feature vectors consisting of original 68-point facial landmarks tomatically find important features from input images without
against an extended version incorporating facial ratios across the need for manual work. Recent studies have focused on
our datasets. These feature vectors were used to train different making gender classification systems more stable and efficient.
classifiers (SVM, Random Forest, KNN, Logistic Regression, Lin and Xie [4], whose ideas we adopted most in this
and Neural Network), and their results were compared. In this project, developed a new method for gender classification
paper, we demonstrate that feature vectors provide effective using face recognition feature vectors. It has proven to be
stronger against changes in image quality and facial expres- structure in our algorithm. The landmarks corresponding to the
sions, while achieving similarly impressive accuracy rates of different regions of the face are as follows:
approximately 98%. • Jawline: 1–17
In this paper, we propose an approach for gender classifica- • Eyebrows:
tion that uses effective preprocessing techniques and geometric – Left eyebrow: 18–22
facial proportions to create a feature vector, and then combines – Right eyebrow: 23–27
the feature vector with modern machine learning and neural
• Nose:
network architectures. Next, we will describe the method we
proposed in detail. – Nose bridge: 28–31
– Nose tip: 32–36
III. M ETHOD • Eyes:

III. METHODS – Left eye: 37–42


Our facial gender detection algorithm consists of two main – Right eye: 43–48
steps: training and testing the model. In the training phase, • Mouth:
we begin with raw input images from two datasets: UTKFace – Outer contour: 49–60
and CelebA. These images undergo preprocessing where face – Inner contour: 61–68
detection and 68-point feature extraction are performed using
the dlib library. After obtaining the facial landmarks and their
coordinates in the images, we apply affine transformation
to the coordinates, which includes rotation and translation
operations.
The transformed coordinates then undergo a normalization
step to remove position dependence and create meaningful
patterns for the classifiers. For the feature vector generation,
we implement two approaches: a base version using the
flattened facial landmark locations (68,2) after normalization
(136,1), and an extended version that incorporates additional
facial ratios. These feature vectors serve as inputs to five
different classifiers: SVM, Random Forest, KNN, Logistic
Regression, and Neural Network.
In the testing phase, test data undergoes the same feature
extraction process. The resulting feature vector is input to
the trained models, and gender classification predictions are
made. We evaluate performance by comparing predictions
against ground truth labels using various performance metrics.
Our experimental design enables comparison across different Fig. 1. 68 point facial landmarks
classifiers, datasets, and feature vector algorithms to determine
the optimal approach for gender classification. 1) Affine Transform: We apply affine transform to the im-
ages by multiplying the facial landmarks with a transformation
A. Face Detection and landmark detection matrix. The main reason behind applying affine transform is
Face detection is mostly the very beginning of the gender that it doesn’t distort facial features, expressions and main-
recognition algorithms. Face detection simply finds the loca- tains relative distances and proportions in the face. Before
tion of the face region for an image. In our code, we extract the calculating the coefficients of the transformation matrix, we
face regions of the raw images taken from different datasets extract the coordinates of the left (37-42th points) and right
using dlib library. Detecting the face regions helps us to get the (43-48th points) eyes. After this, we simply take the average
68 point facial landmarks and get the area of the face which of the x and y coordinates of the left eye point and right eye
will be used in normalization part. After detecting the faces, points so that we get 1 averaged point for left and 1 averaged
again dlib is used for finding the landmark points for the faces. point for right eye. Using these two points, and their x and y
These landmarks constitute a basis for our gender recognition coordinates, we find the dx (difference of x coordinates) and
algorithm, since our feature vectors are mainly obtained from dy (difference of y coordinates). Using dx and dy, we calculate
transformed and normalized versions of the facial landmarks. the arc tangent of the dy/dx, which is the slope of the line that
connects averaged left eye and averaged right eye. In addition,
B. Preprocessing we also find the midpoint of the averaged eye points. After
After obtaining the face regions and 68 point facial land- these steps, we use the angle obtained from arc tangent,θ, and
marks, we preprocess the landmarks in order to have a better midpoint of the eyes to create a transformation matrix which
feature vector that will help us to create a more robust classifier rotates and translates the facial landmarks. Our transformation
matrix rotates the whole image relative to the center point of
y1 + y2 + ... + y68
the eyes by the angle θ clockwise so that now two eyes are f ace centery =
parallel to the ground. In order to show how the coordinates 68
of the 68 facial landmarks are affected by this transform, let’s p
consider a point (x, y) in the original image, the transformed sqrt f ace size = width · height
point (x′ , y ′ ) is calculated using:
xi − f ace centerx
x′i =
42 42 sqrt f ace size
1 X 1 X
lef t eyex = xi , lef t eyey = yi
6 i=37 6 i=37 yi − f ace centery
yi′ =
48 48
sqrt f ace size
1 X 1 X
right eyex = xi , right eyey = yi
6 i=43 6 i=43 xi − f ace centerx yi − f ace centery
(x′i , yi′ ) = ( , )
sqrt f ace size sqrt f ace size
dx = right eyex − lef t eyex
C. Feature Extraction
dy = right eyey − lef t eyey Feature extraction is the most important part of our imple-
mentation since the robustness and accuracy of our classifier
lef t eyex + right eyex lef t eyey + right eyeymodels are directly dependent on the feature vectors we
cx = , cy = create. That’s why, obtaining a correct feature vector is our
2 2
main concern. After preprocessing in which we apply affine
transform and normalization to the 68 point facial landmarks,
θ = arctan 2(dy, dx)
now we are ready to convert these locations to the feature
vector. Our first approach to create a feature vector is to
tx = (1 − cos(θ))cx + sin(θ)cy directly use these 68 facial landmarks in the feature vector
by flattening the transformed and normalized coordinates and
ty = − sin(θ)cx + (1 − cos(θ))cy
obtaining a (136,1) vector. We also tried a second approach,
  which takes into account not only the facial landmarks but also
cos(θ) − sin(θ) tx some facial proportions such as left eye aspect ratio, right jaw
M=
sin(θ) cos(θ) ty angle and face width-to-height ratio. The formulas for these
 ′    ratios are:
x cos(θ) − sin(θ) tx x
y ′  =  sin(θ) cos(θ) ty  y  • Basic Width and Height Measurements

1 0 0 1 1 – Jaw width: jaw width = |P1 − P17 |


– Face height: f ace height = |P9 − P28 |
x′ = cos(θ)x − sin(θ)y + tx – Face width-to-height ratio: width to height =
jaw width
y ′ = sin(θ)x + cos(θ)y + ty f ace height
• Width Ratios
2) Normalization: Affine transform is followed by a nor- – Cheek width: cheek width = |P2 − P16 |
malization process for the landmark points. Since it is of much – Temple width: temple width = |P3 − P15 |
importance to remove the position dependence of the landmark – Cheek-to-jaw ratio: cheek to jaw = cheek width
jaw width
points and have a standardized feature vector regardless of – Temple-to-jaw ratio: temple to jaw =
photo distance, normalization is a necessary part for our algo- temple width
jaw width
rithm. In addition, normalization makes it easier for classifier • Mouth Proportions
models to learn the patterns in feature vectors more easily. In
– Mouth width: mouth width = |P49 − P55 |
our code, the normalization process is performed by taking the
– Mouth height: mouth height =
average of x and y coordinates of the facial landmarks obtained P52 +P58 P49 +P55
2 − 2
after affine transform. This average x and y coordinates are set
– Mouth-to-face width ratio:
as the center of the landmarks. We also get the area of the face
mouth to f ace width = mouth width
jaw width
from the detected face region in the 1st part of our algorithm
– Mouth-to-face height ratio:
and take square root of this value. Finally, our normalized
mouth to f ace height = mouth height
f ace height
landmark locations are found by first extracting the center of
the landmarks from the landmark locations and dividing it • Nose Proportions
to the square root of the face area. The normalized (x′ , y ′ ) – Nose width: nose width = |P32 − P36 |
coordinates from a point (x, y) is calculated as: – Nose length: nose length = |P28 − P34 |
x1 + x2 + ... + x68 – Nose-to-face width ratio: nose to f ace width =
f ace centerx = nose width
68 jaw width
– Nose-to-face height ratio: nose to f ace length = 2) Random Forest(RF): Random forest (RF) is a model
nose length
f ace height based on using multiple decision trees and classifying the
• Face Height Ratios predictions of these trees according to the majority vote or
– Upper face height: upper f ace = |P28 − P22 | average. It is a capable model in handling complex datasets.
– Lower face height: lower f ace = |P9 − P34 | In our project, we created the model with 100 trees
– Upper-to-lower face ratio: upper to lower = (n estimators = 100) and limited the growth depth
upper f ace (max depth = 5) of each tree. The number of leaf nodes
lower f ace
(min samples leaf = 10) and the minimum number of samples
• Eye Aspect Ratio (EAR)
(min samples split = 20) required to divide the nodes were
– Left Eye Aspect Ratio (LEAR): chosen to preserve the generalization ability of the model.
|P38 − P42 | + |P39 − P41 | In addition, balanced class weights were applied and the
LEAR = bootstrap sampling was set to 80% of the samples to maintain
2|P37 − P40 |
the stability of the model.
– Right Eye Aspect Ratio (REAR): 3) K nearest neighbors(KNN): K nearest neighbors (KNN)
|P44 − P48 | + |P45 − P47 | is a simple but effective learning algorithm that classifies
REAR = by looking at the nearest neighbors. It estimates the class
2|P43 − P46 |
according to the majority of these neighbors by looking at
• Jaw Angles the closest K examples in the feature space. KNN is one of
– Left jaw angle: jaw lef t = arctan 2(y9 − y1 , x9 − the good basic classifiers thanks to its intuitiveness.
x1 ) In this paper, we wanted KNN to look at the nearest five
– Right jaw angle: jaw right = arctan 2(y9 − neighbors (n neighbors = 5) for overfitting or underfitting
y17 , x9 − x17 ) while classifying. The fact that KNN is not as complex as
Where (xi , yi ) are the coordinates of point Pi . other models made us want to compare it with other detailed
In total, we calculate 12 different facial ratios that contribute methods used in this project.
to the gender classification problem. As our second feature 4) Logistic Regression(LR): Logistic regression (LR), al-
vector method, we add these ratios to the (136,1) flattened though it has regression in its name, is a basic classification
feature vector by concatenating these ratios to the original algorithm that models according to the probabilities of a binary
feature vector and obtaining a (148,1) vector. These two outcome. Thanks to the sigmoid function, this model allows
feature vector algorithms are used in our implementation for us to categorize the sample between 0 and 1 and we can easily
training and testing the models with different classifiers : separate it into two classes with a threshold value.
SVM, Random Forest, KNN, Logistic Regression, and Neural In this project, we tried to focus on certain details in the
Network as inputs in the next step of our gender classification data set and prevent class imbalance in order for the model
algorithm. to be more stable. Logistic regression is a suitable model to
compare with other classifiers because it is a simple classifier,
D. Classification just like KNN.
After extracting the feature vector of the detected face, 5) Fully Connected Neural Network: Neural networks are
how the vector is classified becomes an issue. Accordingly, powerful deep learning models that can learn complex data
in this paper, we use some machine learning algorithms through interconnected nodes in multiple layers based on how
such as support vector machine(SVM), random forest(RF), K the human brain processes information. Being effective for
nearest neighbors(KNN), logistic regression(LR), and neural high-dimensional data and being able to automatically learn
network(NN). relevant features from the input allows them to be used in
1) Support Vector Machine(SVM): Support Vector Machine various studies.
(SVM) is a powerful learning algorithm designed to catego- In this paper, The neural network has 1024 neurons in
rize data by identifying the appropriate hyperplane that best the first layer and 512, 256 and 128 neurons in the three
separates different classes in the feature space. SVM works hidden layers. For a stable training, we use ReLU (Rectified
very well for handling high-dimensional data and resolving Linear Unit) activation functions in each layer and the sigmoid
challenging classification issues. function in the output layer, which sets the samples to the
In our project, in order to manage non-linear correlations in range of 0 and 1 for binary classification. The dropout rate
the data, we used an RBF (Radial Basis Function) kernel when starts at 0.5 to prevent overfitting and decreases gradually in
implementing SVM. To keep the complexity and generaliza- deeper layers. The network uses the Adam optimizer and the
tion of the model in check, we determine the regularization initial learning rate is 0.0001 (initial rate = 0.0001, decay rate
parameter C to 1. To account for any possible class imbalance = 0.9). In addition, it has 30 epochs and the batch size is 32.
in the classification task, the classifier was set up with balanced The fact that the neural network is the most complex model
class weights. When necessary, we also turned on probability in this project shows what can be done using deep learning
estimates to provide more thorough forecasts. for gender classification.
Fig. 2. Our Neural Network Architecture

E. Evaluation
In order to make better comparisons between the different
classifiers and methods, this paper uses confusion matrix and
some metrics such as presicion, recall, and f1 score.

True Positives
Precision =
True Positives + False Positives

True Positives
Recall =
True Positives + False Negatives

Precision · Recall
F1-Score = 2 ·
Precision + Recall
Fig. 3. Samples of UTKFace dataset
TP + TN
Accuracy =
TP + TN + FP + FN
2) CelebA: The CelebFaces Attributes Dataset (CelebA) is
Supportc = |{xi |xi ∈ classc }| the celebrity face database, with more than 200000 images and
10000 identities. The images in dataset include background
1 X clutter and a large range of poses. In this paper, we have
Macro Average = metricc collected over 16000 face images at random. Some samples
|C|
c∈C of the CelebA dataset are shown in Fig. 4.
P
c∈Cmetricc · Supportc
Weighted Average = P
c∈C Supportc

TP is True Positive; TN is True Negative; FP is False Positive;


FN is False Negative; Support is the total number of samples.
IV. DATASET AND RESULT
A. Dataset
1) UTKFace: UTKFace is a comprehensive facial image
collection created at the University of Tennessee, Knoxville.
It includes more than 20,000 face photos annotated by eth-
nicity (White, Black, Asian, Indian, and Others), gender
(male/female), and age (0-116 years). As our first dataset,
UTKFace was utilized by randomly selecting over 16000 raw
images from the dataset. Some sample images from UTKFace Fig. 4. Samples of CelebA dataset
dataset are shown in Fig. 3.
B. Result vector consisting of the landmark coordinates. Feature vector
In this paper, we try to implement our gender classification we use in this architecture is obtained by just flattening the
algorithm with three different architectures: (68,2) vector into a (136,1) vector. An example feature vector
1) UTKface dataset was utilized with the original 68-point for the sample image 1.
facial landmarks as feature vector
2) UTKface dataset was utilized, feature vector consisting of
68-point facial landmarks and 12 facial ratios were used
3) CelebA dataset was utilized, feature vector consisting of
68-point facial landmarks and 12 facial ratios were used
In this way, we compare the classification results of both
different datasets and feature vector algorithms with 5 different
classifiers utilized for every architectures. For every structure,
we divide the dataset into 3 main categories: train, validation
and test data with a 70% , 15% and 15% configuration,
respectively. After each architecture, we compare the results
obtained from 5 different classifiers. In addition, we also
compare the performances of different structures.
1) UTKface dataset was utilized with the original 68-point
facial landmarks as feature vector: To start with, raw images
are taken from UTKFace dataset. In this architecture 15413
raw images are used for our algorithm, 10789 of them are for
training, 2312 of them are for validation and 2312 of them are
for testing. As the first part of our algorithm, using dlib library,
face regions and 68 facial landmarks are extracted from the
images. Five Sample images and the result of face detection-
landmark detection algorithm is shown in Fig. 5.

Fig. 7. Feature Vector for Sample Image 1

Fig. 5. Sample images taken from UTKFace dateset, with detected face and The obtained feature vectors become input to our classifiers
landmarks
for training part and we obtain classification result for 5
In the next step, we apply affine transform to the images different classifiers. For the evaluation, we obtain feature vec-
(Fig. 6) and get the new coordinates of the facial landmarks. tors from test images and compare the resultant classification
results, confusion matrices and classification reports.
Support Vector Machine (SVM) Classifier:
In the classification result for SVM, we show the 5 images
with highest confidence score. This score is the output of the
classifiers that measure the probability of the predicted gender.
This score is in between 0-1 with 1 means highest probability.
Fig. 6. Sample aligned face images images taken from UTKFace dateset However, for display reasons, we set 0 to male and 1 to female.
The confidence score displayed is a better result for male if
After affine transform, normalization is applied to the 68 it is close to 0 and a better result for female if it is close
point facial landmarks in order to remove position dependency to 1. Since the classifier probability means high if it is close
of the points and to create a meaningful pattern for classifiers to 1, we use it directly as our confidence score for female
to learn. These preprocessing steps, namely affine transform classification. For male classification, however, we calculate 1
and normalization helps us to calculate the necessary coor- - classifier probability in order to get the confidence score for
dinates of the 68 point facial landmarks, which is the basis male. In general, as the probability gets lower, the accuracy
for feature vector. After preprocessing step, we have a (68,2) of our classifier model drops.
Random Forest (RF) Classifier:
A 73.31% test accuracy and a %74.64 training accuracy was
obtained for the architecture designed with UTKFace dataset
and classical feature vector algorithm, predicted with Random
Forest classifier.
Fig. 8. Classification result for SVM with top 5 most confident correct
predictions

After testing SVM classifier with test images and comparing


the results with the ground truth gender classification, we
obtain the confusion matrix, classification report and overall Fig. 12. Classification result for RF with top 5 most confident correct
accuracies. A 78.72% test accuracy and a %80.94 training predictions
accuracy was obtained for the architecture designed with UTK-
Face dataset and classical feature vector algorithm, predicted
with SVM classifier.

Fig. 9. Confusion Matrix for SVM with images taken from UTKFace dateset Fig. 13. Confusion Matrix for RF with images taken from UTKFace dateset
and classical feature vector algorithm and classical feature vector algorithm

Fig. 10. Classification report for SVM with images taken from UTKFace Fig. 14. Classification report for RF with images taken from UTKFace dateset
dateset and classical feature vector algorithm and classical feature vector algorithm

Fig. 11. Classification final results for SVM with images taken from UTKFace Fig. 15. Classification final results for RF with images taken from UTKFace
dateset and classical feature vector algorithm dateset and classical feature vector algorithm
K Nearest Neighbors (KNN) Classifiers: Logistic Regression (LR) Classifiers:
A 74.09% test accuracy and a %82.38 training accuracy A 77.03% test accuracy and a %77.82 training accuracy was
was obtained for the architecture designed with UTKFace obtained for the architecture designed with UTKFace dataset
dataset and classical feature vector algorithm, predicted with and classical feature vector algorithm, predicted with Logistic
K Nearest Neigbors classifier. Regression classifier.

Fig. 16. Classification result for KNN with top 5 most confident correct Fig. 20. Classification result for LR with top 5 most confident correct
predictions predictions

Fig. 17. Confusion Matrix for KNN with images taken from UTKFace dateset Fig. 21. Confusion Matrix for LR with images taken from UTKFace dateset
and classical feature vector algorithm and classical feature vector algorithm

Fig. 18. Classification report for KNN with images taken from UTKFace Fig. 22. Classification report for LR with images taken from UTKFace dateset
dateset and classical feature vector algorithm and classical feature vector algorithm

Fig. 19. Classification final results for KNN with images taken from UTKFace Fig. 23. Classification final results for LR with images taken from UTKFace
dateset and classical feature vector algorithm dateset and classical feature vector algorithm
Neural Network (NN) Classifier: For the training and validation loss graph for our neural
A 80.28% test accuracy and a %84.26 training accuracy was network classifier, we observe that both are decreasing and
obtained for the architecture designed with UTKFace dataset converging. For the training and validation accuracy graph,
and classical feature vector algorithm, predicted with Neural an increase in the both of the values and a convergence are
Network classifier. observed. These graphs indicate that our neural network model
trained well.

Fig. 24. Classification result for neural network with top 5 most confident
correct predictions

Fig. 28. The training-validation loss and accuracy graphs for neural network
with images taken from UTKFace dateset and classical feature vector algo-
rithm

When comparing the classifier models for the UTKFace and


68 point facial landmark feature vector algorithm, we observe
that best test accuracy was obtained from neural network
classifiers with a test accuracy of 80.28%. The worst test
accuracy of all 5 classifiers was obtained from random forest
method, with a test accuracy of 73.31%.

Fig. 25. Confusion Matrix for neural network with images taken from
UTKFace dateset and classical feature vector algorithm

Fig. 26. Classification report for neural network with images taken from
UTKFace dateset and classical feature vector algorithm

Fig. 29. Final model comparison result with images taken from UTKFace
Fig. 27. Classification final results for neural network with images taken from dateset and classical feature vector algorithm
UTKFace dateset and classical feature vector algorithm
2) UTKface dataset was utilized, feature vector consisting Support Vector Machine (SVM) Classifier:
of 68-point facial landmarks and 12 facial ratios were used: A 77.60% test accuracy and a %81.33 training accuracy was
For the 2nd architecture, we again start with taking raw input obtained for the architecture designed with UTKFace dataset
images from UTKFace dataset. In this architecture, 15656 raw and combined feature vector algorithm, predicted with SVM
images are used for our algorithm, 12906 of them are for classifier.
training, 2766 of them are for validation and 2349 of them
are for testing. As the first part of our algorithm,again, using
dlib library, face regions and 68 facial landmarks are extracted
from the images. Preprocessing steps (affine transform and
normalization) are same as we did for the previous architec-
ture. The only difference is that when designing the feature
vector, in addition to 68 point facial landmarks, 12 facial ratios
Fig. 32. Classification result for SVM with top 5 most confident correct
are concatenated into the feature vectors and (148,1) feature predictions
vectors are created in this algorithm.

Fig. 30. Sample aligned face images images taken from UTKFace dateset

Fig. 33. Confusion Matrix for SVM with images taken from UTKFace dateset
and combined feature vector algorithm

Fig. 34. Classification report for SVM with images taken from UTKFace
dateset and combined feature vector algorithm

Fig. 31. Feature Vector for Image 1

The obtained combined feature vectors become input to our


classifiers for training part and we obtain classification result
for different classifier models. For the evaluation, we obtain
feature vectors from test images and compare the resultant
classification results, confusion matrices and classification re- Fig. 35. Classification final results for SVM with images taken from UTKFace
ports. In the end, 1st and 2nd architectures are also compared. dateset and combined feature vector algorithm
Random Forest (RF) Classifier: K Nearest Neighbors (KNN) Classifiers:
A 72.62% test accuracy and a %76.15 training accuracy A 73.75% test accuracy and a %83.15 training accuracy was
was obtained for the architecture designed with UTKFace obtained for the architecture designed with UTKFace dataset
dataset and combined feature vector algorithm, predicted with and combined feature vector algorithm, predicted with KNN
Random Forest classifier. classifier.

Fig. 36. Classification result for RF with top 5 most confident correct Fig. 40. Classification result for KNN with top 5 most confident correct
predictions predictions

Fig. 41. Confusion Matrix for KNN with images taken from UTKFace dateset
Fig. 37. Confusion Matrix for RF with images taken from UTKFace dateset and combined feature vector algorithm
and combined feature vector algorithm

Fig. 42. Classification report for KNN with images taken from UTKFace
Fig. 38. Classification report for RF with images taken from UTKFace dateset dateset and combined feature vector algorithm
and combined feature vector algorithm

Fig. 39. Classification final results for RF with images taken from UTKFace Fig. 43. Classification final results for KNN with images taken from UTKFace
dateset and combined feature vector algorithm dateset and combined feature vector algorithm
Logistic Regression (LR) Classifiers: Neural Network (NN) Classifier:
A 77.21% test accuracy and a %77.82 training accuracy was A 79.76% test accuracy and a %85.34 training accuracy was
obtained for the architecture designed with UTKFace dataset obtained for the architecture designed with UTKFace dataset
and combined feature vector algorithm, predicted with LR and combined feature vector algorithm, predicted with Neural
classifier. Network classifier.

Fig. 44. Classification result for LR with top 5 most confident correct Fig. 48. Classification result for neural network with top 5 most confident
predictions correct predictions

Fig. 45. Confusion Matrix for LR with images taken from UTKFace dateset Fig. 49. Confusion Matrix for neural network with images taken from
and combined feature vector algorithm UTKFace dateset and combined feature vector algorithm

Fig. 46. Classification report for LR with images taken from UTKFace dateset
and combined feature vector algorithm Fig. 50. Classification report for neural network with images taken from
UTKFace dateset and combined feature vector algorithm

Fig. 47. Classification final results for LR with images taken from UTKFace Fig. 51. Classification final results for neural network with images taken from
dateset and combined feature vector algorithm UTKFace dateset and combined feature vector algorithm
For the training and validation loss graph for our neural 3) CelebA dataset was utilized, feature vector consisting of
network classifier, we observe that both are decreasing and 68-point facial landmarks and 12 facial ratios were used: In
converging. For the training and validation accuracy graph, the last architecture, we take raw input images from CelebA
an increase in the both of the values and a convergence are dataset. In this architecture, 15656 raw images are used for
observed. These graphs indicate that our neural network model our algorithm, 12906 of them are for training, 2766 of them
also converges in this second architecture. are for validation and 2349 of them are for testing. Sample
images of CelebA dataset is shown in Fig. 54.

Fig. 54. Sample images taken from CelebA dateset, with detected face and
landmarks
Fig. 52. The training-validation loss and accuracy graphs for neural network
with images taken from UTKFace dateset and combined feature vector Preprocessing steps (affine transformation and normaliza-
algorithm tion) and the feature vector algorithm that combines the
original 68 point facial landmarks and 12 facial ratios are
After applying a new feature vector algorithm, we observe same as the 2nd architecture.This feature feature vector (148,1)
that no obvious improvement has occurred for test accuracies. is larger than the feature vector that is obtained by classical
The maximum test accuracy was again obtained from Neural algorithm (136,1).The only difference is now we utilize a
Network classifier with a 79.76% accuracy and the lowest was different dataset for gender classification.
obtained from the RF classifier with a 72.62% accuracy.

Fig. 55. Sample aligned face images images taken from CelebA dateset

Fig. 56. Feature Vector for Image 1

The obtained combined feature vectors (148,1) become


input to our classifiers(SVM ,RF ,KNN ,LR ,NN) for training
part and we obtain classification result for five different
classifier models. For the evaluation, we obtain feature vectors
from test images and compare the resultant classification
results, confusion matrices and classification reports. In the
end, we compare the 2nd and 3rd architectures with their
corresponding test accuracies of different classifiers.
Support Vector Machine (SVM) Classifier:
A 91.61% test accuracy and a %94.54 training accuracy
Fig. 53. Final model comparison result with images taken from UTKFace was obtained for the architecture designed with CelebA dataset
dateset and combined feature vector algorithm and combined feature vector algorithm, predicted with SVM
classifier.
Fig. 57. Classification result for SVM with top 5 most confident correct Fig. 61. Classification result for RF with top 5 most confident correct
predictions predictions

Fig. 58. Confusion Matrix for SVM with images taken from CelebA dateset Fig. 62. Confusion Matrix for RF with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm

Fig. 59. Classification report for SVM with images taken from CelebA dateset Fig. 63. Classification report for RF with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm

Fig. 60. Classification final results for SVM with images taken from CelebA Fig. 64. Classification final results for RF with images taken from CelebA
dateset and combined feature vector algorithm dateset and combined feature vector algorithm

Random Forest (RF) Classifier: K Nearest Neighbors (KNN) Classifiers:


A 84.72% test accuracy and a %86.37 training accuracy was A 84.76% test accuracy and a %91.59 training accuracy
obtained for the architecture designed with CelebA dataset was obtained for the architecture designed with CelebA dataset
and combined feature vector algorithm, predicted with RF and combined feature vector algorithm, predicted with KNN
classifier. classifier.
Fig. 65. Classification result for KNN with top 5 most confident correct Fig. 69. Classification result for LR with top 5 most confident correct
predictions predictions

Fig. 66. Confusion Matrix for KNN with images taken from CelebA dateset Fig. 70. Confusion Matrix for LR with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm

Fig. 67. Classification report for KNN with images taken from CelebA dateset Fig. 71. Classification report for LR with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm

Fig. 68. Classification final results for KNN with images taken from CelebA Fig. 72. Classification final results for LR with images taken from CelebA
dateset and combined feature vector algorithm dateset and combined feature vector algorithm

Logistic Regression (LR) Classifiers: Neural Network (NN) Classifier:


A 92.04% test accuracy and a %92.92 training accuracy was A 92.55% test accuracy and a %95.83 training accuracy
obtained for the architecture designed with CelebA dataset was obtained for the architecture designed with CelebA dataset
and combined feature vector algorithm, predicted with LR and combined feature vector algorithm, predicted with Neural
classifier. Network classifier.
Fig. 73. Classification result for neural network with top 5 most confident
correct predictions

Fig. 77. The training-validation loss and accuracy graphs for neural network
with images taken from CelebA dateset and combined feature vector algorithm

For the final architecture, we observe an approximately


10% increase in the test accuracies compared with the second
architecture. This time, our best test accuracy was obtained
from Neural Network classifier, with a test accuracy of 92.55%
and the worst test accuracy was obtained from RF classifier,
with a test accuracy of 84.72%.

Fig. 74. Confusion Matrix for neural network with images taken from CelebA
dateset and combined feature vector algorithm

Fig. 75. Classification report for neural network with images taken from
CelebA dateset and combined feature vector algorithm

Fig. 76. Classification final results for neural network with images taken from
CelebA dateset and combined feature vector algorithm

The loss and accuracy graphs for both training and valida-
tion for different epochs resembles the previous 2 architec- Fig. 78. Final model comparison result with images taken from CelebA
dateset and combined feature vector algorithm
tures’ neural network classifier graphs. We observe that both
loss and accuracy graphs converge.
V. C ONCLUSION
This study compared conventional feature vector consisting
of 68 point facial landmarks (a feature vector of dimension
(136,1)) with an alternative 148-point feature vector ((148,1)
feature vector with conventional algorithm plus 12 geometric
ratios) for facial gender classification techniques. Multiple
classifier experiments showed that the revised feature vec-
tor increased the test accuracy of our gender classification
algorithm to some extent. In addition, face alignment and
normalization steps increased the robustness of classification.
Even though the neural network performed the best for the
all 3 architectures, conventional classifiers such as SVM
displayed competitive outcomes, demonstrating how well the
built geometric features captured gender-dimorphic traits.
R EFERENCES
[1] S. A. Khan, M. Nazir, S. Akram, and N. Riaz, ”Gender classification
using image processing techniques: A survey,” in 2011 IEEE 14th
International Multitopic Conference. IEEE, 2011, pp. 25–30. G. T. Rado
and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[2] S. Kalam and G. Guttikonda, ”Gender Classification using Geometric
Facial Features,” International Journal of Computer Applications, vol.
85, no. 7, pp. 32–37, Jan. 2014.
[3] Lapuschkin, Sebastian, et al. ”Understanding and comparing deep neural
networks for age and gender classification.” Proceedings of the IEEE
international conference on computer vision workshops. 2017.
[4] Y. Lin and H. Xie, ”Face gender recognition based on face recognition
feature vectors,” in 2020 IEEE 3rd International Conference on Infor-
mation Systems and Computer Aided Education (ICISCAE), pp. 162 –
166, 2020.

You might also like