EE_475_Report (3)
EE_475_Report (3)
Abstract—This study shows a comprehensive approach that gender classification, and that enhanced feature vectors incor-
uses feature vectors and different classifiers for gender classifi- porating additional gender-specific information can improve
cation. We utilized the UTKface and CelebA datasets to compare test accuracy across different classifiers.
the robustness of the gender recognition algorithm for different
datasets. For each dataset, different classifiers (SVM, Random II. R ELATE W ORKS
Forest, K-Nearest Neighbors, Logistic Regression, and Neural
Network) were used to compare the efficiency and performance Gender classification from facial images has been an active
of different classifiers for the same facial images. We used 68- research area for more than two decades, and it has evolved
point facial landmarks and computed novel geometric ratios to significantly from early traditional approaches to modern deep
train the models and increase the performance of our code.
After training the models with feature vectors, we compare the learning methods. Much of the early work in this field was
resultant test images with their corresponding gender predictions, based on geometric facial features and traditional machine
confusion matrices, and F-scores of each model to compare the learning algorithms.
results of different classifiers. The results suggest that our model Khan et al. [1] published a very comprehensive survey
achieves strong performance across different classifiers while on gender classification techniques using image processing,
providing an effective solution to gender recognition problem.
showing that the first steps of researchers in this area were to
Index Terms—face detection,gender recognition,feature vec- extract and analyze specific facial features and their geometric
tors,machine learning relationships to determine gender. These early methods usually
achieved 75-80% accuracy on controlled datasets. Starting
I. I NTRODUCTION from this basic foundation, many efforts have been directed
toward improving the accuracy of classifications by using
Facial feature-based gender classification has become a different techniques in feature extraction.
prominent application in computer vision and machine learn- Guttikonda and Kalam [2] demonstrated the effectiveness
ing, with significant impact on demographic studies, human- of using carefully selected geometric facial features such as
computer interaction, and various real-world applications. This interocular distance, lip-to-nose ratio, and several other facial
work proposes a comprehensive approach for gender classi- measures. Their study showed that traditional feature extrac-
fication based on feature vectors consisting of facial land- tion can achieve quite impressive accuracy levels of up to 85%,
marks and proportions using machine learning models on two while maintaining a clear understanding of the classification
different datasets: UTKFace and CelebA. For classifier input process. The emergence of deep learning has been a turning
preparation, we extracted face regions and 68 facial landmarks point in the performance of gender classification.
using the dlib library. Different data preprocessing techniques, Lapuschkin et al. [3] compared different deep neural net-
including affine transformation and normalization, were ap- work architectures for age and gender classification. They
plied to the landmark point coordinates. The original feature showed that classifiers with deep learning model could be
vector algorithm was extended by incorporating facial ratios. trained to achieve more than 90% accuracy. One of the biggest
To evaluate effectiveness, we compared the performance of benefits of the methods in deep learning is that they can au-
feature vectors consisting of original 68-point facial landmarks tomatically find important features from input images without
against an extended version incorporating facial ratios across the need for manual work. Recent studies have focused on
our datasets. These feature vectors were used to train different making gender classification systems more stable and efficient.
classifiers (SVM, Random Forest, KNN, Logistic Regression, Lin and Xie [4], whose ideas we adopted most in this
and Neural Network), and their results were compared. In this project, developed a new method for gender classification
paper, we demonstrate that feature vectors provide effective using face recognition feature vectors. It has proven to be
stronger against changes in image quality and facial expres- structure in our algorithm. The landmarks corresponding to the
sions, while achieving similarly impressive accuracy rates of different regions of the face are as follows:
approximately 98%. • Jawline: 1–17
In this paper, we propose an approach for gender classifica- • Eyebrows:
tion that uses effective preprocessing techniques and geometric – Left eyebrow: 18–22
facial proportions to create a feature vector, and then combines – Right eyebrow: 23–27
the feature vector with modern machine learning and neural
• Nose:
network architectures. Next, we will describe the method we
proposed in detail. – Nose bridge: 28–31
– Nose tip: 32–36
III. M ETHOD • Eyes:
E. Evaluation
In order to make better comparisons between the different
classifiers and methods, this paper uses confusion matrix and
some metrics such as presicion, recall, and f1 score.
True Positives
Precision =
True Positives + False Positives
True Positives
Recall =
True Positives + False Negatives
Precision · Recall
F1-Score = 2 ·
Precision + Recall
Fig. 3. Samples of UTKFace dataset
TP + TN
Accuracy =
TP + TN + FP + FN
2) CelebA: The CelebFaces Attributes Dataset (CelebA) is
Supportc = |{xi |xi ∈ classc }| the celebrity face database, with more than 200000 images and
10000 identities. The images in dataset include background
1 X clutter and a large range of poses. In this paper, we have
Macro Average = metricc collected over 16000 face images at random. Some samples
|C|
c∈C of the CelebA dataset are shown in Fig. 4.
P
c∈Cmetricc · Supportc
Weighted Average = P
c∈C Supportc
Fig. 5. Sample images taken from UTKFace dateset, with detected face and The obtained feature vectors become input to our classifiers
landmarks
for training part and we obtain classification result for 5
In the next step, we apply affine transform to the images different classifiers. For the evaluation, we obtain feature vec-
(Fig. 6) and get the new coordinates of the facial landmarks. tors from test images and compare the resultant classification
results, confusion matrices and classification reports.
Support Vector Machine (SVM) Classifier:
In the classification result for SVM, we show the 5 images
with highest confidence score. This score is the output of the
classifiers that measure the probability of the predicted gender.
This score is in between 0-1 with 1 means highest probability.
Fig. 6. Sample aligned face images images taken from UTKFace dateset However, for display reasons, we set 0 to male and 1 to female.
The confidence score displayed is a better result for male if
After affine transform, normalization is applied to the 68 it is close to 0 and a better result for female if it is close
point facial landmarks in order to remove position dependency to 1. Since the classifier probability means high if it is close
of the points and to create a meaningful pattern for classifiers to 1, we use it directly as our confidence score for female
to learn. These preprocessing steps, namely affine transform classification. For male classification, however, we calculate 1
and normalization helps us to calculate the necessary coor- - classifier probability in order to get the confidence score for
dinates of the 68 point facial landmarks, which is the basis male. In general, as the probability gets lower, the accuracy
for feature vector. After preprocessing step, we have a (68,2) of our classifier model drops.
Random Forest (RF) Classifier:
A 73.31% test accuracy and a %74.64 training accuracy was
obtained for the architecture designed with UTKFace dataset
and classical feature vector algorithm, predicted with Random
Forest classifier.
Fig. 8. Classification result for SVM with top 5 most confident correct
predictions
Fig. 9. Confusion Matrix for SVM with images taken from UTKFace dateset Fig. 13. Confusion Matrix for RF with images taken from UTKFace dateset
and classical feature vector algorithm and classical feature vector algorithm
Fig. 10. Classification report for SVM with images taken from UTKFace Fig. 14. Classification report for RF with images taken from UTKFace dateset
dateset and classical feature vector algorithm and classical feature vector algorithm
Fig. 11. Classification final results for SVM with images taken from UTKFace Fig. 15. Classification final results for RF with images taken from UTKFace
dateset and classical feature vector algorithm dateset and classical feature vector algorithm
K Nearest Neighbors (KNN) Classifiers: Logistic Regression (LR) Classifiers:
A 74.09% test accuracy and a %82.38 training accuracy A 77.03% test accuracy and a %77.82 training accuracy was
was obtained for the architecture designed with UTKFace obtained for the architecture designed with UTKFace dataset
dataset and classical feature vector algorithm, predicted with and classical feature vector algorithm, predicted with Logistic
K Nearest Neigbors classifier. Regression classifier.
Fig. 16. Classification result for KNN with top 5 most confident correct Fig. 20. Classification result for LR with top 5 most confident correct
predictions predictions
Fig. 17. Confusion Matrix for KNN with images taken from UTKFace dateset Fig. 21. Confusion Matrix for LR with images taken from UTKFace dateset
and classical feature vector algorithm and classical feature vector algorithm
Fig. 18. Classification report for KNN with images taken from UTKFace Fig. 22. Classification report for LR with images taken from UTKFace dateset
dateset and classical feature vector algorithm and classical feature vector algorithm
Fig. 19. Classification final results for KNN with images taken from UTKFace Fig. 23. Classification final results for LR with images taken from UTKFace
dateset and classical feature vector algorithm dateset and classical feature vector algorithm
Neural Network (NN) Classifier: For the training and validation loss graph for our neural
A 80.28% test accuracy and a %84.26 training accuracy was network classifier, we observe that both are decreasing and
obtained for the architecture designed with UTKFace dataset converging. For the training and validation accuracy graph,
and classical feature vector algorithm, predicted with Neural an increase in the both of the values and a convergence are
Network classifier. observed. These graphs indicate that our neural network model
trained well.
Fig. 24. Classification result for neural network with top 5 most confident
correct predictions
Fig. 28. The training-validation loss and accuracy graphs for neural network
with images taken from UTKFace dateset and classical feature vector algo-
rithm
Fig. 25. Confusion Matrix for neural network with images taken from
UTKFace dateset and classical feature vector algorithm
Fig. 26. Classification report for neural network with images taken from
UTKFace dateset and classical feature vector algorithm
Fig. 29. Final model comparison result with images taken from UTKFace
Fig. 27. Classification final results for neural network with images taken from dateset and classical feature vector algorithm
UTKFace dateset and classical feature vector algorithm
2) UTKface dataset was utilized, feature vector consisting Support Vector Machine (SVM) Classifier:
of 68-point facial landmarks and 12 facial ratios were used: A 77.60% test accuracy and a %81.33 training accuracy was
For the 2nd architecture, we again start with taking raw input obtained for the architecture designed with UTKFace dataset
images from UTKFace dataset. In this architecture, 15656 raw and combined feature vector algorithm, predicted with SVM
images are used for our algorithm, 12906 of them are for classifier.
training, 2766 of them are for validation and 2349 of them
are for testing. As the first part of our algorithm,again, using
dlib library, face regions and 68 facial landmarks are extracted
from the images. Preprocessing steps (affine transform and
normalization) are same as we did for the previous architec-
ture. The only difference is that when designing the feature
vector, in addition to 68 point facial landmarks, 12 facial ratios
Fig. 32. Classification result for SVM with top 5 most confident correct
are concatenated into the feature vectors and (148,1) feature predictions
vectors are created in this algorithm.
Fig. 30. Sample aligned face images images taken from UTKFace dateset
Fig. 33. Confusion Matrix for SVM with images taken from UTKFace dateset
and combined feature vector algorithm
Fig. 34. Classification report for SVM with images taken from UTKFace
dateset and combined feature vector algorithm
Fig. 36. Classification result for RF with top 5 most confident correct Fig. 40. Classification result for KNN with top 5 most confident correct
predictions predictions
Fig. 41. Confusion Matrix for KNN with images taken from UTKFace dateset
Fig. 37. Confusion Matrix for RF with images taken from UTKFace dateset and combined feature vector algorithm
and combined feature vector algorithm
Fig. 42. Classification report for KNN with images taken from UTKFace
Fig. 38. Classification report for RF with images taken from UTKFace dateset dateset and combined feature vector algorithm
and combined feature vector algorithm
Fig. 39. Classification final results for RF with images taken from UTKFace Fig. 43. Classification final results for KNN with images taken from UTKFace
dateset and combined feature vector algorithm dateset and combined feature vector algorithm
Logistic Regression (LR) Classifiers: Neural Network (NN) Classifier:
A 77.21% test accuracy and a %77.82 training accuracy was A 79.76% test accuracy and a %85.34 training accuracy was
obtained for the architecture designed with UTKFace dataset obtained for the architecture designed with UTKFace dataset
and combined feature vector algorithm, predicted with LR and combined feature vector algorithm, predicted with Neural
classifier. Network classifier.
Fig. 44. Classification result for LR with top 5 most confident correct Fig. 48. Classification result for neural network with top 5 most confident
predictions correct predictions
Fig. 45. Confusion Matrix for LR with images taken from UTKFace dateset Fig. 49. Confusion Matrix for neural network with images taken from
and combined feature vector algorithm UTKFace dateset and combined feature vector algorithm
Fig. 46. Classification report for LR with images taken from UTKFace dateset
and combined feature vector algorithm Fig. 50. Classification report for neural network with images taken from
UTKFace dateset and combined feature vector algorithm
Fig. 47. Classification final results for LR with images taken from UTKFace Fig. 51. Classification final results for neural network with images taken from
dateset and combined feature vector algorithm UTKFace dateset and combined feature vector algorithm
For the training and validation loss graph for our neural 3) CelebA dataset was utilized, feature vector consisting of
network classifier, we observe that both are decreasing and 68-point facial landmarks and 12 facial ratios were used: In
converging. For the training and validation accuracy graph, the last architecture, we take raw input images from CelebA
an increase in the both of the values and a convergence are dataset. In this architecture, 15656 raw images are used for
observed. These graphs indicate that our neural network model our algorithm, 12906 of them are for training, 2766 of them
also converges in this second architecture. are for validation and 2349 of them are for testing. Sample
images of CelebA dataset is shown in Fig. 54.
Fig. 54. Sample images taken from CelebA dateset, with detected face and
landmarks
Fig. 52. The training-validation loss and accuracy graphs for neural network
with images taken from UTKFace dateset and combined feature vector Preprocessing steps (affine transformation and normaliza-
algorithm tion) and the feature vector algorithm that combines the
original 68 point facial landmarks and 12 facial ratios are
After applying a new feature vector algorithm, we observe same as the 2nd architecture.This feature feature vector (148,1)
that no obvious improvement has occurred for test accuracies. is larger than the feature vector that is obtained by classical
The maximum test accuracy was again obtained from Neural algorithm (136,1).The only difference is now we utilize a
Network classifier with a 79.76% accuracy and the lowest was different dataset for gender classification.
obtained from the RF classifier with a 72.62% accuracy.
Fig. 55. Sample aligned face images images taken from CelebA dateset
Fig. 58. Confusion Matrix for SVM with images taken from CelebA dateset Fig. 62. Confusion Matrix for RF with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm
Fig. 59. Classification report for SVM with images taken from CelebA dateset Fig. 63. Classification report for RF with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm
Fig. 60. Classification final results for SVM with images taken from CelebA Fig. 64. Classification final results for RF with images taken from CelebA
dateset and combined feature vector algorithm dateset and combined feature vector algorithm
Fig. 66. Confusion Matrix for KNN with images taken from CelebA dateset Fig. 70. Confusion Matrix for LR with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm
Fig. 67. Classification report for KNN with images taken from CelebA dateset Fig. 71. Classification report for LR with images taken from CelebA dateset
and combined feature vector algorithm and combined feature vector algorithm
Fig. 68. Classification final results for KNN with images taken from CelebA Fig. 72. Classification final results for LR with images taken from CelebA
dateset and combined feature vector algorithm dateset and combined feature vector algorithm
Fig. 77. The training-validation loss and accuracy graphs for neural network
with images taken from CelebA dateset and combined feature vector algorithm
Fig. 74. Confusion Matrix for neural network with images taken from CelebA
dateset and combined feature vector algorithm
Fig. 75. Classification report for neural network with images taken from
CelebA dateset and combined feature vector algorithm
Fig. 76. Classification final results for neural network with images taken from
CelebA dateset and combined feature vector algorithm
The loss and accuracy graphs for both training and valida-
tion for different epochs resembles the previous 2 architec- Fig. 78. Final model comparison result with images taken from CelebA
dateset and combined feature vector algorithm
tures’ neural network classifier graphs. We observe that both
loss and accuracy graphs converge.
V. C ONCLUSION
This study compared conventional feature vector consisting
of 68 point facial landmarks (a feature vector of dimension
(136,1)) with an alternative 148-point feature vector ((148,1)
feature vector with conventional algorithm plus 12 geometric
ratios) for facial gender classification techniques. Multiple
classifier experiments showed that the revised feature vec-
tor increased the test accuracy of our gender classification
algorithm to some extent. In addition, face alignment and
normalization steps increased the robustness of classification.
Even though the neural network performed the best for the
all 3 architectures, conventional classifiers such as SVM
displayed competitive outcomes, demonstrating how well the
built geometric features captured gender-dimorphic traits.
R EFERENCES
[1] S. A. Khan, M. Nazir, S. Akram, and N. Riaz, ”Gender classification
using image processing techniques: A survey,” in 2011 IEEE 14th
International Multitopic Conference. IEEE, 2011, pp. 25–30. G. T. Rado
and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[2] S. Kalam and G. Guttikonda, ”Gender Classification using Geometric
Facial Features,” International Journal of Computer Applications, vol.
85, no. 7, pp. 32–37, Jan. 2014.
[3] Lapuschkin, Sebastian, et al. ”Understanding and comparing deep neural
networks for age and gender classification.” Proceedings of the IEEE
international conference on computer vision workshops. 2017.
[4] Y. Lin and H. Xie, ”Face gender recognition based on face recognition
feature vectors,” in 2020 IEEE 3rd International Conference on Infor-
mation Systems and Computer Aided Education (ICISCAE), pp. 162 –
166, 2020.