0% found this document useful (0 votes)
21 views10 pages

88 Submission-1

fyp projects for electrical students

Uploaded by

Luna Tic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

88 Submission-1

fyp projects for electrical students

Uploaded by

Luna Tic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Mental Health Recognition Using Facial Recognition

Technology
A. B. Desai Neeraj Kumar Pandey
Aman Negi
Dept. of Computer Science & Computer Science & Engg.
Computer Science and Engineering
Engineering, Graphic Era University
Graphic Era Hill University
Graphic Era Hill University, Dehradun, Dehradun, India
Dehradun, India
(India) [email protected]
[email protected]
[email protected]
Surabhi Chaubey Amit Kumar Mishra
Anmol Gusain
Computer Science and Engineering Dept. of Computer Science &
Computer Science and Engineering
Graphic Era Hill University Engineering,
Graphic Era Hill University
Dehradun, India Graphic Era Hill University, Dehradun,
Dehradun, India
[email protected] (India)
[email protected]
[email protected]

Abstract— Mental illness has a significant impact on a treatment due to the current shortage of psychiatrists. Objective
person's accomplishments, subjective well-being, and physical physiological markers for depression diagnosis are currently
health. Numerous studies have revealed that the physiological unavailable, and its underlying causes remain unclear.
and behavioral signs of people suffering from mental illnesses
The field of deep learning has garnered increasing interest among
differ from those of healthy people. Variations in brain
researchers, given its rapid development in recent years. Deep
activity, galvanic skin reaction, eye contact, voice, and facial
learning, a subset of machine learning, employs algorithms based on
movements are cases of these signs. Facialexpressions are the
most continuous and accessible nonverbal indicators of artificial neural networks to analyze data representations. It surpasses
mental health. We propose a model to analyze facial shallow models in feature extraction and model fitting, excelling at
expressions from still pictures and videos by collecting data capturing abstract distributed function representations with excellent
based on a person’s emotional state. Neural networks are used generalization capabilities. Deep learning has demonstrated the
to extract features of facial expressions and identify seven potential to address previously challenging problems [1].
distinct expressions such as anger, disgust, fear, neutrality, In the domain of user behavior and emotional state classification,
happiness, surprise, and sadness. Expression rates obtained
various approaches have been proposed. Kyogu Lee focused on
for accuracy and performance as compared to the earlier
utilizing facial muscle movements, while Minsu Cho aimed to
work done by researchers are more satisfactory.
recognize Action Units (AU) based on facial features [2].

Keywords- Emotion Recognition, Convolutional Convolutional Neural Networks (CNNs) have gained popularity in

Neural networks, Depression, Mental health emotion recognition, accompanied by evolving methodologies.
Numerous imaging studies have investigated the differences in brain
I. Introduction activation between natural viewing (without explicit emotion
Mental disorders, characterized by extremely psychotic regulation) and re-appraisal, as well as between natural viewing and
conditions, can cause individuals to think, act, and behave suppression. Additional investigations have compared different
abnormally, leading to difficulties in maintaining a emotion regulation strategies, specifically re-appraisal versus
connection with reality and functioning effectively in suppression, to gain insights into their impact on brain activity and
daily life. Various mental disorders, including depression, subjective emotional experiences [3].
schizophrenia, bipolar disorder, dementia, psychoses, and Emotions represent short-lived physical responses to an individual's
formative disorders, such as chemical imbalances, mood or behavior, and they offer potential avenues for research.
exemplify these conditions. Depression, a prevalent Kassam KS assessed the effectiveness of multiple approaches on the
mental disorder and a leading cause of disability same data and found that DBN's integration of two-dimensional and
worldwide, poses significant challenges in diagnosis and three-dimensional features yielded globally applicable and efficient

1
results [4]. Several studies have explored the utilization of development of facial recognition systems, enabling the detection of
facial expressions for depression detection. Notably, 68 crucial facial landmarks. Through the utilization of these detectors,
Benyoussef Abdellaoui conducted a significant study facial attributes like the mouth, eyebrows, and eyes can be extracted,
[5][6], collecting a dataset of facial expressions from 56 allowing for the measurement of distances and the identification of
depressed and 56 healthy participants and employing expressions such as smiles or surprise. The system can leverage FER+
machine learning algorithms for classification. The results measurements or its own heuristics to make predictions about various
demonstrated high accuracy in distinguishing between emotions, including happiness or sadness [8].
depressed and healthy individuals based on facial The Long Short-Term Memory Network (LSTM), a type of recurrent
expressions [7]. neural network (RNN), specifically addresses the challenges
II. Methodology associated with handling long-term dependencies in sequential data.
The primary emphasis lies in the exploration of how RNNs are constructed as a series of interconnected neural network
convolutional neural networks (CNNs) can be effectively modules designed for sequential input processing. Typically, each
utilized in the realms of deep learning, machine learning, module comprises a single tanh layer, with the output of one module
and emotion identification. In the context of machine serving as input to the next. However, RNNs often encounter
learning, the objective is to enable machines to generate difficulties with long-term dependencies due to the vanishing gradient
accurate predictions by leveraging various techniques and problem. To overcome this obstacle, the LSTM architecture
approaches. incorporates specialized memory cells and gating mechanisms that
Deep learning, as a technique, trains systems to perform enable the network to selectively retain and propagate information
tasks in a manner akin to human learning through over extended time intervals [2].
experiential knowledge. It finds extensive application in Convolutional Neural Networks (CNNs) are a widely employed and
categorization tasks encompassing images, text, and sound, powerful deep learning technique extensively utilized in image
often surpassing human accuracy levels. The training processing and various deep learning tasks [9]. These networks are
procedure involves employing labeled data and composed of fundamental components such as convolution layers,
constructing multi-layered neural network architectures. pooling layers, and fully connected layers [9]. By leveraging the
One of the prominent deep learning techniques utilized is backpropagation algorithm, CNNs can autonomously learn
the Histogram of Oriented Gradients (HOG), which aids in hierarchical spatial features, allowing them to emulate human brain
object detection in computer vision and image processing. activities when analyzing images [10]. Given the computational
Like the Canny Edge Detector, HOG analyzes the demands and complexity associated with CNNs, optimizing the
distribution of gradient orientations within localized image network for efficient computation becomes crucial. One notable
segments, generating histograms of edge orientation. By application of CNNs is in emotion classification, where well-
considering both gradient magnitude and orientation, the constructed CNN models can effectively classify different emotional
HOG descriptor effectively captures an object's shape and states based on image inputs.
appearance, distinguishing itself from other edge On the other hand, Fully Convolutional Networks (FCNs) represent a
descriptors. specific architecture primarily employed for semantic segmentation
By computing histograms based on the magnitude and tasks [11]. FCNs exclusively utilize internally connected layers for
orientation of the gradient, visual features are generated pooling, convolution, and up-sampling, resulting in a more parameter-
[8]. In the context of image processing, a region of interest efficient training approach [12]. Additionally, the internal connections
(ROI) denotes a specific area that one may wish to modify in FCNs enable them to handle varying dimensions. To recover fine-
or filter, often represented as a binary mask image where grained spatial information that might be lost during the down-
pixels inside the ROI are marked as 1, while those outsides sampling process, FCNs employ skip connections. The network
are marked as 0. ROIs play a significant role in the consists of a down-sampling path for feature retrieval and an up-

2
sampling path for context localization and interpretation a deep CNN with a remarkable 50-layer structure that excels in
[13]. An advanced attention mechanism, comprising an accurately classifying images into 1000 different object classes. Its
activation layer, a feature input layer, a convolution layer, exceptional performance can be attributed to its pre-training on a vast
and a full connection layer, has been integrated into this collection of over a million images from the ImageNet database,
FCN model, enhancing its performance and capabilities. enabling it to acquire detailed feature representations for a wide array
A Deep Neural Network (DNN), often known as Deep of visual data. Notably, the network takes input images of size 224 by
Nets, represents a sophisticated type of neural network 224 [18].
characterized by a stacked structure comprising multiple A. Dataset Description:
layers, including at least one hidden layer situated between The training dataset employed for model training is known as
the input and output layers. DNNs are extensively FER2013, sourced from the Kaggle Facial Expression Recognition
employed to handle unstructured and unlabeled data, Challenge. This dataset comprises grayscale face images with
making them widely regarded as the industry standard for dimensions of 48x48 pixels. The images are classified into seven
tackling diverse computer vision tasks. distinct emotion categories, namely anger, surprise, disgust,
VGG (Visual Geometry Group) refers to a deep CNN happiness, fear, sadness, and neutrality. The dataset contains a total
architecture renowned for its numerous layers [14]. The of 26,217 images, with the majority being happy, sad, and neutral
term "deep" in this context signifies the substantial number emotions [19]. The FER+ annotations offer new labels that have been
of layers, such as VGG-16 (16 convolutional layers) or assigned by ten crowd-sourced taggers, providing higher-quality
VGG-19 (19 convolutional layers). The VGG architecture ground truth for still image emotions than the original FER labels.
serves as a foundational framework for advanced object Using these annotations, researchers can calculate the emotion
recognition models and has achieved remarkable probability distribution for each face, allowing for the generation of
performance on various datasets and tasks, extending statistical distributions or multi-label outputs [20].
beyond ImageNet [15]. VGGNet, a deep neural network, B. Model Description
has consistently surpassed benchmark performances, A convolutional neural network (CNN) with multiple layers is used
solidifying its significance in the field. to analyze the distinctive features present in the input image. The
The Inception v3 model for image recognition has architecture of this network consists of an input layer, followed by
established itself as one of the most widely employed many convolutional layers, pooling layers, ReLUactivation layers,
architectures in the field, demonstrating its popularity fully connected layers, and an output layer. These layers are arranged
among researchers and practitioners. It has been proven to in a linear stack.
achieve an impressive accuracy exceeding 78.1% on the Facial expression identification using Convolutional Neural
ImageNet dataset [16]. This model is a culmination of Networks (CNN) is a widely used approach in computer vision. The
diverse ideas proposed by multiple researchers, primarily primary aim of the CNN model is to accurately recognize the
drawing inspiration from the influential paper titled emotional state conveyed by a person's face[21]. To achieve this goal,
"Rethinking the Inception Architecture for Computer the FER dataset,containing grayscale facial images labeled with one
Vision" and its collaborators. Its design incorporates a of the seven emotions - happiness, disgust, anger, fear, sadness,
combination of symmetric and asymmetric building surprise, and neutral, is commonly used [22]. The CNN model for
blocks, including convolutions, concatenations, average facial expression recognition comprises various layers.
pooling, max pooling, dropouts, and fully connected layers There are pooling, convolutional, fully connected layers in the CNN
[17]. Throughout the model, batch normalization is model for facial expression detection.
extensively utilized to normalize activation inputs. The model takes a grayscale image of size 48x48 pixels as its input,
Furthermore, the loss computation in this model employs with the first layer being a convolutional layer that extracts features
SoftMax [14]. On the other hand, ResNet-50 stands out as from the input image. The number of filters in the convolutional layer
can be adjusted depending on the complexity of the problem [23]. To
3
increase the model's translation invariance and reduce the generated in collections. In our specific neural network, each
size of the feature maps, a pooling layer is added after the convolutional layer produces 256 feature maps [2].
convolutional layer [24]. In between the convolutional layers, ReLU (Rectified Linear Unit)
To make the model more resilient to translation and to function of activation was applied. Each feature maps dimensionality
shrink the size of the feature maps, a pooling layer is added is reduced using the MaxPooling method, it is a popular pooling
after the convolutional layer [25]. method.
To extract more complicated characteristics, more MaxPooling considers (2, 2) windows and only retains the most pixel
convolutional and pooling layers are then added. values inside the window from the feature map. This helps in retaining
The initial layer of the model is the input layer, which has important information while reducing the dimensionality. After
a fixed size. Prior to feeding the image into the model, a pooling, the pixel values form a new image with reduced dimensions
preprocessing step is performed, involving face detection with the factor of four.
using OpenCV. Haar-Cascades, combined with Adaboost, In the deep layers of the network, the convolutional and pooling layers
are employed to swiftly identify and resize the face. extract meaningful features from the input image. These features are
Resulting resized face then modified to grayscale and then utilized by the dense layer to classify the image into distinct
resized to dimensions of 48 by 48 pixels. This categories. These layers consist of trainable weights that transform the
preprocessing stage considerably reduces the image extracted features. The training process involves forward propagation
dimensions from RGB format (3, 48, 48) to grayscale and backward propagation of the training data and the errors to adjust
format (1, 48, 48), making it easier to pass as a numpy array the model. The proposed specific model comprises two fully
to the input layer. connected layers connected in sequence. It demonstrates the ability to
Convolutional Layers generalize properly to new photos and regularly adjust its parameters
The Convolution2D layer incorporates a group of distinct until errors are minimized. To prevent overfitting, a dropout of 20%
kernels, which are initialized with generation of random was implemented, effectively reducing the model's sensitivity to noise
weights as one of its hyperparameters [10]. Each feature throughout education whilst retaining the suitable level of complexity
detector has a receptive field size of (3, 3) and scans across within the structure [26].
the original image to generate a corresponding feature map. The output layer of our implemented model is designed as a deep

neural network, incorporating convolutional layers and a SoftMax


activation function. For image preprocessing, OpenCV is utilized,
while training the model involves experimenting with different
hyperparameter combinations using the NVIDIA CUDA × R Deep
Neural Network library (cuDNN) on a 4 GiB DDR3 NVIDIA 840M
graphics card. The final architecture of our model comprised 9
convolutional layers, with max pooling carried out after each three
convolutional layers, observed by using 2 dense layers. The
experiment was carried out with other deep CNN

models including VGG19, ResNet50, VGG11, and Inception v3, and


combining the fully connected convolutional neural network (FCN)
Fig. 1. System Design Flowchart with an attention mechanism. The datasets, containing images of both
The convolution operation produces multiple feature maps
depressed patients and healthy participants, were split into training,
by applying various filters that enhance pixel values, such
testing, and validation sets in a 7:2:1 ratio [27].
as edge detection or blur. Filters are applied repeatedly
Five deep CNN models were created, including the fully connected
entirely across the image by which features maps are
4
convolutional neural network (FCN), VGG19, ResNet50, 3- In pre-processing state, the system's analysis process is
VGG11, and Inception version 3. The model combined built on the four stages, i.e., multilabel learning approach, cross-
with a novel attention mechanism consisting of a entropy loss, majority voting, and probabilistic label drawing.
convolution, activation, feature input, and full connection 4- Pre-processing helps us to identify facial detection and
layer. The samples of depressed individuals and detect some key facial parts like eyes,mouth, jaw lines, etc. which
individuals in good health were divided intotraining, test, the system can collaborate with a neural network and use feature
and validation sets. in a 7: 2: 1 ratio [28]. extraction for facial expressions to identify seven expressions:
CNN was then utilized in this study to construct a fully anger, disgust, fear, neutrality, happiness, surprise, and sadness
connected convolutional layer that turned into included [30].
with the new superior attention mechanism; the model 5- Finally, the system will calculate the rate of expression
was dubbed Fully Connected Convolutional Layer. The and put that into the specified formula which gives us the result.
FCN model consisted of 11 layers: (a) convolutional layer Calculate distance feature using geometric distance formula as shown
(7,7,64); (b) convolutional layer (3,3,64)2; (c) in equation (1). The distance features represent the structure of shape
convolutional layer (3,3,64)2; (d) convolutional layer of facial information like eyes, nose, mouth, etc.
(3,3,128)2; (e) convolutional layer (3,3,128)2; (f)
convolutional layer (3,3,256)2; (g) convolutional layer 𝑑 = √(𝑥1 − 𝑥2 )2 + (𝑦1 − 𝑦2 )2 (1)
Divide the face into 68 landmarks. Let us suppose each distance feature
(3,3,256)2; (h) convolutional block attention module; (i) as d and x as point number out of 68 landmarks as shown in equation
convolutional layer (3,3,512)2; (512,2). In this study, a (2).
Fully Connected Convolutional Layer (FCN) model was 𝑑21 , 𝑑31 , … . . , 𝑑68
1
(2)
developed using a CNN and an advanced attention
mechanism. The FCN model had a total of 11 layers,
1. To calculate facial expression, calculate th e relative distance between
including convolutional layers of various sizes, and a all 68 landmarks. If there are no points in the face, then the number of
convolutional block attention module. For binary all distance feature can be calculated using the equation (3).

classification, SoftMax function is used, with a batch-size 1+2+⋯+𝑛 −1 =


𝑛(𝑛−1)
(3)
2
of 16 and a learning rate of 10e-3. The model was trained
The distance landmark vector can be expressed as shown in equation
for 30 epochs and tested, with the validation set not shown (4)
separately in the flowchart [29]. 𝑛−1
.𝑋 = (𝑑21 … 𝑑𝑛−1
1 ),
(𝑑32 ⋯ 𝑑𝑛−1 ) ⋯ 𝑑𝑛−1 (4)
The application Layer is the final layer of the model. The
model was integrated with Android technology to develop The ROC curve, also termed as the sensitivity curve, is a graphical

an application with a more fluid interface. The application representation that uses a horizontal axis to show the false alarm

includes several screens, which include ones for clicking probability and a vertical axis to represent the hit probability. The curve

images of users' facial expressions and presenting the was generated based on the results obtained using different stimulus

results. The application receives input in the form of conditions and judgment criteria.

multiple user images. The input is subjected to a series of


mathematical operations to generate the intended output. To create the curve, the samples were sorted based on the prediction

Flutter was used to create the application. score of the machine learning classifier. Subsequently, the falsepositive
rate (FPR) and true positive rate (TPR) were computed as shown in the

1- The System designed will analyze facial equations 5 & 6:

expressions fromstill pictures and videos as input.


FPR = TP / (FN + TP)  (5)
2- The system will convert the input into TPR = TP / (FP + TP)  (6)
grayscale.

5
Loss Function: Recall= TP /(TP + FN) (9)
Iinachine learning, a loss function, also referredto as a cost
function, is used to assign a non- negative real number to
F1 is equal to the harmonic mean of the precision and recall rates,
the value systems of a random event or its associated
which combines the two measures into a single score, shown in
random variables. This number represents the level of "risk"
equation (10).
or "loss" associated with the random event.
score of F1 =2*TP*1/(2*TP+FP+FN) (10)

Weight Calculation
Different expressions are categorized with accuracy rates into different
weights, therefore landmarks with higher recognition rates can have
greater accuracy. For the same expression, Fig. 2 (a) & (b). shows the
calculation of the proportion of different landmarks recognition rates.
Where m = Gabor, LBP, UG, MC.
Fig. 2 (a) Proportion of landmark recognition rates
The classifier's performance is often assessed using
accuracy, which is computed by the number of correctly
classified samples divided by overall samples in each test
dataset. The accuracy metric measures the proportion of
correct predictions relative to the entire sample size.
The precision was determined by dividing the number of
accurate predictions by overall samples, shown in equation
(7).

Accuracy=(TN+TP) * 1/ (FP+TN+ TP+FN) (7)

Precision is a metric used to assess the classifier's


performance in identifying true depression samples
among the samples classified as depression. It is computed
as the number by the ratio of true depression samples
correctly predicted by theclassifier to the total number of Fig. 2 (b) Proportion of landmark recognition rates

samples classified as depression as shown in equation (8):

III. Results and Discussion


Precision=TP/ (TP + FP) (8)
The methodology proposed for comprehensive analysis comprises
two major parts. The first part utilizes Convolutional Neural
Recall, also known as the recall rate, measures the
Networks (CNNs) to collect data and conduct meticulous analysis of
proportion of true positive samples that were correctly
facial expressions. By collaborating with another neural network,
identified by the classifier [31]. it's far calculated through
advanced feature extraction techniques are employed to accurately
dividing the variety of high-quality samples that have been
identify and categorize seven primary facial expressions, namely
efficaciously labeled (TP) by way of the total number of
disgust, anger, fear, neutrality, surprise, happiness, and sadness [2].
true high-quality samples, which includes both TP and FN
cases [32] as shown in equation (9).
Moving on to the second part of the methodology, a fully
Convolutional Network (FCN) is employed to segment film record
6
data into multiple images. The data is split into testing, scientific journals in their respective fields. While they share a focus
training, and validation sets in a ratio of 7:2:1. This on publishing research, there are some notable differences among
segmentation process is crucial to format the data in a them.
manner compatible with state-of-the-art models like The "Journal of Affective Disorders" primarily focuses on research
VGG19, VGG11, Inception version 3, and ResNet50. related to mood disorders, including depression, bipolar disorder, and
Following the data preparation, sophisticated anxiety. It explores the causes, diagnosis, treatment, and prevention of
mathematical equations are applied to forecast highly these conditions.
accurate results. By incorporating the two major steps
The "Journal of Medical Internet Research" specializes in the field of
mentioned above, the F1 score is calculated. F1 is a score
eHealth and digital health. It examines the intersection of technology,
that is a widely used value that considers both recall and
healthcare, and medicine, including topics like telemedicine, health
precision. Its range plays a pivotal role in determining
informatics, and online interventions.
whether an individual is likely to be suffering from
"PLOS ONE" is a multidisciplinary journal that covers a wide range
depressive disorder or not.
of scientific research across various fields. It aims to publish rigorous
The proposed methodology ensures a comprehensive
studies from diverse disciplines, including biology, medicine, physics,
analysis by leveraging the power of CNNs to collect
and social sciences.
diverse facial expression data. Through advanced feature
extraction techniques, the methodology precisely The corresponding comparative results of these journals have been

categorizes primary facial expressions. The utilization of shown in Table 1.


Table 1. Comparison of previous work
FCN and proper segmentation of film record data enables
Study Particip Mental Technology Results
compatibility with state-of-the-art models for further ants Health Used
Condition
analysis. Journal of 105 Major Machine The algorithm
Affective Depressive Learning accurately
By employing sophisticated mathematical equations, the Disorders Disorder distinguished
(MDD) between
methodology produces accurate forecasts. The calculation participants with
and without
of the F1 score, which considers both precision and recall, MDD with an
serves as a reliable metric. This score's range plays a accuracy of
80%.
crucial role in determining the likelihood of an individual Journal of 27 Anxiety Facial The algorithm
Medical children Expression detected anxiety
suffering from depressive disorder. Overall, the Internet Recognition in children with
Research technology an accuracy of
methodology encompasses a systematic approach to 81%.
PLOS ONE 50 Mental Facial Changes in
analyzing facial expressions and predicting potential elderly Health Expression facial
residents Recognition expressions were
depressive disorders. in a care technology significantly
home correlated with
Fig. 3 shows expressions of multiple subjects as per the changes in
FER2013 dataset. mental health
status.

With the proposed model an accuracy of 82.5% is obtained as


measured on the available test datasets.

III. Conclusion
The findings from these investigations underscore the promising
Fig. 3. Expressions using multiple subjects potential of employing facial expression analysis as a valuable
"Journal of Affective Disorders," "Journal of Medical instrument for the identification and continuous evaluation of mental
Internet Research," and "PLOS ONE" are all prominent health disorders, including Major Depressive Disorder (MDD),
7
anxiety disorders, and fluctuations in mental well-being. Structural Features," Frontiers in Neuroscience,
The amalgamation of a comprehensive methodology in the vol. 16, 2022.
proposed model endeavors to yield meaningful insights 2. B. Abdellaoui, A. Moumen, Y. E. B. El Idrissi,
and enrich the domain of mental health evaluation through and A. Remaida, "The emotional state through
the utilization of facial expression analysis in conjunction visual expression, auditory expression and
with sophisticated machine learning methodologies. physiological representation," SHS Web of
Notably, the attained expression rates demonstrate a Conferences, 3rd International Conference on
heightened level of accuracy and performance in Quantitative and Qualitative Methods for Social
comparison to preceding scholarly endeavors, boasting an Sciences (QQR’21), vol. 119,2021.
impressive accuracy rate of 82.5%. 3. Y. Y. Ghadi, A. A. Rafique, T. al Shloul, S. A.
Alsuhibany, A. Jalal, and J. Park, "Robust Object
IV. Future Scope Categorization and Scene Classification over
Facial expression detection systems wield the capability to Remote Sensing Images via Features Fusion and
revolutionize the landscape of mental health care, holding Fully Convolutional Network," Remote Sensing,
promise as technology advances. Their potential impact on
the enhancement of mental health outcomes is profound. vol. 14, no. 4, 2022.
Progressing the frontier of real-time emotion recognition 4. P. Bobade and M. Vani, "Stress Detection with
necessitates concentrated attention on two pivotal domains:
Machine Learning and Deep Learning using
fine-tuning the Convolutional Neural Network (CNN)
architecture through meticulous adjustments to parameters, Multimodal Physiological Data," 2020 Second
learning rates, dropout rates, and stride sizes; and the International Conference on Inventive Research
adaptation of datasets to faithfully emulate real-time
scenarios, encompassing challenging conditions like low in Computing Applications (ICIRCA),
lighting and noisy backgrounds. A paramount Coimbatore, India, 2020, pp. 51-57, doi: 10.1109/
consideration involves aligning the distribution of training
datasets with the characteristics of real-time subjects. This ICIRCA48905.2020.9183244.
congruence is indispensable for the veracity of the system. 5. U. Kose, O. Deperlioglu, J. Alzubi, and B. Patrut,
Furthermore, concerted efforts must be directed towards
"Deep Learning for Medical Decision Support
bolstering system resilience in uncontrolled settings.
Enhanced calibration of the CNN architecture has the Systems," Springer Science and Business Media
potential to yield improvements in system performance. LLC, 2021.
The multifaceted advantages of facial expression detection
encompass objective, unobtrusive tracking, facilitation of 6. T. Gorasiya, A. Gore, D. Ingale, and M. Trivedi,
diagnosis, ongoing progress monitoring, and the tailoring "Music Recommendation based on Facial
of personalized treatment strategies for mental health
Expression using Deep Learning," in Proceedings
practitioners.
of the 2022 7th International Conference on
Acknowledgements Communication and Electronics Systems
Authors would like to acknowledge and express deep sense (ICCES), Coimbatore, India, 2022.
of gratitude to the Graphic Era Hill University for their 7. Devaiah K N, Anita H.B, "Classification of
support, assistance in extending the infrastructure, Architectural Designs using Deep Learning”,
resources, and constant encouragements to carry out this International Journal of Engineering and
work. Advanced Technology, (IJEAT), vol. 9, Issue-3,
2020.
References
8. K. Sahib, A. Melouah, F. Touré, and A. Slim, "W-
1. X. Tan, J. Wu, X. Ma, S. Kang, et al.,
net and inception residual network for skin lesion
"Convolutional Neural Networks for
segmentation and classification," Applied
Classification of T2DM Cognitive
Intelligence, vol. 51, no. 9, pp. 1-19, Sep. 2021.
Impairment Based on Whole Brain
8
9. S. Gilda, H. Zafar, C. Soni, and K. Classification Using Ensemble of Fine-Tuned
Waghurdekar, "Smart music player Deep Learning Models," Applied Sciences, vol.
integrating facial emotion recognition 11, no. 17, 2021.
and music mood recommendation," in 14. Y. T. Jo, S. W. Joo, S. H. Shon, H. Kim, Y. Kim,
Proceedings of the 2017 International and J. Lee, “Diagnosing schizophrenia with
Conference on Wireless network analysis and a machine learning
Communications, Signal Processing method,” International Journal of Methods in
and Networking (WiSPNET), Psychiatric Research, vol. 29, no. 1, 2020.
Chennai, India, pp. 154-158, 2017. 15. J. Zhang, X. Yang, W. Li, S. Zhang, and Y. Jia,
10. R. Nijhawan, N. Sule, M. Verma, B. "Automatic detection of moisture damages in
Sharma, and I. Bansal, "Analysis of asphalt pavements from GPR data with deep
Coloboma Defected Eyes using CNN and IRS method," Automation in
Automated Pre-trained CNN Construction, vol. 113, pp. 103119, 2020.
Models," in Proceedings of the 2022 16. "Artificial Neural Networks and Machine
3rd International Conference on Learning," ICANN 2016, Springer Nature, 2016.
Computation, Automation and 17. S. Srinivasagopalan, J. Barry, V. Gurupur, and S.
Knowledge Management (ICCAKM), Thankachan, “A deep learning approach for
Pune, India, pp. 1-5, 2022 diagnosing schizophrenic patients,” Journal of
11. A. B. Desai, D. R. Gangodkar, B. Experimental & Theoretical Artificial
Pant, and K. Pant, "Comparative Intelligence, vol. 31, no. 6, pp. 803–816, 2019.
Analysis using Transfer Learning 18. V. Atliha, "Improving image captioning methods
Models VGG16, Resnet 50 and using machine learning approaches," M.S. thesis,
Xception to Predict Pneumonia," in Vilnius Gediminas Technical University, 2023.
Proceedings of the 2022 2nd 19. M. Niu, Z. Zhao, J. Tao, Y. Li and B. W. Schuller,
International Conference on "Selective Element and Two Orders
Innovative Sustainable Vectorization Networks for Automatic
Computational Technologies Depression Severity Diagnosis via Facial
(CISCT), Dehradun, India, pp. 1-6, Changes," in IEEE Transactions on Circuits and
2022, doi: Systems for Video Technology, vol. 32, no. 11,
10.1109/CISCT55310.2022.1004650 pp. 8065-8077, Nov. 2022, doi:
7. 10.1109/TCSVT.2022. 3182658..
12. A. Demir, F. Yilmaz, and O. Kose, 20. S. Kang and S. K. Kim, "Game Outlier Behavior
"Early detection of skin cancer using Detection System Based on Dynamic," in CMES-
deep learning architectures: Resnet- Computer Modeling in Engineering & Sciences,
101 and Inception-v3," in Proceedings 2021.
of the 2019 Medical Technologies 21. R. C. Borges, "Audio-based coldstart in music
Congress (TIPTEKNO), Istanbul, recommendation systems," M.S. thesis,
Turkey, pp. 1-4, 2019. Universidade de Sao Paulo, Agencia USP de
13. N. Kausar, A. Hameed, M. Sattar, R. Gestao da Informacao Academica (AGUIA),
Ashraf, A. S. Imran, M. Z. ul Abidin, 2022.
and A. Ali, "Multiclass Skin Cancer 22. R. Szeliski, "Computer Vision: Algorithms and

9
Applications," Springer Science and
Business Media LLC, 2017.
23. Y. Yang, "A Recursive Least Squares
Training Approach for Convolutional
Neural Networks," M.S. thesis,
Colorado State University, 2022.
24. A. E. Tate, R. C. McCabe, H. Larsson,
S. Lundström, P. Lichtenstein, and R.
Kuja-Halkola, “Predicting mental
health problems in adolescence using
machine learning techniques,” PLoS
One, vol. 15, no. 4, Article ID
e0230389, 2020.
25. Jetli Chung and Jason Teo, “Mental
Health Prediction Using Machine
Learning: Taxonomy, Applications,
and Challenges” in 2022.

10

You might also like