0% found this document useful (0 votes)

11 views14 pages

Novel Transformer Based Contextualized Embedding and Probabilistic Features For Depression Detection From Social Media

Uploaded by

DZAKY ZIDAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

Novel Transformer Based Contextualized Embedding and Probabilistic Features For Depression Detection From Social Media

Uploaded by

DZAKY ZIDAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Received 25 March 2024, accepted 9 April 2024, date of publication 12 April 2024, date of current version 22 April 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3387695

Novel Transformer Based Contextualized

Embedding and Probabilistic Features
for Depression Detection From
Social Media
MUHAMMAD ASAD ABBAS1 , KASHIF MUNIR 1, ALI RAZA 2, NAGWAN ABDEL SAMEE 3,

MONA M. JAMJOOM 4 , AND ZAHID ULLAH 5

1 Institute
of Information Technology, Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan 64200, Pakistan
2 Department of Software Engineering, University of Lahore, Lahore 54000, Pakistan
3 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P. O. Box 84428,

Riyadh 11671, Saudi Arabia

4 Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
5 Department of Information System, King Abdulaziz University, Jeddah 21589, Saudi Arabia

Corresponding authors: Kashif Munir ([email protected]) and Nagwan Abdel Samee ([email protected])
This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R104),
Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

ABSTRACT Depression constitutes a significant mental health condition, impacting an individual’s

emotional state, thought processes, and ability to carry out everyday tasks. Depression is defined by
ongoing feelings of sadness, diminished interest in previously enjoyed activities, alterations in hunger, sleep
disturbances, decreased vitality, and challenges with focus. The impact of depression extends beyond the
individual, affecting society at large through decreased productivity and higher healthcare costs. In the realm
of social media, users often express their thoughts and emotions through posts, which can provide insightful
data for identifying patterns of depression. This research aims to detect depression early by analyzing
social media user content with machine learning techniques. We have built advanced machine learning
models using a benchmark depression database containing 20,000 tagged tweets from user profiles identified
as depressed or non-depressed. We are introducing an innovative BERT-RF feature engineering method
that extracts Contextualized Embeddings and Probabilistic Features from textual input. The Bidirectional
Encoder Representations from Transformers (BERT) model, based on the Transformer architecture, is used
to extract Contextualized Embedding features. These features are then fed into a random forest model
to generate class probabilistic features. These prominent features aid in enhancing the identification of
depression from social media. In order to classify tweets using the features derived from the BERT-RF
features selection step, we have used five popular classifiers: Random Forest (RF), Multilayer Perceptron
(MLP), K-Neighbors Classifier (KNC), Logistic Regression (LR), and Long Short-Term Memory (LSTM).
Evaluation experiments show that our approach, using BERT-RF for feature engineering, enables the Logistic
Regression model to outperform state-of-the-art methods with a high accuracy score of 99%. We have
validated the results through k-fold cross-validation and statistical T-tests. We achieved 99% k-fold accuracy
during the validation of the proposed approach. This research contributes significantly to computational
linguistics and mental health analytics by providing a robust approach to the early detection of user
depression from social media content.

INDEX TERMS Depression detection, machine learning, deep learning, text mining, BERT, transformer.

I. INTRODUCTION
The associate editor coordinating the review of this manuscript and Depression is a complex mental health condition that
approving it for publication was Ikramullah Lali. affects an individual’s emotions, cognition, and daily
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 54087
M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

functioning [1]. Characterized by more than transient sadness Advanced machine learning-based early detection of
or a challenging period, it manifests as a persistent sense depression from social media is essential for studying
of melancholy, diminished interest or pleasure in previously depression in medicine [9]. Machine learning offers a
enjoyed activities, and a variety of intense and enduring variety of models that can be trained using accurate data
physical and emotional symptoms. These symptoms can to generate precise predictions. This work introduces a
differ among individuals, typically encompassing persistent novel BERT-RF (Bidirectional Encoder Representations from
sadness, a marked loss of interest or pleasure, and notable Transformers-Random Forest) based stress detection model
changes in diet or weight [2], among others. The etiology that improves efficiency during training and enhances the
of depression is multifaceted, involving genetic, biologi- accuracy of predictions. Our new research makes significant
cal, environmental, and psychological factors. Additionally, contributions in the following areas:
trauma, stress, significant life alterations, certain medica- • We proposed a novel BERT-RF feature engineering
tions, and pre-existing medical conditions can precipitate its approach that extracts Contextualized Embeddings and
development [3]. Effective management of depression often Probabilistic Features from textual data. First, the
requires a comprehensive approach, including therapy, phar- Transformer architecture-based BERT extracts Contex-
macological treatment, lifestyle modifications, and support tualized Embedding features, which are input into a
networks, which can significantly mitigate symptoms and random forest model for generating class probabilistic
improve quality of life [4]. features. These salient features help to improve depres-
Treating depression often involves a multifaceted approach sion detection from social media.
tailored to the individual. Preventing depression necessitates • We employed four advanced machine-learning models
a proactive stance towards mental health. Regular exercise, and a deep-learning model for results comparison.
healthy nutrition, and adequate sleep form the cornerstone We improved the performance by optimizing the hyper-
of maintaining a sound and resilient mind [5]. Depression parameters. To validate performance, we applied k-fold
is a mental illness that can cause significant disruptions cross-validation and statistical T-test analysis.
across all facets of life. It extends beyond sadness, impairing The remaining manuscript is set as Section II describes
daily functioning through diminished cognitive abilities and the study and is devoted to examining the limitations of the
decision-making capacities. In severe cases, it can escalate existing literature. In section III, we have described in-depth
the risk of suicidal thoughts or behaviors [6]. Addressing our new approach to researching depression using the best
depression effectively requires a comprehensive strategy that features in BERT-based content embedding and social media
includes professional support, therapy, medication, lifestyle data. Then, in Section IV, we compare the results obtained
adjustments, and motivational encouragement. from our study’s various machine-learning methods. Finally,
The prevalence of depression can vary by population Chapter V details the results of our new research.
and region, with data subject to change over time due to
numerous factors, including shifts in diagnostic criteria, II. LITERATURE REVIEW
advances in knowledge, and societal changes [7]. Despite The global impact of depression, problems with early
these variations, depression remains a significant global diagnosis, and widespread stigma in Arab culture require
mental health challenge. Recent updates, as of 2022, indicate a new approach. Social media is researching mental health
that the incidence of depression has been on the rise over services and has recently increased interest in depression
the years. The World Health Organization (WHO) identifies research, especially in English studies.
depression as the leading cause of disability worldwide, This study [10] conducted Arab-centered research by
impacting over 264 million individuals across various age analyzing Twitter data from the Gulf region to detect
groups as reported in 2020 [8]. The significance of early depression. Use supervised learning algorithms such as
intervention cannot be overstated, highlighting the need to Random Forest and Naive Bayes to build predictive models
diminish stigma, enhance access to mental health services, based on online depression behaviors rather than symptoms.
and increase awareness about this issue. However, while More importantly, the model, specifically the Liblinear
these traditional approaches are cost-effective, there is an classifier, in this study achieved 87.5% accuracy in detecting
emerging need for advanced machine-learning approaches depression tweets, demonstrating the effectiveness of this
for the early detection of depression through social media feature in capturing messages related to mental illness from
analysis. Arab users. This study shows the potential of digital media to
Social media platforms have presented an unparalleled promote early detection of depression and improve cultural
chance to examine mental health disorders, such as depres- awareness and depression intervention in Arabic-speaking
sion, on a significant scale. Twitter has become a wonderful communities.
resource for gaining insights into people’s thoughts and This [11] study combines deep learning with traditional
emotions. Twitter users frequently disclose their personal machine learning techniques to distinguish normal users from
experiences, sentiments, and emotions, enabling the exam- abnormal users on social media profiles. This study explored
ination and detection of indications of depression in their extensive literature to identify mental health indicators using
public messages. different media and behavioral approaches. Integrating deep
54088 VOLUME 12, 2024
M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

TABLE 1. The summary analysis of reviewed literature studies.

learning makes it more useful, especially in distinguishing techniques to teach material and evaluate the plan’s effec-
normal users from abnormal users. The important thing is tiveness. This study reports the numerical score from tweet
that this method achieves a lower error rate than traditional sentiment and achieved 78% accuracy in detecting depression
methods. This study achieved an accuracy rate of 89%, using the XGBoost classifier. Additionally, by combining
demonstrating the effectiveness of deep learning in the features such as TFIDF, NGram and LDA, 89% accuracy
psychological analysis of social media data. This study is achieved with the support vector machine classifier.
highlights the important role of deep learning-based inference Correct selection and their combination are important
in improving psychological analysis in social media and factors contributing to improved performance. The findings
suggests a significant increase in accuracy and dispersion. highlight the importance of integrating emotion and speech
This study [12] achieved an accuracy rate of 90%. The in identifying depressive symptoms from Twitter data. This
system has been validated by tests showing its superiority study demonstrated the effectiveness of machine learning
over existing systems. The proposed model showed more than techniques in identifying potential signs of depression in
a 30% reduction in errors, demonstrating its effectiveness in online users.
searching for depression in users. Various experiments and This [15] study used machine learning techniques that
examples have confirmed the effectiveness of this model in focus on detecting depression in Twitter users by extracting
analyzing the level of emphasis in user texts. Additionally, tweet features. In this study, a classification technique was
the model has been shown to perform well in real-world used that aims to distinguish depressed users from other
situations, confirming its effectiveness. This study highlights users by analyzing features extracted from tweets. A machine
the importance of multi-modal collaboration in advancing learning algorithm is used to classify the collected tweet data
depression research on social media platforms, highlighting to detect whether the user is depressed. This study achieved an
significant improvements in accuracy and robustness in accuracy rate of 87.5%. This prediction is for early detection
capturing content expressed in user messages. of depression or other mental health conditions.
This [13] study was diagnostic criteria are based on This study [16] evaluated depression and suicidal thoughts
patient-reported symptoms and have implications for patient according to depression level. Data collection includes a
management; Therefore, alternative methods should be survey similar to the PHQ-9 that surveys demographic
investigated. Social media platforms like Facebook, Twitter, information, including current age, gender, and school
Reedit, and Tumbler offer new ways to collect behavioral attendance [20]. Based on the collected data, a classification
data that can reveal insights into a user’s emotional state. This algorithm was used to classify severe depression into
research focused on creating a machine learning framework five levels. The XGBoost classifier achieved an accuracy
to examine linguistic trends in Twitter user data for detecting of 83.87% on this data, demonstrating the model’s effective-
depression indicators. The performance of support vector ness. Additionally, the information collected through tweets
machine and random forest algorithms was evaluated and is classified to determine whether the user is depressed.
contrasted, with the random forest algorithm demonstrating The maximum value of the logistic regression classifier for
superior effectiveness. This study achieved an accuracy detecting depression in tweets is 86.45%.
of 77%. Machine learning models, specifically random This study [17] aims to predict users’ psychological
forests, have been shown to best detect depressive symptoms states by using deep learning models to classify depressed
based on message patterns in Twitter data. and non-depressive tweets. Leveraging text content and
This study [14] analyzed users’ Twitter posts to determine deep learning architectures, specifically CNNs and LSTMs,
the likelihood of depressive symptoms among online users. a hybrid model was created that achieved 94.28% accuracy
This study focused on using machine learning and language on Twitter’s distressed dataset. Compared to RNN, CNN,

VOLUME 12, 2024 54089

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

and the basic method, it can be seen that the best CNN tree-based models. How can advanced features engineer-
and LSTM model in terms of prediction performance ing and models be built for depression detection?
is based on different parameters. Statistical and visual • The performance scores of these methods were mod-
methods highlight the importance of distinguishing between erate when compared to the state-of-the-art. How
melancholic and non-melancholic subjects. This study used can high-performance scores using applied models be
Twitter’s depression data using deep learning techniques to achieved for depression detection?
recognize language patterns and predict users’ emotional
states. The findings highlight the effectiveness of the CNN III. PROPOSED METHODOLOGY
and LSTM model in classifying depressed and non-depressed In Figure 1, our proposed methodology process begins with
tweets and demonstrate its potential to improve predictive the user’s text message dataset. These raw data undergo
performance for assessing mental health control. a prequalification process to enhance their suitability for
This [18] article introduces a self-report method using self- analysis. Preprocessing includes methods such as tokeniza-
reports in tweets and proposes a new multi-modal framework tion, normalization, and stemming. Tokenization divides
for predicting depression symptoms based on user data. This the text into individual units, normalization standardizes
study uses the n-gram language model, LIWC dictionary, the representation and stemming simplifies the text by
automatic image tagging, and bag-of-words and adopts reducing words to their base forms. Subsequently, the feature
relationship-based selection and nine categories to measure extraction step isolates significant information from the
performance. The analysis showed that tweets and texts were preprocessed data. This step involves selecting key features
91% and 83% accurate in predicting depression symptoms, using the novel model proposed by BERT-RF. Following
respectively, yielding positive results. This study suggests feature extraction, the dataset is split into two parts: a training
that efficiency can be improved by reducing the number of set and a test set, with 80% of the data allocated to training
users using or participating in medical records. The data was and 20% to testing. This division allows the model to learn
collected from the social platform focused on self-expression patterns from the training set and assess its performance on
and user data in tweets to predict depression symptoms using the test set, including data not seen during the training phase.
various techniques and classifications. The findings highlight We applied several advanced machine learning models. The
the effectiveness of using user-generated content, particularly final step involves analyzing the prediction model to draw
tweets, to predict symptoms of depression and highlight the conclusions. The results typically highlight conditions under
potential of multiple baselines in mental health assessment. which the model predicts depression and those under which
This study [19] begins with a comprehensive review of it predicts the absence of depression. Based on their posts,
Bengali language literature on depression-related reporting this final analysis provides crucial information regarding the
and comments on social media. Machine learning methods, user’s overall mental health status.
including support vector machines, decision trees, random
forests, polynomial naive Bayes, K-nearest neighbors, and A. PHASE 1: DEPRESSION TEXTUAL DATA
logistic regression, are used to predict depression in many This study used a benchmark depression database [21]
ways. This study achieved an accuracy of 90.3% with containing 20,000 tagged English tweet user profiles of
TF-IDF, a Random Forest Model. Data collection focused depressed and non-depressed users. The data includes
on depression-related content in tweets and responses in important details about users such as post text, friends,
Bangladesh to facilitate algorithmic prediction. Different followers, and social media activities. This information is
machine learning algorithms have different predictive values; stored in a data set and used as a reference for subsequent
however, the accuracy of each algorithm applied to the dataset stress tests. The data set is treated according to the main
remains the same. features, referring to features that may indicate melancholic
preferences and ignoring others represented by the content
A. RESEARCH QUESTIONS AND GAPS of the melancholic or non-melancholic text. By taking
In order to demonstrate the uniqueness of our work, into account the user’s history of social media posts and
we presented a summary in Table 1 of the most significant statements, it is aimed to increase the accuracy of the results
distinctions that exist between the approaches applied in the and contribute to better results in the research of depression
current literature and the methodology that we have proposed, by psychologists.
as well as the accuracy that has been achieved. During our lit-
erature analysis, we identified several gaps that are addressed B. PHASE 2: TEXT PREPROCESSING AND DATA ANALYSIS
in our proposed research approach. We propose a novel Figure 2 shows that Data preprocessing is the major portion
transformer-based feature engineering method, BERT-RF, of data mining. The first method removes the user’s custom
which extracts contextualized embeddings and probabilistic text format. The objective of this technique is to eliminate
features from textual data. This contrasts with the classical elements such as ‘‘usernames (@usernames)’’, ‘‘hashtags
approaches predominantly used in current literature. (hashtags),’’ ‘‘URLs,’’ ‘‘non-alphabetic characters, symbols,
• Previously, researchers relied solely on the BERT model and digits,’’ ‘‘blank strings,’’ ‘‘rows containing NaN values,’’
for feature representation and also deployed classical and ‘‘Black-line’’ among others. This approach ensures the

54090 VOLUME 12, 2024

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

FIGURE 1. Proposed methodology workflow.

purification of each tweet in the collection by excluding all In deep learning, the word cloud is not used directly but
URLs present within the tweets. Subsequently, it focuses on has implications in the broader field of natural language
discarding dates, times, numbers, and hashtags. The process processing (NLP) and text analysis, as shown in Figure 3.
then advances to eliminating emojis, redundant spaces, and It works as a visual method rather than a deep learning
extra spaces within sentences. Following this, the technique method. A word cloud visually represents the frequency of
involves the extraction of stems through the removal of stop words in text; frequently used words appear in larger letters.
words, which are words like ‘‘if,’’ ‘‘of,’’ and ‘‘else’’ that While not as complex as deep learning models often used in
do not add significant meaning to sentences. The NLTK sentiment analysis or translation tasks, word clouds provide a
library provides a collection of stop words to filter out these simple and intuitive way to search for and communicate body
non-contributory words from the text. Stemming is employed landmarks.
to reduce words to their base or root form. The ultimate
aim is to transform each word in a tweet into a sequence of
digits, substituting them with their respective values found in
a dictionary index.

FIGURE 3. Visualizing textual patterns: A word cloud analysis.

C. PHASE 3: NOVEL PROPOSED BERT-RF TEXTUAL

FEATURE EXTRACTION
In this section, we describe our novel approach to salient
feature engineering. The workflow for extracting features
FIGURE 2. The user post text data workflow. from textual data is illustrated in Figure 4. The use of

VOLUME 12, 2024 54091

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

BERT-based contextual embeddings has become a popular union of prediction trees; each tree depends on the value
method in researching social anxiety. Initially, we input of the selected value vector. After receiving new input
the preprocessed textual data into the pre-trained BERT data, the algorithm creates a decision tree for that data
transformer model, which generates embeddings for each and merges it with another decision tree in the forest. The
token in the text while preserving context information. RF uses the feature vectors {V1 , V2 , . . . , Vn } as input
Features extracted from these embeddings can be selected for to classify each post into categories such as depressed
specific tokens or aggregated across the entire sentence [22]. or not depressed. The prediction for a post Pi can be
To enhance the model’s capabilities, we include potential represented as:
features as inputs to a random forest model to generate
ŷi = RFC(Vi ) (1)
probabilistic features. These probabilistic features are then
utilized for training the models applied in detecting depres- where the predicted classification for each post,
sion from social media. Careful interpretation and validation denoted as Pi , is represented by ŷi . Here, a value
of predictive models are crucial, especially when addressing of ŷi = 1 signifies that the post is classified as
mental health issues. depressed, whereas ŷi = 0 denotes classification as not
Algorithm 1 outlines the sequential process for extracting depressed.
novel features. The Random Forest Classifier decision function for a
given post Pi can be further detailed as:
Algorithm 1 BERT-RF Algorithm N
Input: Depression Post Textual Content. 1 X
ŷi = DTj (Vi ) (2)
Output: Novel Feature Set. N
j=1
initiate;
where N is the number of decision trees in the forest,
1- BERTce ←− EBERT (Dt ) // here BERTce are the
and DTj is the prediction of the j-th decision tree.
Contextualized Embedding features and Dt are input Textual
A majority vote among all trees typically determines the
Content.
final classification.
2- RFpf ←− PRF (BERTce ) // here RFpf are the
• Multilayer Perceptron (MLP) is a good neural network
Probabilistic features and BERTce are input Contextualized
Embedding features set. model used specifically for task classification [26].
3- Ft ←− RFpf // here Ft are the Novel feature set. There are many layers of connections between nodes or
end; neurons, and the model works in a feed-forward manner,
with information passing through each layer and making
connections within and between layers. Let’s denote the
input layer by x ∈ Rd , where d is the number values
D. PHASE 4: FEATURES DATA SPLITTING of features extracted from social media data. The MLP
In this study, we used an 80:20 data split, with 80% of the consists of L layers, each with its own set of weights
dataset used to train the machine learning model and 20% W(l) and biases b(l) , where l ∈ {1, 2, . . . , L} represents
to evaluate the model’s performance. To achieve this, the the layer index.
data is split using the train-test-split() method in the scikit- The output of each layer l is calculated as:
learn module. This separation method is chosen to reduce
h(l) = f (W(l) h(l−1) + b(l) ), (3)
the risk of overworking and improve the model’s overall
performance [23]. This study also includes k-fold cross- where f represents a non-linear activation function,
validation to validate the results obtained from the profile like the ReLU function, which is denoted by f (z) =
segmentation process. This data splitting helps ensure that max(0, z), and h(0) equals the initial layer x.
the model’s performance is measured consistently across The final output layer (assuming binary classification
different parts of the data set, thus increasing the reliability for depression detection, with 1 indicating depression
of the results. and 0 indicating no depression) is given by:
ŷ = σ (W(L) h(L−1) + b(L) ), (4)
E. PHASE 5: APPLIED ARTIFICIAL INTELLIGENCE MODELS
In artificial intelligence, many well-known algorithms [24] where σ (z) = 1+e1 −z is the sigmoid activation module,
have become the backbone of solving various problems, such and ŷ is the predicted probability of depression.
as depression detection from social media. Together, these • K-Neighbors Classifier (KNC) utilized to classify new
AI-based models create comprehensive tools that address data based on the most common classes of nearest
numerous real-world situations and allow clinicians to derive neighbors at a given location [27]. Calculate the distance
valuable insights from the data. between data points to determine proximity. K, the
• Random Forest Classifier (RF) is a classification and number of neighbors data, and the distance measure
regression technique that uses bootstrapping and multi- are important. Given a set of n social media posts
ple decision trees to create forests [25]. RF represents the P = {p1 , p2 , . . . , pn }, where each post pi is represented

54092 VOLUME 12, 2024

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

FIGURE 4. The architecture analysis of the novel proposed BERT-RF textual feature extraction method.

by a feature vector xi ∈ Rd extracted from the text, where:

and a corresponding label yi ∈ {0, 1} indicating the - - P(Y = 1) is the probability value of an individual
absence (0) or presence (1) of depressive indicators. The being detected as depressed (Y=1) based on their
K-Neighbors Classifier (KNC) method for depression social media activity.
detection can be described as follows: - - e is the base value of the natural logarithm.
1) For a given unlabelled post pu with feature vec- - - β0 is the intercept term value of the model.
tor xu , compute the distance D(xu , xi ) between xu - - β1 , β2 , . . . , βn is the value coefficients of the
and each xi in the training set P. Common distance predictor variables X1 , X2 , . . . , Xn , which represent
metrics include Euclidean distance: different features extracted from social media
activity, such as the frequency of posts, sentiment
v
u
d
analysis scores, or the use of specific words related
uX
(j) (j)
D(xu , xi ) = t (xu − xi )2
u
to depression.
j=1
- - X1 , X2 , . . . , Xn are the predictor variables (features)
2) Identify the k nearest neighbors of xu , denoted as derived from social media data.
Nk (xu ), based on the smallest distances D(xu , xi ). • Long Short-Term Memory (LSTM) networks [29]
3) Determine the majority label values among the k are a special type of recurrent neural network (RNN)
nearest neighbors: designed to solve the missing space problem in tradi-
X tional RNNs. LSTM is particularly useful for tasks that
ŷu = arg max I(yi = y)
y∈{0,1} deal with continuous objects, such as time estimation
xi ∈Nk (xu )
and natural language processing. The key to their
where I is an indicator values function that is 1 if success is integrating brain memory, which is equipped
yi = y and 0 otherwise. with gates (input, memory, and output) that control the
The predicted label ŷu indicates whether the post pu network’s information flow. The input gate decides what
is likely to exhibit depressive indicators (1) or not (0), data to store in memory, the memory gate decides what
based on the content similarity to the labeled posts in to discard, and the output gate calculates the next hidden
the training set. state based on the ideas and current state of memory. The
• Logistic Regression (LR) uses the sigmoid function to basic equations governing an LSTM unit are as follows:
model the probability that inputs belong to a particular - - Forget gate:
class [28]. This model involves a combination of input
strategies converted to a quality between 0 and 1. ft = σ (Wf · [ht−1 , xt ] + bf ) (6)
The basic mathematical equation of the LR model is
expressed as follows: - - Input gate:
1
P(Y = 1) = (5)
1 + e−(β0 +β1 X1 +β2 X2 +···+βn Xn ) it = σ (Wi · [ht−1 , xt ] + bi ) (7)

VOLUME 12, 2024 54093

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

- - Cell state update: and analyses of this study show that the use of deep learning
models is effective in identifying the problem of depression.
C̃t = tanh(WC · [ht−1 , xt ] + bC ) (8)
Ct = ft ∗ Ct−1 + it ∗ C̃t (9) A. EXPERIMENTAL SETUP
Work with deep and machine learning models, including
- - Output gate:
developing complex Python programs, specifically ver-
ot = σ (Wo · [ht−1 , xt ] + bo ) (10) sion 3.6 of the language. The Pandas module is used to import
and analyze stress data. The evaluation is carried out on
- - Hidden state update: Google Colab, utilizing a configuration that includes a GPU
ht = ot ∗ tanh(Ct ) (11) backend, 13 GB of RAM, and 90 GB of storage. To assess
the performance of the machine learning models, metrics like
where: recall, accuracy, precision, and the F1 score are employed.
- - ft is the forget gate’s value activation vector, The used metrics are described below:
- - it is the input gate’s value activation vector, • TP = True Positives: The number of correctly identified
- - C̃t is the cell input value activation vector, depressed posts.
- - Ct is the cell state vector value, • TN = True Negatives: The number of correctly identi-
- - ot is the output gate’s value activation vector, fied not depressed posts.
- - ht is the hidden state vector values (also the output • FP = False Positives: The number of not-depressed
vector of the LSTM unit), posts incorrectly identified as depressed.
- - xt is the input vector at time step t, • FN = False Negatives: The number of depressed posts
- - ht−1 is the hidden state vector at time step t − 1, incorrectly identified as not depressed.
- - Ct−1 is the cell state vector at time step t − 1,
- - W and b are the weights and biases of their 1) RECALL
respective gates, Recall, also known as sensitivity, measures the proportion of
- - σ is the sigmoid function, and actual depressed users that were correctly identified:
- - ∗ denotes element-wise multiplication.
TP
Recall = (12)
F. PHASE 6: HYPERPARAMETER SETTING TP + FN
Table 2 presents the analysis of fine-tuning conducted. 2) ACCURACY
We explore the key areas of each applied model used to Accuracy is simply a ratio of correctly predicted observations
enhance performance by selecting critical features in deep to the total observations:
learning models for predicting depression-related informa-
TP + TN
tion [30]. Through iterative training and validation processes, Accuracy = (13)
we identify the optimal hyperparameters, which help improve TP + TN + FP + FN
efficiency and increase accuracy in depression analysis. 3) PRECISION
Precision, also known as positive predictive value, is the ratio
TABLE 2. Optimizing hyperparameters in used deep learning models.
of true positive predictions to the total positive predictions:
TP
Precision = (14)
TP + FP
4) F1
The F1 score, a harmonic mean of precision and recall,
ensures that the model’s precision and recall are taken into
account, providing a more balanced view of its performance.:
precision · recall
F1 = 2 · (15)
precision + recall

B. RESULTS WITH BERT EMBEDDING FEATURES

The performance results of machine learning models apply-
ing BERT features for depression detection are analyzed in
IV. EXPERIMENTS AND OBSERVATIONS this section. The performance metrics for each method are
This section reviews the results and discusses using assessed using unseen test data. Performance results from
deep learning models for depression recognition. This applying BERT embedding features are presented in Table 3.
study involved standard data, including different depression The analysis reveals that the Logistic Regression (LR)
research cases, to test the model’s effectiveness. The results classifier achieved an accuracy, precision, recall, and F1 score

54094 VOLUME 12, 2024

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

of 0.56. The K-Neighbors classifier followed with a recall, TABLE 5. Results with novel proposed BERT-RF features.
accuracy, precision, and F1 score of 0.61. The Random
Forest (RF) classifier achieved the highest performance
with accuracy, precision, recall, and F1 score of 0.71.
The Multi-Layer Perceptron (MLP) classifier recorded an
accuracy of 0.51, a precision of 0.55, a recall of 0.51, and an
F1 score of 0.42. This analysis concludes that while moderate
performance scores are achieved using BERT embedding
features, there is still a need for improvement.
Tables 6 present a class-wise performance analysis using
the BERT-RF-based model. Our analysis aims to provide an
TABLE 3. Results with BERT embedding features.
overall assessment of the performance of various models on
the given data, with a particular emphasis on class accuracy.
The results indicate that the performance of the Multilayer
Perceptron (MLP) approach achieved a score of 0.97 for
the depression group, which is lower compared to others.
This lower score demonstrates the challenges in accurately
defining the nature of depression using the MLP approach.
TABLE 4. Class-wise performance analysis with the BERT features. In contrast, other models perform well, with category scores
consistently above 0.98 in our analysis. This suggests that
while MLPs struggle with accurately classifying stressors,
other models excel in the decision-making process across
all groups in the dataset. Overall, this analysis indicates that
all models achieved high-performance scores using a novel
proposed approach for depression recognition from social
media.

TABLE 6. Class-wise performance analysis after the BERT.

In addition, we have determined the class-wise perfor-

mance results of the models applied with BERT Embedding
features, as described in Table 4. The analysis reveals that the
RF model achieved a 72% precision score for the depressed
class. This analysis indicates that the results for each class
are low.

C. RESULTS WITH NOVEL PROPOSED BERT-RF FEATURES The time series baseline chart performance results analysis
In this section, the performance results of the proposed of the applied LSTM model is illustrated in Figure 5. The
BERT-RF features with deep learning models applied to analysis shows that the neural network LSTM achieved high
depression detection are analyzed. An evaluation of the error rates when the model started its training; however,
performance metrics of each method is conducted using after the second epoch, the results improved. In addition,
test data. Table 5 describes the performance of the applied we have performed the radar chart-based performance
methods using testing data based on the BERT-RF model. mapping, as shown in Figure 6. This analysis demonstrates
Results show that the LR (Logistic Regression) classifier the superiority of the proposed LR models for detecting
outperformed others with a recall of 0.99, precision of 0.99, depression from social media.
and an F1 score of 0.99. The RF (Random Forest) classifier
has an accuracy of 0.98, recall of 0.99, precision of 0.99, and D. RESULTS OF PROPOSED METHOD
an F1 score of 0.99. The K-Neighbors classifier achieves an In this section, we examine the performance metric results
accuracy of 0.99, precision of 0.99, recall of 0.99, and an F1 of our proposed logistic regression (LR) model. Table 7
score of 0.99. The multilayer perceptron (MLP) classifier has presents an overview of the performance metrics and results
an accuracy of 0.99, recall of 0.98, precision of 0.98, and an for a specific plan category. Our model achieves an accuracy,
F1 score of 0.98. This analysis shows the superiority of the recall, precision, and F1 score of 0.99. This underscores
proposed BERT-RF features in achieving high-performance the effectiveness of the LR method in accurately identifying
accuracy scores for depression recognition from social various levels of depression, including depressed, non-
media. depressed, and moderate cases. Performance evaluations

VOLUME 12, 2024 54095

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

by the higher accuracy metrics using the BERT-RF model

achieved on the chart. The graphical representation, particu-
larly, underscores the superior performance of the BERT-RF
approach, further supported by consistently high scores
across various performance metrics.

F. KFOLD CROSS-VALIDATION RESULTS

In this section, we employ k-fold validation to assess the
performance of the newly developed BERT-based logistic
regression model. Table 8 illustrates that utilizing k-fold
validation can enhance the prediction model’s performance
FIGURE 5. Impact of learning rate on convergence: Training loss over
epoch.
by mitigating the effects of variance across different datasets
and facilitating a more accurate evaluation of the model’s
predictive capability. We opted for a 10-fold validation
approach to analyze the results. The aggregated model
demonstrated a remarkable average k-fold accuracy of 0.99.
Ultimately, the analysis revealed that the composites prepared
using logistic regression (LR) achieved a significance level
of 0.99 and a minimal variance of (+/−) 0.0131. These
findings underscore the reliability and consistency of the
proposed BERT-RF-based logistic regression model in deliv-
ering accurate and stable outcomes.

TABLE 8. The 10-fold-based performance validations Bert base LR model.

FIGURE 6. Multivariate analysis of performance metrics using radar

graphs: Deep learning models.

indicate that the LR model can differentiate between these

categories of depression with high accuracy and confidence.

TABLE 7. Performance results of proposed LR method.

In addition, we also performed a k-fold cross-validation

E. CONFUSION MATRIX RESULTS AND analysis of other applied methods, as shown in Table 9. This
HISTOGRAM ANALYSIS analysis further demonstrates that by utilizing the proposed
Figure 7 presents the outcomes of the confusion matrix anal- BERT-RF features, all applied machine learning models
ysis conducted to assess the performance of various machine achieved generalization in detecting depression from social
learning models. The results indicate that the Multilayer media.
Perceptron (MLP) model generated 65 incorrect predictions,
TABLE 9. Kfold results of all applied methods.
the LSTM model produced 228 inaccurate predictions, and
the K-Nearest Neighbors (KNN) algorithm accounted for
53 errors. In addition, the Logistic Regression (LR) model
yielded 43 incorrect predictions. These findings validate the
effectiveness of the error evaluation method when applied
to the dataset used. Overall, the analysis suggests that
the performance of these machine-learning technologies is
suboptimal. G. STATISTICAL T-TEST ANALYSIS
The comparison between the BERT and the proposed We have also conducted a statistical T-Test analysis to
BERT-RF approach, as illustrated in Figures 8 and 9, compare the proposed Logistic Regression (LR) approach
demonstrates that the BERT-RF model significantly outper- against other methodologies, as illustrated in Table 10. This
forms other methods. This superiority is clearly illustrated analysis reveals that our proposed approach significantly

54096 VOLUME 12, 2024

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

FIGURE 7. The performance evaluation: Confusion matrix analysis.

FIGURE 8. Histogram results with only BERT features. FIGURE 9. Histogram results with proposed BERT-RF features.

outperforms the alternatives in terms of performance scores, and represent data in three-dimensional space, as shown
leading to the rejection of the null hypothesis in each instance in Figure 10. Data points are viewed line by line in 3D space,
for the detection of depression. where closeness in the diagram indicates the similarity of the
main points. This analysis shows that newly created features
H. FEATURE SPACE ANALYSIS WITH PROPOSED BERT-RF using the BERT-RF approach are highly linearly separable,
We also visualized the features using a 3D scatter plot, which which helps us achieve high-performance results for applied
is a visualization technique. This analysis aims to explore machine learning models.

VOLUME 12, 2024 54097

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

TABLE 10. Results of statistical T-Test analysis. accuracy, some inconsistent studies and limitations are worth
noting.
• First, the generalizability of our model across many
social networks, demographic groups, and cultural
contexts remains an area for investigation. Examining
different regulations and their long-term performance
will increase the validity of our approach.
• Additionally, considering the use of BERT-based
embeddings, the interpretation of the prediction model
needs to be further evaluated for clarity and reliability.
• Ethical considerations regarding mental health profiling
and responsible use of information on social media
should be validated, including issues of user consent and
privacy.
• Additionally, our model is based on specific language
patterns that express melancholy. This leads to a
discussion of the possible limitations of why individuals
may display melancholic behaviour differently in less
formal settings.
V. CONCLUSION AND FUTURE DIRECTIONS
This article presents advanced machine learning models for
stress analysis, including Random Forest Classifier (RFC),
Multilayer Perceptron (MLP), K-Nearest Neighbors Classi-
FIGURE 10. Feature reduction for improved insight: PCA in text. fier (KNC), and Logistic Regression (LR), with a focus on
BERT-based deep learning techniques. The results demon-
TABLE 11. State of the art results comparisons of the proposed approach. strate that the proposed scheme is highly effective, achieving
an accuracy rate of 99%, and outperforms existing models
in detecting depression. This study aims to contribute to the
field by addressing the limitations of previous research and
showcasing the effectiveness of deep learning in identifying
patterns of depression. These findings are significant and
promising for both the research and treatment of depression.
The success and accuracy of this study underscore the
potential of novel machine-learning approaches in mental
I. STATE OF THE ART RESULTS COMPARISON health. Overall, this study underscores the critical role of deep
This section focuses on objective and qualitative analysis learning models in accurately identifying and understanding
in state-of-the-art comparison analyses. The basis of our depression, offering insights into future developments in
research is to compare our scheme with previous studies this field.
examining depression data, as shown in Table 11. The
A. FUTURE WORK
comparison results show that our proposed LR method stands
In future work, we will build a web-based API framework
out in the search for depression detection. What sets it apart is
that detects depression from user posts in real-time on social
its exceptional quality, which boasts an impressive accuracy
media platforms.
of 0.99. It plays a crucial role in bridging the gap observed
in previous studies. Our scheme successfully addresses the ACKNOWLEDGMENT
discrepancy between the best-performing scores in the search The authors would like to express their grateful to Princess
for depression. By achieving high accuracy, our method not Nourah bint Abdulrahman University Researchers Support-
only proves its effectiveness in the study but also contributes ing Project number (PNURSP2024R104), Princess Nourah
to the advancement of depression research and the treatment bint Abdulrahman University, Riyadh, Saudi Arabia.
of limitations identified in previous studies.
REFERENCES
[1] D. William and D. Suhartono, ‘‘Text-based depression detection on social
J. DISCUSSIONS AND LIMITATIONS media posts: A systematic literature review,’’ Proc. Comput. Sci., vol. 179,
Our research on detecting depression from social data using pp. 582–589, Jan. 2021.
BERT-based content embeddings and advanced probabilistic [2] J. S. L. Figuerêdo, A. L. L. M. Maia, and R. T. Calumby, ‘‘Early
depression detection in social media based on deep learning and
features provides a new way to understand negative emotions. underlying emotions,’’ Online Social Netw. Media, vol. 31, Sep. 2022,
Although our method significantly improves to 0.99% Art. no. 100225.

54098 VOLUME 12, 2024

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

[3] H. Zogan, I. Razzak, S. Jameel, and G. Xu, ‘‘DepressionNet: A novel [25] A. Raza, A. M. Qadri, I. Akhtar, N. A. Samee, and M. Alabdulhafith,
summarization boosted deep framework for depression detection on social ‘‘LogRF: An approach to human pose estimation using skeleton landmarks
media,’’ 2021, arXiv:2105.10878. for physiotherapy fitness exercise correction,’’ IEEE Access, vol. 11,
[4] S. S and J. S. Raj, ‘‘Analysis of deep learning techniques for early detection pp. 107930–107939, 2023.
of depression on social media Network–A comparative study,’’ J. Trends [26] A. Raza, F. Rustam, H. U. R. Siddiqui, I. D. L. T. Diez, B. Garcia-Zapirain,
Comput. Sci. Smart Technol., vol. 3, no. 1, pp. 24–39, May 2021. E. Lee, and I. Ashraf, ‘‘Predicting genetic disorder and types of disorder
[5] J. D. J. Titla-Tlatelpa, R. M. Ortega-Mendoza, M. Montes-y-Gómez, using chain classifier approach,’’ Genes, vol. 14, no. 1, p. 71, Dec. 2022.
and L. Villaseñor-Pineda, ‘‘A profile-based sentiment-aware approach for [27] A. Raza, K. Munir, M. S. Almutairi, and R. Sehar, ‘‘Novel class probability
depression detection in social media,’’ EPJ Data Sci., vol. 10, no. 1, p. 54, features for optimizing network attack detection with machine learning,’’
Dec. 2021. IEEE Access, vol. 11, pp. 98685–98694, 2023.
[28] A. Raza, F. Rustam, H. U. R. Siddiqui, I. D. L. T. Diez, and I. Ashraf,
[6] V. Adarsh, P. A. Kumar, V. Lavanya, and G. R. Gangadharan, ‘‘Fair and
‘‘Predicting microbe organisms using data of living micro forms of life
explainable depression detection in social media,’’ Inf. Process. Manage.,
and hybrid microbes classifier,’’ PLoS ONE, vol. 18, no. 4, Apr. 2023,
vol. 60, no. 1, Jan. 2023, Art. no. 103168.
Art. no. e0284522.
[7] E. A. Ríssola, S. A. Bahrainian, and F. Crestani, ‘‘A dataset for research [29] A. Raza, K. Munir, M. S. Almutairi, and R. Sehar, ‘‘Novel transfer learning
on depression in social media,’’ in Proc. 28th ACM Conf. User Modeling, based deep features for diagnosis of down syndrome in children using
Adaptation Personalization, Jul. 2020, pp. 338–342. facial images,’’ IEEE Access, vol. 12, pp. 16386–16396, 2024.
[8] H. Senra and S. McPherson, ‘‘Depression in disabling medical [30] A. Alsaeedi and M. Z. Khan, ‘‘A study on sentiment analysis techniques of
conditions—Current perspectives,’’ Int. Rev. Psychiatry, vol. 33, no. 3, Twitter data,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, pp. 361–374,
pp. 312–325, Apr. 2021. 2019.
[9] S. C. Guntuku, D. B. Yaden, M. L. Kern, L. H. Ungar, and J. C. Eichstaedt, [31] M. Rizwan, M. F. Mushtaq, U. Akram, A. Mehmood, I. Ashraf,
‘‘Detecting depression and mental illness on social media: An integrative and B. Sahelices, ‘‘Depression classification from tweets using small
review,’’ Current Opinion Behav. Sci., vol. 18, pp. 43–49, Dec. 2017. deep transfer learning language models,’’ IEEE Access, vol. 10,
[10] S. Almouzini, M. Khemakhem, and A. Alageel, ‘‘Detecting Arabic pp. 129176–129189, 2022.
depressed users from Twitter data,’’ Proc. Comput. Sci., vol. 163, [32] A. A. Baale, O. R. Olasunkanmi, F. E. Adelodun, and A. A. Adigun, ‘‘Opin-
pp. 257–265, Jan. 2019. ion analysis and machine learning modeling for depression detection,’’
Tech. Rep.
[11] J. Deepali, J. Makhija, Y. Nabar, and N. Nehet, ‘‘Mental health analysis
using deep learning for feature extraction,’’ Tech. Rep., 2018.
[12] T. Gui, ‘‘Cooperative multimodal approach to depression detection in
Twitter,’’ in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, Jul. 2019,
pp. 110–117.
[13] F. Azam, M. Agro, M. Sami, M. H. Abro, and A. Dewani, ‘‘Identifying
depression among Twitter users using sentiment analysis,’’ in Proc. Int. MUHAMMAD ASAD ABBAS received the
Conf. Artif. Intell. (ICAI), Apr. 2021, pp. 44–49. B.S. degree in information technology from
[14] P. Kumar, P. Samanta, S. Dutta, M. Chatterjee, and D. Sarkar, ‘‘Feature the Department of Information Technology,
based depression detection from Twitter data using machine learning Bahauddin Zakariya University, Sub Campus
techniques,’’ J. Sci. Res., vol. 66, no. 2, pp. 220–228, 2022. Lodhran, Pakistan, in 2021. He is currently
[15] S. Pachouly, G. Raut, K. Bute, R. Tambe, and S. Bhavsar, ‘‘Depression pursuing the M.S. degree in information tech-
detection on social media network (Twitter) using sentiment analysis,’’ Int. nology with the Khwaja Fareed University
Res. J. Eng. Technol., vol. 8, pp. 1834–1839, Jan. 2021. of Engineering & Information Technology,
[16] S. Jain, S. P. Narayan, R. K. Dewang, U. Bhartiya, N. Meena, and V. Kumar, Rahim Yar Khan, Pakistan. His current research
‘‘A machine learning based depression analysis and suicidal ideation interests include data science, artificial intelli-
detection system using questionnaires and Twitter,’’ in Proc. IEEE Students gence, data mining, natural language processing, and machine learning.
Conf. Eng. Syst. (SCES), May 2019, pp. 1–6.
[17] H. Kour and M. K. Gupta, ‘‘An hybrid deep learning approach for depres-
sion prediction from user tweets using feature-rich CNN and bi-directional
LSTM,’’ Multimedia Tools Appl., vol. 81, no. 17, pp. 23649–23685,
Jul. 2022.
[18] R. Safa, P. Bayat, and L. Moghtader, ‘‘Automatic detection of depression
symptoms in Twitter using multimodal analysis,’’ J. Supercomput., vol. 78, KASHIF MUNIR received the B.Sc. degree in
no. 4, pp. 4709–4744, Mar. 2022. mathematics and physics from Islamia University
[19] D. B. Victor, J. Kawsher, M. S. Labib, and S. Latif, ‘‘Machine learning Bahawalpur, Pakistan, in 1999, the M.Sc. degree
techniques for depression analysis on social media-case study on Bengali in information technology from University Sains
community,’’ in Proc. 4th Int. Conf. Electron., Commun. Aerosp. Technol. Malaysia, in 2001, the M.S. degree in soft-
(ICECA), Nov. 2020, pp. 1118–1126. ware engineering from the University of Malaya,
[20] S. R. Kamite and V. B. Kamble, ‘‘Detection of depression in social media Malaysia, in 2005, and the Ph.D. degree in
via Twitter using machine learning approach,’’ in Proc. Int. Conf. Smart informatics from Malaysia University of Science
Innov. Design, Environ., Manage., Planning Comput. (ICSIDEMPC), and Technology, in 2015. Engaged in higher
Oct. 2020, pp. 122–125. education, since 2002, he taught initially with the
[21] INFAMOUSCODER. Depression: Twitter Dataset + Feature Binary College, Malaysia, for a semester, followed by approximately four
Extraction. Accessed: Nov. 2, 2024. [Online]. Available: https://www. years with the Stamford College, Malaysia. Later, he moved to Saudi Arabia,
kaggle.com/datasets/infamouscoder/mental-health-social-media worked with the King Fahd University of Petroleum and Minerals, from
[22] M. I. Mobin, M. F. Mridha, and S. H. Mahmud, ‘‘Exploratory analysis September 2006 to December 2014. In January 2015, he transitioned to the
of suicidal tendency in depression investigation social media post,’’ University of Hafr Al-Batin, Saudi Arabia, and in July 2021, he joined the
Tech. Rep., 2024. IT Department, Khwaja Fareed University of Engineering & Information
[23] A. Raza, K. Munir, and M. Almutairi, ‘‘A novel deep learning approach for Technology, Rahim Yar Khan, as an Assistant Professor. With a substantial
deepfake image detection,’’ Appl. Sci., vol. 12, no. 19, p. 9820, Sep. 2022. publication record, including journal articles, conference papers, books,
[24] A. Raza, I. Akhtar, L. Abualigah, R. A. Zitar, M. Sharaf, M. S. Daoud, and book chapters, he has served on technical program committees for
and H. Jia, ‘‘Preventing road accidents through early detection of driver numerous peer-reviewed conferences and journals, contributing to the review
behavior using smartphone motion sensor data: An ensemble feature of numerous research papers. His research interests include cloud computing
engineering approach,’’ IEEE Access, vol. 11, pp. 138457–138471, 2023. security, software engineering, and project management.

VOLUME 12, 2024 54099

M. A. Abbas et al.: Novel Transformer Based Contextualized Embedding and Probabilistic Features

ALI RAZA received the Bachelor of Science MONA M. JAMJOOM received the Ph.D. degree in computer science
and M.S. degrees in computer science from the from King Saud University. She is currently an Associate Professor
Department of Computer Science, Khwaja Fareed with the Department of Computer Sciences, College of Computer and
University of Engineering & Information Tech- Information Sciences, Princess Nourah bint Abdulrahman University,
nology (KFUEIT), Rahim Yar Khan, Pakistan, Riyadh, Saudi Arabia. Her research interests include artificial intelligence,
in 2021 and 2023, respectively. He is currently machine learning, deep learning, medical imaging, and data science. She has
a Lecturer with the Faculty of Information Tech- published several research articles in her field.
nology, Department of Software Engineering,
University of Lahore, Pakistan. He has published
several articles in reputed journals. His current
research interests include data science, artificial intelligence, data mining,
natural language processing, machine learning, deep learning, and image
processing.

NAGWAN ABDEL SAMEE received the

B.S. degree in computer engineering from
Ein Shams University, Egypt, in 2000, and
the M.S. degree in computer engineering and
the Ph.D. degree in systems and biomedical ZAHID ULLAH received the Ph.D. degree from
engineering from Cairo University, Egypt, in 2008 the University of Kuala Lumpur, Malaysia.
and 2012, respectively. Since 2013, she has He is currently an experienced Educator and a
been an Assistant Professor with the Informa- Researcher in computer science and information
tion Technology Department, CCIS, Princess systems. He is also an Assistant Professor with
Nourah bint Abdulrahman University, Riyadh, King Abdulaziz University, Jeddah, Saudi Arabia.
Saudi Arabia. Her research interests include data science, machine learning, His research interests include machine learning,
bioinformatics, and parallel computing. Her awards and honors include deep learning, medical imaging, and data science.
the Takafull Prize (Innovation Project Track), Princess Nourah Award in He has published various articles in his field of
innovation, Mastery Award in predictive analytics (IBM), Mastery Award in specialization.
Big Data (IBM), and Mastery Award in Cloud Computing (IBM).

54100 VOLUME 12, 2024

Deep Learning-Based Depression Detection From Social Media
No ratings yet
Deep Learning-Based Depression Detection From Social Media
20 pages
Literature Paper
No ratings yet
Literature Paper
8 pages
Momentary Depressive Feeling Detection Using X (Formerly Twitter) Data: Contextual Language Approach
No ratings yet
Momentary Depressive Feeling Detection Using X (Formerly Twitter) Data: Contextual Language Approach
12 pages
18 s2.0 S294971912400027X Main
No ratings yet
18 s2.0 S294971912400027X Main
13 pages
2022.ltedi-1.29
No ratings yet
2022.ltedi-1.29
6 pages
2023.ltedi-1.36
No ratings yet
2023.ltedi-1.36
5 pages
Enhancing Depressive Post Detection in Bangla - A Comparative Study of TF-IDF, BERT and FastText Embeddings
No ratings yet
Enhancing Depressive Post Detection in Bangla - A Comparative Study of TF-IDF, BERT and FastText Embeddings
16 pages
Feature Based Depression Detection From
No ratings yet
Feature Based Depression Detection From
9 pages
Sentiment Analysis in Mental Health Research
No ratings yet
Sentiment Analysis in Mental Health Research
8 pages
Predicting Depression Using NLP Transformers
No ratings yet
Predicting Depression Using NLP Transformers
8 pages
Report
No ratings yet
Report
27 pages
Depression Detection in Social Media A Comprehensive Review of Machine Learning and Deep Learning Techniques
No ratings yet
Depression Detection in Social Media A Comprehensive Review of Machine Learning and Deep Learning Techniques
30 pages
Conference PPTT
No ratings yet
Conference PPTT
20 pages
Automated Stress Detection in Social Media
No ratings yet
Automated Stress Detection in Social Media
19 pages
1 s2.0 S1877050923001412 Main
No ratings yet
1 s2.0 S1877050923001412 Main
9 pages
Harnessing The Power of Hugging Face Transformers For Predicting Mental Health Disorders in Social Networks
No ratings yet
Harnessing The Power of Hugging Face Transformers For Predicting Mental Health Disorders in Social Networks
11 pages
Phase 1
No ratings yet
Phase 1
14 pages
Priyanka RDC 2
No ratings yet
Priyanka RDC 2
26 pages
Final Review
No ratings yet
Final Review
21 pages
Depression Detection via BERT on Social Media
No ratings yet
Depression Detection via BERT on Social Media
4 pages
Research Paper FF
No ratings yet
Research Paper FF
18 pages
IJNGC Latex Research Paper
No ratings yet
IJNGC Latex Research Paper
10 pages
Depression Detection From Social
No ratings yet
Depression Detection From Social
17 pages
Projectsysnopsis
No ratings yet
Projectsysnopsis
7 pages
Phase 1
No ratings yet
Phase 1
15 pages
IJRPR35097
No ratings yet
IJRPR35097
4 pages
Depression Api
No ratings yet
Depression Api
1 page
Depression Detection Using EI
No ratings yet
Depression Detection Using EI
7 pages
Augmenting Semantic Representation of Depressive Language: From Forums To Microblogs
No ratings yet
Augmenting Semantic Representation of Depressive Language: From Forums To Microblogs
17 pages
Explainable Depression Detection With Multi-Aspect Features Using A Hybrid Deep Learning Model On Social Media
No ratings yet
Explainable Depression Detection With Multi-Aspect Features Using A Hybrid Deep Learning Model On Social Media
24 pages
181 Predicting Ieee
No ratings yet
181 Predicting Ieee
4 pages
Retrieve
No ratings yet
Retrieve
8 pages
Towards Automatically Classifying Depressive Symptoms From Twitter Data For Population Health
No ratings yet
Towards Automatically Classifying Depressive Symptoms From Twitter Data For Population Health
10 pages
Enhancing Depression Detection with BERT
No ratings yet
Enhancing Depression Detection with BERT
6 pages
Leveraging Machine Learning Algorithms For Early Detection of Major Depressive Disorder A Deep Learning Approach With Twitter Data
No ratings yet
Leveraging Machine Learning Algorithms For Early Detection of Major Depressive Disorder A Deep Learning Approach With Twitter Data
4 pages
Ai Healthcare
No ratings yet
Ai Healthcare
5 pages
Second Review
No ratings yet
Second Review
28 pages
Research Paper2+
No ratings yet
Research Paper2+
7 pages
RUDA-2025: Depression Severity Detection Using Pre-Trained Transformers On Social Media Data
No ratings yet
RUDA-2025: Depression Severity Detection Using Pre-Trained Transformers On Social Media Data
23 pages
Depression PDF
No ratings yet
Depression PDF
12 pages
A Novel Approach For Identifying Social Media Posts Indicative of Depression
No ratings yet
A Novel Approach For Identifying Social Media Posts Indicative of Depression
6 pages
Synopsis 3
No ratings yet
Synopsis 3
7 pages
Online Social Networks and Media: José Solenir L. Figuerêdo, Ana Lúcia L.M. Maia, Rodrigo Tripodi Calumby
No ratings yet
Online Social Networks and Media: José Solenir L. Figuerêdo, Ana Lúcia L.M. Maia, Rodrigo Tripodi Calumby
11 pages
Constructing Depression Prediction Model Using ChatGPT and Machine Learning Algorithms
No ratings yet
Constructing Depression Prediction Model Using ChatGPT and Machine Learning Algorithms
4 pages
A Machine Learning Based Depression Analysis
No ratings yet
A Machine Learning Based Depression Analysis
6 pages
Predicting Depression Using Deep Learnin
No ratings yet
Predicting Depression Using Deep Learnin
6 pages
Comparative Analysis of NLP Models For Detecting Depression On Twitter
No ratings yet
Comparative Analysis of NLP Models For Detecting Depression On Twitter
6 pages
2024 Lrec-Main 754
No ratings yet
2024 Lrec-Main 754
11 pages
Sensors 22 09775 v2
No ratings yet
Sensors 22 09775 v2
28 pages
Predicting Stress, Anxiety, and Depression From Social Media Comments: A Holistic Multi-Modal Deep Learning and NLP Framework
No ratings yet
Predicting Stress, Anxiety, and Depression From Social Media Comments: A Holistic Multi-Modal Deep Learning and NLP Framework
6 pages
Research Paper (PREDICTION OF DEPRESSION LEVELS USING SOCIAL MEDIA)
No ratings yet
Research Paper (PREDICTION OF DEPRESSION LEVELS USING SOCIAL MEDIA)
11 pages
Causal Analysis in Mental Health AI
No ratings yet
Causal Analysis in Mental Health AI
3 pages
Research Paper-Final
No ratings yet
Research Paper-Final
5 pages
MentalRiskES IberLEF 2023 TextualTherapists
No ratings yet
MentalRiskES IberLEF 2023 TextualTherapists
18 pages
Calibration of Transformer-Based Models For
No ratings yet
Calibration of Transformer-Based Models For
12 pages
Machine Learning for Twitter Depression Detection
No ratings yet
Machine Learning for Twitter Depression Detection
16 pages
Detecting Depression in Social Media
No ratings yet
Detecting Depression in Social Media
8 pages
(Amaleaks - Blogspot.com) Media and Information Literacy (Meil-122-Lec-1812s) Week 1-20
No ratings yet
(Amaleaks - Blogspot.com) Media and Information Literacy (Meil-122-Lec-1812s) Week 1-20
128 pages
Communicate Levant
No ratings yet
Communicate Levant
6 pages
Financial Accounting 2 by Valix 2015 PDF
No ratings yet
Financial Accounting 2 by Valix 2015 PDF
247 pages
Explore FreedomGPT - The AI Tool That Respects Your Privacy
No ratings yet
Explore FreedomGPT - The AI Tool That Respects Your Privacy
4 pages
0450 Business Unit 2.4 Communication
No ratings yet
0450 Business Unit 2.4 Communication
4 pages
Pidato Aisyah
No ratings yet
Pidato Aisyah
2 pages
Analyzing Customer Sentiment and Brand Perception A Comparative Study of Nike
No ratings yet
Analyzing Customer Sentiment and Brand Perception A Comparative Study of Nike
6 pages
Evaluating Relevance and Truthfulness
100% (1)
Evaluating Relevance and Truthfulness
12 pages
Post-Truth Era's Impact on Politics
No ratings yet
Post-Truth Era's Impact on Politics
11 pages
Thesis Topic For Hospitality Management
100% (3)
Thesis Topic For Hospitality Management
8 pages
Peach Mango Pie Business Plan
No ratings yet
Peach Mango Pie Business Plan
35 pages
Social Media Character Count Cheat Sheet: 71-100 CHARACTERS 40 Characters
No ratings yet
Social Media Character Count Cheat Sheet: 71-100 CHARACTERS 40 Characters
2 pages
Social Media in ESL Teaching
No ratings yet
Social Media in ESL Teaching
3 pages
Café Coffee Day's Digital Strategy
No ratings yet
Café Coffee Day's Digital Strategy
9 pages
MTA Symposium Insights on Technical Analysis
100% (2)
MTA Symposium Insights on Technical Analysis
12 pages
Digital Marketing Agency and Corporate Social Media Banner or Instagram Post Template - Premium PSD
No ratings yet
Digital Marketing Agency and Corporate Social Media Banner or Instagram Post Template - Premium PSD
3 pages
Empowerment Technologies: Web 2.0, Web 3.0, Convergent Technologies, Social, Mobile and Assistive Media
No ratings yet
Empowerment Technologies: Web 2.0, Web 3.0, Convergent Technologies, Social, Mobile and Assistive Media
28 pages
TO Kls 11 SMT 1
No ratings yet
TO Kls 11 SMT 1
9 pages
Reading Comprehension
No ratings yet
Reading Comprehension
3 pages
Recruiter Job Description for Syrah Marie
No ratings yet
Recruiter Job Description for Syrah Marie
1 page
Aff Yfs 3732
No ratings yet
Aff Yfs 3732
3 pages
Syahmi's Barbershop Marketing Plan
100% (1)
Syahmi's Barbershop Marketing Plan
11 pages
BMI - Business Model Canvas
No ratings yet
BMI - Business Model Canvas
1 page
Customer Service For Food and Beverages Industry
No ratings yet
Customer Service For Food and Beverages Industry
27 pages
Gordon C. Bruner II - Marketing Scales Handbook, Volume 10 - Multi-Item Measures For Consumer Insight Research-GCBII Productions (2019)
No ratings yet
Gordon C. Bruner II - Marketing Scales Handbook, Volume 10 - Multi-Item Measures For Consumer Insight Research-GCBII Productions (2019)
552 pages
Strategic Public Relations
No ratings yet
Strategic Public Relations
4 pages
Impact of Digital Marketing On Small Scale Business
100% (1)
Impact of Digital Marketing On Small Scale Business
55 pages
The Effectiveness of Social Media As Marketing Tool For Business
83% (6)
The Effectiveness of Social Media As Marketing Tool For Business
20 pages
Detailed Lesson Plan DLP Format DATE Lea
No ratings yet
Detailed Lesson Plan DLP Format DATE Lea
5 pages
Magundayao - That's A Swap: Why Cavitenos Are Restoring The Barter Trade System Amidst Pandemic
100% (1)
Magundayao - That's A Swap: Why Cavitenos Are Restoring The Barter Trade System Amidst Pandemic
10 pages

Novel Transformer Based Contextualized Embedding and Probabilistic Features For Depression Detection From Social Media

Uploaded by

Novel Transformer Based Contextualized Embedding and Probabilistic Features For Depression Detection From Social Media

Uploaded by

Received 25 March 2024, accepted 9 April 2024, date of publication 12 April 2024, date of current version 22 April 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3387695

Novel Transformer Based Contextualized

MONA M. JAMJOOM 4 , AND ZAHID ULLAH 5

Riyadh 11671, Saudi Arabia

ABSTRACT Depression constitutes a significant mental health condition, impacting an individual’s

TABLE 1. The summary analysis of reviewed literature studies.

VOLUME 12, 2024 54089

54090 VOLUME 12, 2024

FIGURE 1. Proposed methodology workflow.

FIGURE 3. Visualizing textual patterns: A word cloud analysis.

C. PHASE 3: NOVEL PROPOSED BERT-RF TEXTUAL

VOLUME 12, 2024 54091

54092 VOLUME 12, 2024

by a feature vector xi ∈ Rd extracted from the text, where:

VOLUME 12, 2024 54093

B. RESULTS WITH BERT EMBEDDING FEATURES

54094 VOLUME 12, 2024

TABLE 6. Class-wise performance analysis after the BERT.

In addition, we have determined the class-wise perfor-

VOLUME 12, 2024 54095

by the higher accuracy metrics using the BERT-RF model

F. KFOLD CROSS-VALIDATION RESULTS

TABLE 8. The 10-fold-based performance validations Bert base LR model.

FIGURE 6. Multivariate analysis of performance metrics using radar

indicate that the LR model can differentiate between these

TABLE 7. Performance results of proposed LR method.

In addition, we also performed a k-fold cross-validation

54096 VOLUME 12, 2024

FIGURE 7. The performance evaluation: Confusion matrix analysis.

VOLUME 12, 2024 54097

54098 VOLUME 12, 2024

VOLUME 12, 2024 54099

NAGWAN ABDEL SAMEE received the

54100 VOLUME 12, 2024

You might also like