0% found this document useful (0 votes)

17 views23 pages

Large Language Models For Mental Health Diagnostic Assessments Exploring The Potential of Large Lang

This document explores the use of large language models (LLMs) to assist in mental health diagnostic assessments, specifically focusing on the PHQ-9 and GAD-7 questionnaires for depression and anxiety. The authors investigate various prompting and fine-tuning techniques to enhance LLMs' adherence to standard diagnostic procedures, evaluating their effectiveness against expert-validated outcomes. The study introduces the DiagnosticLlama model and provides a dataset of annotated synthetic data to facilitate further research in this area.

Uploaded by

nenexboss69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Large Language Models For Mental Health Diagnostic Assessments Exploring The Potential of Large Lang

Uploaded by

nenexboss69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Exploring The Potential of Large Language Models for Assisting with Mental

Health Diagnostic Assessments

The Depression and Anxiety Case

KAUSHIK ROY, Artificial Intelligence Institute University of South Carolina, USA

HARSHUL SURANA, Indian Institute of Research and Science, Bhopal, India
DARSSAN ESWARAMOORTHI, Artificial Intelligence Institute University of South Carolina, USA
YUXIN ZI, Artificial Intelligence Institute University of South Carolina, USA
VEDANT PALIT, Indian Institute of Technology, Kharagpur, India
arXiv:2501.01305v1 [cs.CL] 2 Jan 2025

RITVIK GARIMELLA, Artificial Intelligence Institute University of South Carolina, USA

AMIT SHETH, Artificial Intelligence Institute University of South Carolina, USA

ABSTRACT
Large language models (LLMs) are increasingly attracting the attention of healthcare professionals for their potential to
assist in diagnostic assessments, which could alleviate the strain on the healthcare system caused by a high patient
load and a shortage of providers. For LLMs to be effective in supporting diagnostic assessments, it is essential that
they closely replicate the standard diagnostic procedures used by clinicians. In this paper, we specifically examine the
diagnostic assessment processes described in the Patient Health Questionnaire-9 (PHQ-9) for major depressive disorder
(MDD) and the Generalized Anxiety Disorder-7 (GAD-7) questionnaire for generalized anxiety disorder (GAD). We
investigate various prompting and fine-tuning techniques to guide both proprietary and open-source LLMs in adhering
to these processes, and we evaluate the agreement between LLM-generated diagnostic outcomes and expert-validated
ground truth. For fine-tuning, we utilize the Mentalllama and Llama models, while for prompting, we experiment with
proprietary models like GPT-3.5 and GPT-4o, as well as open-source models such as llama-3.1-8b and mixtral-8x7b.

Software Availability. We make all software artifacts available at this Github link1

Institutional Review Board (IRB). This study does not require approval from the Institutional Review Board (IRB).
It involves using clinician-annotated social media posts, authorized for research purposes. The primary objective is
to evaluate the effectiveness of LLMs that incorporate diagnostic criteria for major depressive disorder and general
anxiety disorder for assisting with mental health assessments.

1 INTRODUCTION
LLMs are large neural networks (≥∼7 billion weights and biases) designed to encode complex language patterns
achieved by training on massive language-based datasets [1]. Their remarkable success in a wide array of natural
language processing tasks has led to the proliferation of LLM-based tools and applications across various industries
1 https://github.com/kauroy1994/Large-Language-Models-for-Assisting-with-Mental-Health-Diagnostic-Assessments

Authors’ addresses: Kaushik Roy, [email protected], Artificial Intelligence Institute University of South Carolina, USA; Harshul Surana, harshul19@
iiserb.ac.in, Indian Institute of Research and Science, Bhopal, India; Darssan Eswaramoorthi, [email protected], Artificial Intelligence Institute
University of South Carolina, USA; Yuxin Zi, [email protected], Artificial Intelligence Institute University of South Carolina, USA; Vedant Palit, ledarssan@
gmail.com, Indian Institute of Technology, Kharagpur, India; Ritvik Garimella, [email protected], Artificial Intelligence Institute University of South
Carolina, USA; Amit Sheth, [email protected], Artificial Intelligence Institute University of South Carolina, USA.
1
2 Trovato et al.

[2]. In healthcare, particularly in contexts involving natural language conversations, such as interactions between
patients and clinicians, LLMs have piqued the interest of stakeholders as a potentially valuable tool to investigate for
assisting with alleviating some of the burden on clinicians and the overall healthcare system [3]. During patient-clinician
interactions, clinicians employ standard diagnostic assessment processes for capturing a patient’s state, such as the
PHQ-9 for depression assessment and the GAD-7 for anxiety assessment [4, 5]. Figure 1 shows the PHQ-9 and GAD-7
questionnaires. To gainfully leverage LLMs for diagnostic assistance, it is necessary to provide mechanisms for guiding

Fig. 1. Mental Health Diagnostic Assessment Questionnaires. The Patient Health Questionnaire (PHQ)-9 for depression
assessment and the Generalized Anxiety Disorder (GAD)-7 for anxiety assessment.

LLMs in closely following standardized clinical assessment procedures. There are two categories of methods available
to enable this behavior:
(i) Prompting LLMs - Modern LLMs stand out for their capacity to tailor responses based on user instructions
or prompts [6]. However, LLMs are highly sensitive to the specific prompts used [7]. Prompting techniques have
continuously evolved to enhance the robustness of LLM responses, for example, by using Chain-of-Thought (CoT)
prompting [8]. Prompting methods are broadly classified under three categories: (i) Naive prompting - Providing direct
instructions to the LLM in a prompt, (ii) Exemplar-based prompting - Providing direct instructions along with few
examples of the expected output, and (iii) Guidance-based prompting - Exemplar-based prompting along with providing
specific guidance on reasoning steps (for example by prompting the LLM to “think” step-by-step).
(ii) Finetuning LLMs - Fine-tuning of LLMs involves adapting the model’s behavior to closely align with the diagnos-
tic procedures that clinicians follow, using fine-tuning algorithms such as supervised fine-tuning (SFT), reinforcement
learning with human feedback (RLHF), and direct preference optimization (DPO) [9]. Fine-tuning LLMs is relatively
Large Language Models for Mental Health Diagnostic Assessments 3

more complex than prompting due to the need to curate high-quality data and appropriately formulate task-specific
prompts or instructions during the fine-tuning process.
In this work, we explore both approaches using a variety of proprietary and open-source models, namely - the
Mentalllama and Llama models for finetuning, and the models GPT-3.5 and GPT-4o, llama-3.1-8b and mixtral-8x7b for
prompting [7, 10–12].

Related Work and Main Contribution

Related work leveraging the PHQ-9 and GAD-7 questionnaires for diagnostic assistance for MDD and GAD, can be
broadly categorized into: Scoring-based methods - Scoring or ranking excerpts from text data (considered representative
of patient-clinician interactions), based on relevance to symptoms presented in the PHQ-9 and GAD-7 questionnaires
[13], Explainable AI (xAI)-based methods - that clinically ground BERT-based model outputs against the PHQ-9 and
GAD-7 symptoms through surrogate modeling such as LIME and SHAP [14], and Text span identification and evidence
summarization methods - predicting and summarizing text spans over the data, and comparing against human annotated
samples of highlighted text spans [15]. Our work most closely resembles the Text span identification and evidence
summarization methods category. However, our work differs in highly specialized steering of model outputs to provide
information relevant to specific diagnostic criteria in the PHQ-9 and GAD-7 questionnaires across the prompting and
fine-tuning methods employed in our experimentation. Additionally, we provide two significant contributions (1) a
first-of-its-kind fine-tuned model specialized for diagnostic criteria assessment based on the llama model architecture,
which we refer to as the DiagnosticLlama, and (2) A comprehensive set of language model annotated synthetic data,
evaluated for quality by expert humans for facilitating further research on LLM-powered diagnostic assessment.

2 METHODOLOGY
2.1 MDD Diagnostic Assistance based on the PHQ-9
Ground Truth Dataset Creation. We start with the publicly available PRIMATE dataset, which consists of a
collection of social media posts annotated for PHQ-9 relevant criteria [16]. Appendix A.1 shows an example post and its
annotation, specifically the post title, the post text, and the annotations indicating whether specific PHQ-9 symptoms are
present in the post (using yes/no values). We chose this dataset as the authors provide preliminary experimental evidence
on the effectiveness of using this dataset for guiding language models toward questionnaire-specific determination of
diagnostic criteria. We first prompt GPT-4o to identify text spans in the posts corresponding to the PHQ-9 symptoms
by providing an example of the expected output. Appendix A.2 shows an example of a prompt to GPT-4o. It is evident
from this example how we are attempting to steer the model toward providing PHQ-9-specific diagnostic criteria. We
then pass the model outputs to expert clinicians who provide us with a subset of GPT-4o annotated outputs that the
clinicians agree with. This subset is available here2 . The clinicians are three anonymized experts from a non-profit
institution run by a retired professional from the National Institute of Mental Health, Neuroscience, and Allied Fields
(NIMHANS), India3 . The agreement score of 0.74 was recorded among the annotators (measured using Cohen’s Kappa).

2.1.1 Prompting-based Methods.

2 https://huggingface.co/datasets/darssanle/GPT-4o-eval
3 https://www.justdial.com/Bangalore/Dr-C-R-Chandrashekar-Samadhana-Counselling-Trust-Centre-Near-Subramanya-Temple-Mico-Layout-Bus-
Stand-Bannerghatta-Road/080PXX80-XX80-170124231410-U5U4_BZDET
4 Trovato et al.

Obtaining Proprietary Model Outputs for MDD Diagnostic Assistance based on the PHQ-9. Maintaining
exactly the same prompt structure as shown in Appendix A.2, we prompt the models GPT-3.5-Turbo and GPT-4o-mini
to obtain annotations to a subset of the posts in the PRIMATE dataset. Our subset selection is randomized and limited
by request costs and our available budget (see Section 4 for funding information).
For evaluation of the outputs, we employ two methods, (i) hits@k based ranking - We rank-order the text spans
identified in the model output based on cosine similarity with the symptom, and then check if the identified text span
occurs within the top k positions in the ground truth output, and (ii) Standard Classification Metrics - We evaluate
the accuracy, precision, recall and F1-score of the model outputs against the ground truth. Tables 1 and 2 shows the
evaluation results.

Table 1. Evaluation of Proprietary LLMs for PHQ-9 Symptom Annotation of PRIMATE Posts Using hits@k.

Evaluation Metric GPT-3.5-Turbo GPT-4o-mini

hits@1 87% 89%
hits@<5 98% 99%

Table 2. Evaluation of Proprietary LLMs for PHQ-9 Symptom Annotation of PRIMATE Posts Using Standard Classification Metrics.

Method Accuracy Precision Recall F1-score

GPT-3.5-Turbo 0.93 0.89 0.96 0.92
GPT-4o-mini 0.94 0.96 0.98 0.92

2.1.2 Obtaining Open-source Model Outputs for MDD Diagnostic Assistance based on the PHQ-9. Similar to
the proprietary model case, we use the same prompt structure shown in Appendix A.2 and prompt the models llama3.1-
8b and mixtral-8x7b to obtain annotations. Like the proprietary model(s) case, the subset selection is randomized and
limited only by rate-limit costs.
For evaluation, we use the same two methods defined in Section 2.1.1 using the ground truth dataset introduced in
Section 2 (the hits@k and standard classification metrics). Tables 3 and 4 show the results.

Table 3. Evaluation of Open-source LLMs for PHQ-9 Symptom Annotation of PRIMATE Posts Using hits@k.

Evaluation Metric llama3.1-8b mixtral-8x7b

hits@1 83% 92%
hits@<5 88% 99%

Table 4. Evaluation of Open-source LLMs for PHQ-9 Symptom Annotation of PRIMATE Posts Using Standard Classification Metrics.

Method Accuracy Precision Recall F1-score

llama3.1-8b 0.84 0.86 0.78 0.82
mixtral-8x7b 0.92 0.96 0.95 0.93
Large Language Models for Mental Health Diagnostic Assessments 5

2.1.3 Fine-tuning-based Methods.

The MentalllaMa model. MentalllaMA is a model trained on 105K data samples of mental health instructions on
social media posts. The samples are collected from 10 existing sources covering eight mental health analysis tasks,
making MentalllaMA a suitable foundation model for the tasks covered in this study. The instructions used for training
are a combination of expert-written and few-shot ChatGPT prompt outputs, further validating MentalllaMa as a viable
candidate for testing adherence to diagnostic criteria by language models [10].
We perform experiments using MentaLLaMa on the ground truth dataset we introduce in Section 2 and report the
results. The Prompt is provided in appendix Section B.1.

The DiagnosticLlama model - Fine-tuning Mentalllama on the PRIMATE dataset using Hugging Face
AutoTrain. Autorain is a no-code platform designed to simplify the process of training and fine-tuning language
models on custom data4 . The full training specifications for training this model are available in appendix Section C. We
refer to this model as DiagnosticLlama. Appendix section C.1 shows an example of an input (prompt) and output pair
obtained using the DiagnosticLlama model. The model space is available here5 .
For evaluation of the outputs, we employ the same two methods as in Section 2.1.1, i.e., (i) hits@k based ranking, and
(ii) Standard Classification Metrics - the accuracy, precision, recall and F1-score of the model outputs against the ground
truth. Tables 5 and 6 show the evaluation results.
Table 5. Evaluation of MentalllaMa and DiagnosticLlama for PHQ-9 Symptom Annotation of PRIMATE Posts Using hits@k.

Evaluation Metric MentalllaMa DiagnosticLlama

hits@1 - 68.3%
hits@<5 - 76.2%

Table 6. Evaluation of MentalllaMa and DiagnosticLlama for PHQ-9 Symptom Annotation of PRIMATE Posts Using Standard
Classification Metrics.

Method Accuracy Precision Recall F1-score

MentalllaMa 0.82 0.83 0.63 0.75
DiagnosticLlama - - - -

2.2 GAD Diagnostic Assistance based on the GAD-7

Ground Truth Dataset Creation. Once again, we start with the publicly available PRIMATE dataset. We then
prompt GPT-4o to identify text spans in the posts corresponding to the GAD-7 symptoms by providing an example of
the expected output. Appendix A.2 shows an example of a prompt to GPT-4o. This example clarifies how we attempt to
steer the model toward providing GAD-7-specific diagnostic criteria. Similar to the PHQ-9 case, we then pass the model
outputs to expert clinicians who provide us with a subset of GPT-4o annotated outputs that the clinicians agree with.
This subset is available here6 . The clinicians are the same three anonymized experts from the non-profit mentioned in
Section 2.1. The agreement score of 0.72 was recorded among the annotators (measured using Cohen’s Kappa).
4 https://huggingface.co/docs/autotrain/index
5 https://huggingface.co/barca-boy/primate_autotrain_mental_llama
6 https://huggingface.co/datasets/darssanle/GPT-4o-GAD-7
6 Trovato et al.

2.2.1 Prompting-based Methods.

Obtaining Proprietary Model Outputs for MDD Diagnostic Assistance based on the GAD-7. Maintaining
exactly the same prompt structure as shown in Appendix A.2, we prompt the models GPT-3.5-Turbo and GPT-4o-mini
to obtain annotations to a subset of the posts in the PRIMATE dataset, but this time geared towards responses to the
GAD-7 symptoms. Our subset selection is randomized and limited by request costs and our available budget (see Section
4 for funding information).
For evaluation of the outputs, we employ the same two methods as in the PHQ-9 case, i.e., (i) hits@k based ranking,
and (ii) Standard Classification Metrics - the accuracy, precision, recall and F1-score of the model outputs against the
ground truth. Tables 7 and 8 show the evaluation results.

Table 7. Evaluation of Proprietary LLMs for GAD-7 Symptom Annotation of PRIMATE Posts Using hits@k.

Evaluation Metric GPT-3.5-Turbo GPT-4o-mini

hits@1 88% 89%
hits@<5 98% 98%

Table 8. Evaluation of Proprietary LLMs for GAD-7 Symptom Annotation of PRIMATE Posts Using Standard Classification Metrics.

Method Accuracy Precision Recall F1-score

GPT-3.5-Turbo 0.95 0.9 0.95 0.91
GPT-4o-mini 0.93 0.97 0.91 0.92

2.2.2 Obtaining Open-source Model Outputs for MDD Diagnostic Assistance based on the GAD-7. Like the
PHQ-9, we use the same prompt structure shown in Appendix A.2 and prompt the models llama3.1-8b and mixtral-8x7b
to obtain annotations to the GAD-7 symptoms. As before, the subset selection is randomized and limited only by
rate-limit costs.
For evaluation, we use the same two methods defined in Section 2.1.1 using the ground truth dataset introduced in
Section 2 (the hits@k and standard classification metrics). Tables 9 and 10 show the results.

Table 9. Evaluation of Open-source LLMs for GAD-7 Symptom Annotation of PRIMATE Posts Using hits@k.

Evaluation Metric llama3.1-8b mixtral-8x7b

hits@1 83% 92%
hits@<5 88% 99%

2.3 A Note on older LLMs and Classification-based Approaches

Older Autoregressive LLMs. The previous sections have covered the best-performing LLMs. However, we have
performed experiments on older LLMs such as Llama2 and Mistral, and we provide these results in Table 11 [11, 17].
Large Language Models for Mental Health Diagnostic Assessments 7

Table 10. Evaluation of Open-source LLMs for GAD-7 Symptom Annotation of PRIMATE Posts Using Standard Classification Metrics.

Method Accuracy Precision Recall F1-score

llama3.1-8b 0.84 0.86 0.78 0.82
mixtral-8x7b 0.92 0.96 0.95 0.93

Table 11. Evaluation of Llama2-7b-chat and Mistral-Instruct for PHQ-9 Symptom Annotations of the PRIMATE Posts Using F1 scores.

Method F1-score
llama2-7b-chat 0.663
mistral-instruct 0.655

Older pretrained language models. Several classification-based approaches have been used to classify posts into labels
corresponding to diagnostic criteria on questionnaires as an alternative to generative models [16, 18]. Although this
work focuses on modern LLMs, we also perform experiments in the classification setting using the older pretrained
models - BERT, MentalBERT, and MentalRoBERTa [19, 20]. Table 12 shows the results7 .

Table 12. Evaluation of BERT, MentalBERT, and MentalRoBERTa for PHQ-9 Symptom Annotations of the PRIMATE Posts Using F1
scores.

Method F1-score
BERT 0.69
MentalBERT 0.71
MentalRoBERTa 0.48

2.4 Model and Data Artifact Details

As part of this study, we release several software artifacts, including one model - the DiagnosticLlama model (available
here8 ), and multiple annotated datasets that contain diagnostic symptom predictions along with text-span highlights
categorized into:
(a) PHQ-9-based Annotations, namely - (i) GPT-3.5-PHQ-9 annotations (ii) GPT-4o_mini-PHQ-9 annotations, (iii)
GPT-4o-PHQ-9 annotations, (iv) llama3.1_8b-PHQ-9 annotations, (v) mixtral-8x7b-PHQ-9 annotations, and
(b) containing GAD-7 based annotations (i) GPT-3.5-GAD-7 annotations (ii) GPT-4o_mini-GAD-7 annotations,
(iii) GPT-4o-GAD-7 annotations, (iv) llama3.1_8b-GAD-7 annotations, (v) mixtral-8x7b-GAD-7 annotations.
The datasets are all available at this link9 . The dataset statistics are available as part of the dataset cards in the links
provided. The dataset cards also show the details of the prompting structure used to generate the LLM outputs. We have
also consolidated all the links to the model and data artifacts in this Github repository10 . Table 13 provides a summary.

7 For completeness, we also show results of traditional machine learning-based classification methods in appendix Section D
8 https://huggingface.co/barca-boy/primate_autotrain_mental_llama
9 https://huggingface.co/collections/darssanle/mhd-datasets-669628ee2d25bd04e99dc3bf
10 https://github.com/kauroy1994/Large-Language-Models-for-Assisting-with-Mental-Health-Diagnostic-Assessments
8 Trovato et al.

Table 13. Dataset Statistics (number of posts) for All the Datasets in Section 2.4

Dataset Number of Posts

GPT-3.5-PHQ-9 339
GPT-4o_mini-PHQ-9 102
GPT-4o-PHQ-9 40
llama3.1_8b-PHQ-9 155
mixtral-8x7b-PHQ-9 97
GPT-4o_mini-GAD-7 51
GPT-4o-GAD-7 17
llama3.1_8b-GAD-7 124
mixtral-8x7b-GAD-7 109
Total 1034

3 RESULTS
3.1 PHQ-9 Results
From Tables 1, 2, 3, and 4, we see that both the proprietary and open-source LLMs approach human annotation quality,
and Tables 5 and 6 show that fine-tuning LLMs for diagnostic assistance shows promising results. However, fine-tuning
LLMs has turned out to be highly challenging and needs considerable resources and hyperparameter tuning to get right.
The entries for MentalllaMa are blank in the tables because the MentalllaMa model reiterates the input verbatim, as
seen in Section B.1. This further shows the difficulty of adequately leveraging fine-tuned models to achieve good results
in highly specialized tasks such as diagnostic assistance. Still, the preliminary results on the PHQ-9 task demonstrate
that this can be done with a good bit of trial and error on the fine-tuning configurations. It is essential to be able to
deploy specialized models fine-tuned/trained on custom data in safety-constrained and privacy-critical settings.
Interestingly, Table 11 shows significant performance gaps between the older and newer LLMs (open-source and
proprietary models). We also find from Table 12 that older pretrained language models (that are not autoregressive),
perform as well as older LLMs. We also see again that fine-tuning in the case of pretrained LLMs does not lead to
much change in performance and sometimes leads to bad performance (e.g., MentalRoBERTa), further evidencing the
significant challenge with fine-tuning language models for specialized tasks such as mental health diagnostic assistance.

3.2 GAD-7 Results

For the GAD-7 results, from Tables 5, 6, 9, and 10 we see a similar trend as the PHQ-9 case, i.e., both proprietary and
open-source LLMs approach human annotations quality.
Among the proprietary models, we find GPT-4o_mini to be the best performing, and mixtral-8x7b-GAD-7 among
the open-source models. However, there are no significant differences between the different LLMs, including both
proprietary and open-source LLMs.

4 CONCLUSION AND FUTURE WORK

Previous efforts in utilizing Large Language Models (LLMs) for mental health assistance have primarily focused on
conversational data or diagnostic assessment as a classification problem. However, these initiatives lack the precision
and guidance necessary for effective assessment with robust explanations (reasoning), and response generation, based
on established questionnaires. This gap is significant because standardized assessment tools, such as the PHQ-9 for
Large Language Models for Mental Health Diagnostic Assessments 9

major depressive disorder and the GAD-7 for general anxiety disorder, are indispensable for accurate and effective
treatment planning in mental healthcare. Our research addresses this gap by specifically targeting these assessment
procedures and developing prompting strategies to guide LLMs toward crafting clinician-friendly responses with
explanations using assessment and reasoning prompts.
Our findings reveal that while LLMs struggle to effectively utilize questionnaire information in prompts to provide
assessments resembling those of clinicians in the zero-shot setting, their performance significantly improves in the
few-shot setting (both in fine-tuning and few-shot prompting regime), nearly matching the assessments of expert
clinicians. However, despite this improvement, LLMs still do not reason in the same manner as clinicians when arriving
at assessments, matching clinician reasoning only a fraction of the time, as evidenced by the sizes of the ground truth
datasets for which a high expert agreement score is obtained. This underscores the need for further scrutiny in the
integration of LLMs, along with prompting methods incorporating diagnostic assessment criteria, before they can be
reliably utilized in mental healthcare assistance. Moreover, our work introduces several novel assessment LLM and
instruction-tuning datasets, offering a valuable resource for future research aimed at understanding and enhancing the
effectiveness of LLMs in assisting with assessments within mental healthcare settings. This contribution holds promise
for advancing the capabilities of LLMs in mental health support, potentially alleviating the strain on healthcare systems
caused by a shortage of care providers and an increasing number of patients.

Future Work. We are working on integrating the models studied in this work into a clinician-facing app, and
extending the DiagnosticLlama model to include GAD-7, and expanding all datasets in Section 2.4 to match the
original PRIMATE dataset. We are also expanding our datasets and results to include more GAD-7-based results and
non-linearly structured questionnaires (example flowcharts) such as the CSSRS [21]11 . Finally, we are also working to
incorporate additional constraints, such as restricted terminology (e.g., non-toxic terminology), by paraphrasing the
LLM outputs [22, 23]. All future updates will be released on the GitHub repository.

ACKNOWLEDGEMENTS
This research is partially supported by NSF Award 2335967 EAGER: Knowledge-guided neurosymbolic AI with guardrails
for safe virtual health assistants12 [24–29]. The views expressed here are those of the authors, not those of the sponsors.

REFERENCES
[1] Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language
models in medicine. Nature medicine, 29(8):1930–1940, 2023.
[2] Nitin Rane. Chatgpt and similar generative artificial intelligence (ai) for building and construction industry: Contribution, opportunities and
challenges of large language models for industry 4.0, industry 5.0, and society 5.0. Opportunities and Challenges of Large Language Models for
Industry, 4, 2023.
[3] Peter Lee, Sebastien Bubeck, and Joseph Petro. Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine. New England Journal of Medicine,
388(13):1233–1239, 2023.
[4] Joseph Ford, Felicity Thomas, Richard Byng, and Rose McCabe. Use of the patient health questionnaire (phq-9) in practice: Interactions between
patients and physicians. Qualitative Health Research, 30(13):2146–2159, 2020.
[5] Sverre Urnes Johnson, Pål Gunnar Ulvenes, Tuva Øktedalen, and Asle Hoffart. Psychometric properties of the general anxiety disorder 7-item
(gad-7) scale in a heterogeneous psychiatric sample. Frontiers in psychology, 10:1713, 2019.
[6] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex
Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744,
2022.

11 app demo link: https://www.youtube.com/watch?v=VpJYyb7brRs&list=PLqJzTtkUiq577Rc1HpX4iE1_ntNeuppzA&index=22

12 https://www.nsf.gov/awardsearch/showAward?AWD_ID=2335967
10 Trovato et al.

[7] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
[8] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits
reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
[9] Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your
language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
[10] Kailai Yang, Tianlin Zhang, Ziyan Kuang, Qianqian Xie, Jimin Huang, and Sophia Ananiadou. Mentallama: interpretable mental health analysis on
social media with large language models. In Proceedings of the ACM on Web Conference 2024, pages 4489–4500, 2024.
[11] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric
Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[12] Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas,
Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
[13] Anxo Pérez, Marcos Fernández-Pichel, Javier Parapar, and David E Losada. Depresym: A depression symptom annotated corpus and the role of llms
as assessors of psychological markers. arXiv preprint arXiv:2308.10758, 2023.
[14] Ayah Zirikly and Mark Dredze. Explaining models of mental health via clinically grounded auxiliary tasks. In Proceedings of the Eighth Workshop on
Computational Linguistics and Clinical Psychology, pages 30–39, 2022.
[15] Andrew Yates, Bart Desmet, Emily Prud’Hommeaux, Ayah Zirikly, Steven Bedrick, Sean MacAvaney, Kfir Bar, Molly Ireland, and Yaakov Ophir.
Proceedings of the 9th workshop on computational linguistics and clinical psychology (clpsych 2024). In Proceedings of the 9th Workshop on
Computational Linguistics and Clinical Psychology (CLPsych 2024), 2024.
[16] Shrey Gupta, Anmol Agarwal, Manas Gaur, Kaushik Roy, Vignesh Narayanan, Ponnurangam Kumaraguru, and Amit Sheth. Learning to automate
follow-up question generation using process knowledge for depression triage on reddit posts. In Proceedings of the Eighth Workshop on Computational
Linguistics and Clinical Psychology, pages 137–147, 2022.
[17] Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna
Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
[18] Sumit Dalal, Deepa Tilwani, Manas Gaur, Sarika Jain, Valerie Shalin, and Amit Seth. A cross attention approach to diagnostic explainability using
clinical practice guidelines for depression. arXiv preprint arXiv:2311.13852, 2023.
[19] Jacob Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[20] Shaoxiong Ji, Tianlin Zhang, Luna Ansari, Jie Fu, Prayag Tiwari, and Erik Cambria. Mentalbert: Publicly available pretrained language models for
mental healthcare. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7184–7190, 2022.
[21] Kaushik Roy, Yuxin Zi, Manas Gaur, Jinendra Malekar, Qi Zhang, Vignesh Narayanan, and Amit Sheth. Process knowledge-infused learning for
clinician-friendly explanations. In Proceedings of the AAAI Symposium Series, volume 1, pages 154–160, 2023.
[22] Adam Tsakalidis, Jenny Chim, Iman Munire Bilal, Ayah Zirikly, Dana Atzil-Slonim, Federico Nanni, Philip Resnik, Manas Gaur, Kaushik Roy, Becky
Inkster, et al. Overview of the clpsych 2022 shared task: Capturing moments of change in longitudinal user posts. In Proceedings of the Eighth
Workshop on Computational Linguistics and Clinical Psychology, pages 184–198, 2022.
[23] Kaushik Roy, Manas Gaur, Misagh Soltani, Vipula Rawte, Ashwin Kalyan, and Amit Sheth. Proknow: Process knowledge for safety constrained and
explainable question generation for mental health diagnostic assistance. Frontiers in big Data, 5:1056728, 2023.
[24] Amit Sheth, Manas Gaur, Kaushik Roy, and Keyur Faldu. Knowledge-intensive language understanding for explainable ai. IEEE Internet Computing,
25(5):19–24, 2021.
[25] Amit Sheth, Manas Gaur, Kaushik Roy, Revathy Venkataraman, and Vedant Khandelwal. Process knowledge-infused ai: Toward user-level
explainability, interpretability, and safety. IEEE Internet Computing, 26(5):76–84, 2022.
[26] Amit Sheth, Kaushik Roy, and Manas Gaur. Neurosymbolic artificial intelligence (why, what, and how). IEEE Intelligent Systems, 38(3):56–62, 2023.
[27] Amit Sheth and Kaushik Roy. Neurosymbolic value-inspired artificial intelligence (why, what, and how). IEEE Intelligent Systems, 39(1):5–11, 2024.
[28] Amit Sheth, Kaushik Roy, Hemant Purohit, and Amitava Das. Civilizing and humanizing artificial intelligence in the age of large language models.
IEEE Internet Computing, 28(5):5–10, 2024.
[29] Amit Sheth, Vishal Pallagani, and Kaushik Roy. Neurosymbolic ai for enhancing instructability in generative ai. IEEE Intelligent Systems, 39(5):5–11,
2024.

APPENDIX
A DATASET EXAMPLES
A.1 Primate Data Example

{ 1

" p o s t _ t i t l e " : " I ␣ don ' t ␣ f e e l ␣ o r i g i n a l ␣ anymore . " , 2

Large Language Models for Mental Health Diagnostic Assessments 11

" p o s t _ t e x t " : " When ␣ I ␣ was ␣ i n ␣ h i g h ␣ s c h o o l ␣ a ␣ few ␣ y e a r s ␣ back , ␣ I ␣ was ␣ one ␣ o f ␣ t h e ␣ 3

h i g h e s t ␣ c o m p e t i t o r s ␣ i n ␣ my ␣ s c h o o l . ␣ I ␣ j o i n e d ␣ t h e ␣ h i g h ␣ s c h o o l ␣ band ␣ i n ␣
f r e s h m a n ␣ y e a r ␣ and ␣ by ␣ s e n i o r ␣ y e a r ␣ I ␣ became ␣ one ␣ o f ␣ t h e ␣ b e s t ␣ i n ␣ my ␣ s e c t i o n
. ␣ My ␣ a c a d e m i c s ␣ were ␣ a l w a y s ␣ s t r a i g h t ␣ and ␣ I ␣ e x e r c i s e d ␣ d a i l y . ␣ S e n i o r ␣ y e a r ␣
I ␣ e n l i s t e d ␣ i n ␣ t h e ␣ m i l i t a r y ␣ and ␣ now ␣ I ␣ b e l i e v e ␣ i t ␣ was ␣ one ␣ o f ␣ my ␣ w o r s t ␣
d e c i s i o n s ␣ i n ␣ l i f e . ␣ B e f o r e ␣ I ␣ went ␣ t o ␣ b o o t ␣ camp ␣ I ␣ was ␣ m o t i v a t e d , ␣ a ␣
p a t r i o t ␣ and ␣ b e l i e v e d ␣ t h a t ␣ t h e ␣ e l i t e ␣ j o i n e d ␣ t h e ␣ m i l i t a r y . ␣ I n ␣ s e n i o r ␣ y e a r
␣ I ␣ n e v e r ␣ a p p l i e d ␣ f o r ␣ any ␣ s c h o l a r s h i p s ␣ and ␣ I ␣ was ␣ o f f e r e d ␣ one ␣ b u t ␣ t u r n e d ␣
i t ␣ down ␣ b e c a u s e ␣ I ␣ a l r e a d y ␣ s i g n e d ␣ t h e ␣ p a p e r s . ␣ I ␣ t h o u g h t ␣ I ␣ s e t ␣ m y s e l f ␣ up ␣
f o r ␣ s u c c e s s . ␣ Now ␣ I ␣ b e l i e v e ␣ I ␣ was ␣ dead ␣ wrong ␣ f o r ␣ j o i n i n g . ␣ The ␣ o n l y ␣
b e n e f i t ␣ I ␣ s e e ␣ s o ␣ f a r ␣ a f t e r ␣ a ␣ y e a r ␣ and ␣ a ␣ h a l f ␣ o f ␣ s e r v i c e ␣ i s ␣ t h a t ␣ I 'm␣
t r y i n g ␣ t o ␣ s e t ␣ m y s e l f ␣ up ␣ f i n a n c i a l l y ␣ b e f o r e ␣ I ␣ g e t ␣ o u t ␣ and ␣ h o p e f u l l y ␣
a t t e n d ␣ c o l l e g e . ␣ I t ␣ s o u n d s ␣ l i k e ␣ a ␣ p l a n ␣ b u t ␣ I ␣ f e e l ␣ no ␣ h a p p i n e s s ␣ from ␣ what
␣ I ␣ do ␣ a t ␣ a l l . ␣ I ␣ c o n v i n c e d ␣ m y s e l f ␣ t h e r e ' s ␣ no ␣ honor ␣ i n ␣ i t ␣ anymore , ␣ i t ' s ␣
j u s t ␣ a n o t h e r ␣ j o b . ␣ I ␣ don ' t ␣ e x e r c i s e ␣ by ␣ m y s e l f ␣ anymore . ␣ I ␣ f e e l ␣ l i k e ␣ I 'm␣
n o t ␣ p r o g r e s s i n g ␣ anywhere ␣ i n ␣ l i f e ␣ b e i n g ␣ i n ␣ s e r v i c e . ␣ I 'm␣ j u s t ␣ a ␣ body ␣ and ␣
i f ␣ I ␣ wasn ' t ␣ h e r e ␣ d o i n g ␣ what ␣ I 'm␣ doing , ␣ t h e r e ' d ␣ j u s t ␣ be ␣ somebody ␣ e l s e ␣
d o i n g ␣ t h e ␣ e x a c t ␣ same . ␣ I 'm␣ r e p l a c e a b l e . ␣ That ' s ␣ t h e ␣ m i n d s e t ␣ t h e ␣ m i l i t a r y ␣
g a ve ␣ me . ␣ I ␣ l o o k ␣ f o r w a r d ␣ t o ␣ g o i n g ␣ b a c k ␣ home ␣ i n ␣ 6 ␣ months ␣ f o r ␣ v a c a t i o n ␣ and
␣ t h a t ' s ␣ t h e ␣ o n l y ␣ t h i n g ␣ I ' ve ␣ been ␣ l o o k i n g ␣ f o r w a r d ␣ t o ␣ s i n c e ␣ I ' ve ␣ been ␣
s t a t i o n e d . ␣ A f t e r ␣ t h a t , ␣ t h e ␣ o n l y ␣ t h i n g ␣ I ␣ have ␣ my ␣ e y e s ␣ on ␣ a r e ␣ g e t t i n g ␣ o u t
␣ o f ␣ s e r v i c e , ␣ g o i n g ␣ home , ␣ b e i n g ␣ c l o s e r ␣ t o ␣ my ␣ f a m i l y ␣ a g a i n . ␣ There ' s ␣
n o t h i n g ␣ h e r e ␣ t h a t ␣ s a t i s f i e s ␣ me ␣ and ␣ I ␣ h a t e ␣ i t . ␣ I ␣ f e e l ␣ l i k e ␣ I ' ve ␣ t r i e d ␣
e v e r y t h i n g ␣ t o ␣ be ␣ happy ␣ h e r e ␣ b u t ␣ i t ␣ seems ␣ i m p o s s i b l e . ␣ I ␣ wish ␣ somebody ␣
could ␣ help . " ,
" annotations " : [ 4

[ 5

" F e e l i n g −bad − about − y o u r s e l f − or − t h a t −you − a r e −a − f a i l u r e − or − have − l e t − 6

y o u r s e l f − or − your − f a m i l y −down " ,

" yes " 7

], 8

[ 9

" F e e l i n g −down− d e p r e s s e d − or − h o p e l e s s " , 10

" no " 11

], 12

[ 13

" F e e l i n g − t i r e d − or − having − l i t t l e − e n e r g y " , 14

" yes " 15

12 Trovato et al.

], 16

[ 17

" L i t t l e − i n t e r e s t − or − p l e a s u r e − in − d o i n g " , 18

" yes " 19

], 20

[ 21

" Moving − or − s p e a k i n g − so − s l o w l y − t h a t − o t h e r − p e o p l e − c o u l d − have − n o t i c e d −Or− 22

the − o p p o s i t e − b e i n g − so − f i d g e t y − or − r e s t l e s s − t h a t −you − have − been −moving

− around −a − l o t −more − than − u s u a l " ,
" no " 23

], 24

[ 25

" Poor − a p p e t i t e − or − o v e r e a t i n g " , 26

" no " 27

], 28

[ 29

" Thoughts − t h a t −you −would −be − b e t t e r − o f f − dead − or − of − h u r t i n g − y o u r s e l f − in − 30

some −way " ,

" no " 31

], 32

[ 33

" T r o u b l e − c o n c e n t r a t i n g −on − t h i n g s − such − as − r e a d i n g − the − newspaper − or − 34

watch ing − t e l e v i s i o n " ,

" no " 35

], 36

[ 37

" T r o u b l e − f a l l i n g − or − s t a y i n g − a s l e e p − or − s l e e p i n g − too −much " , 38

" no " 39

] 40

] 41

} 42

A.2 GPT-4o Prompt Example

" " " When g i v e n t h e b e l o w JSON f o r m a t t e d f i l e c o n t e n t , I n e e d you t o g i v e me t h e 1

s p e c i f i c s e n t e n c e s from t h e t e x t t h a t e x h i b i t a s e t o f symptoms . B e l o w i s
an e x a m p l e o f INPUT and OUTPUT . Keep JSON F o r m a t t i n g f o r o u t p u t :
2
Large Language Models for Mental Health Diagnostic Assessments 13

{ 3

" p o s t _ t i t l e " : " I don ' t f e e l o r i g i n a l anymore . " , 4

" p o s t _ t e x t " : " When I was i n h i g h s c h o o l a few y e a r s back , I was o n e o f t h e 5

h i g h e s t c o m p e t i t o r s i n my s c h o o l . I j o i n e d t h e h i g h s c h o o l band i n
f r e s h m a n y e a r and by s e n i o r y e a r I became o n e o f t h e b e s t i n my s e c t i o n
. My a c a d e m i c s w e r e a l w a y s s t r a i g h t and I e x e r c i s e d d a i l y . S e n i o r y e a r
I e n l i s t e d i n t h e m i l i t a r y and now I b e l i e v e i t was o n e o f my w o r s t
d e c i s i o n s i n l i f e . B e f o r e I went t o b o o t camp I was m o t i v a t e d , a
p a t r i o t and b e l i e v e d t h a t t h e e l i t e j o i n e d t h e m i l i t a r y . I n s e n i o r y e a r
I n e v e r a p p l i e d f o r any s c h o l a r s h i p s and I was o f f e r e d o n e b u t t u r n e d
i t down b e c a u s e I a l r e a d y s i g n e d t h e p a p e r s . I t h o u g h t I s e t m y s e l f up
f o r s u c c e s s . Now I b e l i e v e I was d e a d wrong f o r j o i n i n g . The o n l y
b e n e f i t I s e e s o f a r a f t e r a y e a r and a h a l f o f s e r v i c e i s t h a t I 'm
t r y i n g t o s e t m y s e l f up f i n a n c i a l l y b e f o r e I g e t o u t and h o p e f u l l y
a t t e n d c o l l e g e . I t s o u n d s l i k e a p l a n b u t I f e e l no h a p p i n e s s from what
I do a t a l l . I c o n v i n c e d m y s e l f t h e r e ' s no h o n o r i n i t anymore , i t ' s
j u s t a n o t h e r j o b . I don ' t e x e r c i s e by m y s e l f anymore . I f e e l l i k e I 'm
n o t p r o g r e s s i n g a n y w h e r e i n l i f e b e i n g i n s e r v i c e . I 'm j u s t a body and
i f I wasn ' t h e r e d o i n g what I 'm d o i n g , t h e r e ' d j u s t b e somebody e l s e
d o i n g t h e e x a c t same . I 'm r e p l a c e a b l e . T h a t ' s t h e m i n d s e t t h e m i l i t a r y
g a v e me . I l o o k f o r w a r d t o g o i n g b a c k home i n 6 months f o r v a c a t i o n and
t h a t ' s t h e only t h i n g I ' ve been l o o k i n g forward t o s i n c e I ' ve been
s t a t i o n e d . A f t e r t h a t , t h e o n l y t h i n g I h a v e my e y e s on a r e g e t t i n g o u t
o f s e r v i c e , g o i n g home , b e i n g c l o s e r t o my f a m i l y a g a i n . T h e r e ' s
n o t h i n g h e r e t h a t s a t i s f i e s me and I h a t e i t . I f e e l l i k e I ' v e t r i e d
e v e r y t h i n g t o b e happy h e r e b u t i t s e e m s i m p o s s i b l e . I w i s h somebody
could help . " ,
" annotations " : [ 6

[ 7

" F e e l i n g − bad − a b o u t − y o u r s e l f − or − t h a t − you − a r e −a − f a i l u r e − or − have − l e t − 8

y o u r s e l f − or − your − f a m i l y −down " ,

" yes " 9

], 10

[ 11

" F e e l i n g −down − d e p r e s s e d − or − h o p e l e s s " , 12

" no " 13

], 14

[ 15
14 Trovato et al.

" F e e l i n g − t i r e d − or − h a v i n g − l i t t l e − e n e r g y " , 16

" yes " 17

], 18

[ 19

" L i t t l e − i n t e r e s t − or − p l e a s u r e − i n − d o i n g " , 20

" yes " 21

], 22

[ 23

" Moving − or − s p e a k i n g − so − s l o w l y − t h a t − o t h e r − p e o p l e − c o u l d − have − n o t i c e d −Or − 24

t h e − o p p o s i t e − b e i n g − so − f i d g e t y − or − r e s t l e s s − t h a t − you − have − be en − moving

− around −a − l o t − more − than − u s u a l " ,
" no " 25

], 26

[ 27

" P o o r − a p p e t i t e − or − o v e r e a t i n g " , 28

" no " 29

], 30

[ 31

" T h o u g h t s − t h a t − you − would − be − b e t t e r − o f f − dead − or − o f − h u r t i n g − y o u r s e l f − i n − 32

some −way " ,

" no " 33

], 34

[ 35

" T r o u b l e − c o n c e n t r a t i n g −on − t h i n g s − s u c h − as − r e a d i n g − t h e − n e w s p a p e r − or − 36

watching − t e l e v i s i o n " ,
" no " 37

], 38

[ 39

" T r o u b l e − f a l l i n g − or − s t a y i n g − a s l e e p − or − s l e e p i n g − t o o −much " , 40

" no " 41

] 42

] 43

} 44

And t h i s i s an e x a m p l e e x p e c t e d o u t p u t f o r m a t : 46

{ 48

" p o s t _ t i t l e " : " I don ' t f e e l o r i g i n a l anymore . " , 49

Large Language Models for Mental Health Diagnostic Assessments 15

" p o s t _ t e x t " : " When I was i n h i g h s c h o o l a few y e a r s back , I was o n e o f t h e 50

h i g h e s t c o m p e t i t o r s i n my s c h o o l . I j o i n e d t h e h i g h s c h o o l band i n
f r e s h m a n y e a r and by s e n i o r y e a r I became o n e o f t h e b e s t i n my s e c t i o n
. My a c a d e m i c s w e r e a l w a y s s t r a i g h t , and I e x e r c i s e d d a i l y . S e n i o r y e a r
I e n l i s t e d i n t h e m i l i t a r y , and now I b e l i e v e i t was o n e o f my w o r s t
d e c i s i o n s i n l i f e . B e f o r e I went t o b o o t camp I was m o t i v a t e d , a
p a t r i o t and b e l i e v e d t h a t t h e e l i t e j o i n e d t h e m i l i t a r y . I n s e n i o r y e a r
I n e v e r a p p l i e d f o r any s c h o l a r s h i p s and I was o f f e r e d o n e b u t t u r n e d
i t down b e c a u s e I a l r e a d y s i g n e d t h e p a p e r s . I t h o u g h t I s e t m y s e l f up
f o r s u c c e s s . Now I b e l i e v e I was d e a d wrong f o r j o i n i n g . The o n l y
b e n e f i t I s e e s o f a r a f t e r a y e a r and a h a l f o f s e r v i c e i s t h a t I 'm
t r y i n g t o s e t m y s e l f up f i n a n c i a l l y b e f o r e I g e t o u t and h o p e f u l l y
a t t e n d c o l l e g e . I t s o u n d s l i k e a p l a n b u t I f e e l no h a p p i n e s s from what
I do a t a l l . I c o n v i n c e d m y s e l f t h e r e ' s no h o n o r i n i t anymore ; i t ' s
j u s t a n o t h e r j o b . I don ' t e x e r c i s e by m y s e l f anymore . I f e e l l i k e I 'm
n o t p r o g r e s s i n g a n y w h e r e i n l i f e b e i n g i n s e r v i c e . I 'm j u s t a body , and
i f I wasn ' t h e r e d o i n g what I 'm d o i n g , t h e r e ' d j u s t b e somebody e l s e
d o i n g t h e e x a c t same . I 'm r e p l a c e a b l e . T h a t ' s t h e m i n d s e t t h e m i l i t a r y
g a v e me . I l o o k f o r w a r d t o g o i n g b a c k home i n 6 months f o r v a c a t i o n ,
and t h a t ' s t h e o n l y t h i n g I ' v e b e e n l o o k i n g f o r w a r d t o s i n c e I ' v e b e e n
s t a t i o n e d . A f t e r t h a t , t h e o n l y t h i n g I h a v e my e y e s on i s g e t t i n g o u t
o f s e r v i c e , g o i n g home , b e i n g c l o s e r t o my f a m i l y a g a i n . T h e r e ' s
n o t h i n g h e r e t h a t s a t i s f i e s me , and I h a t e i t . I f e e l l i k e I ' v e t r i e d
e v e r y t h i n g t o b e happy h e r e b u t i t s e e m s i m p o s s i b l e . I w i s h somebody
could help . " ,
" annotations " : { 51

" F e e l i n g − bad − a b o u t − y o u r s e l f − or − t h a t − you − a r e −a − f a i l u r e − or − have − l e t − 52

y o u r s e l f − or − your − f a m i l y −down " : [

" I t h o u g h t I s e t m y s e l f up f o r s u c c e s s . Now I b e l i e v e I was d e a d wrong 53

for joining ."

], 54

" F e e l i n g −down − d e p r e s s e d − or − h o p e l e s s " : [ ] , 55

" F e e l i n g − t i r e d − or − h a v i n g − l i t t l e − e n e r g y " : [ 56

" I f e e l l i k e I 'm n o t p r o g r e s s i n g a n y w h e r e i n l i f e b e i n g i n s e r v i c e . " 57

], 58

" L i t t l e − i n t e r e s t − or − p l e a s u r e − i n − d o i n g " : [ 59

" T h e r e ' s n o t h i n g h e r e t h a t s a t i s f i e s me , and I h a t e i t . " 60

], 61
16 Trovato et al.

" Moving − or − s p e a k i n g − so − s l o w l y − t h a t − o t h e r − p e o p l e − c o u l d − have − n o t i c e d −Or − t h e 62

− o p p o s i t e − b e i n g − so − f i d g e t y − or − r e s t l e s s − t h a t − you − have − be en − moving −

around −a − l o t − more − than − u s u a l " : [ ] ,
" P o o r − a p p e t i t e − or − o v e r e a t i n g " : [ ] , 63

" T h o u g h t s − t h a t − you − would − be − b e t t e r − o f f − dead − or − o f − h u r t i n g − y o u r s e l f − i n − 64

some −way " : [ ] ,

" T r o u b l e − c o n c e n t r a t i n g −on − t h i n g s − s u c h − as − r e a d i n g − t h e − n e w s p a p e r − or − 65

watching − t e l e v i s i o n " : [ ] ,
" T r o u b l e − f a l l i n g − or − s t a y i n g − a s l e e p − or − s l e e p i n g − t o o −much " : [ ] 66

} 67

}, 68

May I p r o c e e d w i t h t h e r e s t o f t h e INPUTS ? " " " 70

B MENTALLLAMA
B.1 Example Input and Output
Input

# # # INSTRUCTION : 1

F o r a g i v e n u s e r p o s t s e n t e n c e , d o e s i t show s i g n s o f t h e symptom . Answer in 4

b i n a r y " y e s " or
" no " , f o r e v e r y symptom . The symptoms a r e a s f o l l o w s : 5

[ L i t t l e i n t e r e s t or p l e a s u r e in d o i n g t h i n g s , 6

F e e l i n g down , d e p r e s s e d , or h o p e l e s s , 8

T r o u b l e f a l l i n g or s t a y i n g a s l e e p , or s l e e p i n g t o o much , 10

F e e l i n g t i r e d or h a v i n g l i t t l e energy , 11

Poor a p p e t i t e or o v e r e a t i n g , 13

F e e l i n g bad a b o u t y o u r s e l f or t h a t you a r e a f a i l u r e or 15

have l e t y o u r s e l f or your f a m i l y down , 16

T r o u b l e c o n c e n t r a t i n g on t h i n g s , s u c h a s r e a d i n g t h e 18

newspaper or w a t c h i n g t e l e v i s i o n , 19
Large Language Models for Mental Health Diagnostic Assessments 17

Moving or s p e a k i n g s o s l o w l y t h a t o t h e r p e o p l e c o u l d 21

have n o t i c e d . Or t h e o p p o s i t e b e i n g s o f i g e t y or 22

r e s t l e s s t h a t you have been moving around a l o t more 23

than usual , 24

Thoughts t h a t you would be b e t t e r o f f dead , or o f 26

hurting yourself ] . 27

G i v e t h e o u t p u t a s a python l i s t o f l i s t s , w i t h t h e l i s t c o n t a i n i n g t h e symptom 29

and t h e
answer , s e p a r a t e d by a comma . 30

# # # INPUT : 34

H e l l o . I ( 2 3 F ) have t r i e d t o make a p o s t t o e x p l a i n what ' s ␣ g o i n g ␣ on ␣ w i t h ␣ me , ␣ 35

b u t ␣ I ␣ j u s t ␣ end ␣ up ␣ c r y i n g , ␣ and ␣ e v e r y t h i n g ␣ I ␣ w r i t e ␣ i s ␣ i n c o h e r e n t , ␣ s o . ␣ S o r r y ␣
i f ␣ t h e r e ' s not enough c o n t e x t ?
36

I 'm␣ h a v i n g ␣ a ␣ r e a l l y ␣ d i f f i c u l t ␣ t i m e ␣ r i g h t ␣ now . ␣ I ␣ can ' t r e a l l y f o c u s on work , and 37

I don ' t ␣ g e t ␣ i n ␣ a s ␣ many ␣ h o u r s ␣ a s ␣ I ␣ s h o u l d . ␣ I ␣ f e e l ␣ l i k e ␣ I ␣ am ␣ l e t t i n g ␣ p e o p l e ␣

down .
38

E x i s t i n g ␣ i s ␣ e x h a u s t i n g , ␣ and ␣ a l l ␣ I ␣ can ␣ do ␣ i s ␣ w a s t e ␣ t i m e ␣ on ␣ my ␣ phone , ␣ b e c a u s e ␣ i f ␣ 39

I 'm on my phone I don ' t ␣ have ␣ t o ␣ t h i n k ␣ and ␣ t i m e ␣ p a s s e s ␣ more ␣ q u i c k l y .

I 'm t i r e d o f s p e n d i n g h o u r s on my phone i n s t e a d o f d o i n g f u n c t i o n a l − p e o p l e 40

t h i n g s , I 'm␣ j u s t ␣ p u t t i n g ␣ e v e r y t h i n g ␣ o f f ␣ and ␣ s e t t i n g ␣ m y s e l f ␣ up ␣ f o r ␣ f a i l u r e . ␣
But ␣ d o i n g ␣ a n y t h i n g ␣ p r o d u c t i v e ␣ f e e l s ␣ t o o ␣ d a u n t i n g , ␣ and ␣ I ␣ don ' t know how t o
start ?
41

I want t o d e v e l o p h e a l t h y h a b i t s − spend l e s s t i m e on my phone , maybe go 42

o u t s i d e more o f t e n , go on walks , s t i c k t o my work s c h e d u l e , f i n d h o b b i e s .

S i m p l e t h i n g s t h a t I know would improve my m e n t a l h e a l t h .
But even though I want t h o s e t h i n g s , I f e e l l i k e I 'm␣ n o t ␣ m o t i v a t e d ␣ enough ␣ t o ␣ 43

make ␣ a ␣ change . ␣ I t ' s a l w a y s e a s i e r t o s t a y in my c o m f o r t zone , t o d i s t r a c t

m y s e l f . My b r a i n s a y s I need t o change , b u t I f e e l l i k e I don ' t ␣ want ␣ i t ␣ bad
␣ enough ␣ t o ␣ a c t u a l l y ␣ t r y .
18 Trovato et al.

I ␣ can ' t keep l i v i n g l i k e t h i s , though . 45

I ' ve ␣ t r i e d ␣ l o o k i n g ␣ f o r ␣ a d v i c e ␣ o n l i n e , ␣ b u t ␣ e v e r y t h i n g ␣ I ␣ s e e ␣ i n v o l v e s : 47

− ␣ Therapy , ␣ which ␣ I ␣ don ' t have money f o r . 48

− R e a c h i n g out , b u t I have v e r y few p e o p l e I 'm␣ c l o s e ␣ with , ␣ and ␣ I ␣ don ' t f e e l 49

c o m f o r t a b l e t a l k i n g a b o u t my m e n t a l h e a l t h t o any o f them .
− S e l f − help books and s t u f f , which j u s t . . . d o e s not work on me . 50

− Hobbies , b u t I don ' t ␣ r e a l l y ␣ have ␣ any ␣ o f ␣ t h o s e ␣ anymore ? ␣ I ' ve been d e p r e s s e d 51

f o r s o long , I ' ve ␣ l o s t ␣ i n t e r e s t ␣ i n ␣ t h e ␣ o n e s ␣ I ␣ u s e d ␣ t o ␣ have ␣ a s ␣ a ␣ k i d , ␣ and ␣

n e v e r ␣ d e v e l o p e d ␣ new ␣ o n e s .
52

So ␣ I ␣ g u e s s ␣ I 'm wondering / h o p i n g you a l l might have some a d v i c e on how t o g e t a 53

b i t b e t t e r , something t h a t doesn ' t ␣ i n v o l v e ␣ the ␣ t h i n g s ␣ I ␣ j u s t ␣ mentioned ␣ ?

I ␣ don ' t know . I wish w a n t i n g t o g e t b e t t e r was enough t o s p u r me i n t o a c t i o n . 55

S o r r y i f what I 'm␣ a s k i n g ␣ f o r ␣ d o e s n ' t e x i s t , or i f I wasn ' t ␣ v e r y ␣ c l e a r .

Thanks ␣ f o r ␣ r e a d i n g 56

Output

# # # INSTRUCTION : 1

F o r a g i v e n u s e r p o s t s e n t e n c e , d o e s i t show s i g n s o f t h e symptom . Answer in 4

b i n a r y " y e s " or
" no " , f o r e v e r y symptom . The symptoms a r e a s f o l l o w s : 5

[ L i t t l e i n t e r e s t or p l e a s u r e in d o i n g t h i n g s , 6

F e e l i n g down , d e p r e s s e d , or h o p e l e s s , 8

T r o u b l e f a l l i n g or s t a y i n g a s l e e p , or s l e e p i n g t o o much , 10

F e e l i n g t i r e d or h a v i n g l i t t l e energy , 11

Poor a p p e t i t e or o v e r e a t i n g , 13

F e e l i n g bad a b o u t y o u r s e l f or t h a t you a r e a f a i l u r e or 15

have l e t y o u r s e l f or your f a m i l y down , 16

17
Large Language Models for Mental Health Diagnostic Assessments 19

T r o u b l e c o n c e n t r a t i n g on t h i n g s , s u c h a s r e a d i n g t h e 18

newspaper or w a t c h i n g t e l e v i s i o n , 19

Moving or s p e a k i n g s o s l o w l y t h a t o t h e r p e o p l e c o u l d 21

have n o t i c e d . Or t h e o p p o s i t e b e i n g s o f i g e t y or 22

r e s t l e s s t h a t you have been moving around a l o t more 23

than usual , 24

Thoughts t h a t you would be b e t t e r o f f dead , or o f 26

hurting yourself ] . 27

G i v e t h e o u t p u t a s a python l i s t o f l i s t s , w i t h t h e l i s t c o n t a i n i n g t h e symptom 29

and t h e
answer , s e p a r a t e d by a comma . 30

# # # INPUT : 34

H e l l o . I ( 2 3 F ) have t r i e d t o make a p o s t t o e x p l a i n what ' s ␣ g o i n g ␣ on ␣ w i t h ␣ me , ␣ 35

I 'm␣ h a v i n g ␣ a ␣ r e a l l y ␣ d i f f i c u l t ␣ t i m e ␣ r i g h t ␣ now . ␣ I ␣ can ' t r e a l l y f o c u s on work , and 37

I don ' t ␣ g e t ␣ i n ␣ a s ␣ many ␣ h o u r s ␣ a s ␣ I ␣ s h o u l d . ␣ I ␣ f e e l ␣ l i k e ␣ I ␣ am ␣ l e t t i n g ␣ p e o p l e ␣

down .
38

E x i s t i n g ␣ i s ␣ e x h a u s t i n g , ␣ and ␣ a l l ␣ I ␣ can ␣ do ␣ i s ␣ w a s t e ␣ t i m e ␣ on ␣ my ␣ phone , ␣ b e c a u s e ␣ i f ␣ 39

I 'm on my phone I don ' t ␣ have ␣ t o ␣ t h i n k ␣ and ␣ t i m e ␣ p a s s e s ␣ more ␣ q u i c k l y .

I 'm t i r e d o f s p e n d i n g h o u r s on my phone i n s t e a d o f d o i n g f u n c t i o n a l − p e o p l e 40

I want t o d e v e l o p h e a l t h y h a b i t s − spend l e s s t i m e on my phone , maybe go 42

o u t s i d e more o f t e n , go on walks , s t i c k t o my work s c h e d u l e , f i n d h o b b i e s .

S i m p l e t h i n g s t h a t I know would improve my m e n t a l h e a l t h .
20 Trovato et al.

But even though I want t h o s e t h i n g s , I f e e l l i k e I 'm␣ n o t ␣ m o t i v a t e d ␣ enough ␣ t o ␣ 43

make ␣ a ␣ change . ␣ I t ' s a l w a y s e a s i e r t o s t a y in my c o m f o r t zone , t o d i s t r a c t

m y s e l f . My b r a i n s a y s I need t o change , b u t I f e e l l i k e I don ' t ␣ want ␣ i t ␣ bad
␣ enough ␣ t o ␣ a c t u a l l y ␣ t r y .
44

I ␣ can ' t keep l i v i n g l i k e t h i s , though . 45

I ' ve ␣ t r i e d ␣ l o o k i n g ␣ f o r ␣ a d v i c e ␣ o n l i n e , ␣ b u t ␣ e v e r y t h i n g ␣ I ␣ s e e ␣ i n v o l v e s : 47

− ␣ Therapy , ␣ which ␣ I ␣ don ' t have money f o r . 48

− R e a c h i n g out , b u t I have v e r y few p e o p l e I 'm␣ c l o s e ␣ with , ␣ and ␣ I ␣ don ' t f e e l 49

c o m f o r t a b l e t a l k i n g a b o u t my m e n t a l h e a l t h t o any o f them .
− S e l f − help books and s t u f f , which j u s t . . . d o e s not work on me . 50

− Hobbies , b u t I don ' t ␣ r e a l l y ␣ have ␣ any ␣ o f ␣ t h o s e ␣ anymore ? ␣ I ' ve been d e p r e s s e d 51

f o r s o long , I ' ve ␣ l o s t ␣ i n t e r e s t ␣ i n ␣ t h e ␣ o n e s ␣ I ␣ u s e d ␣ t o ␣ have ␣ a s ␣ a ␣ k i d , ␣ and ␣

n e v e r ␣ d e v e l o p e d ␣ new ␣ o n e s .
52

So ␣ I ␣ g u e s s ␣ I 'm wondering / h o p i n g you a l l might have some a d v i c e on how t o g e t a 53

b i t b e t t e r , something t h a t doesn ' t ␣ i n v o l v e ␣ the ␣ t h i n g s ␣ I ␣ j u s t ␣ mentioned ␣ ?

I ␣ don ' t know . I wish w a n t i n g t o g e t b e t t e r was enough t o s p u r me i n t o a c t i o n . 55

S o r r y i f what I 'm␣ a s k i n g ␣ f o r ␣ d o e s n ' t e x i s t , or i f I wasn ' t ␣ v e r y ␣ c l e a r .

Thanks ␣ f o r ␣ r e a d i n g 56

C AUTOTRAINING Diagnostic𝐿𝑙𝑎𝑚𝑎
Model details, input format, sample inferences

C.1 Example Input and Output

Input

# # # INSTRUCTION : 1

F o r a g i v e n u s e r p o s t s e n t e n c e , d o e s i t show s i g n s o f t h e symptom . Answer in 4

b i n a r y " y e s " or
" no " , f o r e v e r y symptom . The symptoms a r e a s f o l l o w s : 5

[ L i t t l e i n t e r e s t or p l e a s u r e in d o i n g t h i n g s , 6

F e e l i n g down , d e p r e s s e d , or h o p e l e s s , 8

9
Large Language Models for Mental Health Diagnostic Assessments 21

T r o u b l e f a l l i n g or s t a y i n g a s l e e p , or s l e e p i n g t o o much , 10

F e e l i n g t i r e d or h a v i n g l i t t l e energy , 11

Poor a p p e t i t e or o v e r e a t i n g , 13

F e e l i n g bad a b o u t y o u r s e l f or t h a t you a r e a f a i l u r e or 15

have l e t y o u r s e l f or your f a m i l y down , 16

T r o u b l e c o n c e n t r a t i n g on t h i n g s , s u c h a s r e a d i n g t h e 18

newspaper or w a t c h i n g t e l e v i s i o n , 19

Moving or s p e a k i n g s o s l o w l y t h a t o t h e r p e o p l e c o u l d 21

have n o t i c e d . Or t h e o p p o s i t e b e i n g s o f i g e t y or 22

r e s t l e s s t h a t you have been moving around a l o t more 23

than usual , 24

Thoughts t h a t you would be b e t t e r o f f dead , or o f 26

hurting yourself ] . 27

G i v e t h e o u t p u t a s a python l i s t o f l i s t s , w i t h t h e l i s t c o n t a i n i n g t h e symptom 29

and t h e
answer , s e p a r a t e d by a comma . 30

# # # INPUT : 34

H e l l o . I ( 2 3 F ) have t r i e d t o make a p o s t t o e x p l a i n what ' s ␣ g o i n g ␣ on ␣ w i t h ␣ me , ␣ 35

I 'm␣ h a v i n g ␣ a ␣ r e a l l y ␣ d i f f i c u l t ␣ t i m e ␣ r i g h t ␣ now . ␣ I ␣ can ' t r e a l l y f o c u s on work , and 37

I don ' t ␣ g e t ␣ i n ␣ a s ␣ many ␣ h o u r s ␣ a s ␣ I ␣ s h o u l d . ␣ I ␣ f e e l ␣ l i k e ␣ I ␣ am ␣ l e t t i n g ␣ p e o p l e ␣

down .
38

E x i s t i n g ␣ i s ␣ e x h a u s t i n g , ␣ and ␣ a l l ␣ I ␣ can ␣ do ␣ i s ␣ w a s t e ␣ t i m e ␣ on ␣ my ␣ phone , ␣ b e c a u s e ␣ i f ␣ 39

I 'm on my phone I don ' t ␣ have ␣ t o ␣ t h i n k ␣ and ␣ t i m e ␣ p a s s e s ␣ more ␣ q u i c k l y .

22 Trovato et al.

I 'm t i r e d o f s p e n d i n g h o u r s on my phone i n s t e a d o f d o i n g f u n c t i o n a l − p e o p l e 40

I want t o d e v e l o p h e a l t h y h a b i t s − spend l e s s t i m e on my phone , maybe go 42

o u t s i d e more o f t e n , go on walks , s t i c k t o my work s c h e d u l e , f i n d h o b b i e s .

make ␣ a ␣ change . ␣ I t ' s a l w a y s e a s i e r t o s t a y in my c o m f o r t zone , t o d i s t r a c t

m y s e l f . My b r a i n s a y s I need t o change , b u t I f e e l l i k e I don ' t ␣ want ␣ i t ␣ bad
␣ enough ␣ t o ␣ a c t u a l l y ␣ t r y .
44

I ␣ can ' t keep l i v i n g l i k e t h i s , though . 45

I ' ve ␣ t r i e d ␣ l o o k i n g ␣ f o r ␣ a d v i c e ␣ o n l i n e , ␣ b u t ␣ e v e r y t h i n g ␣ I ␣ s e e ␣ i n v o l v e s : 47

− ␣ Therapy , ␣ which ␣ I ␣ don ' t have money f o r . 48

− R e a c h i n g out , b u t I have v e r y few p e o p l e I 'm␣ c l o s e ␣ with , ␣ and ␣ I ␣ don ' t f e e l 49

c o m f o r t a b l e t a l k i n g a b o u t my m e n t a l h e a l t h t o any o f them .
− S e l f − help books and s t u f f , which j u s t . . . d o e s not work on me . 50

− Hobbies , b u t I don ' t ␣ r e a l l y ␣ have ␣ any ␣ o f ␣ t h o s e ␣ anymore ? ␣ I ' ve been d e p r e s s e d 51

f o r s o long , I ' ve ␣ l o s t ␣ i n t e r e s t ␣ i n ␣ t h e ␣ o n e s ␣ I ␣ u s e d ␣ t o ␣ have ␣ a s ␣ a ␣ k i d , ␣ and ␣

n e v e r ␣ d e v e l o p e d ␣ new ␣ o n e s .
52

So ␣ I ␣ g u e s s ␣ I 'm wondering / h o p i n g you a l l might have some a d v i c e on how t o g e t a 53

b i t b e t t e r , something t h a t doesn ' t ␣ i n v o l v e ␣ the ␣ t h i n g s ␣ I ␣ j u s t ␣ mentioned ␣ ?

I ␣ don ' t know . I wish w a n t i n g t o g e t b e t t e r was enough t o s p u r me i n t o a c t i o n . 55

S o r r y i f what I 'm␣ a s k i n g ␣ f o r ␣ d o e s n ' t e x i s t , or i f I wasn ' t ␣ v e r y ␣ c l e a r .

Thanks ␣ f o r ␣ r e a d i n g 56

Output
Large Language Models for Mental Health Diagnostic Assessments 23

[ [ ' F e e l i n g −bad − about − y o u r s e l f − or − t h a t −you − a r e −a − f a i l u r e − or − have − l e t − 1

y o u r s e l f − or − your − f a m i l y −down ' , ' y e s ' ] , [ ' F e e l i n g −down− d e p r e s s e d − or −

h o p e l e s s ' , ' y e s ' ] , [ ' F e e l i n g − t i r e d − or − having − l i t t l e − e n e r g y ' , ' y e s ' ] , [ '
L i t t l e − i n t e r e s t − or − p l e a s u r e − in − d o i n g ␣ ' , ' y e s ' ] , [ ' Moving − or − s p e a k i n g − so
− s l o w l y − t h a t − o t h e r − p e o p l e − c o u l d − have − n o t i c e d −Or− the − o p p o s i t e − b e i n g − so −
f i d g e t y − or − r e s t l e s s − t h a t −you − have − been −moving − around −a − l o t −more − than −
u s u a l ' , ' no ' ] , [ ' Poor − a p p e t i t e − or − o v e r e a t i n g ' , ' no ' ] , [ ' Thoughts − t h a t −
you −would −be − b e t t e r − o f f − dead − or − of − h u r t i n g − y o u r s e l f − in −some −way ' , ' no '
] , [ ' T r o u b l e − c o n c e n t r a t i n g −on − t h i n g s − such − as − r e a d i n g − the − newspaper − or −
w a tch ing − t e l e v i s i o n ' , ' y e s ' ] , [ ' T r o u b l e − f a l l i n g − or − s t a y i n g − a s l e e p − or −
s l e e p i n g − too −much ' , ' no ' ] ]

D TRADITIONAL MACHINE LEARNING-BASED APPROACHES

Table 14. Evaluation of Traditional ML-based methods for PHQ-9 Symptom Annotations of the PRIMATE Posts Using F1 scores.

Method F1-score
Logistic Regression 0.49
Random Forest 0.38
XGBoost 0.65

Enhanced Large Language Models For Effective Screening of Depression and Anxiety
No ratings yet
Enhanced Large Language Models For Effective Screening of Depression and Anxiety
19 pages
LLMs in Mental Health: A Scoping Review
No ratings yet
LLMs in Mental Health: A Scoping Review
20 pages
PsychData Evaluation
No ratings yet
PsychData Evaluation
51 pages
Large Language Models in Mental Health Care: A Scoping Review
No ratings yet
Large Language Models in Mental Health Care: A Scoping Review
22 pages
Mental-LLM: Leveraging Large Language Models For Mental Health Prediction Via Online Text Data
No ratings yet
Mental-LLM: Leveraging Large Language Models For Mental Health Prediction Via Online Text Data
32 pages
Classifying Anxiety and Depression Through LLMs Virtual Interactions A Case Study With ChatGPT
No ratings yet
Classifying Anxiety and Depression Through LLMs Virtual Interactions A Case Study With ChatGPT
6 pages
Unveiling and Mitigating Bias in Mental Health Analysis With Large Language Models
No ratings yet
Unveiling and Mitigating Bias in Mental Health Analysis With Large Language Models
17 pages
Exploring The Panorama of Anxiety Levels A Multi-Scenario Study Based On Human-Centric Anxiety Level
No ratings yet
Exploring The Panorama of Anxiety Levels A Multi-Scenario Study Based On Human-Centric Anxiety Level
21 pages
Comparing Traditional Natural Language Processing and Large Language Models For Mental Health Status Classification: A Multi-Model Evaluation
No ratings yet
Comparing Traditional Natural Language Processing and Large Language Models For Mental Health Status Classification: A Multi-Model Evaluation
13 pages
A Computer Vision Based Image Processing System Fo
No ratings yet
A Computer Vision Based Image Processing System Fo
31 pages
Interpretable Mental Health Analysis with LLMs
No ratings yet
Interpretable Mental Health Analysis with LLMs
22 pages
Re 10
No ratings yet
Re 10
16 pages
Large Language Models For Mental Health Applications: A Systematic Review
No ratings yet
Large Language Models For Mental Health Applications: A Systematic Review
47 pages
2024.chim Overview of The CLPsych 2024 Shared Task
No ratings yet
2024.chim Overview of The CLPsych 2024 Shared Task
14 pages
PsycoLLM - Enhancing LLM For Psychological Understanding and Evaluation
No ratings yet
PsycoLLM - Enhancing LLM For Psychological Understanding and Evaluation
12 pages
Inducing Anxiety in Large Language Models Can Induce Bias
No ratings yet
Inducing Anxiety in Large Language Models Can Induce Bias
18 pages
Supporting The Demand On Mental Health Services Wi
No ratings yet
Supporting The Demand On Mental Health Services Wi
26 pages
A Systematic Evaluation of LLM Strategies For Mental Health Text Analysis: Fine-Tuning vs. Prompt Engineering vs. RAG
No ratings yet
A Systematic Evaluation of LLM Strategies For Mental Health Text Analysis: Fine-Tuning vs. Prompt Engineering vs. RAG
9 pages
Arge Language Models For Mental Health: Andreas - Triantafyllopoulos@tum - de
No ratings yet
Arge Language Models For Mental Health: Andreas - Triantafyllopoulos@tum - de
14 pages
Survey On ML and DL in Health
No ratings yet
Survey On ML and DL in Health
6 pages
Exploring Large-Scale Language Models To Evaluate EEG-Based Multimodal Data For Mental Health
No ratings yet
Exploring Large-Scale Language Models To Evaluate EEG-Based Multimodal Data For Mental Health
6 pages
PHQ-V GAD-V Assessments To Identify Signals of Depression
No ratings yet
PHQ-V GAD-V Assessments To Identify Signals of Depression
15 pages
Can AI Relate: Testing Large Language Model Response For Mental Health Support
No ratings yet
Can AI Relate: Testing Large Language Model Response For Mental Health Support
15 pages
A Computational Framework For Behavioral Assessment of LLM Therapists
No ratings yet
A Computational Framework For Behavioral Assessment of LLM Therapists
52 pages
Prompt Engineering in LLMs for Psychiatry
No ratings yet
Prompt Engineering in LLMs for Psychiatry
5 pages
LLM4psych Multimodalities
No ratings yet
LLM4psych Multimodalities
31 pages
AI Will Change The Future of Psychotherapy
No ratings yet
AI Will Change The Future of Psychotherapy
30 pages
2025 Clpsych-1 16
No ratings yet
2025 Clpsych-1 16
25 pages
LLM State Anxiety
No ratings yet
LLM State Anxiety
6 pages
The Role of Natural Language Processing (NLP) in Mental Health Diagnostics
No ratings yet
The Role of Natural Language Processing (NLP) in Mental Health Diagnostics
1 page
Machine Learning for Mental Health Diagnostics
No ratings yet
Machine Learning for Mental Health Diagnostics
21 pages
Epics Documentation
No ratings yet
Epics Documentation
24 pages
Updated References
No ratings yet
Updated References
4 pages
Ijerph 21 00910
No ratings yet
Ijerph 21 00910
12 pages
LLMs in Mental Health: A Review
No ratings yet
LLMs in Mental Health: A Review
40 pages
Batch 16 2nd
No ratings yet
Batch 16 2nd
43 pages
Early Detection of Mental Health Disorders Using Machine Learning Models Using Behavioral and Voice Data Analysis
No ratings yet
Early Detection of Mental Health Disorders Using Machine Learning Models Using Behavioral and Voice Data Analysis
19 pages
Detecting Mental Disorders in Social Media Through Emotional Patterns The Case of Depression
No ratings yet
Detecting Mental Disorders in Social Media Through Emotional Patterns The Case of Depression
39 pages
Combinando 4LLMS
No ratings yet
Combinando 4LLMS
5 pages
Shi - Etal - 2024 - Enhancing Depression Diagnosis With Chain of Thought Prompting
No ratings yet
Shi - Etal - 2024 - Enhancing Depression Diagnosis With Chain of Thought Prompting
6 pages
Deep Learning in Mental Health Outcome Research
No ratings yet
Deep Learning in Mental Health Outcome Research
26 pages
Second Review
No ratings yet
Second Review
20 pages
Deep Learning in Mental Health Outcome Research: A Scoping Review
No ratings yet
Deep Learning in Mental Health Outcome Research: A Scoping Review
26 pages
Diagnostic Reasoning Prompts Reveal The Potential For Large Language Model Interpretability in Medicine
No ratings yet
Diagnostic Reasoning Prompts Reveal The Potential For Large Language Model Interpretability in Medicine
5 pages
Depression Detection via CNN & GRU
No ratings yet
Depression Detection via CNN & GRU
12 pages
Towards Accurate Differential Diagnosis With Large Language Models
No ratings yet
Towards Accurate Differential Diagnosis With Large Language Models
17 pages
LLMs4Psych Arabic
No ratings yet
LLMs4Psych Arabic
35 pages
Rep 1
No ratings yet
Rep 1
10 pages
5.1. AI in Psychology Research
No ratings yet
5.1. AI in Psychology Research
20 pages
1 s2.0 S0167865523000430 Main
No ratings yet
1 s2.0 S0167865523000430 Main
8 pages
Predicting Stress, Anxiety, and Depression From Social Media Comments: A Holistic Multi-Modal Deep Learning and NLP Framework
No ratings yet
Predicting Stress, Anxiety, and Depression From Social Media Comments: A Holistic Multi-Modal Deep Learning and NLP Framework
6 pages
Mini
No ratings yet
Mini
30 pages
Mental Health Equity in LLMS: Leveraging Multi-Hop Question Answering To Detect Amplified and Silenced Perspectives
No ratings yet
Mental Health Equity in LLMS: Leveraging Multi-Hop Question Answering To Detect Amplified and Silenced Perspectives
19 pages
Viraj Research
No ratings yet
Viraj Research
6 pages
Large Language Models For Literature Reviews - An Exemplary Comparison of LLM-based Approaches With Manual Methods
No ratings yet
Large Language Models For Literature Reviews - An Exemplary Comparison of LLM-based Approaches With Manual Methods
7 pages
Conference
No ratings yet
Conference
7 pages
IJISAE 11 Mukesh+Tripathi 3 5268
No ratings yet
IJISAE 11 Mukesh+Tripathi 3 5268
7 pages
Sse 24 170-3
No ratings yet
Sse 24 170-3
12 pages
Large-Scale Digital Phenotyping Identifying Depression and Anxiety Indicators in A General UK Popula
No ratings yet
Large-Scale Digital Phenotyping Identifying Depression and Anxiety Indicators in A General UK Popula
22 pages
Machine Learning To Detect Anxiety Disorders From Error-Related Negativity and EEG Signals
No ratings yet
Machine Learning To Detect Anxiety Disorders From Error-Related Negativity and EEG Signals
15 pages
Supportive Psychotherapy On Insomnia Induced by COVID-19 Evaluation of Patients and Hospital Staff
No ratings yet
Supportive Psychotherapy On Insomnia Induced by COVID-19 Evaluation of Patients and Hospital Staff
36 pages
Measuring Anxiety Levels With Head Motion Patterns in Severe Depression Population
No ratings yet
Measuring Anxiety Levels With Head Motion Patterns in Severe Depression Population
10 pages
WorryWords Norms of Anxiety Association For Over 44k English Words
No ratings yet
WorryWords Norms of Anxiety Association For Over 44k English Words
18 pages
Self-Guided Virtual Reality Therapy For Anxiety A Systematic Review
No ratings yet
Self-Guided Virtual Reality Therapy For Anxiety A Systematic Review
40 pages
What Is Ego State Therapy and What Are Ego States
No ratings yet
What Is Ego State Therapy and What Are Ego States
5 pages
Internet Banking Account Statement
100% (1)
Internet Banking Account Statement
1 page
Allyson Resume
No ratings yet
Allyson Resume
2 pages
E-Personal Claim Submission Guide
No ratings yet
E-Personal Claim Submission Guide
20 pages
Need Scope of School Counseling
No ratings yet
Need Scope of School Counseling
24 pages
Marketing Mix Analysis for Enterprise Cars
No ratings yet
Marketing Mix Analysis for Enterprise Cars
4 pages
Biometric Access Control Systems for Sale
No ratings yet
Biometric Access Control Systems for Sale
3 pages
ITIL Framework for IT Service Management
100% (1)
ITIL Framework for IT Service Management
29 pages
SAP GoingLive Check For VAR Service Desk
No ratings yet
SAP GoingLive Check For VAR Service Desk
3 pages
Paper I
No ratings yet
Paper I
9 pages
Requirements For Tshwane University of Technology
0% (1)
Requirements For Tshwane University of Technology
7 pages
TI-Nspire Shortcuts and Tips: Handheld Computer
No ratings yet
TI-Nspire Shortcuts and Tips: Handheld Computer
3 pages
Financial Prompt Caching in RAG Apps
No ratings yet
Financial Prompt Caching in RAG Apps
18 pages
Grade 11 Term 1 Summative Assessment
No ratings yet
Grade 11 Term 1 Summative Assessment
2 pages
Lefdef WN
No ratings yet
Lefdef WN
16 pages
Annex-IV - List of Colouring Agents Allowed For Use in Cosmetic Products
No ratings yet
Annex-IV - List of Colouring Agents Allowed For Use in Cosmetic Products
8 pages
Out, Up, and Through - Fault Tolerant Tennis
No ratings yet
Out, Up, and Through - Fault Tolerant Tennis
7 pages
Shristi SAP Resume
No ratings yet
Shristi SAP Resume
2 pages
The Body in The Mind The Bodily Basis of
No ratings yet
The Body in The Mind The Bodily Basis of
5 pages
Henry Purcell S Dido and Aeneas Second Edition Ellen T. Harris No Waiting Time
100% (6)
Henry Purcell S Dido and Aeneas Second Edition Ellen T. Harris No Waiting Time
76 pages
Whirling of Shaft Demonstrator
No ratings yet
Whirling of Shaft Demonstrator
8 pages
RG-EW7200BE PRO Datasheet 20250122
No ratings yet
RG-EW7200BE PRO Datasheet 20250122
9 pages
Consumer Buying Behaviuor of Yamaha Bike
86% (7)
Consumer Buying Behaviuor of Yamaha Bike
112 pages
Industrial Butterfly Valve Specs
No ratings yet
Industrial Butterfly Valve Specs
4 pages
Calibrating A Trip Distribution Gravity Model Stratified by The Trip Purposes For The City of Alexandria
No ratings yet
Calibrating A Trip Distribution Gravity Model Stratified by The Trip Purposes For The City of Alexandria
13 pages
Gis Lab 4 (05220073)
No ratings yet
Gis Lab 4 (05220073)
18 pages
Ejemplos de Resume en Ingles
100% (2)
Ejemplos de Resume en Ingles
5 pages
Computer Files Extensions 2
No ratings yet
Computer Files Extensions 2
4 pages
Developing Fluent Readers - Reading Rockets
No ratings yet
Developing Fluent Readers - Reading Rockets
8 pages