Exploring Large-Scale Language Models To Evaluate EEG-Based Multimodal Data For Mental Health
Exploring Large-Scale Language Models To Evaluate EEG-Based Multimodal Data For Mental Health
Aaron J. Quigley
[email protected]
CSIRO’s Data61
Sydney, NSW, Australia
412
UbiComp Companion ’24, October 5–9, 2024, Melbourne, VIC, Australia Yongquan Hu et al.
single modality data such as Mental-LLM [44] and EEG-GPT [21], introduced EEG-GPT, using GPT models to classify and interpret
and the exploration of LLMs in evaluating multimodal sensing data EEG data [21]. However, these studies still focus on single modality,
for mental health remains limited. Moreover, existing multimodal such as text or EEG. Given various modalities can provide rich
LLMs have been developed primarily using audio and video modal- and complementary information to infer health conditions, it is
ities. They may lack the capabilities in handling other types of data, proposed to consider different modalities in the automatic systems
such as EEG and other physiological signals which play a crucial as well, especially with EEG in many mental health applications.
role [12] in mental health assessment. Among various physiologi- However, research on LLMs for multimodal data with EEG is still
cal signals, EEG is particularly valuable, providing high-frequency limited for mental health prediction. Our proposed MultiEEG-GPT
data that accurately assesses conditions such as depression, mood, pioneers the work in examining multimodal data including EEG to
and stress levels [5]. Therefore, understanding how these LLMs infer health conditions, aiming to bridge this gap by enhancing the
process EEG data and how to effectively combine EEG with existing processing of multimodal signals, with a particular focus on EEG
modalities remains an open question. data.
This paper introduces MultiEEG-GPT, a method for assessing
mental health using multimodalities, especially with EEG, i.e., EEG
and facial expression or audio. The latest GPT-4o API1 is adopted
3 Methodology
for processing multimodalities to recognize the health conditions. 3.1 Dataset Selection
Unlike its predecessors such as GPT-4 and GPT-4v, which require Various mental health dataset existed, of which numerous con-
separate interface calls, GPT-4o integrates multimodal data pro- tained EEG modality. Applying the criteria that the dataset need to
cessing into a single interface, enhancing the development of this contain at least EEG modality, we selected 3 most commonly used
method [30]. This work aims to understand the capabilities of mul- datasets: (1) MODMA [5] was developed by Hanshu et al., and this
timodal LLMs in categorising various mental health conditions. multimodal dataset is designed for analyzing depression disorders
This work seeks to compare their ability to model different modali- and includes oral records (audio) of both patients and controls, and
ties and EEG and design optimal prompt engineering to facilitate EEG data (convertible to images) from these groups. This dataset
reliable prediction. has binary labels of whether the participant was diagonsed with
The contributions of this paper include: i) the prompt engineer- Major Depressive Disorder (MDD). (2) PME4 [7] is a multimodal
ing design using both zero-shot and few-shot approaches to exam- emotion dataset featuring four modalities: audio, video (not publicly
ine the predictive capability of MultiEEG-GPT using multimodal- available), EEG, and electromyography (EMG) [7]. It was collected
ities in recognizing different health conditions; ii) experiments from 11 acting students (five female and six male) who provided in-
across three different databases to validate the effectiveness of formed consent. This dataset focuses on identifying seven emotions:
MultiEEG-GPT. iii) an in-depth analysis to understand how multi- anger, fear, disgust, sadness, happiness, surprise, and a neutral state;
modalities enhance health condition predictions compared to single (3) LUMED-2 [9] was collected by Loughborough University and
modalities. We aim to open up further developments, such as health- Hacettepe University, and it was designed to analyze facial expres-
supportive social robots [4, 19, 23], within the context of ubiquitous sion, EEG, and galvanic skin response (GSR) data to recognize and
computing, human-computer interaction, and affective computing. classify three categories of human emotions (neutral, happy, sad)
under various stimuli , advancing the understanding in affective
2 Related Work computing.
EEG-based physiological signal analysis has long been essential For MODMA and PME4, we used audio and EEG modalities,
for monitoring mental health, evolving alongside AI advancements. while for LUMED-2, we used facial expression and EEG modalities.
Initially focused on traditional machine algorithms like k-nearest We chose audio and facial expression features because they were the
Neighbor (k-NN) and Support Vector Machine (SVM) for EEG data, among the most prevalent modalities in mental health analysis [28,
Hou et al. demonstrated the potential of EEG for stress level recog- 37]. Besides, the focus of this paper was to explore the possibility
nition, with the accuracy of 67.07% [17]. Later, the field has shifted of GPT to analyze multimodal data, particularly with the important
towards integrating deep learning and multimodal data. Zhongjie EEG modality [16]. Thus, we did not include the physiological
et al. developed a fusion algorithm levering deep neural networks modalities (e.g., GSR, Resp).
that combines Convolutional Neural Networks (CNNs) and Bidirec-
tional Long Short-Term Memory (BiLSTM) networks for emotion
3.2 Prompt Design
classification, markedly demonstrating the impressive accuracy in
valence and arousal classifications to 93.20±2.55% and 93.18±2.71%, For our MultiEEG-GPT method, we use prompt engineering strate-
respectively [26]. gies (including zero-shot prompting and few-shot prompting) for
Recently, the advent of general LLMs capable of processing mul- prediction tasks on multiple datasets. These prompts are model-
timodal data has further pivoted the focus towards using LLMs agnostic, and we present the details of language models and settings
for evaluating mental health data, anticipating their role as future employed for our experiment in the next section.
evaluation agents. For example, Xuhai et al. tested various LLMs, For the prompting strategies, we built upon the design in [44]
including GPT-3.5 and GPT-4, across multiple datasets using meth- and [45]. We have designed the prompt to account for different
ods like zero-shot and few-shot prompting [44]. Jonathan et al. modalities and incorporated flexibility in altering the number of
modalities for evaluation. Additionally, we have verified and com-
1 https://openai.com/index/hello-GPT-4o/, accessed on June 11, 2024 pared few-shot and zero-shot prompts for evaluation.
413
Exploring LLMs to Evaluate EEG-Based Multimodal Data UbiComp Companion ’24, October 5–9, 2024, Melbourne, VIC, Australia
Zero-shot prompting. As shown in Table 1, the zero-shot timestamps to create topology maps. For the facial expression, we
prompting strategy consists of a role-play prompt, a specific task chose the middle frame of the video (e.g., if the video’s length is 10s,
description, and an additional rule to avoid unnecessary output and we chose the frame at exactly the 5s timestamp) or the image. For
restricted models to focus on the current task. The role-play prompt the audio, because GPT-4o 2 have not yet released the audio input
aims to inform the LLMs of the general task, while the specific support, we used both the audio features and the text as inputs. For
task description provides the information for different modalities. the audio features, we used librosa library to extract the features
Such description also provides the flexibility in adding or deleting from the audio and represent these features in text format (which is
modalities. Therefore, the final prompt for the model consisted of: similar to EEG-GPT’s representation [21]), which includes MFCCs,
{role-play prompt} + {task specification} + {rules}. Mel Spectrogram, Chroma STFT, etc. For the text, we transcribed
Few-shot prompting. The few-shot prompt added the few-shot the audio using automatic speech recognition (ASR) systems. We
samples after the same zero-shot prompt template. Specifically, we chose the open-sourced vosk library 3 with vosk-model-cn-0.15
include the task-specific prompt followed the zero-shot prompt, (Chinese version) or vosk-model-en-0.22 (English version) accord-
but providing the correct class labels instead of offering different ing to the need. These models were the largest and most advanced
candidate class labels for prediction, which is similar to Xuhai et ASR systems in the vosk library, which ensured the accuracy of
al’s setting [44]. recognition and was used in health care tasks [13, 32].
4.1.2 Model Settings. For all datasets and all tasks, we transformed
Table 1: The zero-shot and few-shot prompting strategies. the tasks into multi-class classification problems as in previous
<MOD1>, <MOD2> and <MOD3> as placeholders denote three work [21, 44]. For MODMA, the binary class labels are ’MDD’ or
different modalities. XXX is the description of collection and ’healthy’. For PME4, we followed the labels in the original datasets to
visualization process. <SYM> as a placeholder denotes the classify the emotion into seven classes: anger, fear, disgust, sadness,
symptom to be diagnosed. For example, for depression anal- happiness, surprise and neutral. For LUMED-2, we set the 3-class
ysis <SYM> should be replaced with depression. The example labels as in the original paper, which included neutral, happy ans
is for mental health diagonsis with three classes. The label sadness.
description “0 denotes XXX” of the classes could be added or Previous work showed that GPT-4 generally performed better
removed to accommodate for more or less classes. than GPT-3.5 [44]. Given that GPT-4o is the most recent series of
GPT-4 that naturally supports multimodal capabilities, we adopted
Role-play prompt Imagine you are a mental health expert expert GPT-4o as the tested LLMs. Specifically, we used “GPT-4o-2024-05-
at analyzing the emotion and mental health 13”4 as the targeted model through OpenAI Azure’s API5 . For the
status. few-shot experiment, we tested the 1-shot learning scenario to ex-
Task specification The below is <MOD1>, <MOD2> and <MOD3> amine the capability of multi-model LLMs with limited information
data. <MOD1> data is collected through XXX provided. In each repeated trial, we randomly selected one sample
and visualized in XXX form. <MOD2> data from the corresponding dataset to act as the 1-shot sample. This
is collected through XXX and visualized in strategy mitigates the bias of selecting samples. For all zero-shot
XXX form. <MOD3> data is collected through and few-shot experiments, we tested across each dataset (for the
XXX and visualized in XXX form. Analyze the few-shot experiment, we excluded that selected sample) for 5 times
<SYM> status of the person. 0 denotes XXX, 1 and reported the average accuracy and the standard deviation.
denotes XXX, 2 denotes XXX. We use the image updating module of GPT-4o. However, we use
Rules [Rule]: Do not output other text. no other any additional techniques (e.g., Chain-of-Thoughts [41])
to serve as a preliminary study in understanding how multimodal
LLMs process multimodal information. This approach ensures the
results reflect teh basic capability of the models, which was also
4 Experiment consistent with previous work [21, 44].
4.1 Settings
4.2 Results and Discussions
4.1.1 Dataset Settings. As all the datasets used standard 10-20
electrode layout, we set the electrodes following this layout. MNE 4.2.1 Multimodal analysis. We showed two examples of zero-shot
library is used for processing EEG signal. We processed the datasets cases using LUMED-2 and PME4 dataset in Figure 1. The first per-
using the raw data instead of their pre-processed data (e.g., PME4) son in the LUMED-2 video is in neutral mood. The MultiEEG-GPT
because the pre-processed data only contained features instead aims to recognize the participant’s mental state through the fa-
of the original signals, which were infeasible for plotting topol- cial expression and the EEG topology map. As seen in Figure 1(a),
ogy map. We used a bandpass filter (low-frequency cutoff 0.1Hz, MultiEEG-GPT first processed the image, and then analyzed the
high-frequency cutoff 45Hz, Hamming Window) [34] with firwin EEG topology map. It subsequently aggregated the results from
window design. Afterward, the filtered data were re-referenced to 2 https://community.openai.com/t/when-the-new-voice-model-for-chatgpt-4o-will-
an average reference [34]. Since the elicitation presented with dif- be-released/789928, accessed by Jun 16th, 2024
3 https://alphacephei.com/vosk/, accessed by Jun 16th, 2024
ferent length for different datasets, we chose 530s, 5s and 1.65-4.15s 4 https://openai.com/index/hello-GPT-4o/, accessed by 11st June, 2024
for LUMED-2, PME4 and MODMA datasets respectively, to account 5 https://azure.microsoft.com/en-us/products/ai-services/openai-service, accessed by
for randomly set elicitation time , with 10 equidistant sampled 11st June, 2024
414
UbiComp Companion ’24, October 5–9, 2024, Melbourne, VIC, Australia Yongquan Hu et al.
Table 2: Ablation experiment on 3 different multimodal data (EEG image, facial expression, audio). The line with no EEG image,
facial expression, audio was determined through majority voting. For few-shot prompting, we chose M=1, which meant we
added one few-shot sample in the prompt.
image and EEG jointly, and predicted the participant’s emotion 6.60% over zero-shot prompting for the MODMA, PME4 and LUMED-
state as neural. 2 databases, respectively. This suggests that additional examples
For Figure 1 (b), the participant is in a sad mood. The MultiEEG- enhance recognition, consistent with previous findings [27, 38].
GPT first analyzed the person’s audio features, and then analyzed The extra example likely serves as a benchmark for feature com-
the EEG features in the topology map through the color of the parison, allowing LLMs to assess the users’ mental health status
map. It finally combined different features and predicted that the more effectively by comparing features of the few-shot and test
participant is in a sad mood. These cases showed the capability of samples. The results indicate the general benefit of an additional ex-
MultiEEG-GPT to (1) analyze each modality separately, (2) aggre- ample, as no specific sample was intentionally selected in the 1-shot
gated the outputs based on different modalities jointly. It is also prompting setting. In summary, LLMs leveraging multimodalities
evident that a single modality is not adequate to identify the mood including EEG could significantly benefit depression and emotion
correctly. For example, MultiEEG-GPT identified the status of Fig- recognition.
ure 1 (b) as “an emotional reaction”. However, it did not accurately
state that the participant is sad from the EEG features. By com-
bining the audio features and the EEG features, MUltiEEG-GPT 5 Conclusion and Future Work
achieved the accurate prediction. This paper proposes MultiEEG-GPT to explore multimodal data,
specifically with EEG, for mental health recognition. We have de-
signed zero-shot and few-shot prompting strategies to enhance
4.2.2 Performance of MultiEEG-GPT. Table 2 presents the zero- prediction accuracy, leveraging the most recent GPT-4o as the LLM
shot and few-shot prompting performance for all three databases. base model. Three datasets, including MODMA, PME4, and LUMED-
The modalities used for MultiEEG-GPT depend on their availabil- 2, were adopted for evaluation. Our study showed that predictions
ity in the datasets. For zero-shot prompting, our proposed model, using multimodal data significantly outperform those using single-
utilizing both modalities—either EEG + facial expression or EEG + modal data. While the current prediction accuracy approaches that
audio—achieved the best performance compared to other models of traditional machine learning methods even without tuning the
using a single modality. The proposed model demonstrated relative LLMs, there is significant potential for improvement with strategies
improvements of 4.19%, 7.52%, 7.67% over the best single-modality such as instruction fine-tuning or multi-strategy hierarchical pre-
performance for the three databases, respectively. This also high- diction in future research for mental health leveraging multimodal
lighted the importance of including EEG data in addition to the LLMs.
commonly used modalities in LLMs, such as audio and video. It Moreover, the use of LLMs as health agents raises important
should be noted that the cases with all modalities removed (the ethical considerations. LLMs may exhibit value alignment problems,
first line) used majority voting similar to Xuhai et al.’s setting [44], leading to racial and gender disparities [46] or producing outcomes
serving as the baseline for model performance. misaligned with health assessment standards [20]. LLMs also pose
For the few-shot prompting, we observed a similar trend, with privacy risks [3, 33] due to data memorization and extraction [6].
multimodal models outperforming single-modality models. Addi- Fine-tuning with mental health data can lead to data leakage. These
tionally, the 1-shot prompting achieved higher performance than issues necessitate careful attention to ensure ethical compliance and
zero-shot prompting, with relative improvements of 5.45%, 8.43%, accuracy. For example, input data should be anonymized beforehand
415
Exploring LLMs to Evaluate EEG-Based Multimodal Data UbiComp Companion ’24, October 5–9, 2024, Melbourne, VIC, Australia
Figure 1: Case analysis for LUMED-2 and PME4 datasets (the person’s face has been blurred for ethical reasons). Figure
(a) illustrates one subject’s input EEG topology map and his facial expression, as well as the prediction result and the text
explanation from LUMED-2 dataset. Figure (b) illustrates one subject’s input EEG topology map, audio features, input audio
transcription “The sky is green.”, as well as the prediction result and the explanation, from PME4 dataset. In both cases, the
model makes the accurate predictions when processing both modalities.
[36], and un-learning and alignment should be integrated to the [2] Usman Arshad, Cecilia Mascolo, and Marcus Mellor. 2003. Exploiting mobile
training process to protect privacy and avoid harm [22]. computing in health-care. In Proceedings of demo session of the 3rd international
workshop on smart appliances, ICDCS03. Citeseer.
[3] Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and
Florian Tramèr. 2022. What does it mean for a language model to preserve
privacy?. In Proceedings of the 2022 ACM conference on fairness, accountability,
References and transparency. 2280–2292.
[1] Rohizah Abd Rahman, Khairuddin Omar, Shahrul Azman Mohd Noah, Mohd [4] Johana Cabrera, M Soledad Loyola, Irene Magaña, and Rodrigo Rojas. 2023.
Shahrul Nizam Mohd Danuri, and Mohammed Ali Al-Garadi. 2020. Application Ethical dilemmas, mental health, artificial intelligence, and llm-based chatbots.
of machine learning methods in mental health detection: a systematic review. In International Work-Conference on Bioinformatics and Biomedical Engineering.
Ieee Access 8 (2020), 183952–183964.
416
UbiComp Companion ’24, October 5–9, 2024, Melbourne, VIC, Australia Yongquan Hu et al.
Springer, 313–326. [28] Daniel M Low, Kate H Bentley, and Satrajit S Ghosh. 2020. Automated assess-
[5] Hanshu Cai, Zhenqin Yuan, Yiwen Gao, Shuting Sun, Na Li, Fuze Tian, Han Xiao, ment of psychiatric disorders using speech: A systematic review. Laryngoscope
Jianxiu Li, Zhengwu Yang, Xiaowei Li, et al. 2022. A multi-modal open dataset investigative otolaryngology 5, 1 (2020), 96–116.
for mental-disorder analysis. Scientific Data 9, 1 (2022), 178. [29] Lakmal Meegahapola, William Droz, Peter Kun, Amalia De Götzen, Chaitanya
[6] Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert- Nutakki, Shyam Diwakar, Salvador Ruiz Correa, Donglei Song, Hao Xu, Miriam
Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Bidoglia, et al. 2023. Generalization and personalization of mobile sensing-
et al. 2021. Extracting training data from large language models. In 30th USENIX based mood inference models: an analysis of college students in eight countries.
Security Symposium (USENIX Security 21). 2633–2650. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies
[7] Jin Chen, Tony Ro, and Zhigang Zhu. 2022. Emotion recognition with audio, 6, 4 (2023), 1–32.
video, EEG, and EMG: a dataset and baseline approaches. IEEE Access 10 (2022), [30] OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/ [Ac-
13229–13242. cessed:June 2024].
[8] Dan Chisholm, Kim Sweeny, Peter Sheehan, Bruce Rasmussen, Filip Smit, Pim [31] World Health Organization et al. 2022. World mental health report: Transforming
Cuijpers, and Shekhar Saxena. 2016. Scaling-up treatment of depression and mental health for all. (2022).
anxiety: a global return on investment analysis. The Lancet Psychiatry 3, 5 (2016), [32] Tiago F Pereira, Arthur Matta, Carlos M Mayea, Frederico Pereira, Nelson Monroy,
415–424. João Jorge, Tiago Rosa, Carlos E Salgado, Ana Lima, Ricardo J Machado, et al. 2022.
[9] Yucel Cimtay, Erhan Ekmekcioglu, and Seyma Caglar-Ozhan. 2020. Cross-subject A web-based Voice Interaction framework proposal for enhancing Information
multimodal emotion recognition based on hybrid fusion. IEEE Access 8 (2020), Systems user experience. Procedia Computer Science 196 (2022), 235–244.
168865–168878. [33] Charith Peris, Christophe Dupuy, Jimit Majmudar, Rahil Parikh, Sami Smaili,
[10] Ting Dang, Dimitris Spathis, Abhirup Ghosh, and Cecilia Mascolo. 2023. Human- Richard Zemel, and Rahul Gupta. 2023. Privacy in the time of language models.
centred artificial intelligence for mobile health sensing: challenges and opportu- In Proceedings of the sixteenth ACM international conference on web search and
nities. Royal Society Open Science 10, 11 (2023), 230806. data mining. 1291–1292.
[11] Nan Gao, Soundariya Ananthan, Chun Yu, Yuntao Wang, and Flora D Salim. 2023. [34] Kerstin Pieper, Robert P Spang, Pablo Prietz, Sebastian Möller, Erkki Paajanen,
Critiquing Self-report Practices for Human Mental and Wellbeing Computing at Markus Vaalgamaa, and Jan-Niklas Voigt-Antons. 2021. Working with envi-
Ubicomp. arXiv preprint arXiv:2311.15496 (2023). ronmental noise and noise-cancelation: a workload assessment with EEG and
[12] Ela Gore and Sheetal Rathi. 2019. Surveying machine learning algorithms on eeg subjective measures. Frontiers in neuroscience 15 (2021), 771533.
signals data for mental health assessment. In 2019 IEEE Pune Section International [35] Dimitris Spathis, Sandra Servia-Rodriguez, Katayoun Farrahi, Cecilia Mascolo,
Conference (PuneCon). IEEE, 1–6. and Jason Rentfrow. 2019. Passive mobile sensing and psychological traits for
[13] Lukas Grasse, Sylvain J Boutros, and Matthew S Tata. 2021. Speech interaction to large scale mood prediction. In Proceedings of the 13th EAI international conference
control a hands-free delivery robot for high-risk health care scenarios. Frontiers on pervasive computing technologies for healthcare. 272–281.
in Robotics and AI 8 (2021), 612750. [36] Robin Staab, Mark Vero, Mislav Balunovic, and Martin Vechev. 2024. Large
[14] Alberto Greco, Gaetano Valenza, and Enzo Pasquale Scilingo. 2016. Advances in Language Models are Anonymizers. In ICLR 2024 Workshop on Reliable and
Electrodermal activity processing with applications for mental health. Springer. Responsible Foundation Models.
[15] Unsoo Ha, Yongsu Lee, Hyunki Kim, Taehwan Roh, Joonsung Bae, Changhyeon [37] Chang Su, Zhenxing Xu, Jyotishman Pathak, and Fei Wang. 2020. Deep learning
Kim, and Hoi-Jun Yoo. 2015. A wearable EEG-HEG-HRV multimodal system with in mental health outcome research: a scoping review. Translational Psychiatry
simultaneous monitoring of tES for mental health management. IEEE transactions 10, 1 (2020), 116.
on biomedical circuits and systems 9, 6 (2015), 758–766. [38] Hao Sun, Jiaqing Liu, Shurong Chai, Zhaolin Qiu, Lanfen Lin, Xinyin Huang, and
[16] Blake Anthony Hickey, Taryn Chalmers, Phillip Newton, Chin-Teng Lin, David Yenwei Chen. 2021. Multi-Modal Adaptive Fusion Transformer Network for the
Sibbritt, Craig S McLachlan, Roderick Clifton-Bligh, John Morley, and Sara Lal. Estimation of Depression Level. Sensors 21, 14 (2021). https://doi.org/10.3390/
2021. Smart devices and wearable technologies to detect and monitor mental s21144764
health conditions and stress: A systematic review. Sensors 21, 10 (2021), 3461. [39] Teo Susnjak, Peter Hwang, Napoleon H Reyes, Andre LC Barczak, Timothy R
[17] Xiyuan Hou, Yisi Liu, Olga Sourina, Yun Rui Eileen Tan, Lipo Wang, and Wolfgang McIntosh, and Surangika Ranathunga. 2024. Automating research synthesis with
Mueller-Wittig. 2015. EEG based stress monitoring. In 2015 IEEE international domain-specific large language model fine-tuning. arXiv preprint arXiv:2404.08680
conference on systems, man, and cybernetics. IEEE, 3110–3115. (2024).
[18] Xiaozhu Hu, Yanwen Huang, Bo Liu, Ruolan Wu, Yongquan Hu, Aaron J Quigley, [40] Zhiyuan Wang, Maria A Larrazabal, Mark Rucker, Emma R Toner, Katharine E
Mingming Fan, Chun Yu, and Yuanchun Shi. 2023. SmartRecorder: An IMU-based Daniel, Shashwat Kumar, Mehdi Boukhechba, Bethany A Teachman, and Laura E
Video Tutorial Creation by Demonstration System for Smartphone Interaction Barnes. 2023. Detecting social contexts from mobile sensing indicators in vir-
Tasks. In Proceedings of the 28th International Conference on Intelligent User Inter- tual interactions with socially anxious individuals. Proceedings of the ACM on
faces. 278–293. Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023), 1–26.
[19] Yongquan Hu, Hui-Shyong Yeo, Mingyue Yuan, Haoran Fan, Don Samitha Elvit- [41] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi,
igala, Wen Hu, and Aaron Quigley. 2023. Microcam: Leveraging smartphone Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning
microscope camera for context-aware contact surface sensing. Proceedings of in large language models. Advances in neural information processing systems 35
the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023), (2022), 24824–24837.
1–28. [42] Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan
[20] Inthrani Raja Indran, Priya Paranthaman, Neelima Gupta, and Nurulhuda Mustafa. Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, et al. 2024. MindShift: Leveraging
2024. Twelve tips to leverage AI for efficient and effective medical question Large Language Models for Mental-States-Based Problematic Smartphone Use
generation: a guide for educators using Chat GPT. Medical Teacher (2024), 1–6. Intervention. In Proceedings of the CHI Conference on Human Factors in Computing
[21] Jonathan W Kim, Ahmed Alaa, and Danilo Bernardo. 2024. EEG-GPT: Exploring Systems. 1–24.
Capabilities of Large Language Models for EEG Classification and Interpretation. [43] Xuhai Xu, Xin Liu, Han Zhang, Weichen Wang, Subigya Nepal, Yasaman Se-
arXiv preprint arXiv:2401.18006 (2024). fidgar, Woosuk Seo, Kevin S Kuehn, Jeremy F Huckins, Margaret E Morris, et al.
[22] Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, and Scott A Hale. 2024. The 2023. GLOBEM: cross-dataset generalization of longitudinal human behavior
benefits, risks and bounds of personalizing the alignment of large language modeling. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
models to individuals. Nature Machine Intelligence (2024), 1–10. Technologies 6, 4 (2023), 1–34.
[23] Tin Lai, Yukun Shi, Zicong Du, Jiajie Wu, Ken Fu, Yichao Dou, and Ziqi Wang. [44] Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler,
2023. Psy-llm: Scaling up global mental health psychological services with ai- Marzyeh Ghassemi, Anind K Dey, and Dakuo Wang. 2024. Mental-llm: Leveraging
based large language models. arXiv preprint arXiv:2307.11991 (2023). large language models for mental health prediction via online text data. Proceed-
[24] Bishal Lamichhane. 2023. Evaluation of chatgpt for nlp-based mental health ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, 1
applications. arXiv preprint arXiv:2303.15727 (2023). (2024), 1–32.
[25] Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, and Michelle Li. 2024. [45] Hao Xue and Flora D Salim. 2023. Promptcast: A new prompt-based learning
OmniActions: Predicting Digital Actions in Response to Real-World Multimodal paradigm for time series forecasting. IEEE Transactions on Knowledge and Data
Sensory Inputs with LLMs. In Proceedings of the CHI Conference on Human Factors Engineering (2023).
in Computing Systems. 1–22. [46] Travis Zack, Eric Lehman, Mirac Suzgun, Jorge A Rodriguez, Leo Anthony Celi,
[26] Zhongjie Li, Gaoyan Zhang, Jianwu Dang, Longbiao Wang, and Jianguo Wei. Judy Gichoya, Dan Jurafsky, Peter Szolovits, David W Bates, Raja-Elie E Abdul-
2021. Multi-modal emotion recognition based on deep learning of EEG and audio nour, et al. 2024. Assessing the potential of GPT-4 to perpetuate racial and gender
signals. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, biases in health care: a model evaluation study. The Lancet Digital Health 6, 1
1–6. (2024), e12–e22.
[27] Liangliang Liu, Zhihong Liu, Jing Chang, and Xue Xu. 2024. A multi-modal
extraction integrated model for neuropsychiatric disorders classification. Pattern
Recognition (2024), 110646.
417