Paper - Psychological Stress Using Social Media
Paper - Psychological Stress Using Social Media
Abstract
A body of literature has demonstrated that users’ mental health
conditions, such as depression and anxiety, can be predicted
from their social media language. There is still a gap in the sci-
entific understanding of how psychological stress is expressed
on social media. Stress is one of the primary underlying causes
and correlates of chronic physical illnesses and mental health
conditions. In this paper, we explore the language of psycho-
logical stress with a dataset of 601 social media users, who
answered the Perceived Stress Scale questionnaire and also
consented to share their Facebook and Twitter data. Firstly, Figure 1: Overview of the approach taken to build language
we find that stressed users post about exhaustion, losing con- based stress prediction model.
trol, increased self-focus and physical pain as compared to
posts about breakfast, family-time, and travel by users who
are not stressed. Secondly, we find that Facebook language is
more predictive of stress than Twitter language. Thirdly, we very severe, alleviating psychological stress and promoting
demonstrate how the language based models thus developed a healthy lifestyle is more efficient compared to treating a
can be adapted and be scaled to measure county-level trends. more acute and chronic condition (Wilkinson 2005).
Since county-level language is easily available on Twitter us- People are increasingly using social media platforms in
ing the Streaming API, we explore multiple domain adaptation
algorithms to adapt user-level Facebook models to Twitter lan-
order to inform others about their mental states, solicit so-
guage. We find that domain-adapted and scaled social media- cial support, as well as keep records of their daily activities,
based measurements of stress outperform sociodemographic preferences, and interests. Notwithstanding the challenge of
variables (age, gender, race, education, and income), against working with a non-random, non-representative sample of
ground-truth survey-based stress measurements, both at the social media users, studies have identified the markers of self-
user- and the county-level in the U.S. Twitter language that disclosure which concern depression (Guntuku et al. 2017c),
scores higher in stress is also predictive of poorer health, less schizophrenia (Ernala et al. 2017), ADHD (Guntuku et al.
access to facilities and lower socioeconomic status in counties. 2017b), alcohol consumption (Liu, Weitzman, and Chunara
We conclude with a discussion of the implications of using 2017), and personality (Guntuku et al. 2017a). With respect to
social media as a new tool for monitoring stress levels of both stress, the linguistic features of event-related stress have been
individuals and counties.
predicted from social media posts about experiences such as
travel and work (Lin et al. 2014); however, these findings
Introduction cannot be applied to improve the psychological understand-
ing of stress, because people suffering from chronic stress
Stress is defined as perceived distress caused by an interaction
do so irrespective of stressful events. For instance, preparing
between a person and their environment (Cohen, Miller, and
for an exam is a stressful event, while chronically feeling
Rabin 2001). While people can handle stress better or worse
overwhelmed with responsibilities is trait-related stress. An-
depending on their general coping skills, experiencing stress
other research gap is that the previous work has focused on
too frequently is known to affect well-being and physical
known stressors collected using search keywords (Thelwall
and mental health negatively. Stress is seen to be a single
2017). However the labels thus acquired likely have person-
pervasive trait influencing health, through a broad range of
ality confounds (Preotiuc-Pietro et al. 2015), emphasizing
negative affective states and somatic pathways (McEwen
the need for using stronger ground truth. Instead, we antici-
and Stellar 1993). Considering that the symptoms associated
pate that insights into psychological stress could help in (a)
with depression and other severe mental health conditions are
designing social-media-based interventions to enable a low-
Copyright c 2019, Association for the Advancement of Artificial stress lifestyle, and (b) developing a better understanding of
Intelligence (www.aaai.org). All rights reserved. regional variations in stress.
Table 1: Items on the Cohen’s stress scale: Each question is from a large number of people for this research study. Adapt-
assessed on a Likert Scale. (-) indicates reverse coded items. ing models trained at user-level to predict stress in counties
In the last month, how often have you: has multiple applications in monitoring health and well-being
- been upset because of something that happened unexpectedly? in counties, especially where survey data is hard to collect.
- felt that you were unable to control the important things in your life?
- felt nervous and ”stressed”? Thus motivated, in this paper we address the following re-
- felt confident about your ability to handle your personal problems? (-) search questions:
- felt that things were going your way? (-)
- found that you could not cope with all the things that you had to do? RQ1 How does social media language of users who are
- been able to control irritations in your life? (-) stressed differ from those who are not?
- felt that you were on top of things?(-)
- been angered because of things that were outside of your control? RQ2 How do Facebook and Twitter language differ in pre-
- felt difficulties were piling up so high that you could not overcome them? dicting user-level stress? Since county-level language
is easily available on Twitter, can off-the-shelf domain
adaptation algorithms be used to improve prediction
The ubiquitous nature of smart devices and Internet ac- on Twitter language?
cess in almost all parts of the world means that social media RQ3 How do domain-adapted social media-based measure-
is a potentially powerful tool to measure the psychological ments of stress at the county level correlate with health
states and behaviors of people at both micro- (individual) and behaviors and socioeconomic characteristics at the
macro- (county) levels. However, except for a few studies, county level?
little has been done to explore how to scale language models
to study regions, and no work yet has attempted to do this for
stress. Although some studies have analyzed the geographic RQ1: Differential Language Analysis of
variation in social media language corresponding to chronic Stressed Users
illnesses (Culotta 2014), depression (Bagroy, Kumaraguru, Methods
and De Choudhury 2017), well-being (Schwartz et al. 2013a),
heart-disease (Eichstaedt et al. 2015), and happiness (Quer- User-level social media data: We deployed a survey on
cia et al. 2012), they are often challenged by (a) a limited Qualtrics2 (a platform similar to Amazon Mechanical Turk),
understanding of how to scale user-level models to measure comprising several demographic questions (age, gender, race,
counties, and (b) the lack of ground truth about a large pop- education, and income) and the Cohen’s 10-item Stress scale
ulation. Recent work has shown the need to use weighting (Cohen, Kessler, and Gordon 1997) (Table 1), and invited
and scaling techniques to transform user-level language esti- users to share access to their Facebook status updates and/or
mates from Twitter to county-level estimates (Rieman et al. Twitter usernames. Users received an incentive for their par-
2017), in order to avoid the ecological fallacies reported in ticipation, and we obtained their informed consent to access
the studies mentioned above. However, there are challenges their Facebook and Twitter posts. All users were based in
associated with transferring predictive models from one so- the US. This study received approval from the IRB of our
cial media platform to another, because of differences in institution.
self-disclosure on Facebook vs. Twitter (Jaidka et al. 2018). Out of all users who took the survey, 601 users completed
This study contributes with (a) effective off-the-shelf meth- the survey and had active accounts with more than 900 words
ods for cross-domain adaptation of user-level stress models on both Facebook and Twitter. We collected their Facebook
to measure county-level stress, and (b) validation against posts by using the Facebook Graph API and downloaded
ground-truth survey data built on over two million responses. their Twitter posts using the Twitter API. Of these 601 users,
To summarize, the research gaps in previous work are the 265 self-identified as female. The mean age of the sample
lack of language models to predict psychological stress, a was 38. The stress scores range from 6 to 39 (mean 30). Each
limited understanding of scaling and transforming Facebook item in the scale is scored on 0-4, with an absolute maximum
language models to work on Twitter, and the lack of valida- summing to 40.
tion against the region-level ground truth. We show in Figure Features: We process all our social media posts using the
1, how our study uses transfer learning to adapt user-level HappierFunTokenizer available with the DLATK package
models trained on their Facebook language, to predict county- (Schwartz et al. 2017) which is emoticon- and social media-
level stress from the county-level Twitter language. This is aware. We then represent the language of each user and
necessary because it is easier to train predictive models on county as a set of features. In the dictionary-based method,
the language of a small population of social media users, we transform social media language into numerical features
but it is expensive to survey entire counties for training pur- representing percentage proportions of lexical categories in
poses. Furthermore, county-level social media language is an existing dictionary. In the data-driven method, we trans-
only available for Twitter, where approximately 20% of all form language into numerical features which represent the
public posts are geo-tagged with their location information, proportions of word clusters which are statistically similar
and they can be easily mined by using Twitter’s Streaming according to their frequency distributions.
API1 . On the contrary, Facebook needs user authentication LIWC: We use Linguistic Inquiry and Word Count
for accessing their posts which is resource intensive to collect (LIWC) (Pennebaker, Booth, and Francis 2007), a dictionary
1 2
https://developer.twitter.com/en www.qualtrics.com/Survey-Software
Figure 2: Words and phrases associated with a) high-stress users (red) and b) low-stress users (blue). The size of the word
indicates the correlation strength and the color indicates frequency (darker is more frequent). Correlations are controlled for age
and gender, and are significant at p < .01, two-tailed t-test Bonferroni p-correction.
comprising 73 different psycholinguistic categories (e.g., top- ful stimulus and determining own coping abilities (Staal
ical categories, emotions, parts-of-speech) to represent the 2004), this result is very intuitive. It should be noted that
language of each user and county as the normalized frequency increased self-focus in stressful situations is likely adaptive,
distributions. but a prolonged self-focus in one’s thoughts, especially in
Topics: In the data-driven method, we represent the lan- the context of negative affect, is known to psychologists
guage of each user and county as normalized frequency dis- as rumination and linked to negative effects for health and
tributions for a set of topics derived using Latent Dirichlet well-being (Moberly and Watkins 2008). Hate could signify
Allocation. These topics are an open-source resource avail- aggressive or angry affect, which also tends to be maladap-
able through the DLATK package (Schwartz et al. 2017) and tive and could signify the experience of frustration while
were trained on a corpus of over 20 million Facebook statuses perceiving resources fall short of perceived demands.
(Schwartz et al. 2013b). Words and phrases such as “me”, ”I had”, “feel like”, ”I
TensiStrength: To detect direct and indirect expressions don’t” and ”I hate” are significantly correlated with high
of stress or relaxation, a stress lexicon (Thelwall 2017) is stress. Furthermore, the language of high stress appears to
used. We obtain stress scores at the sentence level, from each be marked by expressions of perceived lack of control and
post on Facebook and Twitter and aggregate them to users by expressions of a need state or lack of resources (“struggling
calculating mean stress scores. with”, “tired of”, “I need”), as well as negative-angry affect
User engagement We extracted features such as the num- (“hate”, “I hate”). Further, high stress language seem to be
ber of posts made between 12am-6am, mean message length, comorbid with mental health conditions (“depression”, “de-
and number of URLs & hashtags. These features were shown pressed”, “anxiety”, “bipolar”). It is interesting that language
to be predictive of stress by (Lin et al. 2014). reflects the adverse effects that stress can have on health (Wat-
son and Pennebaker 1989). The language of low stress has
Results prominent positive affect (“excited to”), discussions of meals
We identify the linguistic characteristics indicative of high (“breakfast”), as well as feelings of social inclusion (“joined
and low psychological stress based on the social media lan- the”). The language of low stress comprises the discussion
guage of individual users on Facebook. We conducted a sim- of meals, specifically “breakfast”, which may indicate relax-
ilar analysis on Twitter and present them in the appendix. ation and enjoyment inasmuch as meals are often taken in the
Since we explore several features simultaneously, we con- company of others, social inclusion, as well as with positive
sider coefficients significant if they are less than a Bonferroni- affect (“excited to”).
corrected two-tailed p of 0.01 (i.e., when examining 1000 LIWC: LIWC categories from Facebook language that sig-
features, in the case of words and phrases, a passing p-value nificantly correlate with stress are shown in Table 2. The top
is less than 1x10−5 ; when examining 2000 topics p-value is correlated categories in Facebook comprise first person sin-
less than 5x10−6 , and when examining 73 LIWC categories gular pronouns (“I”), indicating increased self-focus, which
p-value is less than 1.3x10−4 ). corroborate previous findings (Pennebaker and Lay 2002)
Ngrams: We extract 1-,2-, and 3-grams from all posts to that self-references by individuals increase in emotionally
analyze significant associations between words & phrases vulnerable situations. Users on Facebook are more likely
and stress. Figure 2 provides a visualization of the Pearson to use adverbs such as “very” and “really” to emphasize
correlations of words and phrases with stress on Facebook. their point, and more likely to explicitly use words denoting
In general, the language of stress has a prominent self-focus. negative emotions, such as “hurt” and “anger”. Mentions of
Given that psychologists have found stress to be a state result- negative emotion confirms our expectations, because stress is
ing from an individual assessing the demands of the stress- an aversive state (Cohen, Kessler, and Gordon 1997). Filler
Table 2: Pearson correlations (top 5) between stress score and Table 3: Pearson correlations between stress score and Topics
LIWC features extracted on Facebook, when controlling for age and extracted from Facebook, when controlling for age and gender.
gender. All features are significant at p < .01, two-tailed t-test and Topic labels are manually created. All features are significant at
Bonferroni corrected. p < .01, two tailed t-test and Bonferroni corrected. The top 5 topics
LIWC are shown for both positive and negative correlations.
Positively Topics
most frequent words r
correlated Positively most frequent words used by high-stress
r
1st Person Singular I .22 correlated people
Adverbs very, really .17 i’m, tired, hungry, bored, exhausted,
Negation no, not, never .16 Exhaustion freaking, sleepy, sooooooooo, stressed, .21
Negative Emotion hurt, sadness, anger .17 grumpy
Fillers I mean, you know .15 i’m, sick, tired, feeling, hearing, tire, fed,
Feeling hurt .21
Negatively bullshit, assuming, hurting, numb
most frequent words r don’t, feel, anymore, beg, begging, hon-
correlated
1st Person Plural we -.18 Physical Pain estly, knees, clue, hollow, creeping, sym- .20
Affiliation friend, social -.17 pathy
Positive Emotion love, nice, sweet -.12 feel, sick, crap, feeling, ugh, hate, feels,
Feeling sick sucks, crappy, bleh, worse, miserable, .19
sickness, icky, =(
i’m, it’s, i’ve, don’t, lost, mind, quarter,
word associations may indicate a lack of self-esteem or feel- Losing control .19
wouldn’t, anymore, control, reason
ing down on oneself, and hedging as a result (Abouserie Negatively most frequent words used by low-stress
1994). Negation further suggests a ’lack of’ things, men- r
correlated people
tioned and experienced by those who are also prone to high great, lunch, nice, dinner, family, en-
stress. On the other hand, the first person plural pronouns joyed, church, wonderful, afternoon, sun-
Family and eating .13
(“We”), and the LIWC “Affiliation” categories are negatively day, kids, evening, shopping, meeting,
correlated with stress, which implies that a high-stress indi- hubby
vidual often depicts themselves as isolated, with a certain kings, delhi, leon, mumbai, rocks,
disassociation from their social circles. Negative correlations Travel queens, reached, rains, phew, bak, travel- .12
of stress with positive emotion reflects that those individuals ling, royal, metro, indians, mahal
reporting lower stress are significantly more likely to express
positive emotion.
Topics: Topics which significantly correlate with stress (a) LIWC features and (b) Topic features (c) TensiStrength
are provided in Table 3. Exhaustion is typical of prolonged scores, and (d) engagement features (such as time of posts,
stress (McManus, Winder, and Gordon 2002). Feeling hurt, number of posts, number of posts between 12am-6am) for
physical pain, and feeling sick are known correlates of stress users in four folds, and testing on the users in the held out
(Gil et al. 2004). Lack of control also signifies the concept fold.
of resources falling short of demands, and possibly reducing In the five-fold cross-validation setting, we perform linear
their ability to cope with stressors (Gray, Waytz, and Young regression with several regularization methods such as ridge,
2012). elastic-net, LASSO and L2 penalized SVMs and find that
Our findings also reiterate previous research on language elastic-net showed marginally superior performance over the
use in mental health (De Choudhury et al. 2013), as they others. Accordingly, we report results only using elastic-net.
describe symptoms of physical pain and sickness, besides The performance was measured by calculating Pearson’s r
expressing a lack of control and negative emotions. On the over the aggregated predictions from the five folds.
other hand, the mention of family meals on Facebook has We first evaluate the above features to predict stress within
a negative correlation with stress, suggesting a good social domain - i.e., we train and test on the same platform. Engage-
support network and taking time to spend with loved ones as ment features are part of the feature set used in (Lin et al.
well as travel, are ways people relax. 2014) for predicting user-level stress. We also examine how
the models perform when compared to sociodemographic
RQ2: Predicting User-Level Psychological variables, namely age, gender, race, income, and education.
Stress using Facebook and Twitter Then, we evaluate how models trained on Facebook per-
form at predicting stress from Twitter language in a cross-
Methods domain setting. Previous studies showed that predictive per-
We utilize the Facebook and Twitter data from the same users, formance changes in cross-domain applications (Jaidka et
described in the previous section, to evaluate the performance al. 2018; Zhong et al. 2017). Therefore, we then attempt to
of supervised models trained on Facebook and Twitter lan- improve the cross-domain prediction performance by apply-
guage at predicting users’ stress for a held-out set. We stratify ing domain adaptation. The motivation for cross-domain and
our set of users into five folds with a uniform distribution domain adaptation experiments is to build an accurate pre-
of age and gender traits in each fold. We conduct a cross- dictive model on Twitter language to then scale it to county
validated weighted linear regression for stress, training on level Twitter language.
Table 4: Within Domain: Stress Prediction Performance (Pearson’s as LIWC is stable across both platforms. We also combined
r) based on 5-fold cross-validation using different features and mod- both Facebook and Twitter corpora, and the model trained
els trained and tested on the same domain. Social media language on both platforms together gives a marginal improvement in
adds to and outperforms sociodemographic variables at predicting prediction performance.
stress. Given the marked distinction between expressions of stress
D: Sociodemographic variables in Facebook and Twitter language, we investigate several ap-
Feature Pearson r proaches to improve the predictive performance of models on
Age, Gender, Race Twitter. This problem can be viewed as a domain adaptation
.25
Income & Education task, where we are adapting from a source domain: users’
SM: Social media language language on one platform, to a target domain: the same users’
Feature Facebook Twitter language on another platform.
User Engagement (Lin et al. 2014) .11 .05
TensiStrength (Thelwall 2017) .17 .11 Table 5: Cross Domain: Stress Prediction Performance (Pear-
LIWC 2015 .29 .22 son’s r) based on 5-fold cross-validation. FB: Facebook; Tw:
Twitter.
Topics .31 .18 Trained on FB Trained on FB+Tw
Language + Sociodemographic Feature
Tested on Tw Tested on Tw
D + SM .33 .26 LIWC 2015 .23 .24
Topics .15 .17
Results for Within Domain Predictions
Table 4 shows the performance of predicting stress using so- Results with Domain adaptation
ciodemographic variables and social media language. Within
the domain, Facebook does better than Twitter by a slight Most prior work on domain adaptation has focused on the
margin. On Facebook, both LIWC and Topics outperform case where some labels are available on both the source and
sociodemographic variables (age, gender, race, income, and target domains and is usually done by combining (often in
education). Topics outperform LIWC on Facebook (r=.305), some weighted fashion) training data sets or, less commonly,
and LIWC outperforms Topics on Twitter (r=.218). These cor- trained models from the source and target domains. A simple
relations, which are considered a high correlation in measur- and effective supervised method was proposed by Daumé
ing internal traits (Meyer, Finn, and others 2001), show that (Daumé III, Kumar, and Saha 2010) which applied a super-
linguistic features perform reliably well at predicting stress. vised heuristic mapping from labeled data the source and
User engagement features performed rather poorly when com- target domains, to a higher dimensional feature space, which
pared to linguistic features indicating that trait-prediction is are used to train standard classifiers or linear regression mod-
a different task when compared to state-prediction. A similar els. This approach has demonstrated the best performance in
observation has been made while using user engagement fea- several comparative evaluations conducted for image-, text-
tures for predicting other traits (Preotiuc-Pietro et al. 2016). and sentiment-classification (Pan and Yang 2010).
The correlation between stress scores collected from sur- We test two standard domain adaptation techniques for im-
vey and user-aggregated TensiStrength scores was .17 for proving the cross-platform performance of predictive models:
Facebook and .11 for Twitter. TensiStrength was developed one supervised approach; Easy Adapt (Daumé III, Kumar,
by annotating keyword-selected Twitter posts, which makes and Saha 2010), which uses labeled data from both the source
them inapplicable for measuring psychological stress. Conse- and the target, and one unsupervised approach: Transfer Com-
quently, we dropped engagement features and TensiStrength ponent Analysis – TCA (Pan et al. 2011), which requires no
from subsequent analysis. We also limited the posting period labels on the target domain. While there are several candi-
to the previous one month (consistent with the time period date algorithms within both supervised and unsupervised
in the Cohen’s survey questionnaire) and observed that the approaches, we chose Easy Adapt and TCA for their simplic-
performance shows a non-significant drop by 0.02 in r. ity in application.
The standard notation used throughout this section is that
Results for Cross-Domain Predictions XS refers to labeled observations in the source domain and
XT refers to the test set in the target domain.
Since our goal is to predict stress in counties using Twitter, we
examined how models trained on Facebook perform on Twit-
ter (shown in Table 5). Performance drops (by 5% compared EasyAdapt : We define our problem according to the im-
to within domain performance) when Facebook models are plementation described by Daumé (Daumé III, Kumar, and
used to predict stress from the Twitter language. Specifically, Saha 2010). Let X1 denote the original feature space for
topics see a larger drop (by 50%) possibly due to not being the user in the source domain, X1 = RF . We construct an
able to generalize across platforms. We have seen that expres- augmented feature space X1f = R3F , by creating an platform-
sions of stress in Facebook language vs. Twitter language specific, and user-specific version of each feature in X. For
are significantly different in terms of the vocabulary used. this, we define Φs : X −→ X e to transform feature vec-
Consequently, a standardized theory-driven dictionary such tors corresponding to the platform-specific and user-specific
feature spaces. The mappings are defined by the following is used in order to ensure that any word-to-outcome corre-
equation: lations observed were stable, resulting in a dataset of 2710
counties. On an average, each county had 8,892,568 words.
Φs (x) = x, x, 0 , Φt (x) = x, 0, x (1) County Health outcomes: We use two sets of county
s
Here, Φ (x) is the feature vector for the source domain, health outcomes in our work: 1. Gallup-Sharecare Well-Being
Φt (x) is the corresponding vector for the target domain. data and 2. County health statistics for the United States.
0 = 0, 0, ..., 0 ∈ RF is the zero vector. From the Gallup-Sharecare Well-Being Index, we use the
We thus augment the original feature space with the labeled stress outcome to validate the stress predictions made by our
data points (for the same set of individuals) from the target language model. Gallup data is collected as a part of 1,000
domain, excluding the held out sample for testing. Next, we telephone interviews conducted every day across the US
train a regression model between this augmented feature (Huppert and So 2010). The questions on the interview range
space and stress scores of the users. We similarly transform from topics such as health behaviors, work environment to
the feature space of the held out sample before prediction. social and county factors, and financial security. We specifi-
cally used the stress fields from the Gallup data aggregated
to counties.
Transfer Component Analysis (TCA) TCA exploits the From US County Health Rankings and Roadmaps portal3 ,
Maximum Mean Discrepancy Embedding (MMDE) for com- we obtained county socioeconomic characteristics and health
paring the distributions between the source and target domain, behaviors which provides access to county-level health fac-
based on the Reproducing Kernel Hilbert Space (RKHS). tors from a wide range of sources, including Behavioral Risk
The empirical estimate of the distance between DS and DT , Factor Surveillance System, American counties Survey, and
Dist(DS , DT ), is the National Center for Health Statistics.
n1 n2
1 X 1 X
|| si − ui ||H (2) Results
n1 i=1 n2 i=1
After analyzing the performance of different models at pre-
where u and s are individual observations in DS and DU , H dicting stress in the cross-domain setting, we use the domain-
is a universal RKHS and φ : X −→H. adapted model trained on LIWC features to predict stress
After applying domain adaptation (results in Table 6), per- using Twitter language from US counties (shown in Figure 3).
formance increases by 16% when compared to using Face- We validate our language-predicted stress with county-level
book models alone to predict stress from Twitter language stress reported by Gallup, and further with county-level esti-
without any domain adaptation. TCA is seen to outperform mates of health behaviors and socioeconomic characteristics.
EasyAdapt.