Moffitt Et Al 2021 Hunting Conspiracy Theories During The Covid 19 Pandemic
Moffitt Et Al 2021 Hunting Conspiracy Theories During The Covid 19 Pandemic
research-article20212021
                           SMSXXX10.1177/20563051211043212Social Media <span class="symbol" cstyle="Mathematical">+</span> SocietyMoffitt et al.
Article
                                                                      Abstract
                                                                      The fear of the unknown combined with the isolation generated by COVID-19 has created a fertile environment for strong
                                                                      disinformation, otherwise known as conspiracy theories, to flourish. Because conspiracy theories often contain a kernel
                                                                      of truth and feature a strong adversarial “other,” they serve as the perfect vehicle for maligned actors to use in influence
                                                                      campaigns. To explore the importance of conspiracies in the spread of dis-/mis-information, we propose the usage of state-
                                                                      of-the-art, tuned language models to classify tweets as conspiratorial or not. This model is based on the Bidirectional Encoder
                                                                      Representations from Transformers (BERT) model developed by Google researchers. The classification method expedites
                                                                      analysis by automating a process that is currently done manually (identifying tweets that promote conspiracy theories). We
                                                                      identified COVID-19 origin conspiracy theory tweets using this method and then used social cybersecurity methods to
                                                                      analyze communities, spreaders, and characteristics of the different origin-related conspiracy theory narratives. We found
                                                                      that tweets about conspiracy theories were supported by news sites with low fact-checking scores and amplified by bots
                                                                      who were more likely to link to prominent Twitter users than in non-conspiracy tweets. We also found different patterns
                                                                      in conspiracy vs. non-conspiracy conversations in terms of hashtag usage, identity, and country of origin. This analysis shows
                                                                      how we can better understand who spreads conspiracy theories and how they are spreading them.
                                                                      Keywords
                                                                      natural language processing, disinformation, conspiracy theories, COVID-19, social media
                                                                                        Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-
                                                                                        NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction
                                                                      and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
                                                                      (https://us.sagepub.com/en-us/nam/open-access-at-sage).
2	                                                                                                           Social Media + Society
ways, these theories act as a defense mechanism against peo-       only person involved in John F. Kennedy’s assassination is
ple’s natural fear of the unknown.                                 one of the most popularly believed conspiracy theories, even
   While conspiracy theories may help people feel like they        decades after the event occurred (Douglas et al., 2017).
have regained some control, they can often have dangerous             Belief in conspiracy theories is primarily motivated by a
consequences. COVID-related conspiratorial beliefs are             desire for information, a sense of control, and maintaining a
associated with problematic health behavior, such as reduced       positive view of yourself and your identity groups. When
levels of self-reported handwashing and social distancing          information surrounding an event is unavailable, conflicting,
(Imhoff & Lamberty, 2020). Conspiracy theorists are less           or incomplete, belief in a conspiracy theory can lower an
likely to trust experts, which is particularly problematic in      individuals’ feelings of uncertainty and quench their curios-
the case of a pandemic (Douglas et al., 2019; Imhoff &             ity (Douglas et al., 2017). In addition, conspiracy theories
Lamberty, 2020). In addition to science denial, conspiratorial     appeal to those who feel that they or their group is being
beliefs have been associated with higher intentions for every-     threatened in some way (Douglas et al., 2017). Letters from
day crime, increased prejudice, extremism, and an increased        readers of the popular New York Times newspaper over the
tendency toward violence (Douglas et al., 2019; Sternisko,         last century in the United States were analyzed. Researchers
Cichocka, Cislak, & Bavel, 2020; Sternisko, Cichocka, &            found that the popularity of certain types of conspiracy theo-
Van Bavel, 2020). Previous research by Sternisko, Cichocka,        ries in these letters tended to track with which political
and Van Bavel (2020) shows that conspiracy theories can            groups were out of power at the moment. Conspiracy theo-
also foment dangerous anti-democratic movements. A prime           ries about leftists or communists were more commonly dis-
example of this problem has been the recent surge in support       cussed in letters to the newspaper when a Democrat was
of QAnon and various election-related conspiracy theories.         president, while conspiracy theories about the right or big
These theories motivated many of the individuals involved          corporations were more talked about when a Republican was
in the violent insurrection on the US Capitol on 6 January         president (Uscinski & Parent, 2014). Conspiracy theories
2021 (Brittain et al., 2021; Seitz, 2021).                         have been an instrumental part of the political conversation
   The internet and social media platforms help facilitate the     throughout US history (Fenster, 1999).
spread of conspiracy theories faster than ever before (Douglas
et al., 2019). QAnon has gone from a fringe movement to
having large amounts of social media activity. Because con-
                                                                   Communication of Conspiracy Theories
spiracy theories can impact behavior in a public health crisis     According to Franks et al., there are three primary dimen-
and social media is one of the vehicles through which they         sions of a successful conspiracy theory: the stick, the spread,
spread, it is essential to study who is spreading these con-       and the action. Effective communication of a conspiracy
spiracy theories on social media and how.                          theory involves these three aspects (Franks et al., 2013).
                                                                       The “stickiness” of a theory involves how that theory
                                                                   appeals to individuals and how passionate those individuals
Related Work                                                       become about it. Theories that sound completely outlandish,
Conspiracy theories are attempts to explain the causes of sig-     such as “lizard people” theories, are therefore less likely to
nificant social and political events with claims of secret plots   stick with a large number of individuals because that seems
involving two or more powerful actors (Aaronovitch, 2010;          too bizarre. The “spread” refers to how individuals share and
Byford, 2011; Dentith & Orr, 2018). There has been exten-          convince others of this theory. A successful spread involves
sive research conducted on who falls for conspiracy theories,      targeting the right people and anticipating possible critiques
why they believe these theories, and what effects these            so that they can be rebutted. Finally, the “action” refers to the
beliefs may have on real-world behavior (Douglas et al.,           degree to which believers take collective action against those
2019; Imhoff & Lamberty, 2020; Sternisko, Cichocka, & Van          they believe are conspirators.
Bavel, 2020). Prior work on conspiracy theories has spanned            Theories framed as a group conflict over a societal value,
several fields, including psychology, history, political sci-      such as our sacred value of democracy, are more likely to
ence, and sociology (Uscinski & Parent, 2014).                     inspire action. For example, those who stormed the US
                                                                   Capitol believed in false allegations of voter fraud and took
                                                                   specific action as a result. On the other hand, believers of
Belief in Conspiracy Theories                                      conspiracy theories surrounding the J.F.K. assassination tend
Individuals who believe in one conspiracy theory often             to be more casual in their belief and have also not taken any
believe in others, even other theories that may seem logically     action, perhaps because there are no obvious possible actions
incompatible (e.g., simultaneously believing that Princess         to take.
Diana faked her death, but also that she was murdered)                 Successful conspiracy theories are typically communi-
(Georgiou et al., 2020; Goertzel, 1994; Wood et al., 2012).        cated in specific ways to maximize stick, spread, and action,
Most Americans believe in at least one well-known conspir-         and actors can exploit these theories to attain their desired
acy theory. The idea that Lee Harvey Oswald was not the            goals (Franks et al., 2013; Nefes, 2017). In 2013, Erdogan,
Moffitt et al.	                                                                                                                3
the Prime Minister of Turkey at the time, spread a conspiracy    tweet, the more likely a bot was tweeting or retweeting it.
theory among his supporters to discredit anti-government         There was a much higher percentage of bots originating or
protesters. The protesters were resistant to a government-       retweeting fakes news when compared to real news (Huang,
planned demolition of a park to build a shopping mall.           2020). Given that conspiracy theories can be so quickly com-
Erdogan claimed that the protesters were associated with         municated online, it is crucial to understand how they propa-
malicious foreign agents, including the “interest rate lobby,”   gate on social media.
who were conspiring against the Turkish economy. Erdogan
likely had two primary goals: discredit political opponents
                                                                 Impact of Conspiracy Theories
by labeling them as funded by foreigners and pressure the
central bank to lower interest rates faster (Nefes, 2017). A     Belief in conspiracy theories can have real-world conse-
study conducted on a popular Turkish forum found that pre-       quences. With the rise of social media and the internet, more
existing political belief strongly predicted whether an indi-    people are exposed to these false stories, with many believ-
vidual would believe in the conspiracy theory, with Erdogan      ing and acting on them. Employees who distrust their work-
supporters being more likely to believe it than his opponents    place and believe in organizational conspiracy theories are
(Nefes, 2017).                                                   more likely than other employees to have higher turnover,
   Looking at COVID-related conspiracy theories, we have         decreased commitment, and decreased job satisfaction
seen that they have spread to large numbers of individuals,      (Douglas & Leite, 2017). Many conspiracy theories demon-
often without being challenged. A study of Spanish, French,      ize the enemy and delegitimize dissenting voices, and accep-
and German social media users found that state-backed            tance of these theories may encourage believers to act
reporting from adversarial nations like China, Russia, and       violently (Bartlett & Miller, 2010). While many conspiracy
Iran received more engagement on average than main-              theories do not lead to violence (JFK assassination, 9/11
stream news sources (Rebello et al., 2020). In a large study     Truthers, etc.), there are several cases where belief in con-
on both online and traditional media from January to May         spiracy theory acts as a “radicalization multiplier,” com-
2020, Evanega et al. (2020) found that only 16% of men-          pounding with other factors that encourage extremism and
tions of COVID-19 misinformation included some level of          terrorism (Bartlett & Miller, 2010; Douglas et al., 2019).
fact-checking, indicating that most of these posts were not      However, it is difficult to disentangle whether conspiracy
being disputed. A different analysis of a specific conspiracy    theories lead to violence or are just more prevalent in indi-
theory, the 5G/COVID-19 link, found more fact-checking.          viduals who are pre-disposed to violence (Uscinski & Parent,
The authors collected sample tweets containing the               2014).
#5GCoronavirus hashtag from 27 March to 4 April 2020,               Research has shown that belief in COVID-related con-
which was the week the hashtag was trending in the United        spiracies is often associated with taking fewer preventive
Kingdom. From the sample, 65% of tweets were either              measures, such as social distancing and frequent handwash-
countering the 5G/COVID-19 theory or were general tweets         ing (Imhoff & Lamberty, 2020; Oleksy et al., 2020). A
with no opinion on the matter (Ahmed et al., 2020).              British survey found that belief in the 5G coronavirus con-
However, the authors found no authority figure in the net-       spiracy was positively correlated with anger against the
work combating the misinformation, indicating that more          state and a willingness to engage in violence (Jolley &
coordination among public health officials may be needed         Paterson, 2020). Because belief in conspiracy theories can
(Ahmed et al., 2020).                                            negatively impact offline behavior in a pandemic, it is
   Aspects of social media, including hashtags, bots, and        essential to understand the spread and reach of these dis-/
URL links, can contribute to the spread of conspiracy theo-      mis-information stories.
ries. Prior research shows that simple and concrete messages
tend to be memorable, and messages that fit in with our prior
                                                                 Detection of Conspiracy Theories
beliefs, seem credentialed, and trigger our emotions are more
likely to spread (Heath & Heath, 2007). Both malicious and       Most of the prior research on conspiracy theories has focused
unwitting actors take advantage of these factors and the         on the psychology behind why people believe them, the way
structure of social media to spread conspiracy theories. Since   they are communicated, and their real-world impact. While a
Twitter restricts the number of characters in a message, many    fair amount is understood about conspiracy theories, less is
tweets spreading conspiracy theories will have a message         known about how to detect them. Tangherlini et al. (2020)
with a URL link to a seemingly credentialed “source.”            employed automated machine learning techniques to under-
Hashtags also facilitate the spread of content to particular     stand the narrative structures of both actual conspiracies
audiences. In July 2020, Twitter blocked QAnon-related           (e.g., Bridgegate) and conspiracy theories (e.g., Pizzagate),
URL links and changed their algorithms to no longer high-        and then later applied similar techniques to try to detect coro-
light QAnon activity and hashtags in search results or recom-    navirus misinformation stories (Shahsavari et al., 2020).
mendations (Conger, 2020). Finally, prior research on early      Understanding what narratives are spreading and how they
COVID-19 Twitter data shows that the more conspiratorial a       are being structured and placed into pre-existing conspiracy
4	                                                                                                             Social Media + Society
Table 1.  Keywords Used in Twitter’s Application Programming            RQ1. Can we rapidly and accurately identify conspiracy
Interface (API) to Collect Tweets That May Capture the                  theory tweets related to COVID-19?
Different Types of COVID-19 Dis-/Mis-Information (Memon &
Carley, 2020).                                                          RQ2. How do users behind conspiracy theory tweets dif-
                                                                        fer from non-conspiracy users?
COVID-19 collection terms for tweet labeling
                                                                        RQ3. How are the tweets that carry conspiracy theories
#coronavirus, #covid, #nCoV20199, #CoronaOutbreak,
                                                                        propagating through the extensive COVID-19 discussion?
#CoronaVirus, #CoronavirusCoverup, #CoronavirusOutbreak,
#COVID19, #Coronavirus, #WuhanCoronavirus, #Wuhan,
                                                                        We show that this state-of-the-art model can aid in the fast
bleach, vaccine, acetic acid, steroids, essential oil, saltwater,
ethanol, children, kids, garlic, alcohol, chlorine, seasame oil,
                                                                     detection of conspiracy theories. Given the rising threat of
conspiracy, 5G, cure, colloidal silver, dryer, bioweapon, cocaine,   conspiracy theories, it is essential to understand who is spread-
hydroxychloroquine, chloroquine, gates, immune, poison,              ing these theories and how they are being spread so quickly on
fake, treat, doctor, senna makki, senna tea                          social media to develop effective counter-measures.
                                                                     Data
theories is the first step in developing ways to disrupt their
spread (Shahsavari et al., 2020).                                    Classifier Training Data
   Other misinformation researchers have used clustering             Our training dataset consists of 8,781 hand-labeled tweets.
techniques to cluster hashtags together as a way to analyze          The dataset was collected from the Twitter application pro-
communities and discussions surrounding the pandemic                 gramming interface (API) using keywords found in Table 1
(Cruickshank & Carley, 2020). Zinoviev (2017) applied net-           with a collection window between February 2020 and July
work science to quantify the relationship between conspira-          2020. Memon et al. developed a labeling taxonomy to clas-
torial and pseudo-science topics and between conspiratorial          sify 4,573 of the tweets in a study to characterize COVID-19
and non-conspiratorial topics using the title and co-purchas-        misinformation communities (Memon & Carley, 2020).
ing information from Amazon. In some cases, research to              Table 2 presents the labels created to characterize the misin-
detect disinformation or “fake news” only combines con-              formation communities.
spiracy theories with other forms of disinformation                      The remaining 4,208 tweets were hand-labeled by univer-
(Aphiwongsophon & Chongstitvatana, 2018). In addition,               sity student volunteers participating in a summer studies
these models either rely on network connections (Zinoviev,           course. The course leads provided instruction and labeling
2017) or rely on handcrafted templates (Tangherlini et al.,          guidance to follow the same procedures as the Memon et al.
2020). Networks and network inference do not scale well,             study. We took all 8,781 previously labeled tweets and col-
which is an important factor to consider when analyzing big          lapsed the original 16 labels in a binary fashion for our study.
data. Handcrafted templates take time to create and deploy,          First, we coded all tweets classified as conspiracy in the origi-
and they are typically only useful for a limited time period         nal dataset as “1” conspiracy tweets. We then took all remain-
before a new template needs to be created. Therefore, more           ing original labels and re-coded them to “0” for non-conspiracy
work is needed to develop useful methods for quickly                 tweets. Thus, tweets labeled as “0” non-conspiracy include
detecting new conspiracy theory topics on social media as            tweets that may contain truthful information but also might
they arise.                                                          include dis-/mis-information in the form of a “fake cure,”
   This article uses the Bidirectional Encoder Representations       “fake treatment,” or “false fact.” It is important to note that
from Transformers (BERT) model, trained and tuned with               our classification task is not to label dis-/mis-information but
labeled training data about COVID-related conspiracies. This         to label a particular kind of “strong” dis-/mis-information
model significantly changed the natural language processing          known as conspiracy theories. In making this training choice,
(NLP) language in 2018 and is partly responsible for the pop-        we believe that our model will distinguish between other
ularity of transformer models. We chose to use a BERT-based          forms of dis-/mis-information and conspiracy. The result is a
model for this study for several reasons. First, it can detect       final training dataset that consists of 8,781 labeled tweets,
conspiracies through text alone, does not require templates or       which is the most extensive labeled COVID-19 conspiracy
network information, and therefore runs relatively quickly           theory dataset of which we are aware.
and scales linearly. The BERT language model also has a
good track record for improving downstream NLP tasks after
domain-specific pre-training and tuning. These applications
                                                                     Data for Analysis
include BioBERT (biomedical text (Lee et al., 2020)),                Our study focuses on the analysis of conspiracy tweets related
SciBERT (scientific applications (Beltagy et al., 2019)),            to the origin of COVID-19. To facilitate this work, we set our
DocBERT (document classification (Devlin et al., 2018)),             collection period to encompass time before and after the
and COVID-Twitter-BERT (Müller et al., 2020).                        beginning of the US lockdown. Our research group collected
   We used this model to address three research questions:           243.6 million tweets from Twitter’s v1 streaming API between
Moffitt et al.	                                                                                                                           5
Table 2.  List of Labels Developed to Characterize COVID-19               Beskow and Carley (2018) and identity and location labeling
Dis-/Mis-Information Themes on Twitter (Memon & Carley,                   techniques developed by Huang and Carley (2019, 2020a). We
2020).                                                                    added these additional features to our data with the intuition
ID      Category               ID     Category                            that they may provide markers or trends in the spread of con-
                                                                          spiracy theories on Twitter.
0       Irrelevant              9     True Public Health Response             Bots are automated agents used on social media platforms.
1       Conspiracy             10     False Public Health Response        Not all bots are used for nefarious purposes, but in recent
2       True Treatment         11     Politics
                                                                          years, they have been used widely to spread dis-/mis-informa-
3       True Prevention        12     Ambiguous/Difficult to Classify
                                                                          tion (Beskow & Carley, 2018). BotHunter is a random forest
4       Fake Cure              13     Commercial Activity
                                                                          regressor trained on labeled tweets from known information
5       Fake Treatment         14     Emergency Response
                                                                          operation attacks on the Atlantic Council’s Digital Forensic
6       False Fact             15     News
7       Calling Out            16     Panic Buying
                                                                          Lab and NATO collected by Beskow and the suspended
8       Sarcasm/Satire                                                    Russian bot dataset released from Twitter in October 2018.
                                                                          The model leverages tweet content and user metadata to pro-
                                                                          vide a probability (between 0 and 1) that an account is a bot or
Table 3.  This Table Represents the List of Terms Used to                 not a bot; see Table 5 for a more comprehensive list of
Collect COVID-19-Related Tweets and the List of Terms Used                BotHunter features. The developers of BotHunter, in a sepa-
by This Study to Find Conspiracy Theory-Related Tweets.                   rate study, calculated the precision/recall scores on multiple
Type          Terms                                                       sets of Twitter data, and they recommend in their article to use
                                                                          a threshold of between 0.6 and 0.8 . A threshold closer to 0.6
Collection NcoV2019, coronavirus, covid-19, covid 19, covid19,            would include more false positives, while a higher threshold
           NCoV, wuhanvirus, wuhan virus, 2019nCoV
                                                                          would have more false negatives (Beskow & Carley, 2020).
Filtering  bat, bioweapon, bio-weapon, lab, labs, 5G
                                                                          For this study, 0.75 is the threshold for an account to be labeled
Note. Type “Collection” refers to the terms used with Twitter’s API       as a bot. We chose a value on the upper end of the developers’
to collect COVID-19 tweets. Type “Filtering” refers to terms used in a    suggested threshold range to get a more conservative estimate
regular expressions search query to find conspiracy-related tweets from   on the number of bots in the data.
the COVID-19 tweets collected by Twitter’s API.
                                                                              Social media platforms host a diverse set of actor types.
                                                                          Actors can be regular users, government entities, or celebri-
16 February 2020 and 31 May 2020, using the “collection”                  ties. The availability of account profile data is inconsistent;
terms in Table 3. The “collection” terms remained unchanged,              some accounts have descriptive information, and some do
and our collection did not miss any days during the period of             not. It is often hard to determine what kind of actor an
our study. Figure 1 displays the number of tweets collected by            account is based on profile data alone. Huang and Carley
day. It is important to note that v1 streaming API may pro-               developed a hierarchical self-attention neural network model
duce at most 1% of the data available on Twitter’s Firehose               to classify Twitter user actor types. The model uses account
API. Work by Morstatter et al. (2013) shows that the realized             metadata and tweets to classify a user as one of the seven
coverage is variable and at times biased.                                 types: regular user, marketing agency, news reporter, govern-
   Next, we developed a set of filter terms to find tweets that           ment official, celebrity, company, or sports figure (Huang &
might contain conspiracy theories related to the origin of the            Carley, 2020a). Unlike BotHunter, this algorithm is a neural
COVID-19 virus. The “Filtering” section of Table 3 provides               network model and does not output a direct probability score.
the complete list of terms we used to search through our massive          The final layer of the model is the Softmax layer, and the
collection of COVID-19 tweets to find possible origin conspir-            model assigns labels based on the highest Softmax score.
acy content. 5G may seem like an addition to the list, but we             When these researchers applied this algorithm to COVID-19
found several origin theories that argued that the virus was cre-         Twitter data, collected from 29 January to 4 March 2020,
ated to take advantage of 5G or that 5G causes COVID-19.                  they had a 94.5% accuracy (Huang, 2020; Huang & Carley,
   Our final dataset for analysis consists of 1,508,765                   2020b). Considering the high level of accuracy and the simi-
English language tweets, with 953,696 unique user accounts.               larity in topic and time frame to our dataset, we used this
Table 4 provides a breakdown of the number of tweets,                     algorithm to augment our data. These labels will be useful in
retweets, mentions, and replies found in our dataset.                     determining the types of accounts that spread or counter con-
                                                                          spiracy theories.
                                                                              In social cybersecurity, forensics analysis of the location
Bots, Identity, and Location
                                                                          of tweets can be a valuable aspect when analyzing a dis-/
We augmented the data usually provided by a tweet object                  mis-information campaign. Determining the origin of a
with a prediction on whether an account might be a bot, an                campaign may help determine intent, sources, and targets. A
account categorization, and a prediction of the account loca-             tweet object can contain geo-tag information and user-
tion. The augmented data were made possible by social cyber-              declared location data on an account’s profile. This kind of
security forensic tools that include BotHunter developed by               information is typically sparse and unreliable (Graham
6	                                                                                                               Social Media + Society
Figure 1.  This figure provides the longitudinal distribution of COVID-19-related tweets collected from Twitter’s API using the
collection terms found in Table 3 from 16 February 2020 through 31 May 2020.
Table 4.  This Table Provides the Number of Tweets by Type            theory narratives to shift blame from their handling of
(Tweet, Retweet, Reply, and Mention) Found in Our Data.               COVID-19.
Tweet type                Tweet count                 Percentage
                                                                      Methodology
Retweet                   1,214,127                   80.5
Tweet                       172,327                   11.4            Language Models
Mention                      66,840                    4.4
Reply                        55,462                    3.7            NLP, a subfield of artificial intelligence, provides a robust set
                                                                      of tools to aid social cybersecurity analysis. These tools
                                                                      include but are not limited to sequence classification, parts-of-
                                                                      speech labeling, summarization, and knowledge extraction.
Table 5.  This Table Provides the List of Features Used by
BotHunter to Classify Potential Bots (Beskow & Carley, 2018).         The effectiveness of these tools for analysis is typically limited
                                                                      by the numerical representation of text for analysis. Pre-
User features                             Content features            trained language representations have been shown to improve
Account age                               Is last tweet a retweet?    the effectiveness of these NLP tasks (Devlin et al., 2018).
Avg tweets per day                        Same language?              Current state-of-the-art language representations take trans-
Screen name                               Hashtags in tweet           former architecture (Vaswani et al., 2017) and word (token)
Default profile image?                    Mentions in tweet           context into account when forming numerical representations.
Has location?                             Last status sensitive?      A major strength of such models in our research is that they
Total tweet count                         Bot reference?              employ a general architecture and use weights tuned to spe-
Number of friends                                                     cific downstream tasks.
Number of followers                                                       BERT was the first language representational model to
Number of favorites                                                   create bidirectional representations based on jointly condi-
                                                                      tioning on both left and right context of input sequences
                                                                      (Devlin et al., 2018). The original BERT model consisted of
et al., 2014; Hecht et al., 2011). To solve this issue, Huang         two versions: BERT-Base, which has 12 layers and 110 mil-
and Carley (2019) present a hierarchical location prediction          lion parameters, and BERT-Large, which has 24 layers with
neural network (HLPNN) to predict a user’s location, given            340 million parameters. Both versions were domain agnostic
tweet text and metadata features. Like the identity predic-           and were pre-trained using unsupervised learning on a corpus
tion model, the location prediction model’s final layer is the        of text consisting of 800 million words from BooksCorpus
Softmax layer, and the model assigns labels based on the              and 2.5 billion words from Wikipedia (Devlin et al., 2018).
highest Softmax score. When these researchers applied this            Recent work shows the versatility and utility of the general-
algorithm to their COVID-19 Twitter data, they had a                  purpose language model when domain-specific data are
92.96% accuracy (Huang, 2020). We add this feature to aid             applied to fine-tune the model. This is evidenced by the popu-
our analysis to determine if state actors are using conspiracy        larity of BioBert (Lee et al., 2020), SciBERT (Beltagy et al.,
Moffitt et al.	                                                                                                                     7
Table 6. This Table Provides a Summary of the Data Used for Analysis.
2019), and Covid-Twitter BERT (CT-BERT) (Müller et al.,             optimizer. The conspiracy theory classifier trained on 8,781
2020).                                                              tweets labeled as “conspiracy theory” or “not conspiracy
    Because of its proven success in improving downstream           theory” for binary classification. The data were split 80% for
NLP tasks and its proven adaptability to different domains, we      training and 20% for testing. We trained our classifiers for 10
selected BERT to serve as our language representation model         epochs for approximately 6 hr per model.
to train our conspiracy theory tweet classifier. We chose to test
the BERT-Large pre-trained model and the COVID-Twitter-
                                                                    Training Results
BERT v2 (CT-BERT) model. CT-BERT is based on the BERT-
Large model but is pre-trained with 97 million unique               We used the area under the curve (AUC) and F1 scores as
COVID-19-related tweets collected between 12 January 2020           performance metrics to determine the best model. AUC pro-
and 5 July 2020 (Müller et al., 2020). The tweets collected for     vides an aggregate measure of performance across all possible
this training were collected with similar search terms and in a     classification thresholds. F1 score is another measure of accu-
similar manner as the data used in this study.                      racy that measures the balance between precision and recall of
                                                                    the model. The F1 scores for BERT-Large and CT-BERT are
                                                                    essentially the same at 0.950 and 0.948, respectively, while the
Model Training                                                      AUC score for CT-BERT is slightly higher (0.971 vs. 0.966).
We used the transformer library (Wolf et al., 2020) with            Based on the results, we feel confident that either model would
TensorFlow 2.4 on CPU with 64 GB of RAM for classifier              aid our analysis. In the end, we chose the BERT-CT model
training. Individual tweets served as input sequences and           because of the domain knowledge captured by the model.
were tokenized using their respective pre-trained language
models. A Twitter message can contain up to 280 characters,
                                                                    Results
but we were able to reduce the max token length needed for
training down to 64 tokens. We used a batch size of 16, which       Table 6 shows a summary of the data and their predicted con-
is the minimum recommended batch size for training. We set          spiracy, bot, and identity labels. In the following subsections,
a constant learning rate of 2e−5 with the ADAM as the               we address our three research questions.
8	                                                                                                                        Social Media + Society
Table 7. This Table Shows Example Text From Tweets Labeled as Conspiracy and Non-Conspiracy.
Label               Text
Conspiracy          They cannot contain the truth that #CoronarvirusOutbreak originated in a lab, so their excuse is that a bat peed on
                    a scientist who didn’t wash their hands. Dumb. Why does a deadly disease have a PATENT IN THE FIRST PLACE?
                    #GatesFoundation
Non-                In an interview on Fox News, Senator (name removed) raised the unsubstantiated rumor that the new coronavirus
conspiracy          originated in a high-security biochemical lab in China. The theory lacks evidence and has been dismissed by scientists.
Note. The conspiracy text was generated by an account associated with QAnon that Twitter has since suspended. The non-conspiracy text was generated
by a reputable news source’s Twitter account.
Figure 2. This figure provides the number of labeled conspiracy and non-conspiracy tweets by month.
RQ1: Can We Rapidly and Accurately Identify                                 truth labels to the most frequently occurring text in our data.
Conspiracy Theory Tweets Related to COVID-19?                               We randomly sampled an additional 120 tweets from our
                                                                            data, excluding the tweets we previously provided ground
In this section, we address whether we can rapidly and accu-                truth labels for and their copies. The process produced 200
rately identify conspiracy tweets related to COVID-19. Our                  unique tweets, representing 311,346 total tweets or approxi-
classifier labeled 826,367 of 1,508,756 tweets, approximately               mately 20% of our dataset with ground truth labels.
55%, as conspiracy tweets. The number may seem high, but                       We calculated classifier metrics using only the unique
we collected tweets with conspiracy theory-related search                   tweets (un-weighted) and the total volume of those tweets
terms, so we believe this to be a reasonable percentage. Table              (weighted). The classifier achieved accuracy and F1 scores
7 provides examples of tweets classified as conspiracy and                  approximately 6%–10% less than the score achieved during
non-conspiracy by our model. Figure 2 provides tweet clas-                  model training and validation. Table 8 provides a more com-
sification behavior by month between February and May                       prehensive breakdown of model performance. We achieved
2020. We can observe that conspiracy-labeled tweets follow                  weighted and un-weighted accuracy scores of 0.91 and 0.88
an up-and-down pattern; this may represent a decline in cov-                and weighted and un-weighted F1 scores of 0.91 and 0.87,
erage of one conspiracy theory and the rise in coverage of a                respectively, and processed 1,508,756 tweets in under 3 hr.
new conspiracy, say from “covid is a bioweapon” to “5G is                   Our results represent a 6% increase in accuracy compared to
causing covid.” In contrast, the number of non-conspiracy                   other tools.
labeled tweets gradually grows as the pandemic continues.
    To approximate the accuracy of the conspiracy classifier                RQ2: How Do Users Behind Conspiracy Theory
when applied to unlabeled data, we selected 200 tweets and
provided ground truth labels. We first selected the top 10
                                                                            Tweets Differ From Non-Conspiracy Users?
most retweeted conspiracy and non-conspiracy tweets per                     This section compares the two groups of users on their social
month of our data; in doing so, we wanted to provide ground                 identities, country, and bot-like behavior.
Moffitt et al.	                                                                                                                            9
Table 8.  This Table Provides the Confusion Matrices and Metrics to Approximate the Conspiracy Tweet Classifier Performance on the
“Data for Analysis” Dataset.
Figure 3. This figure displays the row-normalized contingency table for social identity labels by predicted conspiracy label.
Social Identities.  For social identity labels, we were interested      Table 9.  This Table Shows the Top 10 Predicted Countries by
in seeing which types of identities participate in conspiracies         Volume of Tweets and Associated % of Conspiracy and Non-
                                                                        Conspiracy Tweets.
the most. To determine if identity labels are independent of
conspiracy labels, we conducted Pearson’s chi-square test.              Country              No. of tweets     % Non-            % Conspiracy
The relationship between these variables was statistically                                                     conspiracy
significant, . In the following analysis, we analyze tweets for
                                                                        United States        859,565           37                63
which we have social identity predictions; 2% of the data do
                                                                        United Kingdom       121,269           56                44
not have social identity predictions. Figure 3 provides a con-          India                77,998            63                37
tingency table of social identity labels by classified tweet            Nigeria              59,655            70                30
labels; the values are row-normalized.                                  Canada               57,842            47                53
    We find that normal and celebrity identities are more               Australia            32,882            48                52
prevalent in tweets classified as conspiracy-related. The               Hong Kong            29,908            50                50
model classified 56% of tweets from predicted normal users              China                14,664            34                66
and 62% of tweets from predicted celebrities as a conspiracy.           South Africa         14,262            52                48
On the other hand, companies (56%), government entities                 The Philippines      12,157            68                32
(70%), news agencies (64%), reporters (59%), and sports
figures (80%) social identities are more abundant in non-
conspiracy tweets.                                                      to determine independence between predicted country labels
                                                                        and conspiracy labels. Here, we find a relationship between
Countries of Origin.  We sought to determine if there was any           conspiracy label and predicted country label likely exists, X2
unusual country representation between conspiracy and non-              (207, N = 1,465,653) = 69,127.74, p < .001. Table 9 pro-
conspiracy tweets. Again, we apply Pearson’s chi-square test            vides a list of the top 10 predicted countries with the volume
10	                                                                                                                Social Media + Society
Figure 4. This figure displays the row-normalized contingency table for location labels by predicted conspiracy label.
Figure 5. This figure provides bot activity as the percentage of tweets sent by predicted bots by month.
of tweets produced by that country and the percentage of               mechanism to make sense of the pandemic (Douglas et al.,
those tweets classified as non-conspiracy and conspiracy.              2017). A limitation of these results is that our dataset is
Figure 4 provides a heat-map view of the same data. Only               English language only. In addition, while dis-/mis-informa-
2% of the data did not have a country prediction. We find that         tion may often originate in the United States, many of those
the United States and China had the most considerable por-             accounts could be controlled or influenced by actors outside
tions of their tweets classified as a conspiracy. In contrast,         of the country.
Canada and Australia had just over 50% of their tweets clas-
sified as a conspiracy.                                                Bot Activity.  Predicted bots produced approximately 35%
    Most of the predicted conspiracy tweets have their origin          of all the tweets in our data, 33% of all tweets labeled non-
in the United States. Previous work presents a similar trend           conspiracy, and 36% of all tweets labeled conspiracy. Fig-
in the spread of dis-/mis-information via URLs on Twitter              ure 5 shows that the percentage of tweets sent by bots in
during the COVID-19 pandemic (Huang, 2020; Huang &                     conspiracy tweets is more significant than the rate of
Carley, 2020b). This result may suggest that the United                tweets sent by bots in non-conspiracy tweets for each
States continues to be a significant source of disinformation,         month of our data. We tested the independence of bot
or many Americans were using conspiracy stories as a                   labels and conspiracy labels using Pearson’s chi-square
Moffitt et al.	                                                                                                                     11
Figure 6. This histogram displays the bot prediction probability distribution for both conspiracy and non-conspiracy tweets.
Table 10. This Table Provides the Mann–Whitney Test Statistics for the BotHunter Score Comparison.
test and found them not independent of each other;                          When running these statistical tests, we find that conspir-
 X 2 (1, N = 1,508, 756) = 895.58, p < .001 . Figure 6 pro-             acy bot scores are lower than non-conspiracy bot scores for
vides a view of the bot prediction probability distribution             the populations above the 0.75 threshold, and conspiracy bot
for conspiracy and non-conspiracy tweets.                               scores are higher than non-conspiracy bot scores for the pop-
   We find an interesting observation when comparing the                ulations below the 0.75 threshold. A possible explanation for
BotHunter scores between conspiracy and non-conspiracy                  this finding is that non-conspiracy tweets exhibit characteris-
tweets labeled as a bot and not a bot. BotHunter scores of              tics that mark them as clearly either bot or not bot. In con-
non-conspiracy tweets labeled as a bot (Mdn = 0.859) were               trast, conspiracy tweets present characteristics that make it
slightly higher than those of conspiracy tweets labeled as a            harder to distinguish between bot and not bot, and they may
bot (Mdn = 0.854). We computed the one-sided Mann–                      have more cyborg accounts (mix of human and bot-like fea-
Whitney test statistic and reject the null hypothesis that              tures). Figure 6 displays the previously described pattern,
these bot scores come from the same underlying distribu-                where the non-conspiracy bot scores are more numerous at
tion; see Table 10 for test statistics. For tweets labeled as           the low end of the probability scale and more tightly skewed
coming from a bot, the distribution underlying non-con-                 toward .9 on the higher end of the probability scale. In con-
spiracy tweet BotHunter scores is stochastically greater                trast, the conspiracy bot scores are more bunched around the
than the distribution underlying conspiracy BotHunter                   threshold.
scores. In contrast, BotHunter scores of non-conspiracy                     One of the significant functions of bots is amplification;
tweets labeled non-bot (Mdn = 0.477) were much lower                    they can send information at the speed of an algorithm, and
than those of conspiracy tweets labeled non-bot (Mdn =                  they scale (Beskow & Carley, 2018). We find that bots gener-
0.528). Again, a one-sided Mann–Whitney test was statisti-              ated 39% of conspiracy retweets and 36% of non-conspiracy
cally significant; see Table 10 for test statistics. For tweets         retweets. While bots amplify messages more in conspiracy
labeled as coming from a non-bot, the distribution underly-             tweets, non-bot accounts conduct most of the amplification
ing non-conspiracy tweet BotHunter scores is stochasti-                 in the conspiracy and non-conspiracy tweets. Comparing all
cally less than the distribution underlying conspiracy                  bot activity between conspiracy and non-conspiracy, bots
BotHunter scores.                                                       found in conspiracy tweets send 3% fewer retweets than bots
12	                                                                                                           Social Media + Society
Figure 7. This figure provides the average number of hashtags found in conspiracy and non-conspiracy tweets by month.
found in non-conspiracy tweets. Bots found in conspiracy            non-conspiracy tweets. We see that the total usage counts
tweets use mentions more often than bots found in non-con-          for the top 15 hashtags found in conspiracy tweets are
spiracy tweets.                                                     78,439, which is over twice as many found in the top 15
                                                                    hashtags for non-conspiracy tweets at 29,950. In the con-
RQ3: How Are the Tweets That Carry Conspiracy                       spiracy theory hashtags, we see three strong QAnon-
                                                                    related hashtags, #QAnon, #WWG1WGA (“where we go
Theories Propagating Through the Extensive                          one, we go all”), and #DeepStateCabal. There are also
COVID-19 Discussion?                                                hashtags with strong ties to President Trump and Bill
This section compares how the individual tweets differ in the       Gates. The non-conspiracy tweet hashtags do not appear to
conspiracy vs. non-conspiracy groups in their usage of              carry the same topics.
hashtags and URLs. These tweet attributes affect a tweet’s
spread and impact.                                                  URL Analysis. We wanted to discover if URL usage differed
                                                                    between conspiracy and non-conspiracy tweets. In doing so, we
Hashtag Analysis.  For hashtags found in conspiracy and non-        also wanted to explore the top domains shared and analyze the
conspiracy tweets, we were most interested in determining           factuality ratings of those domains. We converted shortened
differences in usage behavior. We wanted to see if any dis-         URLs, mapped mobile versions to full versions, cleaned out
tinct indicators may help identify conspiracy tweets more           query terms, and found fewer unique URLs present in
easily or provide insights into user behavior within a con-         conspiracy tweets than non-conspiracy tweets. Figure 8a shows
spiracy theory topic group. Figure 7 shows a line plot of the       the number of unique URLs per month by conspiracy label; we
number of hashtags per tweet for conspiracy tweets vs. non-         see that the number for non-conspiracy surpasses conspiracy as
conspiracy tweets. As shown in the figure, conspiracy tweets        the pandemic progresses. We tested to see if the number
have a higher hashtag usage rate per tweet than non-conspir-        of times each URL is shared is independent of conspiracy
acy tweets across all 4 months of our data. This result may         labels using Pearson’s chi-square test. We compared
suggest that users perpetuating conspiracy theories or spread-      all URLs shared and also considered only URLs
ing conspiracy tweets rely on hashtag use to establish topic        found in bothconspiracy and non-conspiracy tweets. In both
groups and attract like users to their message more than users      cases, we found that we can reject the null
not perpetuating conspiracy tweets.                                 hypothesis that the number of times a URL is shared is indepen-
   The top hashtags for conspiracy and non-conspiracy               dent     of     conspiracy    labels;   total    URL      case:
tweets are variants of “#coronavirus,” “#covid19,” and               X 2 (100, 662, N = 104,566) = 1, 058,136.41, p < .001 ; shared
“#wuhanvirus,” which might be expected based on our                             2
data collection terms. To better understand the hashtag top-        URL case X (3,902, N = 7,806) = 349, 614.70, p < .001 . The
ics in our data, we remove all variants of “virus” and then         conspiracy propagators may be using fewer unique sources as
analyze the remaining hashtags. Table 11 presents the               evidence of a conspiracy; in contrast, normal or non-conspiracy
top 15 resulting hashtags used for conspiracy and                   propagators share more diverse external content.
Moffitt et al.	                                                                                                                   13
Table 11.  This Table Shows the Top 15 Hashtags Found in Conspiracy and Non-Conspiracy Tweets That Are Not a Variant of the
Pandemic’s Name.
    Figure 8b shows the trend for the average number of              related to the origins of COVID-19. Future work using late
URLs per tweet by conspiracy label. The average number of            2020–2021 data could be used to analyze conspiracy theories
URLs per non-conspiracy tweet steadily increases as the              related to the COVID-19 vaccines.
pandemic progresses. Simultaneously, we find a sharp                     A limitation related to the model itself is that because the
decline and a slight rise in the average number of URLs per          underlying pre-trained language model for text embedding
conspiracy tweet. The increase in URL usage for non-con-             and the training data is linked explicitly to COVID-19 text,
spiracy tweets may result from more credible information             our model may not generalize to other topics like the 2020
about COVID-19 reaching news sources.                                US election. The final limitation is related to the data collec-
    Comparing domains, we find 5,485 unique domains in               tion of tweets for analysis. By collecting only English lan-
conspiracy tweets compared to 10,178 unique domains in               guage tweets, we could have potentially introduced some
non-conspiracy tweets. There appears to be a more concen-            bias in our downstream tasks, such as our location analysis.
trated group of outside sources shared in conspiracy tweets.             Future work should continue analyzing COVID-19 data
The Carnegie Mellon University Center for Computational              because strategies for disseminating conspiracy theories online
Analysis of Social and Organizational Systems (CASOS)                may change over the course of the pandemic. This work could
research group maintains a thesaurus of media sources that           include building a more balanced training dataset to include
provide a label (real news, fake news, etc.), a factual rating (1:   tweets discovered during this study. In addition, creating more
very low to 6: very high). We calculated the domain factual          diverse training datasets to help new models generalize beyond
rating weighted average for domains found in conspiracy and          COVID-19 would be useful. Another avenue for future analy-
non-conspiracy tweets and find that the conspiracy average is        sis would be to apply our approach to a more comprehensive
3.27 and the non-conspiracy average is 4.66. Table 12 dis-           collection of tweets beyond the English language only.
plays the top 10 domains found in each tweet label category.
In addition, we found 168 domains considered conspiracy or
fake news in conspiracy tweets and found 80 fake news/con-
                                                                     Conclusion
spiracy domains in non-conspiracy tweets.                            Our usage of the BERT model (Devlin et al., 2018; Müller
                                                                     et al., 2020) tuned to classify the COVID-19 tweets as con-
                                                                     spiratorial or not helped us quickly analyze large amounts of
Limitations and Future Work                                          data. When comparing conspiracy and non-conspiracy
One limitation of our study is that in the training data, the        labeled tweets, we found several significant differences in
ratio of class labels is 7 (non-conspiracy): 3 (conspiracy),         hashtag and URL usage, bot behavior, and the user types in
representing a slight imbalance in the training data. The            each group.
imbalance could lead to non-optimized results for the unbal-            Overall, we have four main findings:
anced class because the model never gets a good look at the
underlying class. In addition, due to the timing of our data            1.	 Language model success—The BERT-based model
collection, our data primarily focus on conspiracy theories                 (Devlin et al., 2018; Müller et al., 2020) tuned with
14	                                                                                                          Social Media + Society
Figure 8.  This figure provides two views of URL usage for conspiracy and non-conspiracy tweets. (a) Unique URLs by month and
conspiracy label. (b) Average number of URLs per tweet by month.
       COVID-19 discourse was able to rapidly and accu-                    unique URLs and hashtags, perhaps as a way to more
       rately classify conspiracy-related tweets.                          effectively build groups and spread their message.
   2.	 User identities—Celebrities and normal users were
       the most prevalent identity group in the conspiracy             A long-term goal of our research is to develop an NLP-
       group, and the United States originated and spread a         based system capable of solving a wide range of dis-/mis-
       disproportionate amount of pandemic-related                  information classification tasks (not just conspiracy theories).
       misinformation.                                              We have shown that the BERT-based model is scalable, fast,
   3.	 Bot strategies—Bots were more prevalent in the con-          and effective in classifying conspiracy theories specifically,
       spiracy group and were more likely to link to promi-         separate from other types of dis-/mis-information. The suc-
       nent Twitter users compared with bots in the                 cess of the BERT-based model in this context gives further
       non-conspiracy group. This strategy is also a way to         evidence to language models’ effectiveness in various appli-
       build their community.                                       cations and datasets. Future work should apply language
   4.	 Credibility differences—Users in the conspiracy              models to other important dis-/mis-information classifica-
       group linked to less credible sources and used fewer         tion tasks.
Moffitt et al.	                                                                                                                      15
Table 12. This Table Shows the Top 10 URL Domains Found in Conspiracy and Non-Conspiracy Tweets.
   We also found that those in the conspiracy group were          those in the conspiracy group differed noticeably from regular
more likely to be bot accounts and originate from the United      users. This type of analysis can be used going forward for
States. Disingenuous actors may be employing bot accounts         responding to real-time events where it is vital to know who is
to help spread their message more effectively in an auto-         promoting conspiracy theories, what strategies they are using,
mated or semi-automated fashion. Social media companies           and how we can best apply potential counter-measures.
often target unauthorized bots for removal, and this study
shows that this approach may be effective. Understanding          Declaration of Conflicting Interests
where these conspiracy theories originate can help social         The author(s) declared no potential conflicts of interest with respect
media companies and public health officials appropriately         to the research, authorship, and/or publication of this article.
target their prevention or response efforts.
   Compared with non-conspiracy tweets, we found fewer            Funding
unique URLs and domains in the conspiracy-labeled tweets.         The author(s) disclosed receipt of the following financial support
Conspiracy-related tweets also contained fewer unique             for the research, authorship, and/or publication of this article: The
hashtags than non-conspiracy tweets, but the average hashtag      research for this article was supported in part by the Office of Naval
usage per tweet was higher. Using a smaller set of unique         Research (ONR) under grants N00014182106 and N000141812108,
hashtags can help consolidate content and make it easier to       the Knight Foundation, the US Army, and by the Center for
find for interested users. The presence of fewer unique URLs      Informed Democracy and Social—cybersecurity (IDeaS). The
in the conspiracy dataset may be because either more real-        views and conclusions are those of the authors and should not be
news domains currently exist or conspiracy theorists consoli-     interpreted as representing the official policies, either expressed or
date on certain web domains to, again, make their content         implied, of the Knight Foundation, the ONR, the US Army, or the
easier to find. These results show the importance of commu-       US Government.
nity building when propagating conspiracy theories and that
conspiracy theory promoters are effective communicators.          ORCID iDs
Therefore, de-emphasizing or countering their commonly            J. D. Moffitt  https://orcid.org/0000-0001-6477-8338
used hashtags and URLs in social media search results could       Catherine King    https://orcid.org/0000-0002-1636-9887
be an effective policy response by social media companies.
   Leveraging language models was crucial for quickly ana-        References
lyzing conspiracy theories in the COVID-19 pandemic. We           Aaronovitch, D. (2010). Voodoo Histories: The role of the conspir-
found that the user types and communication strategies of             acy theory in shaping modern history. Riverhead Books.
16	                                                                                                                 Social Media + Society
Ahmed, W., Vidal-Alaball, J., Downing, J., & Seguí, F. L. (2020).       Douglas, K. M., Sutton, R. M., & Cichocka, A. (2017). The
     COVID-19 and the 5G conspiracy theory: Social network                   psychology of conspiracy theories. Current Directions in
     analysis of Twitter data. Journal of Medical Internet Research,         Psychological Science, 26, 538–542.
     22(5), Article e19458.                                             Douglas, K. M., Uscinski, J. E., Sutton, R. M., Cichocka, A., Nefes,
Aphiwongsophon, S., & Chongstitvatana, P. (2018). Detecting fake             T., Ang, C. S., & Deravi, F. (2019). Understanding conspiracy
     news with machine learning method. In 2018 15th International           theories. Political Psychology, 40(Suppl. 1), 3–35.
     Conference on Electrical Engineering/Electronics, Computer,        Evanega, S., Lynas, M., Adams, J., Smolenyak, K., & Insights, C.
     Telecommunications and Information Technology (ECTI-                    G. (2020). Coronavirus misinformation: Quantifying sources
     CON) (pp. 528–531). IEEE. https://ieeexplore.ieee.org/docu-             and themes in the covid-19 “infodemic.” JMIR Preprints.
     ment/8620051                                                            https://doi.org/10.2196/preprints.25143
Bartlett, J., & Miller, C. (2010). The power of unreason: Conspiracy    Fenster, M. (1999). Conspiracy theories: Secrecy and power in
     theories, extremism, and counter-terrorism. Demos.                      American culture. University of Minnesota Press.
Basu, T. (2020, July 15). How to talk to conspiracy theorists—and       Franks, B., Bangerter, A., & Bauer, M. (2013). Conspiracy theories
     still be kind. MIT Technology Review. https://www.technolo-             as quasi-religious mentality: An integrated account from cog-
     gyreview.com/2020/07/15/1004950/how-to-talk-to-conspir-                 nitive science, social representations theory, and frame theory.
     acy-theorists-and-still-be-kind/                                        Frontiers in Psychology, 4, Article 424.
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained          Georgiou, N., Delfabbro, P., & Balzan, R. (2020). Covid-19-related
     language model for scientific text. arXiv preprint arXiv,               conspiracy beliefs and their relationship with perceived stress
     1903.10676. https://arxiv.org/pdf/1903.10676.pdf                        and pre-existing conspiracy beliefs. Personality and Individual
Beskow, D. M., & Carley, K. M. (2018). Bot-hunter: A tiered                  Differences, 166, 110201.
     approach to detecting & characterizing automated activity on       Goertzel, T. (1994). Belief in conspiracy theories. Political
     Twitter. In SBP-BRiMS: International Conference on Social               Psychology, 15(4), 731–742.
     Computing, Behavioral-Cultural Modeling and Prediction and         Graham, M., Hale, S. A., & Gaffney, D. (2014). Where in the world
     Behavior Representation in Modeling and Simulation (Vol. 8).            are you? Geolocation and language identification in Twitter.
     Springer. http://www.casos.cs.cmu.edu/publications/papers/              The Professional Geographer, 66(4), 568–578.
     LB_5.pdf                                                           Heath, C., & Heath, D. (2007). Made to stick: Why some ideas sur-
Beskow, D. M., & Carley, K. M. (2020). You are known by your                 vive and others die. Random House.
     friends: Leveraging network metrics for Bot detection in           Hecht, B., Hong, L., Suh, B., & Chi, E. H. (2011). Tweets from
     Twitter. In M. A. Tayebi, U. Glässer, & D. B. Skillicorn (Eds.),        Justin Bieber’s heart: The dynamics of the location field
     Open source intelligence and cyber crime: Social media ana-             in user profiles. In Proceedings of the SIGCHI confer-
     lytics, lecture notes in social networks (pp. 53–88). Springer.         ence on human factors in computing systems (pp. 237–246).
Brittain, A., Zauzmer, J., Abelson, J., Willman, D., & Dungca, N.            Association for Computing Machinery. https://dl.acm.org/
     (2021, January 10). The Capitol mob: A raging collection of             doi/10.1145/1978942.1978976
     grievances and disillusionment. The Washington Post. https://      Huang, B. (2020). Learning user latent attributes on social media
     www.washingtonpost.com/investigations/2021/01/10/capitol-               [Doctoral thesis], Carnegie Mellon University.
     rioters-identified-arrested/                                       Huang, B., & Carley, K. (2019). A hierarchical location pre-
Brotherton, R. (2015). Suspicious minds: Why we believe conspir-             diction neural network for Twitter user geolocation. In
     acy theories. Bloomsbury Publishing.                                    Proceedings of the 2019 Conference on Empirical Methods
Byford, J. (2011). Conspiracy theories: A critical introduction.             in Natural Language Processing and the 9th International
     Palgrave Macmillan.                                                     Joint Conference on Natural Language Processing (EMNLP-
Carley, K. M. (2020). Social cybersecurity: An emerging science.             IJCNLP) (pp. 4732–4742). Association for Computational
     Computational and Mathematical Organization Theory, 26(4),              Linguistics. https://aclanthology.org/D19-1480/
     365–381.                                                           Huang, B., & Carley, K. M. (2020a). Discover your social identity
Conger, K. (2020, July 21). Twitter takedown targets QAnon                   from what you tweet: A content based approach (pp. 23–37).
     accounts. The New York Times. https://www.nytimes.                      Springer.
     com/2020/07/21/technology/twitter-bans-qanon-accounts.html         Huang, B., & Carley, K. M. (2020b). Disinformation and misinfor-
Cruickshank, I. J., & Carley, K. M. (2020). Characterizing com-              mation on Twitter during the novel coronavirus outbreak. arXiv
     munities of hashtag usage on Twitter during the 2020 covid-19           preprint arXiv, 2006.04278. https://arxiv.org/pdf/2006.04278.
     pandemic by multi-view clustering. Applied Network Science,             pdf
     5(1), 1–40.                                                        Imhoff, R., & Lamberty, P. (2020). A bioweapon or a hoax? The
Dentith, M. R. X., & Orr, M. (2018). Secrecy and conspiracy.                 link between distinct conspiracy beliefs about the coronavi-
     Episteme, 15(4), 433–450.                                               rus disease (COVID-19) outbreak and pandemic behavior.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT:             Social Psychological and Personality Science, 11(8), 1110–
     Pre-training of deep bidirectional transformers for language            1118.
     understanding. arXiv preprint arXiv, 1810.04805. https://arxiv.    Jolley, D., & Paterson, J. L. (2020). Pylons ablaze: Examining the
     org/pdf/1810.04805.pdf                                                  role of 5G covid-19 conspiracy beliefs and support for vio-
Douglas, K. M., & Leite, A. C. (2017). Suspicion in the workplace:           lence. British Journal of Social Psychology, 59(3), 628–640.
     Organizational conspiracy theories and work-related outcomes.      Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang,
     British Journal of Psychology, 108(3), 486–506.                         J. (2020). BioBERT: A pre-trained biomedical language
Moffitt et al.	                                                                                                                              17
     representation model for biomedical text mining. Bioin                 systems (pp. 5998–6008). https://papers.nips.cc/paper/2017/fil
     formatics, 36(4), 1234–1240.                                            e/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Memon, S. A., & Carley, K. M. (2020). Characterizing covid-19            Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi,
     misinformation communities using a novel Twitter data-                  A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J.,
     set. arXiv preprint arXiv, 2008.00791. https://arxiv.org/               Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C.,
     pdf/2008.00791.pdf                                                      Scao, T. L., Gugger, S., . . .Rush, A. M. (2020). Transformers:
Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. (2013). Is the sam-       State-of-the-art natural language processing. In Proceedings
     ple good enough? Comparing data from Twitter’s streaming                of the 2020 Conference on Empirical Methods in Natural
     API with Twitter’s Firehose. Proceedings of the International           Language Processing: System Demonstrations (pp. 38–45).
     AAAI Conference on Web and Social Media, 7(1), 400–408.                 Association for Computational Linguistics. https://aclanthol-
Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-                  ogy.org/2020.emnlp-demos.6.pdf
     Twitter-BERT: A natural language processing model to                Wood, M. J., Douglas, K. M., & Sutton, R. M. (2012). Dead and
     analyse covid-19 content on Twitter. arXiv preprint arXiv,              alive: Beliefs in contradictory conspiracy theories. Social
     2005.07503. https://arxiv.org/pdf/2005.07503.pdf                        Psychological and Personality Science, 3(6), 767–773.
Nefes, T. S. (2017). The impacts of the Turkish government’s con-        Zinoviev, D. (2017, July). Network analysis of conspiracy theo-
     spiratorial framing of the Gezi Park protests. Social Movement          ries and pseudosciences [Conference session]. International
     Studies, 16(5), 610–622.                                                Conference on Computational Social Science, Cologne,
Oleksy, T., Wnuk, A., Maison, D., & Łyś, A. (2020). Content mat-             Germany.
     ters. Different predictors and social consequences of general
     and government-related conspiracy theories on COVID-19.
     Personality and Individual Differences, 168, 110289.
                                                                         Author Biographies
Rebello, K., Schwieter, C., Schliebs, M., Joynes-Burgess, K.,            J. D. Moffitt is a Societal Computing PhD student in the Institute for
     Elswah, M., Bright, J., & Howard, P. N. (2020). Covid-19 news       Software Research at Carnegie Mellon University. At the Naval
     and information from state-backed outlets targeting French,         Postgraduate School, he earned his MS in Operations Research. His
     German and Spanish-speaking social media users [Data                research interests include applications of natural language process-
     memo]. Project on Computational Propaganda.                         ing to identify and understand disinformation and applications of
Seitz, A. (2021). Mob at U.S. Capitol encouraged by online con-          dynamic network analysis and machine learning to identify and
     spiracy theories. The Associated Press.                             mitigate influence operations.
Shahsavari, S., Holur, P., Wang, T., Tangherlini, T. R., &
                                                                         Catherine King is a Societal Computing PhD student in the Institute
     Roychowdhury, V. (2020). Conspiracy in the time of corona:
                                                                         for Software Research at Carnegie Mellon University. At the
     Automatic detection of emerging COVID-19 conspiracy theo-
                                                                         College of William & Mary, she earned both her MS in
     ries in social media and the news. Journal of Computational
                                                                         Computational Operations Research and her BS in Mathematics
     Social Science, 3(2), 279–317.
                                                                         with a minor in Computer Science. Her research focuses on the
Sternisko, A., Cichocka, A., Cislak, A., & Bavel, J. J. V. (2020).
                                                                         societal impact of misinformation and polarization, including their
     Collective narcissism predicts the belief and dissemination of
                                                                         impact on elections and public policy.
     conspiracy theories during the COVID-19 pandemic. Preprint,
     PsyArXiv. https://doi.org/10.31234/osf.io/4c6av                     Kathleen M. Carley (PhD Harvard, HD University of Zurich) is a
Sternisko, A., Cichocka, A., & Van Bavel, J. J. (2020). The dark         Professor of Computer Science in the Institute for Software
     side of social movements: Social identity, non-conformity, and      Research, IEEE Fellow, Director of the Center for Computational
     the lure of conspiracy theories. Current Opinion in Psychology,     Analysis of Social and Organizational Systems (CASOS), and
     35, 1–6.                                                            Director of the Center for Informed Democracy And Social—
Tangherlini, T. R., Shahsavari, S., Shahbazi, B., Ebrahimzadeh,          cybersecurity (IDeaS) at Carnegie Mellon University, and CEO of
     E., & Roychowdhury, V. (2020). An automated pipeline for            Netanomics. She is the recipient of the USGA Academic Award at
     the discovery of conspiracy and conspiracy theory narrative         GEOINT 2018 for her work on geo-spatially enabled dynamic net-
     frameworks: Bridgegate, Pizzagate and storytelling on the           work analytics, the Allen Newell award for research excellence, the
     web. PLOS ONE, 15(6), Article e0233879.                             Lifetime Achievement Award from the Sociology and Computers
United Nations. (2020). UN tackles “infodemic” of misinformation         Section of the ASA (2001), and the Simmel Award for advances in
     and cybercrime in COVID-19 crisis.                                  social networks from INSNA (2011). Her research combines cogni-
Uscinski, J. E., & Parent, J. M. (2014). American conspiracy theo-       tive science, sociology, and computer science to address complex
    ries. Oxford University Press.                                       social and organizational issues. Her pioneering research led to the
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,          areas of computational social science, dynamic network analysis,
    Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention         and social cybersecurity. She has over 400 publications and has
    is all you need. In Advances in neural information processing        served on multiple National Academies panels.