0% found this document useful (0 votes)
53 views9 pages

Fakeddit: A Multimodal Fake News Dataset

The document presents Fakeddit, a new multimodal dataset for fine-grained fake news detection, containing over 1 million samples sourced from various subreddits on Reddit. It addresses the limitations of existing datasets by providing comprehensive text, image, metadata, and comment data, allowing for multiple classification categories. The dataset aims to enhance research in fake news detection through its extensive scale and multimodal approach, facilitating the development of more effective detection models.

Uploaded by

NUSRAT HUSSAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views9 pages

Fakeddit: A Multimodal Fake News Dataset

The document presents Fakeddit, a new multimodal dataset for fine-grained fake news detection, containing over 1 million samples sourced from various subreddits on Reddit. It addresses the limitations of existing datasets by providing comprehensive text, image, metadata, and comment data, allowing for multiple classification categories. The dataset aims to enhance research in fake news detection through its extensive scale and multimodal approach, facilitating the development of more effective detection models.

Uploaded by

NUSRAT HUSSAIN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 6149–6157

Marseille, 11–16 May 2020


c European Language Resources Association (ELRA), licensed under CC-BY-NC

r/Fakeddit:
A New Multimodal Benchmark Dataset for
Fine-grained Fake News Detection
Kai Nakamura*¶, Sharon Levy*§, William Yang Wang§
¶Laguna Blanca School
§University of California, Santa Barbara
kai.nakamura42@[Link], {sharonlevy, william}@[Link]

Abstract
Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as
well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat
the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news
research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment
data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset
consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the
samples are labeled according to 2-way, 3-way, and 6-way classification categories through distant supervision. We construct hybrid
text+image models and perform extensive experiments for multiple variations of classification, demonstrating the importance of the
novel aspect of multimodality and fine-grained classification unique to Fakeddit.

Keywords: fake news, machine learning, multimodal

1. Introduction dia sources including images, which often supplement the


Within our progressively digitized society, the spread of text. In addition, many datasets are small in size and vari-
fake news and disinformation has enlarged in journal- ation. For example, Abu Salem et al. (2019) aims to in-
ism, news reporting, social media, and other forms of crease the diversity of fake news by covering news that
online information consumption. False information from goes beyond the scope of conventional American political
these sources, in turn, has caused many problems such news. However, it suffers the problem of only consisting
as spurring irrational fears during medical outbreaks like of less than 1000 samples, limiting its extent to which it
Ebola1 . The dissemination and consequences of fake news can contribute to fake news research. Moreover, many con-
are exacerbating due to the rise of popular social media ap- ventional datasets label their data binarily (true and false).
plications and other online sources with inadequate fact- However, fake news can be categorized into many different
checking or third-party filtering, enabling any individual to types (Wardle, 2017). These problems significantly affect
broadcast fake news easily and at a large scale (Allcott and the quality of fake news research and detection.
Gentzkow, 2017). Though steps have been taken to detect We overcome these limitations posed by conventional
and eliminate fake news, it still poses a dire threat to society datasets through the dataset we propose: Fakeddit3 , a novel
(Dreyfuss and Lapowsky, 2019). According to a Pew Re- multimodal fake news detection dataset consisting of over
search Center report2 , 50% of Americans view fake news 1 million samples with 2-way, 3-way, and 6-way classifica-
as a critical problem, placing it above violent crime. In ad- tion labels, along with comment data and metadata. We
dition, the report found that 68% of Americans view fake sourced our data from multiple subreddits from Reddit4 .
news as having a significant impact on their confidence of Our dataset will expand fake news research into the mul-
the government and 54% viewed it as having a large impact timodal space and allow researchers to develop stronger,
in their trust in one another. As such, research in the area more generalized, fine-grained fake news detection sys-
of fake news detection is of high importance for society. tems. We provide examples from our dataset in Figure 1.
To build a fake news detection model, one must obtain siz- Our contributions to the study of fake news detection are:
able and diverse training data. Within this area of research,
• We create a large-scale multimodal fake news dataset
there are several existing published datasets. However, they
consisting of over 1 million samples containing text,
have many constraints: limited size, modality, and granu-
image, metadata, and comments data from a highly
larity. Most conventional fake news research and datasets
diverse set of resources.
such as LIAR (Wang, 2017) and Some-Like-It-Hoax (Tac-
chini et al., 2017) solely focus on text data. However, on- • Each data sample consists of multiple labels, allowing
line information today is also consumed through multime- users to utilize the dataset for 2-way, 3-way, and 6-
way classification. This enables both high-level and
* Equal Contribution. fine-grained fake news classification. Samples are also
1
[Link]
fake-news-stories-brain-cant-ignore
2 3
[Link] [Link]
4
say-made-up-news-is-a-critical-problem-that-needs-to-be-fixed/ [Link]

6149
Figure 1: Dataset examples with 6-way classification labels.

thoroughly refined through multiple steps of quality 2.2. Image Datasets


assurance. Most of the existing fake news datasets collect only text
data. However, fake news can also come in the form of
• We evaluate our dataset through text, image, and images. Existing fake image datasets are limited in size
text+image modes with neural network architectures and diversity, making dataset research in this area impor-
that integrate both the image and text data. We run ex- tant. Image features supply models with more data that
periments for several types of baseline models, provid- can help immensely to identify fake images and news that
ing a comprehensive overview of classification results have image data. We analyze three traditional fake image
and demonstrating the significance of multimodality datasets that have been published. The Image Manipulation
present in Fakeddit. dataset (Christlein et al., 2012) contains self-taken manip-
ulated images for image manipulation detection. The PS-
Battles dataset (Heller et al., 2018) is an image dataset con-
2. Related Work taining manipulated image derivatives from one subreddit.
We expand upon the size and scope of the data provided
A variety of datasets for fake news detection have been pub- from the same subreddit in the PS-Battles dataset by ex-
lished in recent years. These are listed in Table 1, along panding the size and time range as well as including text
with their specific characteristics. data and other metadata. This expanded data makes up
only two of 22 sources of data present in our research. The
2.1. Text Datasets image-verification-corpus (Boididou et al., 2018), like ours,
When comparing fake news datasets, a few trends can be contains both text and image data. While it does contain a
seen. Most of the datasets are small in size, which can be larger number of samples than other conventional datasets,
ineffective for current machine learning models that require it still pales in comparison to Fakeddit.
large quantities of training data. Only four datasets con-
tain over half a million samples, with CREDBANK (Mitra 2.3. Fact-Checking
and Gilbert, 2015) and FakeNewsCorpus5 being the largest, Due to the unique aspect of multimodality, Fakeddit can
both containing millions of samples. In addition, many also be applied to the realm of implicit fact-checking.
of the datasets separate their data into a small number of Other existing datasets utilized for fact-checking include
classes, such as fake vs. true. Datasets such as NELA-GT- FEVER (Thorne et al., 2018) and Fauxtography (Zlatkova
2018 (Nørregaard et al., 2019), LIAR (Wang, 2017), and et al., 2019). The former consists of altered claims utilized
FakeNewsCorpus provide more fine-grained labels. While for textual verification. The latter utilizes both text and im-
some datasets include data from a variety of categories (Zu- age data in order to fact-check claims about images. Using
biaga et al., 2016), many contain data from specific areas, both text and image data, researchers can use Fakeddit for
such as politics and celebrity gossip (Tacchini et al., 2017; verifying truth and proof: utilizing image data as evidence
Pathak and Srihari, 2019; Shu et al., 2018; Abu Salem et for text truthfulness or using the text data as evidence for
al., 2019; Santia and Williams, 2018)6 . These data samples image truthfulness.
may contain limited scopes of context and styles of writing Compared to other existing datasets, Fakeddit provides a
due to their limited number of categories. larger breadth of novel features that can be applied in a
number of applications: fake news text, image, text+image
5
[Link] classification as well as implicit fact-checking. Other data
6 provided, such as comments data, enable more applica-
[Link]
check tions.

6150
Dataset Size (# of samples) # of Classes Modality Source Data Category
LIAR 12,836 6 text Politifact political
FEVER 185,445 3 text Wikipedia variety
BUZZFEEDNEWS 2,282 4 text Facebook political
BUZZFACE 2,263 4 text Facebook political
some-like-it-hoax 15,500 2 text Facebook scientific/conspiracy
PHEME 330 2 text Twitter variety
CREDBANK 60,000,000 5 text Twitter variety
Breaking! 700 2,3 text BS Detector political
NELA-GT-2018 713,000 8 IA text 194 news outlets variety
FAKENEWSNET 602,659 2 text Twitter political/celebrity
FakeNewsCorpus 9,400,000 10 text [Link] variety
FA-KES 804 2 text 15 news outlets Syrian war
Image Manipulation 48 2 image self-taken variety
Fauxtography 1,233 2 text, image Snopes, Reuters variety
image-verification-corpus 17,806 2 text, image Twitter variety
The PS-Battles Dataset 102,028 2 image Reddit manipulated content
Fakeddit (ours) 1,063,106 2,3,6 text, image Reddit variety

Table 1: Comparison of various fake news detection datasets. IA: Individual assessments.

3. Fakeddit Dataset Statistics


3.1. Data Collection Total samples 1,063,106
Fake samples 628,501
We sourced our dataset from Reddit, a social news and dis-
True samples 527,049
cussion website where users can post submissions on vari- Multimodal samples 682,996
ous subreddits. Reddit is one of the top 20 websites in the Subreddits 22
world by traffic7 . Each subreddit has its own theme. For Unique users 358,504
example, ‘nottheonion’ is a subreddit where people post Unique domains 24,203
seemingly false stories that are surprisingly true. Active Timespan 3/19/2008 - 10/24/2019
Reddit users are able to upvote, downvote, and submit com- Mean words per submission 8.27
ments on the submissions. Mean comments per submission 17.94
Fakeddit consists of over 1 million submissions from 22 Vocabulary size 175,566
different subreddits. The specific subreddits can be found Training set size 878,218
Validation set size 92,444
in the Appendix. As depicted in Table 2, the samples span
Released test set size 92,444
over almost a decade and are posted on highly active and
Unreleased set size 92,444
popular pages by over 300,000 unique individual users, al-
lowing us to capture a wide variety of perspectives. Having
Table 2: Fakeddit dataset statistics
a decade’s worth of recent data allows machine learning
models to stay attuned to contemporary cultural-linguistic
patterns and current events. Our data also varies in its con-
relate or describe the image. We harvest these comments
tent, because of the array of the chosen subreddits, rang-
from the photoshopbattles subreddit and treat them as sub-
ing from political news stories to simple everyday posts by
mission data to incorporate in our submission dataset, sig-
Reddit users.
nificantly contributing to the total number of multimodal
Submissions were collected with the [Link] API8 with
samples. Approximately 64% of the samples in our dataset
the earliest submission being from March 19, 2008, and the
contain both text and images. These multimodal samples
most recent submission being from October 24, 2019. We
are used for our baseline experiments and error analysis.
gathered the submission title and image, comments made
by users who engaged with the submission, as well as other
3.2. Quality Assurance
submission metadata including the score, the username of
the author, subreddit source, sourced domain, number of Because our dataset contains over one million samples, it
comments, and up-vote to down-vote ratio. From the pho- is crucial to make sure that it contains reliable data. To do
toshopbattles subreddit, we treated both submission and so, we have several levels of data processing. The first is
comment data as submission data. In the photoshopbattles provided through the subreddit pages. Each subreddit has
subreddit, users post submissions that contain true images. moderators that ensure submissions pertain to the subred-
Other users then manipulate these submission images and dit theme. The job of these moderators is to remove posts
post these doctored images as comments on the submis- that violate any rules. As a result, the data goes through its
sion’s page. These comments also contain text data that first round of refinement. The next stage occurs when we
start collecting the data. In this phase, we utilize Reddit’s
7
[Link] upvote/downvote score feature. This feature is intended to
8
[Link] not only signify another user’s approval for the post but also

6151
indicates that a post does not contribute to the subreddit’s
theme or is off-topic if it has a low score9 . As such, we
filtered out any submissions that had a score of less than 1
to further ensure that our data is credible. We assume that
invalid or irrelevant posts within a subreddit would be ei-
ther removed or down-voted to a score of less than 1. The
high popularity of the Reddit website makes this step par-
ticularly effective as thousands of individual users can give
their opinion of the quality of various submissions.
Our final degree of quality assurance is done manually and
occurs after the previous two stages. We randomly sampled
10 posts from each subreddit in order to determine whether
the submissions really do pertain to each subreddit’s theme.
If any of the 10 samples varied from this, we decided to re-
move the subreddit from our list. As a result, we ended up Figure 2: Distributions of word length in Fakeddit and
with 22 subreddits to keep our processed data after this fil- FEVER datasets. We exclude samples that have more than
tering. When labeling our dataset, we labeled each sample 100 words.
according to its subreddit’s theme. These labels were deter-
mined during the last processing phase, as we were able to
look through many samples for each subreddit. Each sub-
reddit is labeled with one 2-way, 3-way, and 6-way label.
Lastly, we cleaned the submission title text: we removed all
punctuation, numbers, and revealing words such as “PsBat-
tle” and “colorized” that automatically reveal the subreddit
source. For the savedyouaclick subreddit, we removed text
following the “ ” character and classified it as misleading
content. We also converted all the text to lowercase.
As mentioned above, we do not manually label each sample
and instead label our samples based on their respective sub-
reddit’s theme. By doing this, we employ distant supervi-
sion, a commonly used technique, to create our final labels.
While this may create some noise within the dataset, we Figure 3: Type-caption curve of Fakeddit vs. FEVER with
aim to remove this from our pseudo-labeled data. By go- 4-gram type.
ing through these stages of quality assurance, we can deter-
mine that our final dataset is credible and each subreddit’s
label will accurately identify the posts that it contains. We of fake news rather than just doing a simple binary or tri-
test this by randomly sampling 150 text-image pairs from nary classification. This can help in pinpointing the degree
our dataset and having two of our researchers individually and variation of fake news for applications that require this
manually label them for 6-way classification. It is difficult type of fine-grained detection. In addition, it will enable re-
to narrow down each sample to exactly one subcategory, searchers to focus on specific types of fake news classifica-
especially for those not working in the journalism industry. tion if they desire; for example, focusing on satire only. For
We achieve a Cohen’s Kappa coefficient (Cohen, 1960) of the 6-way classification, the first label is true and the other
0.54, showing moderate agreement and that some samples five are defined within the seven types of fake news (War-
may represent more than one label. While we only provide dle, 2017). Only five types of fake news were chosen as we
each sample with one 6-way label, future work can help did not find subreddits with posts aligning with the remain-
identify multiple labels for each text-image pair. ing two types. We provide examples from each class for
6-way classification in Figure 1. The 6-way classification
3.3. Labeling labels are explained below:
We provide three labels for each sample, allowing us to True: True content is accurate in accordance with fact.
train for 2-way, 3-way, and 6-way classification. Having Eight of the subreddits fall into this category, such as us-
this hierarchy of labels will enable researchers to train for news and mildlyinteresting. The former consists of posts
fake news detection at a high level or a more fine-grained from various news sites. The latter encompasses real pho-
one. The 2-way classification determines whether a sample tos with accurate captions.
is fake or true. The 3-way classification determines whether Satire/Parody: This category consists of content that spins
a sample is completely true, the sample is fake and con- true contemporary content with a satirical tone or informa-
tains text that is true (i.e. direct quotes from propaganda tion that makes it false. One of the four subreddits that
posters), or the sample is fake with false text. Our final 6- make up this label is theonion, with headlines such as “Man
way classification was created to categorize different types Lowers Carbon Footprint By Bringing Reusable Bags Ev-
ery Time He Buys Gas”.
9
[Link] Misleading Content: This category consists of informa-

6152
Dataset 1-gram 2-gram 3-gram 4-gram Meanwhile, FEVER’s longest text length stops at less than
FEVER 40874 179525 315025 387093 70 words.
Fakeddit 61141 507512 767281 755929 Secondly, we examine the linguistic variety of our dataset
by computing the Type-Caption Curve, as defined in (Wang
Table 3: Unique n-grams for FEVER and Fakeddit for et al., 2019). Figure 3 shows these results. Fakeddit pro-
equal sample size (FEVER’s total dataset size). vides significantly more lexical diversity. Even though
Fakeddit contains more samples than FEVER, the number
of unique n-grams contained in similar sized samples are
still much higher than those within FEVER. These effects
will be magnified as Fakeddit contains more than 5 times
more total samples than FEVER. In Table 3, we show the
number of unique n-grams for both datasets when sampling
n samples, where n is equal to FEVER’s dataset size. This
demonstrates that for all n-gram sizes, our dataset is more
lexically diverse than FEVER’s for equal sample sizes.
These salient text features - longer text lengths, broad ar-
ray of text lengths, and higher linguistic variety - high-
light Fakeddit’s diversity. This diversity can strengthen fake
news detection systems by increasing their lexical scope.

4. Experiments
4.1. Fake News Detection
Multiple methods were employed for text and image fea-
Figure 4: Multimodal model for integrating text and image ture extraction. We used InferSent (Conneau et al., 2017)
data for 2, 3, and 6-way classification. n, the hidden layer and BERT (Devlin et al., 2019) to generate text embeddings
size, is tuned for each model instance through hyperparam- for the title of the Reddit submissions. VGG16 (Simonyan
eter optimization. and Zisserman, 2015), EfficientNet (Tan and Le, 2019), and
ResNet50 (He et al., 2016) were utilized to extract the fea-
tures of the Reddit submission thumbnails.
tion that is intentionally manipulated to fool the audience. We used the InferSent model because it performs very well
Our dataset contains three subreddits in this category. as a universal sentence embeddings generator. For this
Imposter Content: This category contains two subreddits, model, we loaded a vocabulary of 1 million of the most
which contain bot-generated content and are trained on a common words in English and used fastText embeddings
large number of other subreddits. (Joulin et al., 2017). We obtained encoded sentence fea-
False Connection: Submission images in this category do tures of length 4096 for each submission title using In-
not accurately support their text descriptions. We have four ferSent.
subreddits with this label, containing posts of images with In addition, we used the BERT model. BERT achieves
captions that do not relate to the true meaning of the image. state-of-the-art results on many classification tasks, in-
Manipulated Content: Content that has been purposely cluding Q&A and named entity recognition. To ob-
manipulated through manual photo editing or other forms tain fixed-length BERT embedding vectors, we used the
of alteration. The photoshopbattle subreddit comments (not bert-as-service(Xiao, 2018) tool, to map variable-length
submissions) make up the entirety of this category. Sam- text/sentences into a 768 element array for each Reddit
ples contain doctored derivatives of images from the sub- submission title. For our experiments, we utilized the pre-
missions. trained BERT-Large, Uncased model.
We employed VGG16, ResNet50, and EfficientNet mod-
3.4. Dataset Analysis els for encoding images. VGG16 and ResNet50 are widely
In Table 2, we provide an overview of specific statistics per- used by many researchers, while EfficientNet is a relatively
taining to our dataset such as vocabulary size and number newer model. For EfficientNet, we used variation: B4. This
of unique users. We also provide a more in-depth analysis was chosen as it is comparable to ResNet50 in terms of
in comparison to another sizable dataset, FEVER. FLOP count. For the image models, we preloaded weights
First, we choose to examine the word lengths of our text of models trained on ImageNet and included the top layer
data. Figure 2 shows the proportion of samples per text and used the penultimate layer for feature extraction.
length for both Fakeddit and FEVER. It can be seen that
our dataset contains a higher proportion of longer text start- 4.2. Experiment Settings
ing from word lengths of around 17, while FEVER’s cap- As mentioned in section 3.2, the text was cleaned thor-
tions peak at around 10 words. In addition, while FEVER’s oughly through a series of steps. We also prepared the im-
peak is very sharp, Fakeddit has a much smaller and more ages by constraining the sizes of the images to match the
gradual slope. Fakeddit also provides a broader diversity input size of the image models. We applied necessary im-
of text lengths, with samples containing almost 100 words. age preprocessing required for the image models.

6153
2-way 3-way 6-way
Type Text Image Validation Test Validation Test Validation Test
Text BERT – 0.8654 0.8644 0.8582 0.8580 0.7696 0.7677
InferSent – 0.8634 0.8631 0.8569 0.8570 0.7652 0.7666
Image – VGG16 0.7355 0.7376 0.7264 0.7293 0.6462 0.6516
– EfficientNet 0.6115 0.6087 0.5877 0.5828 0.4152 0.4153
– ResNet50 0.8043 0.8070 0.7966 0.7988 0.7529 0.7549
Text+Image InferSent VGG16 0.8655 0.8658 0.8618 0.8624 0.8130 0.8130
InferSent EfficientNet 0.8328 0.8339 0.8259 0.8256 0.7266 0.7280
InferSent ResNet50 0.8888 0.8891 0.8855 0.8863 0.8546 0.8526
BERT VGG16 0.8694 0.8699 0.8644 0.8655 0.8177 0.8208
BERT EfficientNet 0.8334 0.8318 0.8265 0.8255 0.7258 0.7272
BERT ResNet50 0.8929 0.8909 0.8905 0.8890 0.8600 0.8588

Table 4: Results on fake news detection for 2, 3, and 6-way classification with combination method of maximum.

2-way 3-way 6-way


Combination Methods Validation Test Validation Test Validation Test
Add 0.8551 0.8551 0.8509 0.8505 0.8206 0.8235
Concatenate 0.8564 0.8568 0.8531 0.8530 0.8237 0.8249
Maximum 0.8929 0.8909 0.8905 0.8890 0.8600 0.8588
Average 0.8554 0.8561 0.8512 0.8518 0.8229 0.8242

Table 5: Results on different multi-modal combinations for BERT + ResNet50

For our experiments, we excluded submissions that have by text-only, and image-only. Thus, image and text multi-
either text or image data missing. We performed 2-way, modality present in our dataset significantly improves fake
3-way, and 6-way classification for each of the three types news detection. The “maximum” method to merge image
of inputs: image only, text only, and multimodal (text and and text features yielded the highest accuracy. Overall, the
image). As in Figure 4, when combining the features in multimodal model that combined BERT text features and
multimodal classification, we first condensed them into n- ResNet50 image features through the maximum method
element vectors through a trainable dense layer and then performed most optimally. The best 6-way classification
merged them through four different methods: add, concate- model parameters were: hidden layer sizes of 224 units,
nate, maximum, average. These features were then passed 1e-4 learning rate, trained over 20 epochs.
through a fully connected softmax predictor. For all ex-
periments, we tuned the hyperparameters on the validation 5. Error Analysis
dataset using the keras-tuner tool10 . Specifically, we em-
We conduct an error analysis on our 6-way detection model
ployed the Hyperband tuner (Li and Jamieson, 2018) to
by examining samples from the test set that the model pre-
find optimal hyperparameters for the hidden layer size and
dicted incorrectly. A subset of these samples is shown in
learning rates. The hyperparameters are tuned on the val-
Table 6. Firstly, the model had the most difficult time iden-
idation set. We varied the number of units in the hidden
tifying imposter content. This category contains subreddits
layer from 32 to 256 with increments of 32. For the opti-
that contain machine-generated samples. Recent advances
mizer, we used Adam (Kingma and Ba, 2014) and tested
in machine learning such as Grover (Zellers et al., 2019),
three learning rate values: 1e-2, 1e-3, 1e-4. For the mul-
a model that produces realistic-looking machine-generated
timodal model, the unit size hyperparameter affected the
news articles, have allowed machines to automatically gen-
sizes of the 3 layers simultaneously: the 2 layers that are
eration human-like material. Our model has a relatively
combined and the layer that is the result of the combina-
difficult time identifying these samples. The second cate-
tion. For non-multimodal models, we utilized a single size-
gory the model had the poorest performance on was satire
tunable hidden layer, followed by a softmax predictor. For
samples. The model may have a difficult time identifying
each model, we specified a maximum of 20 epochs and an
satire because creators of satire tend to focus on creating
early stopping callback to halt training if the validation ac-
content that seems similar to real news if one does not have
curacy decreased.
a sufficient level of contextual knowledge. Classifying the
4.3. Results data into these two categories (imposter content and satire)
The results are shown in Tables 4 and 5. For image and mul- are complex challenges, and our baseline results show that
timodal classification, ResNet50 performed the best fol- there is significant room for improvement in these areas.
lowed by VGG16 and EfficientNet. In addition, BERT On the other hand, the model was able to correctly classify
achieved better results than InferSent for multimodal classi- almost all manipulated content samples. We also found that
fication. Multimodal features performed the best, followed misclassified samples were frequently categorized as being
true. This can be attributed to the relative size of true sam-
10
[Link] ples in the 6-way classification. While we have compara-

6154
Text Image Predicted Label Gold Label PM(%)

volcanic eruption in
False Connection True 17.9
bali last night

nascar race stops to


wait for family of ducks True Satire 32.8
to pass

cars race towards nu-


True False Connection 17.8
clear explosion

bear experiences get-


ting hit in the cinema Satire Imposter Content 55.7
rule, your child again

three corgis larping at


True Manipulated Content 3.3
the beach

mighty britain getting


tied down in south
False Connection Misleading Content 16.9
africa during boer bar
circa

Table 6: Classification errors on the BERT+ResNet50 model for 6-way classification. PM: Proportion of samples misclas-
sified within each Gold label.

ble sizes of fake and true samples for 2-way classification, conducted using our dataset’s unique multimodality aspect.
6-way breaks down the fake news into more fine-grained We hope that our dataset can be used to advance efforts to
classes. As a result, the model trains on a higher number of combat the ever-growing rampant spread of disinformation
true samples and may be inclined to predict this label. in today’s society.

6. Conclusion Acknowledgments
In this paper, we presented a novel dataset for fake news We would like to acknowledge Facebook for the Online
research, Fakeddit. Compared to previous datasets, Faked- Safety Benchmark Award. The authors are solely responsi-
dit provides a large number of multimodal samples with ble for the contents of the paper, and the opinions expressed
multiple labels for various levels of fine-grained classifi- in this publication do not reflect those of the funding agen-
cation. We conducted several experiments with multiple cies.
baseline models and performed an error analysis on our
results, highlighting the importance of large scale multi-
7. Bibliographical References
modality unique to Fakeddit and demonstrating that there is
still significant room for improvement in fine-grained fake Abu Salem, F. K., Al Feel, R., Elbassuoni, S., Jaber, M., and
news detection. Our dataset has wide-ranging practicalities Farah, M. (2019). Fa-kes: A fake news dataset around
in fake news research and other research areas. Although the syrian war. Proceedings of the International AAAI
we do not utilize submission metadata and comments made Conference on Web and Social Media, 13(01):573–582,
by users on the submissions, we anticipate that these ad- Jul.
ditional multimodal features will be useful for further fake Allcott, H. and Gentzkow, M. (2017). Social media and
news research. For example, future research can look into fake news in the 2016 election. Journal of Economic Per-
tracking a user’s credibility through using the metadata and spectives, 31(2):211–36, May.
comment data provided and incorporating video data as an- Boididou, C., Papadopoulos, S., Zampoglou, M., Apos-
other multimedia source. Implicit fact-checking research tolidis, L., Papadopoulou, O., and Kompatsiaris, Y.
with an emphasis on image-caption verification can also be (2018). Detection and visualization of misleading con-

6155
Subreddit 6-Way Label URL
photoshopbattles submissions True [Link]
nottheonion True [Link]
neutralnews True [Link]
pic True [Link]
usanews True [Link]
upliftingnews True [Link]
mildlyinteresting True [Link]
usnews True [Link]
fakealbumcovers Satire [Link]
satire Satire [Link]
waterfordwhispersnews Satire [Link]
theonion Satire [Link]
propagandaposters Misleading Content [Link]
fakefacts Misleading Content [Link]
savedyouaclick Misleading Content [Link]
misleadingthumbnails False Connection [Link]
confusing perspective False Connection [Link] perspective
pareidolia False Connection [Link]
fakehistoryporn False Connection [Link]
subredditsimulator Imposter Content [Link]
subsimulatorgpt2 Imposter Content [Link]
photoshopbattles comments Manipulated Content [Link]

Table 7: List of subreddits used in Fakeddit.

tent on twitter. International Journal of Multimedia In- (2017). Bag of tricks for efficient text classification.
formation Retrieval, 7(1):71–86. In Proceedings of the 15th Conference of the European
Christlein, V., Riess, C., Jordan, J., Riess, C., and An- Chapter of the Association for Computational Linguis-
gelopoulou, E. (2012). An evaluation of popular copy- tics: Volume 2, Short Papers, pages 427–431, Valencia,
move forgery detection approaches. IEEE Transactions Spain, April. Association for Computational Linguistics.
on information forensics and security, 7(6):1841–1854. Kingma, D. P. and Ba, J. (2014). Adam: A
Cohen, J. (1960). A coefficient of agreement for nomi- method for stochastic optimization. arXiv preprint
nal scales. Educational and psychological measurement, arXiv:1412.6980.
20(1):37–46. Li, L. and Jamieson, K. (2018). Hyperband: A novel
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bor- bandit-based approach to hyperparameter optimization.
des, A. (2017). Supervised learning of universal sen- Journal of Machine Learning Research, 18:1–52.
tence representations from natural language inference Mitra, T. and Gilbert, E. (2015). Credbank: A large-scale
data. In Proceedings of the 2017 Conference on Em- social media corpus with associated credibility annota-
pirical Methods in Natural Language Processing, pages tions. In Ninth International AAAI Conference on Web
670–680, Copenhagen, Denmark, September. Associa- and Social Media.
tion for Computational Linguistics. Nørregaard, J., Horne, B. D., and Adali, S. (2019). Nela-
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. gt-2018: A large multi-labelled news dataset for the
(2019). BERT: Pre-training of deep bidirectional trans- study of misinformation in news articles. Proceedings
formers for language understanding. In Proceedings of of the International AAAI Conference on Web and Social
the 2019 Conference of the North American Chapter of Media, 13(01):630–638, Jul.
the Association for Computational Linguistics: Human Pathak, A. and Srihari, R. (2019). BREAKING! presenting
Language Technologies, Volume 1 (Long and Short Pa- fake news corpus for automated fact checking. In Pro-
pers), pages 4171–4186, Minneapolis, Minnesota, June. ceedings of the 57th Annual Meeting of the Association
Association for Computational Linguistics. for Computational Linguistics: Student Research Work-
Dreyfuss, E. and Lapowsky, I. (2019). Facebook is chang- shop, pages 357–362, Florence, Italy, July. Association
ing news feed (again) to stop fake news. Wired. for Computational Linguistics.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid- Santia, G. C. and Williams, J. R. (2018). Buzzface: A
ual learning for image recognition. In Proceedings of the news veracity dataset with facebook user commentary
IEEE conference on computer vision and pattern recog- and egos. In Twelfth International AAAI Conference on
nition, pages 770–778. Web and Social Media.
Heller, S., Rossetto, L., and Schuldt, H. (2018). The PS- Shu, K., Mahudeswaran, D., Wang, S., Lee, D., and
Battles Dataset – an Image Collection for Image Manip- Liu, H. (2018). Fakenewsnet: A data repository with
ulation Detection. CoRR, abs/1804.04866. news content, social context and dynamic information
Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. for studying fake news on social media. arXiv preprint

6156
arXiv:1809.01286. Appendix
Simonyan, K. and Zisserman, A. (2015). Very deep convo- We show the list of subreddits in Table 7.
lutional networks for large-scale image recognition. In
International Conference on Learning Representations.
Tacchini, E., Ballarin, G., Vedova, M. L. D., Moret, S.,
and de Alfaro, L. (2017). Some like it hoax: Auto-
mated fake news detection in social networks. CoRR,
abs/1704.07506.
Tan, M. and Le, Q. V. (2019). Efficientnet: Rethinking
model scaling for convolutional neural networks.
Thorne, J., Vlachos, A., Christodoulopoulos, C., and Mit-
tal, A. (2018). FEVER: a large-scale dataset for fact ex-
traction and VERification. In Proceedings of the 2018
Conference of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Language
Technologies, Volume 1 (Long Papers), pages 809–819,
New Orleans, Louisiana, June. Association for Compu-
tational Linguistics.
Wang, X., Wu, J., Chen, J., Li, L., Wang, Y.-F., and
Wang, W. Y. (2019). Vatex: A large-scale, high-quality
multilingual dataset for video-and-language research. In
The IEEE International Conference on Computer Vision
(ICCV), October.
Wang, W. Y. (2017). “liar, liar pants on fire”: A new bench-
mark dataset for fake news detection. In Proceedings of
the 55th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 2: Short Papers), pages 422–
426, Vancouver, Canada, July. Association for Computa-
tional Linguistics.
Wardle, C. (2017). Fake news. it’s complicated. First
Draft.
Xiao, H. (2018). bert-as-service. [Link]
com/hanxiao/bert-as-service.
Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi,
A., Roesner, F., and Choi, Y. (2019). Defending against
neural fake news. In Advances in Neural Information
Processing Systems 32.
Zlatkova, D., Nakov, P., and Koychev, I. (2019). Fact-
checking meets fauxtography: Verifying claims about
images. In Proceedings of the 2019 Conference on Em-
pirical Methods in Natural Language Processing and the
9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), pages 2099–2108, Hong
Kong, China, November. Association for Computational
Linguistics.
Zubiaga, A., Liakata, M., Procter, R., Hoi, G. W. S., and
Tolmie, P. (2016). Analysing how people orient to and
spread rumours in social media by looking at conversa-
tional threads. PloS one, 11(3):e0150989.

6157

You might also like