0% found this document useful (0 votes)
893 views39 pages

FakeNews - NLP'24

The document discusses the concepts of misinformation, disinformation, and fake news, emphasizing their definitions, historical examples, and the motivations behind their spread. It outlines various methods for detecting fake news, including manual verification and technology-based solutions, while highlighting the challenges faced in the verification process. Additionally, it presents a structured approach to claim detection and verification, utilizing advanced models and datasets to combat misinformation on social media.

Uploaded by

sonu23144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
893 views39 pages

FakeNews - NLP'24

The document discusses the concepts of misinformation, disinformation, and fake news, emphasizing their definitions, historical examples, and the motivations behind their spread. It outlines various methods for detecting fake news, including manual verification and technology-based solutions, while highlighting the challenges faced in the verification process. Additionally, it presents a structured approach to claim detection and verification, utilizing advanced models and datasets to combat misinformation on social media.

Uploaded by

sonu23144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Detecting Fake News and Misinformation in

Social Media
Information Disorders

Misinformation
Rumor Disinformation

Malinformation
Fake
News
Propaganda
Information Disorders
● Two key ingredients:
○ Veracity: Falseness or Truthfulness
○ Intent: Malicious (intend to harm) or non-malicious

Falseness Malicious-Intent

Disinformation
Propaganda
Rumor
Misinformation Malinformation

Fake
News
What is Fake News?

● The term “fake news” refers to disinformation or false information reported as


legitimate news.
● The spread of fake news is often motivated by various factors such as political
agendas, financial gain, or simply to provoke reactions and create confusion.
Fake News in History

King on his deathbed The Zinoviev Letter US presidential election

During the Jacobite rebellion, some Days before the 1924 UK General Election, Stories proliferated on social media,
individuals disloyal to King George II, the Daily Mail publishes a forged letter, targeting specific candidates or spreading
sought to undermine his rule by printing purportedly from the Soviet government, sensationalist and misleading information to
fake stories that his health was failing. suggesting a Labour victory would hasten influence public opinion, including Pope
the radicalisation of the working classes. Francis had endorsed Donald Trump, false
Labour went on to lose the election. claims about Hillary Clinton's health etc.

1835 1989

Mid-1700 1924 2016


The Moon hoax Hillsborough
The New York Sun newspaper runs a series In the aftermath of the disaster at Hillsborough
of articles claiming that life has been football stadium, where 96 supporters died,
discovered on the Moon. They attributed the The Sun newspaper published a series of
discovery to a renowned astronomer, before misleading and fabricated stories, placing
later admitting the story to be a hoax. much of the blame on Liverpool fans.
Fighting Fake News

● Manual Verification
● Technology-Based Solutions
Manual Fact-Checking Organizations
Challenges in Manual Verification

1. High volume of data


2. Relevance to the world
3. Limited manpower
4. Not enough expertise on the topic
5. Language barrier
6. Repetitive claims
so on…
Solution?

Fake News
Detection
System
Fake News Detection Datasets and competitions
Corpus Labels #Examples

BuzzfeedNews mostly true, mostly false, mixture of true and false 2,282

PHEME true or false 330

LIAR pants-fire, false, barely-true, half-true, mostly-true, and true 12,836

FakeNewsNet fake or real 23,921


Fake News Detection Methods

Content Content Based


Based

Hybrid
Tweet/Article Model Fake or real
User Network
Based Based
Fake News Detection Methods

Content Network Based


Based

Hybrid N4

User Network
Based Based N1 N2
Model Fake or real

N3 N5
Fake News Detection Methods

Content User Based


Based

U3

Hybrid N4

User Network
Based Based N1 N2
Model Fake or real

U2

U1
N3 N5
Transparency of Fake News Detection Models

● Initially, we utilized some sophisticated ML/DL models for detecting the fake news. (Often got good
performance scores as well.)
● However, predicting only the Fake or Real label may not be always sufficient.
● In a sense, why should we trust a model prediction, when it’s prone to mispredicts the labels?
● Arguably, even if we get correct predictions, we always look for evidence or justification for
something being real or fake.

● Fake News Detection with Evidence


https://proceedings.neurips.cc/paper_files/paper/2022/file/3d57795f0e263aa69577f1bbceade46b-Paper-Conference.pdf

Evidence-based Fake-news detection.

Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social-Text Classification. Grover et al., NuerIPS 2022.
Concepts related to Fake News

Concept Authenticity Intention

Fake news Factual Mislead

Deceptive news Non-factual Mislead

Misinformation Non-factual Undefined

Claims Commonly factual Not bad

Rumors Undefined Undefined

Clickbait Undefined Mislead


Concepts related to Fake News

Concept Authenticity Intention

Fake news Factual Mislead

Deceptive news Non-factual Mislead

Misinformation Non-factual Undefined

Claims Commonly factual Not bad

Rumors Undefined Undefined

Clickbait Undefined Mislead


Claims on
Social Media
What is a “claim”?

“State or assert that something is the case, typically without providing evidence
or proof.”
- Oxford Dictionary [33]

“A claim is an assertion that deserves our attention”


-Toulmin (2003) [34]

“A claim is a disputed statement that we try to support with reasons.”


-Govier (2013) [35]
Fake news and Claims

● Fake news has a strong relationship with claim


○ A news with claim is a worthy candidate for fake news.
Fake News
Ill-intentioned and always
Claim aimed at causing harm.
May or may not be
ill-intentioned

Text Claim Fake

Alcohol cures corona. Yes Yes

Wearing mask can prevent corona. Yes No

Lord, please protect my family & the Philippines from the coronavirus. No No

If this corona scare doesn’t end soon imma have to intervene No No


Motivation
Why is detecting claims “so” difficult?

Relatively unexplored Absence of


domain unstructured datasets

Highly difficult to
Lack of sufficient differentiate from fake
amount of data news and opinions.
4
Claim Verification: Find evidence for
claim and decides its veracity.

Claim Check-worthiness: If the claim is 3


worthy enough to get verified?
Claim Span Identification: What are
2 the claims in the post? [Sundriyal et al.,
EMNLP-2023]
Claim Detection: If a post has a claim or
not. [Gupta et al., EACL-2021, Sundriyal et 1

al., CIKM - 2021]


Problem Statement: Claim Detection

Claim
RT @PirateAtLaw: No no no. Corona beer is the cure
not the disease. Please try this cure. I drink Corona
beer a lot!
https://t.co/fnba2fr2m2

Non-claim
Lord, please protect my family & the Philippines
from the coronavirus.

LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content (EACL'2021)
Linguistic Encapsulation and Semantic Amalgamation
1. Part of Speech (POS) Encoding

● Text sequence: p= p1 p2 p3 p4 p5 p6
w1 w 2 w 3 w 4 … w n

● Text to POS sequence:


p1 p 2 p 3 p 4 … p n

w= w1 w2 w3 w4 w5 w6
● Extract sequence tri-grams:
(p1 p2 p3) (p2 p3 p4) … (pn-1 pn-2 pn)

● Tri-gram as an individual token:


pi pi+1 pi+2 = ti
2. Dependency (DEP) Tree Encoding

Dep-tag sequence: d1 d2 d3 d4 … dn
Dep-tag and Parent pos sequence: {(d1,pp 1) ,... (dn,pp n)}

Final Representation

d1 d2 d3 d4 … dn

pp1 pp2 pp3 pp4 … ppn

d = incoming dependency edge of child token


pp = index of associated parent token
Evaluation

What is the claim?

Patanjali did not


Cannot cure
claim?
COVID-19 in 7 days?
Problem Statement: Claim Span Identification

RT @PirateAtLaw: No no no. Corona beer is the cure


not the disease. Please try this cure. I drink Corona
beer a lot!
https://t.co/fnba2fr2m2

“ Corona beer is the cure not the disease. ”

Empowering the Fact-checkers! Automatic Identification of Claim Spans on Twitter (EMNLP’2022)


BIO encoding for Span

Tweet Span BIO Encoding

Alcohol cures corona . [0,2] BIIO

Lord , please protect my family & the Philippines from the coronavirus . - OOOOOOOOOOOOOOOOO

Heyyaa , it’s true . Wearing mask can prevent corona . [3, 7] OOOOOBIIIIO

If this corona scare doesn’t end soon imma have to intervene . - OOOOOOOOOOOOOO

● Other Encoding schemes


○ IO (I-inside and O-outside)
○ BIOE (B-beginning, I-inside, E-end, and O-outside)
○ BIESO (B-beginning, I-inside, E-end, S-single word entity and O-outside)
Description Aware RoBERTa (DABERTa)
Claim Descriptions
● Coming from annotation guidelines (manually created document by the
domain expert).
○ Definitions of what can be claim and what can’t be claim.

Claim Description Example

Texts in the tweet mentioning statistics, Another case for more testing for #coronavirus! Blood tests show 14% of
dates or numbers people are now immune to covid-19 in one town in Germany.

Texts in the tweet that negate a possibly No! #Bleach won’t cure #COVID19. Disinfectants can’t kill the #coronavirus in
false claim. your body. In fact, they will hurt you. If you or someone you know has been
exposed to bleach, call Poison Control for help (1-800-222-1***).

Texts in the tweet containing opinions that @username @username I think it’s a bio weapon made by China so I’m not
have societal implications. surprised it has a lot of carriers.
Evaluation Metrics
● Let's consider the below example,
○ Gold Standard: …. Anthony Edward Stark ….
○ Predicted Answer: …. Edward Stark….

● Evaluation:
○ Free Match: Match the labels (BIO)
■ TP: 2, FP: 0, FN: 1 ⇒ Precision: 1, Recall: 0.66
○ Entity Matching
■ Exact Match (Strict): Match the token completely
● EM = 0
■ Partial Match (Relax): Check overlap with certain threshold.
● Threshold (60%): PM: 1
● Threshold (75%): PM: 0
Humans-in-the-loop Analysis

Error analysis of the outputs. Bold text (green) highlights the correct claim span whereas text in italics (red)
represents the mistakes committed by our model, DABERTa, and vanilla RoBERTa as baseline.
Problem Statement: Claim Verification

False claim

But why? Can you justify?

Evidence for false claim from some As the Covid-19 pandemic continues its destructive course, two theories are being
widely aired...The lab is one of 20 such facilities under the Chinese Academy of
trustable source:
Sciences, but is the only one dealing with virology. Fully compliant with ISO
standards, the Wuhan facility interacts regularly with a host of outside experts. Like
other labs, its aim is to protect populations against new viruses..
Justified by the evidence!!

Support

Refute

NEI

Wikipedia/Internet Relevant Evidence Veracity


Documents

Could not find an evidence; hence,


Not-Enough-Information (NEI)
Document Retrieval and Claim Verification to Mitigate COVID-19
Misinformation

Document retrieval module: Uses one of


the given datasets to retrieve top-k
relevant documents for the corresponding
input claim

Veracity prediction module: Seeks to


establish the retrieved documents’
credibility against the input claim.
DTCA: Decision Tree-based Co-Attention Networks for Explainable Claim
Verification (Wu et. al. 2020)

● Decision Tree-based Evidence model


(DTE) to select comments with high
credibility as evidence in a transparent and
interpretable way.

● Co-attention Self-attention networks


(CaSa) to make the selected evidence
interact with claims.
CheckThat! Lab
Shared Task

You might also like