FakeNews - NLP'24
FakeNews - NLP'24
Social Media
Information Disorders
Misinformation
Rumor Disinformation
Malinformation
Fake
News
Propaganda
Information Disorders
● Two key ingredients:
○ Veracity: Falseness or Truthfulness
○ Intent: Malicious (intend to harm) or non-malicious
Falseness Malicious-Intent
Disinformation
Propaganda
Rumor
Misinformation Malinformation
Fake
News
What is Fake News?
During the Jacobite rebellion, some Days before the 1924 UK General Election, Stories proliferated on social media,
individuals disloyal to King George II, the Daily Mail publishes a forged letter, targeting specific candidates or spreading
sought to undermine his rule by printing purportedly from the Soviet government, sensationalist and misleading information to
fake stories that his health was failing. suggesting a Labour victory would hasten influence public opinion, including Pope
the radicalisation of the working classes. Francis had endorsed Donald Trump, false
Labour went on to lose the election. claims about Hillary Clinton's health etc.
1835 1989
● Manual Verification
● Technology-Based Solutions
Manual Fact-Checking Organizations
Challenges in Manual Verification
Fake News
Detection
System
Fake News Detection Datasets and competitions
Corpus Labels #Examples
BuzzfeedNews mostly true, mostly false, mixture of true and false 2,282
Hybrid
Tweet/Article Model Fake or real
User Network
Based Based
Fake News Detection Methods
Hybrid N4
User Network
Based Based N1 N2
Model Fake or real
N3 N5
Fake News Detection Methods
U3
Hybrid N4
User Network
Based Based N1 N2
Model Fake or real
U2
U1
N3 N5
Transparency of Fake News Detection Models
● Initially, we utilized some sophisticated ML/DL models for detecting the fake news. (Often got good
performance scores as well.)
● However, predicting only the Fake or Real label may not be always sufficient.
● In a sense, why should we trust a model prediction, when it’s prone to mispredicts the labels?
● Arguably, even if we get correct predictions, we always look for evidence or justification for
something being real or fake.
Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social-Text Classification. Grover et al., NuerIPS 2022.
Concepts related to Fake News
“State or assert that something is the case, typically without providing evidence
or proof.”
- Oxford Dictionary [33]
Lord, please protect my family & the Philippines from the coronavirus. No No
Highly difficult to
Lack of sufficient differentiate from fake
amount of data news and opinions.
4
Claim Verification: Find evidence for
claim and decides its veracity.
Claim
RT @PirateAtLaw: No no no. Corona beer is the cure
not the disease. Please try this cure. I drink Corona
beer a lot!
https://t.co/fnba2fr2m2
Non-claim
Lord, please protect my family & the Philippines
from the coronavirus.
LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content (EACL'2021)
Linguistic Encapsulation and Semantic Amalgamation
1. Part of Speech (POS) Encoding
● Text sequence: p= p1 p2 p3 p4 p5 p6
w1 w 2 w 3 w 4 … w n
w= w1 w2 w3 w4 w5 w6
● Extract sequence tri-grams:
(p1 p2 p3) (p2 p3 p4) … (pn-1 pn-2 pn)
Dep-tag sequence: d1 d2 d3 d4 … dn
Dep-tag and Parent pos sequence: {(d1,pp 1) ,... (dn,pp n)}
Final Representation
d1 d2 d3 d4 … dn
Lord , please protect my family & the Philippines from the coronavirus . - OOOOOOOOOOOOOOOOO
Heyyaa , it’s true . Wearing mask can prevent corona . [3, 7] OOOOOBIIIIO
If this corona scare doesn’t end soon imma have to intervene . - OOOOOOOOOOOOOO
Texts in the tweet mentioning statistics, Another case for more testing for #coronavirus! Blood tests show 14% of
dates or numbers people are now immune to covid-19 in one town in Germany.
Texts in the tweet that negate a possibly No! #Bleach won’t cure #COVID19. Disinfectants can’t kill the #coronavirus in
false claim. your body. In fact, they will hurt you. If you or someone you know has been
exposed to bleach, call Poison Control for help (1-800-222-1***).
Texts in the tweet containing opinions that @username @username I think it’s a bio weapon made by China so I’m not
have societal implications. surprised it has a lot of carriers.
Evaluation Metrics
● Let's consider the below example,
○ Gold Standard: …. Anthony Edward Stark ….
○ Predicted Answer: …. Edward Stark….
● Evaluation:
○ Free Match: Match the labels (BIO)
■ TP: 2, FP: 0, FN: 1 ⇒ Precision: 1, Recall: 0.66
○ Entity Matching
■ Exact Match (Strict): Match the token completely
● EM = 0
■ Partial Match (Relax): Check overlap with certain threshold.
● Threshold (60%): PM: 1
● Threshold (75%): PM: 0
Humans-in-the-loop Analysis
Error analysis of the outputs. Bold text (green) highlights the correct claim span whereas text in italics (red)
represents the mistakes committed by our model, DABERTa, and vanilla RoBERTa as baseline.
Problem Statement: Claim Verification
False claim
Evidence for false claim from some As the Covid-19 pandemic continues its destructive course, two theories are being
widely aired...The lab is one of 20 such facilities under the Chinese Academy of
trustable source:
Sciences, but is the only one dealing with virology. Fully compliant with ISO
standards, the Wuhan facility interacts regularly with a host of outside experts. Like
other labs, its aim is to protect populations against new viruses..
Justified by the evidence!!
Support
Refute
NEI