Bad Actor, Good Advisor - Exploring The Role of Large Language Models in Fake News Detection
Bad Actor, Good Advisor - Exploring The Role of Large Language Models in Fake News Detection
Abstract [Label: FAKE] Detailed photos of Xiang Liu's tendon surgery exposed. Stop
complaints and please show sympathy and blessings!
Detecting fake news requires both a delicate sense of di-
(a) [News] Large
verse clues and a profound understanding of the real-world The answer is real.
background, which remains challenging for detectors based + [Prompting] Language Model
on small language models (SLMs) due to their knowledge
and capability limitations. Recent advances in large language (b) [News] - Commonsense: Real surgery
+ [Perspective- Large generally won’t be exposed…
models (LLMs) have shown remarkable performance in var- - Textual Description: The
ious tasks, but whether and how LLMs could help with fake specific Language Model language is emotional and
Prompting] tries to attract audience…
news detection remains underexplored. In this paper, we in-
vestigate the potential of LLMs in fake news detection. First,
we conduct an empirical study and find that a sophisticated Small
LLM such as GPT 3.5 could generally expose fake news and [News] Prediction: FAKE
Language Model
provide desirable multi-perspective rationales but still under-
performs the basic SLM, fine-tuned BERT. Our subsequent Figure 1: Illustration of the role of large language models
analysis attributes such a gap to the LLM’s inability to select
and integrate rationales properly to conclude. Based on these (LLMs) in fake news detection. In this case, (a) the LLM
findings, we propose that current LLMs may not substitute fails to output correct judgment of news veracity but (b)
fine-tuned SLMs in fake news detection but can be a good helps the small language model (SLM) judge correctly by
advisor for SLMs by providing multi-perspective instructive providing informative rationales.
rationales. To instantiate this proposal, we design an adaptive
rationale guidance network for fake news detection (ARG), the news-faking process: Fake news creators might manip-
in which SLMs selectively acquire insights on news analysis ulate any part of the news, using diverse writing strategies
from the LLMs’ rationales. We further derive a rationale-free and being driven by inscrutable underlying aims. Therefore,
version of ARG by distillation, namely ARG-D, which ser- to maintain both effectiveness and universality for fake news
vices cost-sensitive scenarios without querying LLMs. Ex- detection, an ideal method is required to have: 1) a delicate
periments on two real-world datasets demonstrate that ARG
and ARG-D outperform three types of baseline methods, in-
sense of diverse clues (e.g., style, facts, commonsense); and
cluding SLM-based, LLM-based, and combinations of small 2) a profound understanding of the real-world background.
and large language models. Recent methods (Zhang et al. 2021; Kaliyar, Goswami,
and Narang 2021; Mosallanezhad et al. 2022; Hu et al.
2023) generally exploit pre-trained small language models
Introduction (SLMs)1 like BERT (Devlin et al. 2019) and RoBERTa (Liu
et al. 2019) to understand news content and provide fun-
The wide and fast spread of fake news online has posed real- damental representation, plus optional social contexts (Shu
world threats in critical domains like politics (Fisher, Cox, et al. 2019; Cui et al. 2022), knowledge bases (Popat et al.
and Hermann 2016), economy (CHEQ 2019), and public 2018; Hu et al. 2022b), or news environment (Sheng et al.
health (Naeem and Bhatti 2020). Among the countermea- 2022) as supplements. SLMs do bring improvements, but
sures to combat this issue, automatic fake news detection, their knowledge and capability limitations also compromise
which aims at distinguishing inaccurate and intentionally further enhancement of fake news detectors. For example,
misleading news items from others automatically, has been a BERT was pre-trained on text corpus like Wikipedia (De-
promising solution in practice (Shu et al. 2017; Roth 2022). vlin et al. 2019) and thus struggled to handle news items
Though much progress has been made (Hu et al. 2022a), that require knowledge not included (Sheng et al. 2021).
understanding and characterizing fake news is still challeng-
1
ing for current models. This is caused by the complexity of The academia lacks a consensus regarding the size boundary
between small and large language models at present, but it is widely
Copyright © 2024, Association for the Advancement of Artificial accepted that BERT (Devlin et al. 2019) and GPT-3 family (Brown
Intelligence (www.aaai.org). All rights reserved. et al. 2020) are respectively small and large ones (Zhao et al. 2023).
22105
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
22106
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
22107
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Model Usage Chinese English collaboration via predicting the LLM’s judgment through
Zero-Shot CoT 0.677 0.666
the rationale, enriching news-rationale feature interaction,
GPT-3.5-turbo from Perspective TD 0.667 0.611 and evaluating rationale usefulness (Figure 3(b)). The inter-
from Perspective CS 0.678 0.698 active features are finally aggregated with the news feature
x for the final judgment of x being fake or not (Figure 3(c)).
BERT Fine-tuning 0.753 0.765 ARG-D is derived from the ARG via distillation for scenar-
Ensemble
Majority Voting 0.735 0.724 ios where the LLM is unavailable (Figure 3(d)).
Oracle Voting 0.908 0.878
Representation
Table 4: Performance of the LLM using zero-shot CoT with We employ two BERT models separately as the news and
perspective specified and other compared models. TD: Tex- rationale encoder to obtain semantic representations. For the
tual description; CS: Commonsense. given news item x and two corresponding rationales rt and
rc , the representations are X, Rt , and Rc , respectively.
We further investigate the LLM’s performance when
asked to perform analysis from a specific perspective on the News-Rationale Collaboration
full testing set (i.e., 100% coverage).4 From the first group The step of news-rationale collaboration aims at providing
in Table 4, we see that the LLM’s judgment with single- a rich interaction between news and rationales and learning
perspective analysis elicited is still promising. Compared to adaptively select useful rationales as references, which
with the comprehensive zero-shot CoT setting, the single- is at the core of our design. To achieve such an aim, ARG
perspective-based LLM performs comparatively on the Chi- includes three modules, as detailed and exemplified using
nese dataset and is better on the English dataset (for the com- the textual description rationale branch below:
monsense perspective case). The results showcase that the
internal mechanism of the LLM to integrate the rationales News-Rationale Interaction To enable comprehensive
from diverse perspectives is ineffective for fake news detec- information exchange between news and rationales, we
tion, limiting the full use of rationales. In this case, com- introduce a news-rationale interactor with a dual cross-
bining the small and large LMs to complement each other attention mechanism to encourage feature interactions. The
is a promising solution: The former could benefit from the cross-attention can be described as:
analytical capability of the latter, while the latter could be
√
CA(Q, K, V) = softmax Q′ · K′ / d V′ , (1)
enhanced by task-specific knowledge from the former.
To exhibit the advantages of this solution, we apply major-
where Q′ = WQ Q, K′ = WK K, and V′ = WV V. d is
ity voting and oracle voting (assuming the most ideal situa-
the dimensionality. Given representations of the news X and
tion where we trust the correctly judged model for each sam-
the rationale Rt , the process is:
ple, if any) among the two single-perspective-based LLMs
and the BERT. Results show that we are likely to gain a per- ft→x = AvgPool (CA(Rt , X, X)) , (2)
formance better than any LLM-/SLM-only methods men-
tioned before if we could adaptively combine their advan- fx→t = AvgPool (CA(X, Rt , Rt )) , (3)
tages, i.e., the flexible task-specific learning of the SLM and where AvgPool(·) is the average pooling over the token
the informative rationale generated by the LLM. That is, representations outputted by cross-attention to obtain one-
the LLM could be possibly a good advisor for the SLM vector text representation f .
by providing rationales, ultimately improving the perfor-
mance of fake news detection. LLM Judgement Prediction Understanding the judg-
ment hinted by the given rationale is a prerequisite for fully
ARG: Adaptive Rationale Guidance Network exploiting the information behind the rationale. To this end,
we construct the LLM judgment prediction task, whose re-
for Fake News Detection quirement is to predict the LLM judgment of the news verac-
Based on the above findings and discussion, we propose the ity according to the given rationale. We expect this to deepen
adaptive rationale guidance (ARG) network for fake news the understanding of the rationale texts. For the textual de-
detection. Figure 3 overviews the ARG and its rationale-free scription rationale branch, we feed its representation Rt into
version ARG-D, for cost-sensitive scenarios. The objective the LLM judgment predictor, which is parametrized using a
of ARG is to empower small fake news detectors with the multi-layer perception (MLP)5 :
ability to adaptively select useful rationales as references for
m̂t = sigmoid(MLP(Rt )), Lpt = CE(m̂t , mt ), (4)
final judgments. Given a news item x and its correspond-
ing LLM-generated rationales rt (textual description) and rc where mt and m̂t are respectively the LLM’s claimed judg-
(commonsense), the ARG encodes the inputs using the SLM ment and its prediction. The loss Lpt is a cross-entropy loss
at first (Figure 3(a)). Subsequently, it builds news-rationale CE(ŷ, y) = −y log ŷ−(1−y) log(1− ŷ). The case is similar
for commonsense rationale Rc .
4
We exclude the factuality to avoid the impacts of hallucina-
5
tion. The eliciting sentence is “Let’s think from the perspective of For brevity, we omit the subscripts of all independently
[textual description/commonsense].” parametrized MLPs.
22108
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Encoder
Feature
News
X Attention x Aggregator
Content
News Item
News-Rationale
Interactor ft→x ’
ft→x fcls
wt Rationale Usefulness
Rt fx→t Evaluator Let
Textual LLM Judgment Classifier
Description Lpt
Rationale
Predictor
Encoder
Rationales Rationale
LLM
News-Rationale
fc→x ’
fc→x
Lce
Interactor
Rationales wc Rationale Usefulness Vector/
Rc fx→c Evaluator Lec Matrix
LLM Judgment Module
Commonsense
Rationale Predictor Lpc
↑ ARG Network Loss
↓ ARG-D Network initialized from (d) Distillation for Rationale-Free Model initialized from
module in (a) module in (c)
News Rationale-Aware d
News Item Encoder X Feature Simulator Attention fcls Classifier Lce
distill from fcls Lkd
Figure 3: Overall architecture of our proposed adaptive rationale guidance (ARG) network and its rationale-free version ARG-
D. In the ARG, the news item and LLM rationales are (a) respectively encoded into X and R∗ (∗ ∈ {t, c}). Then the small and
large LMs collaborate with each other via news-rationale feature interaction, LLM judgment prediction, and rationale usefulness
′
evaluation. The obtained interactive features f∗→x (∗ ∈ {t, c}). These features are finally aggregated with attentively pooled
news feature x for the final judgment. In the ARG-D, the news encoder and the attention module are preserved and the output
of the rationale-aware feature simulator is supervised by the aggregated feature fcls for knowledge distillation.
Rationale Usefulness Evaluation The usefulness of ra- {0, 1}, we aggregate these vectors with different weights:
tionales from different perspectives varies across different ′
fcls = wxcls · x + wtcls · ft→x ′
+ wccls · fc→x , (7)
news items and improper integration may lead to perfor- cls cls cls
mance degradation. To enable the model to adaptively se- where wx , wt and wc are learnable parameters ranging
lect appropriate rationale, we devise a rationale usefulness from 0 to 1. fcls is the fusion vector, which is then fed into
evaluation process, in which we assess the contributions of the MLP classifier for final prediction of news veracity:
different rationales and adjust their weights for subsequent Lce = CE(MLP(fcls ), y). (8)
veracity prediction. The process comprises two phases, i.e., The total loss function is the weighted sum of the loss terms
evaluation and reweighting. For evaluation, we input the mentioned above:
news-aware rationale vector fx→t into the rationale useful-
ness evaluator (parameterized by an MLP) to predict its use- L = Lce + β1 (Let + Lec ) + β2 (Lpt + Lpc ), (9)
fulness ut . Following the assumption that rationales leading where β1 and β2 are hyperparameters.
to correct judgments are more useful, we use the judgment
correctness as the rationale usefulness labels. Distillation for Rationale-Free Model
The ARG requires sending requests to the LLM for every
ût = sigmoid(MLP(fx→t )), Let = CE(ût , ut ). (5) prediction, which might not be affordable for cost-sensitive
scenarios. Therefore, we attempt to build a rationale-free
In the reweighting phase, we input vector fx→t into an MLP
model, namely ARG-D, based on the trained ARG model via
to obtain a weight number wt , which is then used to reweight
knowledge distillation (Hinton, Vinyals, and Dean 2015).
the rationale-aware news vector ft→x . The procedure is as
The basic idea is simulated and internalized the knowledge
follows:
from rationales into a parametric module. As shown in Fig-
ft→x ′ = wt · ft→x . (6) ure 3(d), we initialize the news encoder and classifier with
We also use attentive pooling to transform the representation the corresponding modules in the ARG and train a rationale-
matrix X into a vector x. aware feature simulator (implemented with a multi-head
transformer block) and an attention module to internalize
Prediction knowledge. Besides the cross-entropy loss Lce , we let the
d
Based on the outputs from the last step, we now aggregate feature fcls to imitate fcls in the ARG, using the mean
′
news vector x and rationale-aware news vector ft→x ′
, fc→x squared estimation loss:
d
for the final judgment. For news item x with label y ∈ Lkd = MSE(fcls , fcls ). (10)
22109
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Chinese English
Model
macF1 Acc. F1real F1fake macF1 Acc. F1real F1fake
G1: LLM-Only GPT-3.5-turbo 0.725 0.734 0.774 0.676 0.702 0.813 0.884 0.519
Baseline 0.753 0.754 0.769 0.737 0.765 0.862 0.916 0.615
EANNT 0.754 0.756 0.773 0.736 0.763 0.864 0.918 0.608
G2: SLM-Only
Publisher-Emo 0.761 0.763 0.784 0.738 0.766 0.868 0.920 0.611
ENDEF 0.765 0.766 0.779 0.751 0.768 0.865 0.918 0.618
Baseline + Rationale 0.767 0.769 0.787 0.748 0.777 0.870 0.921 0.633
SuperICL 0.757 0.759 0.779 0.734 0.736 0.864 0.920 0.551
ARG 0.784 0.786 0.804 0.764 0.790 0.878 0.926 0.653
(Relative Impr. over Baseline) (+4.2%) (+4.3%) (+4.6%) (+3.8%) (+3.2%) (+1.8%) (+1.1%) (+6.3%)
G3: LLM+SLM
w/o LLM Judgment Predictor 0.773 0.774 0.789 0.756 0.786 0.880 0.928 0.645
w/o Rationale Usefulness Evaluator 0.781 0.783 0.801 0.761 0.782 0.873 0.923 0.641
w/o Predictor & Evaluator 0.769 0.770 0.782 0.756 0.780 0.874 0.923 0.637
ARG-D 0.771 0.772 0.785 0.756 0.778 0.870 0.921 0.634
(Relative Impr. over Baseline) (+2.4%) (+2.3%) (+2.1%) (+2.6%) (+1.6%) (+0.9%) (+0.6%) (+3.2%)
Table 5: Performance of the ARG and its variants and the LLM-only, SLM-only, LLM+SLM methods. The best two results in
macro F1 and accuracy are respectively bolded and underlined. For GPT-3.5-turbo, the best results in Table 2 are reported.
22110
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
79.6% 77.9% quire extra assistance from knowledge bases (Popat et al.
15.5% 15.8% 2018) and news environments (Sheng et al. 2022). Both two
LLM ARG groups of methods obtain textual representation from pre-
✓ ✓
20.9% 16.8% ✗ ✓ trained models like BERT as a convention but rarely con-
TD CS sider its potential for fake news detection. We conducted an
✓ ✓
✗ ✓ exploration in this paper by combining large and small LMs
20.4% 22.1% ✓ ✗ and obtained good improvement only using textual content.
43.3% 45.3%
LLMs for Natural Language Understanding LLMs,
(a) right(ARG) – right(Baseline) (b) right(ARG-D) – right(Baseline)
though mostly generative models, also have powerful nat-
ural language understanding (NLU) capabilities, especially
in the few-shot in-context learning scenarios (Brown et al.
Figure 4: Statistics of additional correctly judged samples of 2020). Recent works in this line focus on benchmarking the
(a) ARG and (b) ARG-D over the BERT baseline. right(·) latest LLM in NLU. Results show that LLMs may not have
denotes samples correctly judged by the method (·). TD/CS: comprehensive superiority compared with a well-trained
Textual description/commonsense perspective. small model in some types of NLU tasks (Zhong et al. 2023).
Our results provide empirical findings in fake news detection
P (0.23, 0.784)
with only textual content as the input.
22111
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Acknowledgements Hu, L.; Wei, S.; Zhao, Z.; and Wu, B. 2022a. Deep learning
The authors would like to thank the anonymous review- for fake news detection: A comprehensive survey. AI Open,
ers for their insightful comments. This work is supported 3: 133–155.
by the National Natural Science Foundation of China Hu, X.; Guo, Z.; Wu, G.; Liu, A.; Wen, L.; and Yu, P. 2022b.
(62203425), the Zhejiang Provincial Key Research and De- CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-
velopment Program of China (2021C01164), the Project Checking. In Proceedings of the 2022 Conference of the
of Chinese Academy of Sciences (E141020), the Post- North American Chapter of the Association for Computa-
doctoral Fellowship Program of CPSF (GZC20232738) tional Linguistics: Human Language Technologies, 3362–
(GZC20232738) and the CIPSC-SMP-Zhipu.AI Large 3376. ACL.
Model Cross-Disciplinary Fund. The corresponding author Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.;
is Qiang Sheng. Bang, Y. J.; Madotto, A.; and Fung, P. 2023. Survey of Hal-
lucination in Natural Language Generation. ACM Comput-
References ing Surveys, 55: 1–38.
Anthropic. 2023. Model Card and Evaluations for Claude Kaliyar, R. K.; Goswami, A.; and Narang, P. 2021. Fake-
Models. https://www-files.anthropic.com/production/ BERT: Fake News Detection in Social Media with a BERT-
images/Model-Card-Claude-2.pdf. Accessed: 2023-08-13. based Deep Learning Approach. Multimedia tools and ap-
Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; plications, 80(8): 11765–11788.
Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, Kojima, T.; Gu, S. S.; Reid, M.; Matsuo, Y.; and Iwasawa,
A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, Y. 2022. Large Language Models are Zero-Shot Reason-
T.; Child, R.; Ramesh, A.; Ziegler, D. M.; Wu, J.; Winter, ers. In Advances in Neural Information Processing Systems,
C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; volume 35, 22199–22213. Curran Associates, Inc.
Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; and Neubig,
A.; Sutskever, I.; and Amodei, D. 2020. Language Models G. 2023a. Pre-train, prompt, and predict: A systematic sur-
Are Few-Shot Learners. In Advances in Neural Information vey of prompting methods in natural language processing.
Processing Systems, 1877–1901. Curran Associates Inc. ACM Computing Surveys, 55(9): 1–35.
Caramancion, K. M. 2023. News Verifiers Showdown: Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao,
A Comparative Performance Evaluation of ChatGPT 3.5, L.; Zhang, T.; and Liu, Y. 2023b. Jailbreaking ChatGPT via
ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking. Prompt Engineering: An Empirical Study. arXiv preprint
arXiv preprint arXiv:2306.17176. arXiv:2305.13860.
CHEQ. 2019. The Economic Cost of Bad Actors Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.;
on the Internet. https://info.cheq.ai/hubfs/Research/THE Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V.
ECONOMIC COST Fake News final.pdf. Accessed: 2023- 2019. RoBERTa: A Robustly Optimized BERT Pretraining
08-13. Approach. arXiv preprint arXiv:1907.11692.
Cui, J.; Kim, K.; Na, S. H.; and Shin, S. 2022. Meta-Path- Ma, Y.; Cao, Y.; Hong, Y.; and Sun, A. 2023. Large Lan-
based Fake News Detection Leveraging Multi-level Social guage Model Is Not a Good Few-shot Information Extrac-
Context Information. In Proceedings of the 31st ACM Inter- tor, but a Good Reranker for Hard Samples! arXiv preprint
national Conference on Information & Knowledge Manage- arXiv:2303.08559.
ment, 325–334. ACM. Min, E.; Rong, Y.; Bian, Y.; Xu, T.; Zhao, P.; Huang, J.; and
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. Ananiadou, S. 2022. Divide-and-Conquer: Post-User Inter-
BERT: Pre-training of Deep Bidirectional Transformers for action Network for Fake News Detection on Social Media.
Language Understanding. In Proceedings of the 2019 Con- In Proceedings of the ACM Web Conference 2022, 1148–
ference of the North American Chapter of the Association 1158. ACM.
for Computational Linguistics: Human Language Technolo- Mosallanezhad, A.; Karami, M.; Shu, K.; Mancenido, M. V.;
gies, Volume 1 (Long and Short Papers), 4171–4186. ACL. and Liu, H. 2022. Domain Adaptive Fake News Detection
Fisher, M.; Cox, J. W.; and Hermann, P. 2016. Pizzagate: via Reinforcement Learning. In Proceedings of the ACM
From rumor, to hashtag, to gunfire in DC. The Washington Web Conference 2022, 3632–3640. ACM.
Post. Mu, Y.; Bontcheva, K.; and Aletras, N. 2023. It’s about
Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distill- Time: Rethinking Evaluation on Rumor Detection Bench-
ing the Knowledge in a Neural Network. arXiv preprint marks using Chronological Splits. In Findings of the Associ-
arXiv:1503.02531. ation for Computational Linguistics: EACL 2023, 736–743.
ACL.
Hu, B.; Sheng, Q.; Cao, J.; Zhu, Y.; Wang, D.; Wang, Z.;
and Jin, Z. 2023. Learn over Past, Evolve for Future: Fore- Naeem, S. B.; and Bhatti, R. 2020. The COVID-19 ‘info-
casting Temporal Trends for Fake News Detection. In Pro- demic’: a new front for information professionals. Health
ceedings of the 61st Annual Meeting of the Association for Information & Libraries Journal, 37(3): 233–239.
Computational Linguistics (Volume 5: Industry Track), 116– Nan, Q.; Cao, J.; Zhu, Y.; Wang, Y.; and Li, J. 2021. MD-
125. ACL. FEND: Multi-domain Fake News Detection. In Proceedings
22112
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
of the 30th ACM International Conference on Information Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux,
and Knowledge Management. ACM. M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.;
Nguyen, V.-H.; Sugiyama, K.; Nakov, P.; and Kan, M.-Y. Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; and Lample,
2020. FANG: Leveraging Social Context for Fake News G. 2023. LLaMA: Open and Efficient Foundation Language
Detection Using Graph Representation. In Proceedings of Models. arXiv preprint arXiv:2302.13971.
the 29th ACM International Conference on Information and Wang, Y.; Ma, F.; Jin, Z.; Yuan, Y.; Xun, G.; Jha, K.; Su, L.;
Knowledge Management, 1165–1174. ACM. and Gao, J. 2018. EANN: Event Adversarial Neural Net-
OpenAI. 2022. ChatGPT: Optimizing Language Models works for Multi-Modal Fake News Detection. In Proceed-
for Dialogue. https://openai.com/blog/chatgpt/. Accessed: ings of the 24th ACM SIGKDD International Conference on
2023-08-13. Knowledge Discovery & Data Mining, 849–857. ACM.
Pelrine, K.; Reksoprodjo, M.; Gupta, C.; Christoph, J.; and Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.;
Rabbany, R. 2023. Towards Reliable Misinformation Mit- Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Met-
igation: Generalization, Uncertainty, and GPT-4. arXiv zler, D.; Chi, E. H.; Hashimoto, T.; Vinyals, O.; Liang, P.;
preprint arXiv:2305.14928v1. Dean, J.; and Fedus, W. 2022a. Emergent Abilities of Large
Language Models. Transactions on Machine Learning Re-
Popat, K.; Mukherjee, S.; Yates, A.; and Weikum, G. 2018. search.
DeClarE: Debunking Fake News and False Claims using
Evidence-Aware Deep Learning. In Proceedings of the Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.;
2018 Conference on Empirical Methods in Natural Lan- Xia, F.; Chi, E.; Le, Q. V.; and Zhou, D. 2022b. Chain-
guage Processing, 22–32. ACL. of-Thought Prompting Elicits Reasoning in Large Language
Models. In Advances in Neural Information Processing Sys-
Przybyla, P. 2020. Capturing the Style of Fake News. In Pro- tems, volume 35, 24824–24837. Curran Associates, Inc.
ceedings of the AAAI Conference on Artificial Intelligence,
volume 34, 490–497. AAAI Press. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.;
Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davi-
Qi, P.; Cao, J.; Li, X.; Liu, H.; Sheng, Q.; Mi, X.; He, Q.; son, J.; Shleifer, S.; von Platen, P.; Ma, C.; Jernite, Y.; Plu,
Lv, Y.; Guo, C.; and Yu, Y. 2021. Improving Fake News J.; Xu, C.; Le Scao, T.; Gugger, S.; Drame, M.; Lhoest, Q.;
Detection by Using an Entity-enhanced Framework to Fuse and Rush, A. 2020. Transformers: State-of-the-Art Natural
Diverse Multimodal Clues. In Proceedings of the 29th ACM Language Processing. In Proceedings of the 2020 Confer-
International Conference on Multimedia, 1212–1220. ACM. ence on Empirical Methods in Natural Language Process-
Ramlochan, S. 2023. Role-Playing in Large Language Mod- ing: System Demonstrations, 38–45. Online: ACL.
els like ChatGPT. https://www.promptengineering.org/role- Xu, C.; Xu, Y.; Wang, S.; Liu, Y.; Zhu, C.; and McAuley, J.
playing-in-large-language-models-like-chatgpt/. Accessed: 2023. Small Models are Valuable Plug-ins for Large Lan-
2023-08-13. guage Models. arXiv preprint arXiv:2305.08848.
Roth, Y. 2022. The vast majority of content we take ac- Zhang, X.; Cao, J.; Li, X.; Sheng, Q.; Zhong, L.; and Shu,
tion on for misinformation is identified proactively. https: K. 2021. Mining Dual Emotion for Fake News Detection. In
//twitter.com/yoyoel/status/1483094057471524867. Ac- Proceedings of the web conference 2021, 3465–3476. ACM.
cessed: 2023-08-13. Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; Fu, T.; Huang, X.;
Sheng, Q.; Cao, J.; Zhang, X.; Li, R.; Wang, D.; and Zhu, Y. Zhao, E.; Zhang, Y.; Chen, Y.; Wang, L.; Luu, A. T.; Bi, W.;
2022. Zoom Out and Observe: News Environment Percep- Shi, F.; and Shi, S. 2023. Siren’s Song in the AI Ocean: A
tion for Fake News Detection. In Proceedings of the 60th Survey on Hallucination in Large Language Models. arXiv
Annual Meeting of the Association for Computational Lin- preprint arXiv:2309.01219.
guistics (Volume 1: Long Papers), 4543–4556. ACL. Zhao, W. X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.;
Sheng, Q.; Zhang, X.; Cao, J.; and Zhong, L. 2021. Inte- Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; Du, Y.; Yang, C.;
grating pattern-and fact-based fake news detection via model Chen, Y.; Chen, Z.; Jiang, J.; Ren, R.; Li, Y.; Tang, X.; Liu,
preference learning. In Proceedings of the 30th ACM inter- Z.; Liu, P.; Nie, J.-Y.; and Wen, J.-R. 2023. A Survey of
national conference on information & knowledge manage- Large Language Models. arXiv preprint arXiv:2303.18223.
ment, 1640–1650. ACM. Zhong, Q.; Ding, L.; Liu, J.; Du, B.; and Tao, D.
Shu, K.; Cui, L.; Wang, S.; Lee, D.; and Liu, H. 2019. dE- 2023. Can ChatGPT Understand Too? A Comparative
FEND: Explainable Fake News Detection. In Proceedings of Study on ChatGPT and Fine-tuned BERT. arXiv preprint
the 25th ACM SIGKDD International Conference on Knowl- arXiv:2302.10198.
edge Discovery & Data Mining, 395–405. ACM. Zhou, X.; and Zafarani, R. 2019. Network-Based Fake News
Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; and Liu, Detection: A Pattern-Driven Approach. ACM SIGKDD Ex-
H. 2020. FakeNewsNet: A Data Repository with News plorations Newsletter, 21(2): 48–60.
Content, Social Context and Spatiotemporal Information for Zhu, Y.; Sheng, Q.; Cao, J.; Li, S.; Wang, D.; and Zhuang,
Studying Fake News on Social Media. Big data, 8: 171–188. F. 2022. Generalizing to the Future: Mitigating Entity Bias
Shu, K.; Sliva, A.; Wang, S.; Tang, J.; and Liu, H. 2017. Fake in Fake News Detection. In Proceedings of the 45th Inter-
news detection on social media: A data mining perspective. national ACM SIGIR Conference on Research and Develop-
ACM SIGKDD Explorations Newsletter, 19: 22–36. ment in Information Retrieval, 2120–2125. ACM.
22113