Graph Neural Networks
Graph Neural Networks
The emergence of various social networks has generated vast volumes of data. Efficient
methods for capturing, distinguishing, and filtering real and fake news are becoming
increasingly important, especially after the outbreak of the COVID-19 pandemic. This study
conducts a multi-aspect and systematic review of the current state and challenges of graph
neural networks (GNNs) for fake news detection systems and outlines a comprehensive
approach to implementing fake news detection systems using GNNs. Furthermore, advanced
GNN-based techniques for implementing pragmatic fake news detection systems are discussed
from multiple perspectives. First, we introduce the background and overview related to fake
news, fake news detection, and GNNs. Second, we provide a GNN taxonomy-based fake news
detection taxonomy and review and highlight models in categories. Subsequently, we compare
critical ideas, advantages, and disadvantages of the methods in categories. Next, we discuss the
possible challenges of fake news detection and GNNs. Finally, we present several open issues in
this area and discuss potential directions for future research. We believe that this review can be
utilized by systems practitioners and newcomers in surmounting current impediments and
navigating future situations by deploying a fake news detection system using GNNs.
Keywords: Fake news, Fake news characteristics, Fake news features, Fake news detection,
Graph neural network
1. Introduction
A graph neural network is a novel technique that focuses on using deep learning algorithms
over graph structures [6]. Before their application in fake news detection systems, GNNs had
been successfully applied in many machine learning and natural language processing-related
tasks, such as object detection [7], [8], sentiment analysis [9], [10], and machine
translation [11], [12]. The rapid development of numerous GNNs has been achieved by
improving convolutional neural networks, recurrent neural networks, and autoencoders
through deep learning [13]. The rapid development of GNN-based methods for fake news
detection systems on social networks can be attributed to the rapid growth of social networks
in terms of the number of users, the amount of news posted, and user interactions.
Consequently, social networks naturally become complex graph structures if they are applied
independently, which is problematic for previous machine learning-based and deep learning-
based fake news detection algorithms. The main reasons for this phenomenon are the
dependence of the graph size on the number of nodes and the different numbers of node
neighbors. Therefore, some important operations (convolutions) are difficult to calculate in the
graph domain. Additionally, the primary assumption of previous machine learning and deep
learning-based fake news detection algorithms is that news items are independent. This
assumption cannot apply to graph data because nodes can connect to other nodes through
various types of relationships, such as citations, interactions, and friendships. GNN-based fake
news detection methods have been developed. Although some state-of-the-art results have
been achieved (see Table 1), no complete GNN-based fake news detection and prevention
system existed when we conducted this study. Faking news on social networks is still a major
challenge that needs to be solved (the first justification).
Table 1
A description of the improved performance of the traditional methods compared using GNN-
based methods.
Least improved
Method Ref Improved methods Dataset
performance
DTC, SVM-TS, mGRU, RFC, Twitter15, Accuracy: 18.7%,
GCAN [14]
tCNN, CRNN, CSI, dEFEND Twitter16 Accuracy: 19.9%
FANG [5] Feature SVM, CSI Twitter AUC: 6.07%
HAN, dEFEND, SAFE, CNN, FakeNewsNet, F1: 5.19%,
SAFER [15]
RoBERTa, Maj sharing baseline FakeHealth F1: 5.00%
Weibo, Accuracy: 4.5%,
DTC, SVM-RBF, SVM-TS, RvNN
Bi-GCN [16] Twitter15, Accuracy: 13.6%,
PPC_RNN+CNN
Twitter16 Accuracy: 14.3%
SVM, LIWC, text-CNN, Label
AA- PolitiFact Accuracy: 2.82%
[17] propagation,
HGNN BuzzFeed Accuracy:9.34%
DeepWalk, LINE, GAT, GCN, HAN
Open in a separate window
Various survey papers of fake news detection have been published, such
as [18], [19], [20], [21], [22], [23]. We briefly summarize related work as follows: Vitaly Klyuev
et al. [20] presented a survey of different fake news detection methods based on semantics
using natural language processing (NLP) and text mining techniques. Additionally, the authors
discussed automatic checking and bot detection on social networks. Meanwhile, Oshikawa
et al. [21] introduced a survey for fake news detection, focusing only on reviewing NLP-based
approaches. Collins et al. [18] presented various variants of fake news and reviewed recent
trends in preventing the spread of fake news on social networks. Shu et al. [22] conducted a
review on various types of disinformation, factor influences, and approaches that decrease the
effects. Khan et al. [19] presented fake news variants, such as misinformation, rumors, clickbait,
and disinformation. They provided a more detailed representation of some fake news variant
detection methods without limiting NLP-based approaches. They also introduced types of
available detection models, such as knowledge-based, fact-checking, and hybrid approaches.
Moreover, the authors introduced governmental strategies to prevent fake news and its
variants. Mahmud et al. [23] presented a comparative analysis by implementing several
commonly used methods of machine learning and GNNs for fake news detection on social
media and comparing their performance. No survey papers have attempted to provide a
comprehensive and thorough overview of fake news detection using the most current
technique, namely, the GNN-based approach (the second justification).
The above two justifications motivated us to conduct this survey. Although some similarities are
unavoidable, our survey is different from the aforementioned works in that we focus on
description, analysis, and discussion of the models of fake news detection using the most
recent GNN-based techniques. We believe that this paper can provide an essential and basic
reference for new researchers, newcomers, and systems practitioners in overcoming current
barriers and forming future directions when improving the performance of fake news detection
systems using GNNs. This paper makes the following four main contributions.
We provide the most comprehensive survey yet of fake news, including similar concepts,
characteristics, types of related features, types of approaches, and benchmarks datasets. We
redefine similar concepts regarding fake news based on their characteristics. This survey can
serve as a practical guide for elucidating, improving, and proposing different fake news
detection methods.
We provide a brief review of existing types of GNN models. We also make necessary
comparisons among types of models and summarize the corresponding algorithms.
We introduce the details of GNN models for fake news detection systems, such as pipelines of
models, benchmark datasets, and open source code. These details provide a background and
guide experienced developers in proposing different GNNs for fake news prevention
applications.
We introduce and discuss open problems for fake news detection and prevention using GNN
models. We provide a thorough analysis of each issue and propose future research directions
regarding model depth and scalability trade-offs.
This section justified the problem and highlighted our motivations for conducting this survey.
The remaining sections of the paper are ordered as follows. Section 2 introduces the
background and provides an overview of fake news, fake news detection, and GNNs.
Section 3 presents the survey methodology used to conduct the review. General information on
the included papers is analyzed in Section 4. In Section 5, the selected papers are categorized
and reviewed in detail. Subsequently, we discuss the comparisons, advantages, and
disadvantages of the methods by category in Section 6. Next, the possible challenges of fake
news and GNNs are briefly evaluated in Section 7. Finally, we identify several open issues in this
area and discuss potential directions for future research in Section 8.
2. Background
Headline: Description of the main topic of the news with a short text to attract readers’
attention.
Body content: Detailed description of the news, including highlights and publisher
characteristics.
Image/Video: Part of the body content that provides a visual illustration to simplify the news
content.
“Fake news” was named word of the year by the Macquarie Dictionary in 2016 [24]. Fake news
has received considerable attention from researchers, with differing definitions from various
view opinions. In [24], the authors defined fake news as “a news article that is intentionally and
verifiably false”. Alcott and Gentzkow [2] provided a narrow definition of fake news as “news
articles that are intentionally and verifiably false, and could mislead readers”. In another
definition, the authors considered fake news as “fabricated information that mimics news
media content in form but not in organizational process or intent” [25]. In [26], the authors
considered fake news in various forms, such as false, misleading, or inventive news, including
several characteristics and attributes of the disseminated information. In [27], the authors
provided a broad definition of fake news as “false news” and a narrow definition of fake news
as “intentionally false news published by a news outlet”. Similar definitions have been employed
in previous fake news detection methods [3], [4], [28], [29].
Intention to deceive [35]: This characteristic is identified based on the hypothesis that “no one
inadvertently produces inaccurate information in the style of news articles, and the fake news
genre is created deliberately to deceive” [25]. Deception is prompted by political/ideological or
financial reasons [2], [36], [37], [38]. However, fake news may also appear and is spread to
amuse, to entertain, or, as proposed in [39], “to provoke”.
Malicious account: Currently, news on social networks comes from both real people and unreal
people. Although fake news is created and primarily spread by accounts that are not real
people, several real people still spread fake news. Accounts created mainly to spread fake news
are called malicious accounts [27]. Malicious accounts are divided into three main types: social
bots, trolls, and cyborg users [24]. Social bots are social network accounts controlled by
computer algorithms. A social bot is called a malicious account when it is designed primarily to
spread harmful information and plays a large role in creating and spreading fake news [40]. This
malicious account can also automatically post news and interact with other social network
users. Trolls are real people who disrupt online communities to provoke an emotional response
from social media users [24]. Trolls aim to manipulate information to change the views of
others [40] by kindling negative emotions among social network users. Consequently, users
develop strong doubts and distrust them [24]; they will fall into a state of confusion, unable to
determine what is real and what is fake. Gradually, users will doubt the truth and begin to
believe lies and false information. Cyborg users are malicious accounts created by real people;
however, they maintain activities by using programs. Therefore, cyborgs are better at spreading
false news [24].
Authenticity: This characteristic aims to identify whether news is factual [27]. Factual
statements can be proven true or false. Subjective opinions are not considered factual
statements. Only objective opinions are considered factual statements. Factual statements can
never be incorrect. When a statement is published, it is not a factual statement if it can be
disproved [41]. Nonfactual statements are statements that we can agree or disagree with. In
other words, this news is sometimes wrong, sometimes right or completely wrong. Fake news
contains mostly nonfactual statements.
The information is news: This characteristic [27] reflects whether the information is news.
Based on the characteristics of fake news, we provide a new definition of fake news as
follows. “Fake news” is news containing nonfactual statements with malicious accounts that
can cause the echo chamber effect, with the intention to mislead the public.
Concepts related to Fake news: Various concepts regarding fake news exist. Using the
characteristics of fake news, we can redefine these concepts to distinguish them as follows.
False news [42], [43] is news containing nonfactual statements from malicious accounts that
can cause the echo chamber effect with undefined intentions.
Cherry-picking [45] is news or non-news containing common factual statements from malicious
accounts and can cause the echo chamber effect, with the intention to mislead the public.
Rumor [46] is news or non-news containing factual or nonfactual statements from malicious
accounts and can cause the echo chamber effect with undefined intentions.
Fake information is news or non-news of nonfactual statements from malicious accounts that
can cause the echo chamber effect, with the intention to mislead the public.
Deceptive news [2], [24], [27] is news containing nonfactual statements from malicious
accounts that can cause the echo chamber effect, with the intention to mislead the public.
Satire news [48] is news containing factual or nonfactual statements from malicious accounts
that can cause the echo chamber effect, with the intention to entertain the public.
Clickbait [49] is news or non-news containing factual or nonfactual statements from malicious
accounts that can cause the echo chamber effect, with the intention to mislead the public.
Fake facts [50] are undefined information (news or non-news) comprising nonfactual
statements from malicious accounts that can cause the echo chamber effect, with the intention
to mislead the public.
Sloppy journalism [19] is unreliable and unverified information (news or non-news) comprising
undefined statements shared by journalists that can cause the echo chamber effect, with the
intention to mislead the public.
Based on the content presented in Table 2, these datasets can be further detailed as follows:
ISOT1 : Both fake news and real news from Reuters; fake news from websites flagged by
PolitiFact and Wikipedia.
Fakeddit: English multimodal fake news dataset including images, comments, and metadata
news.
LIAR2 : English dataset with 12,836 short statements regarding politics collected from online
streaming and two social networks – Twitter and Facebook – from 2007 to 2016.
Stanford Fake News: Fake news and satire stories, including hyperbolic support or
condemnation of a figure, conspiracy theories, racist themes, and discrediting of reliable
sources.
FA-KES: Labeled fake news regarding the Syrian conflict, such as casualties, activities, places,
and event dates.
BREAKING!: English dataset created using the Stanford Fake News dataset and BS detector
dataset3 . The data, including news regarding the 2016 US presidential election, were collected
from web pages.
BuzzFeedNews4 : English dataset with 2283 news articles regarding politics collected from
Facebook from 2016 to 2017.
FakeNewsNet5 : English dataset with 422 news articles regarding society and politics collected
from online streaming and Twitter.
FEVER: English dataset with 185,445 claims regarding society collected from online streaming.
FakeCovid: English dataset with 5182 news articles for COVID-19 health and society crawled
from 92 fact-checking websites, referring to Poynter and Snopes.
CredBank6 : English dataset with 60 million tweets about over 1000 events regarding society
collected from Twitter from October 2014 to February 2015.
Memetracker: English dataset with 90 million documents, 112 million quotes, and 22 million
various phrases regarding society collected from 165 million sites.
BuzzFace: English dataset with 2263 news articles and 1.6 million comments regarding society
and politics collected from Facebook from July 2016 to December 2016. This dataset was
extended in September 2016.
FacebookHoax: English dataset with 15,500 hoaxes regarding science collected from Facebook
from July 2016 to December 2016. Additionally, this dataset identifies posts with over 2.3
million likes.
Higgs-Twitter: English dataset with 985,590 tweets posted by 527,496 users regarding the
science of the new Higgs boson detection collected from Twitter.
Trust and Believe: English dataset with information from 50,000 politician users on Twitter. All
information was labeled manually or using available learning methods.
Yelp: English dataset with 18,912 technology fake reviews collected from online streaming.
PHEME: English and German dataset with 4842 tweets and 330 rumors conversations regarding
society and politics collected from Twitter.
Because of the limited number of manuscript pages, we do not describe further datasets. The
remaining datasets are presented in the Appendix under Description of Datasets.
Based on the above analysis, we compare the criteria of fake news datasets in Fig. 1, followed
by a discussion of observations and the main reason for these observations.
Fig. 1
A comparison among datasets in terms of four criteria.
First, regarding the type of news content, 29 of the 35 datasets contained text data (82.86%);
three of the 35 datasets comprised text, image, and video data (8.57%), namely, Fakeddit,
Stanford Fake News, and Verification Corpus; two of the 35 datasets contained text and image
data (5.71%), namely, FakeNewsNet and Breaking; and only one dataset contained text and
video data (2.86%). No dataset included separate images or videos because previous fake news
detection methods used mainly NLP-based techniques that were highly dependent on text data.
Additionally, labeled image or video data are scarce because annotating them is labor intensive
and costly.
Second, regarding the news domain, 20 and 19 of the 35 datasets focused on society news
(57.14%) and political news (54.29%), respectively, whereas only one dataset contained
economy, fraud/scam, and fauxtography news (2.86%). These findings can be explained by the
fact that fake news is more pertinent and widespread in political and societal domains than in
other domains [89].
Third, regarding the type of fake news concepts, 27 of the 35 datasets contained the fake news
concept (77.14%), followed by rumors (11.43%), satire (8.57%), hoaxes, and real news (5.71%),
and finally, fake reviews (2.86%). Therefore, datasets containing the fake news concept are
generally used for fake news detection applications because fake news contains false
information spread by news outlets for political or financial gains [46].
Finally, regarding the type of applications, the most common application objective of the 35
datasets was fake detection (71.43%), followed by fact-checking (11.43%), veracity
classification, and rumor detection (8.57%) because fake news detection applications can be
used to solve practical problems. Additionally, fake news detection is the most general
application, covering the entire process of classifying false information as true or false. Thus,
fake information datasets are the most relevant for collection [52].
2.2.3. Features of fake news detection
The details of extracting and representing useful categories of features from news content and
context are summarized in Fig. 2.
Fig. 2
Categories of features for fake news detection methods.
Based on the news attributes and discriminative characteristics of fake news, we can extract
different features to build fake news detection models. Currently, fake news detection relies
mainly on news and context information. In this survey, we categorize factors that can aid fake
news detection into seven categories of features: network-, sentiment-, linguistic-, visual-,
post-, user-, and latent-based features.
Linguistic-based features: These are used to capture information regarding the attributes of the
writing style of the news, such as words, phrases, sentences, and paragraphs. Fake news is
created to mislead or entertain the public for financial or political gains. Therefore, based on
the intention of fake news, we can easily extract features related to writing styles that often
appear only in fake news, such as using provocative words to stimulate the reader’s attention
and setting sensational headlines. To best capture linguistic-based features, we divide them
into five common types: lexical, syntactic, semantic, domain-specific, and informality. Lexical
features refer to wording, such as the most salient characters (n-grams) [90], [91], frequency of
negation words, doubt words, abbreviation words, vulgar words [92], and the novelty of
words [93]. Syntactic features capture properties related to the sentence level, such as the
number of punctuations [94], number of function words (nouns, verbs, and adjectives) [93],
frequency of POS tags [95], and sentence complexity [96], [97]. Semantic features capture
properties related to latent content, such as the number of latent topics [98] and contextual
clues [99]. These features are extracted with state-of-the-art NLP techniques, such as
distribution semantics (embedding techniques) and topic modeling (LDA technique) [100].
Domain-specific features capture properties related to domain types in the news, such as
quoted words, frequency of graphs, and external links [101]. Informality features capture
properties related to writing errors, such as the number of typos, swear words, netspeak, and
assent words [27].
User-based features: This category of features is identified and extracted based on the
malicious account characteristics of fake news, specifically social bots and cyborg users. User-
based features are properties related to user accounts that create or spread fake news. These
features are classified into two levels, namely, the group level and the individual level [27]. The
individual focuses on exploiting fake or real factors regarding each specific user, such as
registration age, number of followers, and number of opinions posted by users [102], [104].
Meanwhile, the group level focuses on factors regarding the group of users, such as the ratio of
users, the ratio of followers, and the ratio of followees [95], [105].
Post-based features: This category of features is identified and extracted based on the
malicious accounts and news characteristics of fake news. Post-based features are used to
capture properties related to users’ responses or opinions regarding the news shared. These
features are classified into three categories: group, post, and temporal [27]. The post level
focuses on exploiting factors regarding each post [28], such as other users’ opinions regarding
this post (support, deny), main topic, and degree of reliability. The group level focuses on
factors regarding all opinions related to this post [106], such as the ratio of supporting opinions,
ratio of contradicting opinions, and reliability degree [95], [105]. The temporal level notes
factors such as the changing number of posts and followers over time and the sensory
ratio [105].
Data-driven features: This category of features is identified and extracted based on the data
characteristics of fake news, such as the data domain, data concept, data content, and
application. The data domain exploits domain-specific and cross-domain knowledge in the news
to identify fake news from various domains [108]. The data concept focuses on determining
whether concept drift [109] exists in the news. The data content focuses on considering
properties related to latent content in the news, such as the number of latent topics [98] and
contextual clues [99]. These features are extracted based on state-of-the-art NLP techniques,
such as distribution semantics (embedding techniques) and topic modeling (LDA
technique) [100].
Visual-based features: Few fake news detection methods have been applied to visual
news [24]. This category of features is identified and extracted based on the authenticity, news,
and intended characteristics of fake news. Visual-based features are used to capture properties
related to news containing images, videos, or links [27], [100]. The features in this category are
classified into two groups: visual and statistical. The visual level reflects factors regarding each
video or image, such as clarity, coherence, similarity distribution, diversity, and clustering score.
The statistical level calculates factors regarding all visual content, such as the ratio of images
and the ratio of videos.
Latent features: A critical concept that we need to be aware of herein is latent features that are
not directly observable, including latent textual features and latent visual features. Latent
features are needed to extract and represent latent semantics from the original data more
effectively. This category of features is identified and extracted based on the characteristics of
fake news, such as the echo chamber, authenticity, and news information. Latent textual
features are often extracted by using the news text representation models to create news text
vectors. Text representation models can be divided into three groups: contextualized text
representations, such as BERT [110], ELMo [111], Non-contextualized text representation, such
as Word2Vec [112], FastText [113], GloVe [114], and knowledge graph-based representation,
such as Koloski et al. method [115], RotatE [116], QuatE [117], ComplEx [118]. Contextualized
text representations are word vectors that can capture richer context and semantic
information. Knowledge graph-based representations can enrich various contextual and
noncontextual representations by adding human knowledge representations via connections
between two entities with their relationship based on knowledge graphs. News text
representations can be not only used as inputs for traditional machine learning
models [119] but also integrated into deep learning models, such as neural networks [115],
recurrent networks [120], and transformers [110], [121], [122], and GNNs-based
models [123], [124], [125] for fake news detection. Latent visual features are often extracted
from visual news, such as images and videos. Latent visual features are extracted by using
neural networks [126] to create a latent visual representation containing an image pixel tensor
or matrix.
2.2.4. Fake news detection techniques
Style-based detection:
Given a news item a with a set of fas style features, where fas is a set of features regarding the
news content. Style-based fake news detection is defined as binary classification to identify
whether news item a is fake or real, which means that we have to find a mapping
function F such that F:fas→Ψa. The techniques in this category are proposed based on the
intention and news characteristics of fake news. The objective of style-based techniques is to
capture the distinct writing style of fake news. Fake news employs distinct styles to attract the
attention of many people and stand out from ordinary news. The capturing step of the writing
styles was built automatically. However, two techniques must be observed as criteria: style
representation techniques [132], [133], [134] and style classification
techniques [28], [91], [135].
Context-based detection:
Given news item a with a set of fac context features, where fac includes news text, news
source, news publisher, and news interaction. Context-based fake news detection is defined as
the task of binary classification to identify whether news item a is fake or real, which means
that we have to find a mapping function F such that F:fac→Ψa. The techniques in this category
are proposed based on the malicious account and news characteristics of fake news. The
objective of source-based techniques is to capture the credibility of sources that appear,
publish, and spread the news [27]. Credibility refers to people’s emotional response to the
quality and believability of news. The techniques in this category are often classified into two
approaches: (i) assessing the reliability of sources where the news appeared and is spread
based on news authors and publishers [136], [137] and (ii) assessing the reliability of sources
where the news appeared and is spread based on social media users [105], [138], [139].
Propagation-based detection:
Given news item a with a set of fap propagation patterns features for news. Propagation-based
fake news detection is defined as binary classification to identify whether news item a is fake or
real, which means that we have to develop a mapping function F such that F:fap→Ψa. The
techniques in this category are proposed based on the echo chamber effect and news
characteristics of fake news. The objective of propagation-based techniques is to capture and
extract information regarding the spread of fake news. That is, the methods in this category aim
to detect fake news based on how people share it. These techniques are often grouped into
two small categories:
(i) using news cascades [140], [141] and
(ii) using self-defined propagation graphs [142], [143], [144], [145].
Multilabel learning-based detection:
Let χ∈Rd be the d-dimension input feature matrix; hence, news item a=[a1,…,ad]∈χ; and
let Γ={real,fake}l be the label matrix, such that Ψ=[Ψ1,…,Ψl]∈Γ, where l is the number of class
labels. Given a training set {(a,Ψ)}, the task of multilabel learning detection is to learn a
function F:χ→Γ to predict Ψˆ=F(a). Multilabel learning-based detection is a learning method
where each news item in the training set is associated with a set of labels. The techniques in
this category are proposed based on the echo chamber effect and news characteristics of fake
news. The objective of multilabel learning-based techniques is to capture and extract
information regarding the news content and the news latent text. The techniques in this
category are often classified into four approaches: (i) using style-based
representation [17], [115], [146], [147]; (ii) using style-based
classification [15], [29], [148], [149], [150], [151]; (iii) using news cascades [140], [152]; and (iv)
using self-defined propagation graphs [4], [16], [125], [153].
Hybrid-based detection: This method is a state-of-the-art approach for fake news detection
that simultaneously combines two previous approaches, such as content-context [154], [155],
propagation-content [147], [156], and context-propagation [4], [14]. These hybrid methods are
currently of interest because they can capture more meaningful information related to fake
news. Thus, they can improve the performance of fake news detection models.
A critical issue that needs to be discussed is fake news early detection. Early detection of fake
news provides an early alert of fake news by extracting only the limited social context with a
suitable time delay compared with the appearance of the original news item. Knowledge-based
methods are slightly unsuitable for fake news early detection because these methods depend
strongly on knowledge graphs; meanwhile, newly disseminated news often generates new
information and contains knowledge that has not appeared in knowledge graphs. Style-based
methods can be used for fake news early detection because they depend mainly on the news
content that allows us to detect fake news immediately after news appears and has not been
spread. However, style-based fake news early detection methods are only suitable for a brief
period because they rely heavily on the writing style, which creators and spreaders can change.
Propagation-based methods are unsuitable for fake news early detection because news that is
not yet been disseminated often contains very little information about its spread. To the best of
our knowledge, context-based methods are most suitable for fake news early detection
because they depend mainly on the news surroundings, such as news sources, news publishers,
and news interactions. This feature allows us to detect fake news immediately after news
appears and has not been spread by using website spam detection [157], distrust link
pruning [158], and user behavior analysis [159] methods. In general, early detection of fake
news is only suitable for a brief period because human intelligence is limitless. When an early
detection method of fake news is applied, it will not be long until humans create an effective
way to combat it. This issue is still a major challenge for the fake news detection field.
2.3. Understanding graph neural networks
In this section, we provide the background and definition of a GNN. The techniques, challenges,
and types of GNNs are discussed in the following section. Before presenting the content of this
section, we introduce the notations used in this paper in Table 3.
Table 3
Descriptions of notations.
Notations Descriptions
|.| The length of a set
G A graph
V The set of nodes in a graph
v A node in a graph
E The set of edges in a graph
eij An edge between two nodes vi,vj in a graph
A The graph adjacency matrix
D The degree matrix of A. Dii=∑nj=1Aij
n The number of nodes
m The number of edges
r The set of relations of edges
d The dimension of node feature vector
c The dimension of edge feature vector
xevi,vj∈Rc The feature vector of edge eij
xnv∈Rd The feature vector of node v
Xe∈Rm×c The edge feature matrix of a graph
X∈Rn×d The node feature matrix of a graph
X(t)∈Rn×d The node feature matrix at the time step t
Open in a separate window
2.3.1. What is a graph?
Before we discuss deep learning models on graph structures, we provide a more formal
description of a graph structure. Formally, a simple graph is presented as G={V,E},
where V={v1,v2,…,vn} is the set of nodes, and E={e11,e12,…,enn} is the set of edges
where eij=(vi,vj) ∈E, 1≤i,j≤n. In which, vi and vj are two adjacent nodes. The adjacency
matrix A is a n × n matrix with
Aij={1,0,ifeij∈E,ifeij∉E.)
(2)
We can create improved graphs with more information from simple graphs, such as attributed
graphs [6], multi-relational graphs [160].
Attributed graphs are the extended version of simple graphs. They are obtained by adding the
node attributes X or the edge attributes Xe, where X∈Rn×d is a node feature matrix
with xnv∈Rd indicating the feature vector of a node v; Xe∈Rm×c is an edge feature matrix
with xevi,vj∈Rc indicating the feature vector of an edge eij.
Spatial–Temporal graphs are special cases of attributed graphs, where the node attributes
automatically change over time. Therefore, let X(t) be a feature matrix of the node
representations at tth time step, a spatial–temporal graph is defined as G(t)= {V,E,X(t)},
where X(t)∈Rn×d.
Multi-relational graphs are another extension version of simple graphs that include edges with
different types of relations τ. In these cases, we have eij=(vi,vj)∈E →eij=(vi,τ,vj)∈E. Each edge
has one relation adjacency matrix Aτ. The entire graph can be created an adjacency
tensor A ∈Rn×r×n. The multi-relational graphs can be divided into two subtypes: heterogeneous
and multiplex graphs.
Heterogeneous graphs: Here, nodes can be divided into different types. That
means V=V1∪V2∪...∪Vk, where for i≠j, Vi∩Vj=∅. Meanwhile, edges must generally satisfy the
conditions following the node types. Then, we have eij=(vi,τ,vj)∈E→eij=(vi,τh,vj)∈E,
where vi∈Vt,vj∈Vk and t≠k.
Multiplex graphs: Here, graphs are divided into a set of k layers, where each node belongs to
one layer, and each layer has a unique relation called the intralayer edge type. Another edge
type is the interlayer edge type. The interlayer connects the same node across the layers. That
means G={Gi,i∈{1,2,…,k}},Gi={Vi,Ei}, with Vi={v1,v2,
…,vn}, Ei=Eintrai∪Einteri, Eintrai={elj=(vl,vj),vl,vj∈Vi}, Einteri={elj=(vl,vj),vl∈Vi,vj∈Vh,1≤h≤k,h≠i}.
2.3.2. What are graph neural networks?
GNNs are created using deep learning models over graph structure data, which means deep
learning models deal with Euclidean space data; in contrast, GNNs [6], [161], [162], [163] deal
with non-Euclidean domains. Assume that we have a graph G={V,E} with adjacency
matrix A and node feature matrix (or edge feature matrix) X (or Xe). Given A and X as inputs,
the main objective of a GNN is to find the output, i.e., node embeddings and node classification,
after the k-th layer is: H(k)=F(A,H(k−1);θ(k)), where F is a propagation function; θ is the
parameter of function F, and when k = 1, then H(0)=X. The propagation function has a number
of forms. Let σ(⋅) be a non-linear activation function, e.g., ReLU; W(k) is the weight matrix for
layer k; Aˆ is the normalized adjacency matrix and calculated as Aˆ=D−0.5A⊤D−0.5 with D, is the
diagonal degree matrix of A⊤, that is calculated as Dii=∑jA⊤ij; A⊤=A+I with I is the identity
matrix. A simple form of the propagation function is often used: F(A,H(k))=σ(AH(k−1)W(k)). In
addition, the propagation function can be improved to be suitable for GNN tasks as follows:
For the node classification task, function F often takes the following form [164]:
F(A,H(k))=σ(AˆH(k−1)W(k))
(3)
For the node embeddings task, function F often takes the following form [165]:
F(A,H(k))=σ((Qϕ(H(k−1)eMe)Q⊤⊙Aˆ)H(k−1)W(k))
(4)
where Q is a transformer representing whether edge e is connected to the given node
and Q⊤=T+I; Me the learnable matrix for the edges; ϕ is the diagonalization operator; ⊙ is the
element-wise product; H(k−1)e is the hidden feature matrix of edges in the (k−1)-th layer,
where H0e=Xe (Xe is the edge feature matrix). The Qϕ(H(k−1)eMe)Q⊤ is to normalize the
feature matrix of edges. The Qϕ(H(k−1)eMe)Q⊤⊙Aˆ is to fuse the adjacency matrix by adding
the information from edges.
More choices of the propagation function in GNNs are detail presented in Refs. [13], [165].
Early neural networks were applied to acyclic graphs by Sperduti et al. [166] in 1997. In 2005,
Gori et al. [167] introduced the notion of GNNs, which were further detailed by Scarselli
et al. [168] in 2009 and by Gallicchio et al. [169] in 2010. According to Wu et al. [6], GNNs can
be divided into four main taxonomies: conventional GNNs, graph convolutional networks, graph
autoencoders, and spatial–temporal graph neural networks. In the next subsections, we
introduce the categories of GNNs.
Graph convolutional networks (GCNs) were first introduced by Kipf and Welling [164]. They are
capable of representing graphs and show outstanding performance in various tasks. In these
GNNs, after the graph is constructed, the function F is also defined as Eq. (3). However, the
recursive propagation step of a GCN at the k-th convolution layer is given by:
H(1)=σ(AˆH(0)W(1)+b(1))
(6)
Hence,
H(2)=σ(AˆH(1)W(2)+b(2))
(7)
That means:
H(k)=σ(AˆH(k−1)W(k)+b(k))
(8)
where H(0)=X. σ(⋅) is an activation function. W(k)∈Rm×d, k={1,2,3,…} is a transition matrix
created for the k-th layer. b(1) and b(2) are the biases of two layers.
Graph autoencoders (GAEs) are deep neural architectures with two components: (i) the
encoder, which converts nodes on the graph into a vector space of latent features, and (ii) the
decoder, which decodes the information on the graph from the latent feature vectors. The first
version of GAEs was introduced by Kipf and Welling [170], [171]. In these GNNs, the form of
function F is redefined as the following Equation:
F(A˜,H(k))=σ(A˜H(k−1)W(k))
(9)
where A˜=φ(ZZ⊤) is the reconstructed adjacency matrix and φ is the activation function of the
decoder composition. Z is the output of the encoder composition. In these GAEs, the GCNs are
used in the encoder step to create the embedding matrix; therefore, Z is calculated based on
Eq. (3). Thus, Z=F(Aˆ,H(k)) with F(⋅) corresponds to the case of GCNs. Z⊤ is the transpose matrix
of Z.
Spatial–temporal graph neural networks (STGNNs) in various real-world tasks are dynamic as
both graph structures and graph inputs. To represent these types of data, a spatial–temporal
graph is constructed as introduced in Section 2.3.1. However, to capture the dynamicity of
these graphs, STGNNs have been proposed for modeling the inputs containing nodes with
dynamic and interdependency. STGNNs can be divided into two approaches: RNN-based and
CNN-based methods.
For the RNN-based approach, to capture the spatial–temporal relation, the hidden states of
STGNNs are passed to a recurrent unit based on graph convolutions [172], [173], [174]. The
propagation function form of STGNNs is also shown in Eq. (3). However, the value of the k-th
layer is calculated as follows:
H(t)=σ(WXn(t)+UH(t−1)+b)
(10)
where Xn(t) is the node feature matrix at time step t. After using graph convolutions, Eq. (10) is
recalculated as follows:
H(t)=σ(GCN(Xn(t),Aˆ;W)+GCN(H(t−1),Aˆ;U)+b)
(11)
where GCN is one of GCNs model. U∈Rn×n is the eigenvector matrix ranked by eigenvalues
with U⊤U=I.
Attention-based graph neural networks (AGNNs) [178] remove all intermediate fully
connected layers and replace the propagation layers with an attention mechanism that
maintains the structure of the graph [179]. The attention mechanism allows learning a dynamic
and adaptive local summary of the neighborhoods to obtain more accurate predictions [180].
The propagation function form of the AGNN is shown in Eq. (3). However, the AGNN includes
graph attention layers. In each layer, a shared, learnable linear transformation M∈Rth×dh,
where h is the number of the t-th hidden layer, dh is the dimensional of the t-th hidden layer, is
used for the input features of every node as follows:
H(t)=σ(M(t)H(t−1))
(12)
where the row-vector of node vi defined as follows:
H(t)vi=∑vj∈N(vi)∪{i}M(t−1)ijH(t−1)j
(13)
where
M(t−1)ij=φ([β(t−1)cos(H(t−1)i,H(t−1)j)]vj∈N(vi)∪{i})
(14)
where β(t−1)∈R is an attention-guided parameter of propagation layers. Note that the value
of β of propagation layers is changed over hidden states. φ(⋅) is the activation function of
propagation layer.
3. Survey methodology
In this study, we conducted a systematic review of fake news detection articles using GNN
methods, including three primary steps: “literature search,” “selection of eligible papers,” and
“analyzing and discussing” [181]. The research methodology is illustrated in Fig. 4:
Fig. 4
Flow diagram of research methodology.
The literature search is used to select peer-reviewed and English-language scientific papers
containing the following keywords: “GNN” OR “graph neural network” OR “GCN” OR “graph
convolutional network” OR “GAE” OR “graph autoencoder” OR “AGNN” OR “attention-based
graph neural network” combined with “fake news” OR “false news” OR ”rumour” OR ”rumor”
OR “hoax” OR “clickbait” OR “satire” OR “misinformation” combined with “detection”. These
keywords were extracted from Google Scholar, Scopus, and DBLP from January 2019 to the end
of Q2 2021.
The selection of eligible papers is used to exclude the nonexplicit papers on fake news
detection using GNNs. To select the explicit papers, we specify a set of exclusion/inclusion
criteria. The inclusion criteria were as follows: written in English, published after 2019, peer-
reviewed, and retrieved full-text. The exclusion criteria were as follows: papers of reviews,
surveys, and comparisons or only presented mathematical models.
Analysis and discussion papers are used to compare the surveyed literature and capture the
main challenges and interesting open issues that aim to provide various unique future
orientations for fake news detection.
By the above strategy, a final total of 27 papers (5 papers in 2019, 16 papers in 2020, and 6
papers for the first 6 months of 2021) are selected for a comprehensive comparison and
analysis. These selected papers are classified into four groups based on GNN taxonomies (see
Section 2.3.2), including conventional GNN-based, GCN-based, AGNN-based, and GAE-based
methods. In the next step, eligible papers are analyzed via the criteria of the method’s name,
critical idea, loss function, advantage, and disadvantage.
6-Marion 2020,
[153] FakeNewsNet Acc: 73.3% Propagation
et al. 1 GCN
Acc: 69.0–
2021, Twitter [140] 77.0%
22-*GraphSAGE [125] Propagation
GCN PHEME Acc: 82.6–
84.2%
MCC: 33.12–
23- Bert-GCN 2021, Covid-19 and 5G 47.95%
[150] Content
Bert-VGCN GCN tweets MCC: 39.10–
49.75%
24-*Lotfi [204] 2021, PHEME F1: 80% Content,
GCN (rumor) propagation
F1: 79% (non-
Method’s PY and Approach-
Authors Dataset Performance
name TG based
rumor)
Acc: 79.2–
2021, Twitter 15[140],
25-*SAGNN [151] 85.7% Content
GCN Twitter 16[140]
Acc: 72.6-86.9%
2021, Fact-checking, Acc: 61.55% Content,
26-AA-HGNN [17]
AGNN BuzzFeedNews Acc: 73.51% context
2021, Acc: 63.8–
27-*EGCN [154] PHEME Propagation
GCN 84.1%
Open in a separate window
PY and TG: Publication year and Type of GNNs. MCC: Matthews correlation coefficient. T-MCC:
MCC for test dataset. M-FCN: MALSTM-FCN model.
Table 5
Code source.
Refe
Code source
r
1
[Link]
2
[Link]
3
[Link]
4
[Link]
5
[Link]
6
[Link]
7
[Link]
8
[Link]
9
[Link]
Open in a separate window
Using the relationships among the information in Table 4, we compare quantitatively surveyed
methods in terms of four distribution criteria of GNN-based fake news detection approaches, as
shown in Fig. 5.
Fig. 5
A comparison of four distribution criteria of GNN-based fake news detection approaches.
The number of surveyed papers from 2019 to 2021 (the end of Q2) regarding fake news
detection using GNNs shows that this problem is attracting increasing attention from system
practitioners (increasing 40.74% from 2019 to 2020). Although in 2021, only 22.22% of articles
on fake news detection focused on using GNNs, Q2 has not yet ended, and we believe that the
last two quarters of the year will produce more articles in this field, considering the outbreak of
fake news related to COVID-19 and the challenges of this problem.
With regard to the type of news concepts employed (types of objectives), 14 of the 27 surveyed
papers are related to fake news detection (51.85%), followed by rumors and spam detection
(29.63%, 7.41%), whereas other types of detection constitute only 3.7%. A likely reason for
these results is the creation and spread of fake news correspond to active economic and
political interests. That is, if fake news is not detected and prevented in a timely manner,
people will suffer many deleterious effects. Additionally, as analyzed above, an equally
important reason is that datasets used for fake news detection are now richer and more fully
labeled than other datasets (see Section 2.2.2).
With regard to GNN-based techniques, the authors predominantly (74.07%) used GCNs for fake
news detection models, followed by GNN-based methods (14.81%), GAE, and AGNN (3.7%).
This choice is attributable to the suitability of GCNs for graph representations in addition to
having achieved state-of-the-art performance in a wide range of tasks and applications [13].
5. Literature survey
In this section, we survey papers using graph neural networks for fake news detection. Based
on GNN taxonomies (see Section 2.3.2), we categorized GNN-based fake news detection
methods into conventional GNN-based, GCN-based, AGNN-based, and GAE-based methods, as
shown in Table 6.
Table 6
Category Publication
Conventio
[4], [5], [15], [192], [193]
nal GNN
[3], [14], [16], [123], [125], [147], [149], [150], [151], [153], [154], [155], [156], [1
GCN
94], [201], [204]
AGNN [17], [192]
GAE [124]
Conventional GNN-based methods (GNN∗) are pioneering GNN-based fake news detection
methods. These methods apply a similar set of recurrent parameters to all nodes in a graph to
create node representations with better and higher levels.
GCN-based methods (GCN) often use the convolutional operation to create node
representations of a graph. Unlike the conventional GNN-based approach, GCN-based methods
allow integrating multiple convolutional layers to improve the quality of node representations.
AGNN-based methods are constructed mainly by feeding the attention mechanism into graphs.
Thus, AGNNs are used to effectively capture and aggregate significant neighbors to represent
nodes in the graph.
GAE-based methods are unsupervised learning approaches to encode nodes on a graph into a
latent vector and decode the encoded information to reconstruct the graph data to create node
representations by integrating latent information.
Most approaches proposed in the surveyed papers for detecting false information are used to
solve a classification problem task that involves associating labels such as rumor or nonrumor
and true or false with a particular piece of text. In using GNNs for fake news detection,
researchers have employed mainly conventional GNNs and GCNs to achieve state-of-the-art
results. On the other hand, some researchers have applied other approaches, such as GAE and
AGNN, to predict their conforming labels.
Ke et al. [156] constructed a heterogeneous graph, namely, KZWANG, for rumor detection by
capturing the local and global relationships on Weibo between sources, reposts, and users. This
method comprises three main steps as follows: (i) word embeddings convert text content of
news into vectors using a multihead attention mechanism,
T=MultiHead(Q,K,V)=Concat(head1,…,headh)Wo
(15)
where headi=attention(QWQi,KWKi,VWVi) with
Q∈Rnq×d,K∈Rnk×d,V∈Rnv×d are sentences of query, key, and value; nq,nk,nv are the number
of words in each sentence; and attention(Q,K,V ) =Softmax(QK⊤dk√)V; (ii) propagation and
interaction representations are learned via GCNs; and (iii) graph construction builds a model of
potential interactions among users: P=H(k)=σ(AˆH(k−1)W(k−1)). The difference between this
model and conventional GCNs is that KZWANG is a combination of the news text representation
using a multihead attention mechanism and propagation representation using GCNs. Thus, the
outputs of the GCN layer and the multihead attention layer are the inputs of rumor
classification: R = Softmax(TP + b), where T is the text representation matrix. P is the
propagation representation matrix. R is the output of the whole model.
Lotfi et al. [204] introduced a model that includes two GCNs: (i) a GCN of tweets, such as source
and reply as T=H(k)=σ(AˆTH(k−1)W(k−1)); (ii) GCN of users, such as interaction among users
as Re=H(k)=σ(AˆReH(k−1)W(k−1)) where AT is the adjacency matrix of the GCN of tweets and
determined as if(tweetirepliestotweetj)or(i=j) thenAijT=1otherwise0. Meanwhile, ARe is the
adjacency matrix of the GCN of users and defined as
follows: if(userisentmtweetstouseriinconversation)or(i=j) thenAijRe=1 otherwise0.
And H(0)=X is determined
as:if(thereishighfrequencywordsjintweeti)or(thepropagationtimeisintervalbetweenthereplytwe
etiandthesourcetweet)thenXij=1otherwise0. Unlike other models, the authors constructed two
independent GCNs and then concatenated them into one fully connected layer for fake news
detection as Softmax((T⊕Re)W+b), where ⊕ is the concatenation function.
Vu et al. [125] presented a novel method called GraphSAGE for rumor detection based on
propagation detection. In contrast to other propagation-based approaches, this method
proposes a graph propagation embedding method based on a GCN to convert the news
propagation procedure and their features into vector space by aggregating the node feature
vectors and feature vectors of their local neighbors into a combination vector. Thus, the
difference between the GraphSAGE model and the traditional GCN models concerns the
aggregator functions, which are divided into the following aggregators: (i) Convolutional
aggregator:
hkvj=σ(Wkhk−1vj+∑ihk−1vi|N(vj)|+1),∀vi∈N(vj)
(16)
A social spammer detection model [201] was built with a combination of the GCN and Markov
random field (MRF) models. First, the authors used convolutions on directed graphs to explicitly
consider various neighbors. They then presented three influences of neighbors on a user’s label
(follow, follower, reciprocal) using a pairwise MRF. Significantly, the MRF is formulated as an
RNN for multistep inference. Finally, MRF layers were stacked on top of the GCN layers and
trained via an end-to-end process of the entire model. Unlike conventional GCNs, this model
uses an improved forward propagation rule
Q=H(l+1)=σ(D−1iAiH(l)W(l)i+D−1oAoH(l)W(l)o+Dˆ−1/2bAˆbDˆ−1/2bH(l)W(l)b)
(20)
where Ai,Ao,Ab are types of neighbors; Aˆb=Ab+I; Di,Do, and Dˆb are degree matrices of Ai,Ao,
and Ab, respectively; The node feature matrix X=H(0) is created based on BoW features. Then,
the authors initialized the posterior probabilities of the MRF layer with the GCN output as
R=Softmax(logH(k)−AiQ[−w−ww′−w]−AoQ[−ww′−w−w]−AbQ[−ww′w′−w])
(21)
where w,w′≥0 are two learnable parameters to measure homophily and heterophily strength of
MRF model. This method demonstrated the superiority of the combination of GCN and MRF
layers. A multistep MRF layer is essential to convergence. However, the node feature matrix
was created simply with the bag-of-words method. This limitation can be improved using state-
of-the-art embedding models in the future.
A novel GCN framework, called FauxWard [149], is proposed for fauxtography detection by
exploiting news characteristics, such as linguistic, semantic, and structural attributes. The
authors modeled fauxtography detection as a classification problem and used GCNs to solve
this problem. FauxWard is similar to traditional GCN models; however, unlike these models, it
adds a cluster-based pooling layer between graph convolutional layers to learn the node
representation more efficiently. The cluster-based pooling layer first assigns neighbor nodes
into clusters based on the node vectors of the previous graph convolution layers and then
learns a cluster representation as the input of the back graph convolution layer. It performs
graph convolution by A˜(k)=C(k−1)⊤A˜(k−1)C(k−1), where A˜(k) is the updated adjacency
matrix; C(k) is the clustering matrix obtained after the k-th graph convolution layer, such
that H(k)=C(k−1)⊤σ(A˜(k−1)H(k−1)W(k−1)), where H(0)=X be a node feature matrix. Unlike
conventional GCNs, this X is created by concatenating text content, such as linguistic,
sentiment, endorsement, and image content, such as metadata.
Malhotra et al. [147] introduced a method of combining RoBERTa and BiLSTM
(TD=Bi(RoTa(tweet)), where RoTa indicates a RoBERTa model [207] and Bi indicates a BiLSTM
model) and GCN methods (GD=H(k)= σ(AˆH(k−1)W(k)), where H(0)=X is a node feature matrix
by concatenating eleven features, such as friend count, follower count, followee count, etc.) for
rumor detection as yˆ=Softmax (concat(TD,GD)). This model is based on rumor characteristics,
such as propagation and content. It exploits features regarding the structure, linguistics, and
graphics of tweet news.
Vlad et al. [195] produced a multimodal multitask learning method based on two main
components: meme identification and hate speech detection. The first combines GCN and an
Italian BERT for text representation, whereas the second is an image representation method,
which varies among different image-based structures. The image component employed VGG-16
with five CNN stacks [208] to represent images. The text component used two mechanisms to
represent text, namely, Italian BERT attention and convolution. This model is multimodal
because it considers features related to the text and image content simultaneously. Meanwhile,
Monti et al. [3] introduced a geometric deep learning-based fake news detection method by
constructing heterogeneous graph data to integrate information related to the news, such as
user profile and interaction, network structure, propagation patterns, and content. Given a
URL u with a set of tweets mentioned u, the authors constructed a graph Gu={V,E}. V is a set of
nodes corresponding to tweets and their posters. E is a set of edges expressing one of four
relations between two nodes: follow, followed, spreading, and spread. This graph has node
feature matrix X and adjacency matrix A. X is created by characterizing user features, such as
profiles, network structure, and tweet content. However, matrix A is defined as
follows:if(nodevjspreadstweetofnodevi)or(nodevispreadstweetofnodevj)or(nodevifollowsnodev
j)or(nodevjfollowsnodevi)thenAij=1otherwise0; Given matrices X and A, similar to traditional
GCNs, the authors utilized a four-layer GCN: two convolutional layers for node representation
and two fully connected layers to predict the news as fake or real. However, unlike some
previous GCNs, in this proposal, one attention mechanism in the filters [178] and the mean
pooling are used to decrease the feature vectors’ dimension for each convolutional layer.
SELU [209] is employed as a nonlinearity activation function for the entire network.
Li et al. [123] presented a GCN-based antispam method for large-scale advertisements named
GAS. Unlike previous GCNs, in the GAS model, a combination graph is constructed by
integrating the nodes and edges of the heterogeneous graph and a homogeneous graph to
capture the local and global comment contexts. The GAS is defined in the following steps: (i)
Graphs construction: The authors constructed two types of graphs named Xianyu graph and
comment graph. The first graph was denoted by G = {U,I,E} where U,I are sets of nodes
representing users and their items, respectively, and E is a set of edges representing comments.
An adjacency of this graph is created as
follows: ifuserimakescommentetoitemj, thenAXij=1other− wise0. The second graph is
constructed by connecting nodes expressing comments that have the similar meaning. That
means ifcommentihassimilarmeaning withj,then ACij=1otherwise0. (ii) GCN on Xianyu graph:
Let h(l)e,h(l)U(e) and h(l)I(e) be the l-th layer node embeddings of edge, user, and item,
respectively, ze=h(l)e=σ(W(l)E⋅concat(h(l−1)e,h(l−1)U(e),h(l−1)I(e))) where h(0)e=TN(w0,w1,
…,wn) and U(e),I(e) are user node and item node of edge e. Let h(l)N(u),h(l)N(i) are neighbor
embeddings of node u,i. TN stands by TextCNN model [210]. wk is the word vector of word k in
tweet. Hence,
h(l)N(u)=σ(W(l)U⋅att(h(l−1)u,concat(h(l−1)e,h(l−1)i)))
(22)
where ∀e=(u,i)∈E(u) and
h(l)N(i)=σ(W(l)I⋅att(h(l−1)i,concat(h(l−1)i,h(l−1)e))),
(23)
where ∀e=(u,i)∈E(i), and E(u) is the edge connected to u; att a stands of attention mechanism.
From that, we
have: zu=h(l)u=concat(W(l)U⋅h(l)u,h(l)N(u)) and zi=h(l)i=concat (W(l)I⋅h(l)i,h(l)N(i)). (iii) GCN on
the comment graph: in this step, authors used the GCN model proposed in [211] to represent
nodes on the comment graph into node embeddings as pe=GCN(XC,AC), where XC is node
feature matrix. (iv) GAS classifier: The output of GAS model is defined
as y=classifier(concat(zi,zu,ze,pe)).
Ren et al. [17] introduced a novel approach, called AA-HGNN, to model user and community
relations as a heterogeneous information network (HIN) for content-based and context-based
fake news detection. The primary technique used in AA-HGNN involves improving the node
representation process by learning the heterogeneous information network. In this study, the
AGNNs use two levels of an attention mechanism: the node learns the same neighbors’ weights
and then represents them by aggregating the neighbors’ weights corresponding to each type-
specific neighbor and a schema to learn the nodes’ information, thus obtaining the optimal
weight of the type-specific neighbor representations. Assume that we have a news HIN and a
news HIN schema, denoted by G={V,E} and SG={VT,ET}.
Let V={C∪N∪S} with C (creators), N (news), S (subjects); and E={Ec,n∪En,s}.
Let VT={θn,θc,θs} and ET={write,belongsto} denotes types of nodes and types of links. Node-
level attention is defined as h′ni=Mθn⋅hni, where ni∈N, hni is the feature vector of
node ni. Mθn is the transformation matrix for type θn. Let T∈{C∪N∪S}, tj∈T belongs to type-
neighbor θt and tj∈neighborni. Let eθtij=att(h′ni,h′tj;θt) is the importance degree of
node tj for ni, where att be a node-level attention mechanism with the attention weight
coefficient as αθtij=Softmaxj(eθtij) Then, the schema node is calculated by aggregating from the
neighbor’s features as Tni=σ(∑tj∈neighborniαθtij⋅h′tj). Let ωθti=schema(WTni,WNni) is the
importance degree of schema node Tni, where schema be a schema-level attention
mechanism, Nni is a schema node corresponding to neighbors of node ni. And the final fusion
coefficient is calculated as βθti=Softmaxt(ωθti). From that, we have a node representation
as rni=∑θt∈VTβθti⋅Tni. AA-HGNN can still achieve excellent performance without using much-
labeled data because it benefits from adversarial active learning. It can also be used for other
actual tasks relating to heterogeneous graphs because of its high generalizability.
Benamira et al. [192] proposed content-based fake news detection methods for binary text
classification tasks. The objective was a GNN-based semisupervised method to solve the
problem of labeled data limitations. This method comprises the following steps: news
embedding; news representation based on k nearest-neighbor graph inference; and news
classification based on GNNs, such as AGNN [179] and GCN [164], which are conventional GNNs
without improvements or updates.
Method
Critical idea Loss function Advantage Disadvantage
[Ref]
– Focus on analyzing
– Have not been
the news content – Can obtain good
evaluated with
Benamira using semi-supervised Cross-entropy efficacy with
big data and
et al. [192] learning loss limited labeled
multi-labeled
– Binary classification data
data
model
– The first model using
adversarial active – Support early
learning for fake news detection stage
detection – Still obtain high
– Improve the performance with – Not compare
AA-HGNN conventional GCN Cross-entropy limited training the efficacy with
[17] models by using a new loss data the context-
hierarchical attention – Can extract based methods
mechanism for node information as text
representation and structure
– Multi classification simultaneously
model
Lin – Focus on integrating The sum of – The first GAE- – Low
et al. [124] the information Cross-entropy based rumor performance for
related to text, loss and KL detection method the non-rumor
Method
Critical idea Loss function Advantage Disadvantage
[Ref]
propagation, and – Can create better
network structure and high-level
class
– Include three parts: node
divergence – Not high
encoder, decoder, and representations
loss generalization for
detector – Obtain better
the performance
– Multi classification efficacy than other
model the latest methods
Open in a separate window
Lin et al. [124] proposed a model to capture textual, propagation, and structural information
from news for rumor detection. The model includes three parts: an encoder, a decoder, and a
detector. The encoder uses a GCN to represent news text to learn information, such as text
content and propagation. The decoder uses the representations of the encoder to learn the
overall news structure . The detector also uses the representations of the encoder to predict
whether events are rumors. The decoder and detector are simultaneously implemented. These
parts are generally defined as follows: (i) Encoder component: Two layers of the GCN are used
to enhance the learning ability:
H(1)=GCN(X,A)=Aˆσ(AˆXW(0)W(1))
(24)
and
H(2)=GCN(H(1),A)=Aˆσ(AˆH(1)W(1)W(2))
(25)
where σ is ReLU function. X represents word vectors that are created by determining the TF-IDF
values, and the adjacent matrix A is defined as
follows: ifnodevirespondsto nodevj,thenAij=1otherwise0. Then, the GCN is used to learn a
Gaussian distribution for variational GAE as z=μ+ϵσ,
where μ=GCN(H(1),A) and logσ=GCN(H(1),A) (μ, σ, and ϵ are the mean, standard deviation, and
standard sample of the Gaussian distribution, respectively). (ii) Decoder component: In this
step, an inner product (⊙) is used to reconstruct the adjacency matrix as A˜=⊙(ZZ⊤) where Z is
the matrix of distributions z. (iii) Detector component: This step aims to represent the latent
information and classify the news. It is defined as S=MP(Z) where MP stands for the mean-
pooling operator. Finally, the output layer of this model is defined
as yˆ=Softmax(SW+b) where W is the parameter matrix of the fully connected layer.
6. Discussion
6.1. Discussion on GNNs∗-based methods
Previous studies for fake news detection models based on GNNs∗ are compared in Table 7.
Table 7
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
– Focus on
analyzing the news – Enable to
content, users, integrate – Only
social structure heterogeneous implement with
Monti et al. [3] and propagation Hinge loss data the binary
using geometric – Obtain very high classification
deep learning performance with task
– Binary big real data
classification task
– Focus on
– Can solve spam
capturing the
problems like
global and local
adversarial
contexts of the – Only
actions and
news implement with
Regression scalability
!GAS [123] – Integrate graphs the binary
loss – Obtain high
of homogeneous classification
performance with
and task
large-scale data
heterogeneous
– Can apply to
– Binary
online news
classification task
– Focus on
capturing the – Use a very
propagation broad definition
features of the – Can apply to the – Apply to a
Marion et al. [153] news using NA non-URL-limited single data
geometric deep news source
learning – Not high
– Binary generalizability
classification task
GCAN [14] – Focus on the Cross- – Can early – Apply to a
relation of original entropy detection single data
tweet and retweet loss – Can detect a source
and the co- tweet story as – Not high
influence of the fake using only generalization
user interaction short-text tweet
and original tweet without needing
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
– Use the dual co- user comments
attention and network
mechanism structure
– Binary – Explainable of
classification task fake reasons
– Focus on the
– Not promising
temporal features
performance
of the network
Cross- – The data is
structure without – Can apply to
Pehlivan et al. [194] entropy split not
considering any metadata
loss reasonable for
textual features
training, testing,
– Binary
validation
classification task
– Focus on
analyzing features
– Have an early
related to
detection
dispersion and
mechanism
propagation of the
– Can detect
news
Cross- rumors in real-
– Construct a top- – Not high
*Bi-GCN [16] entropy time
down graph to generalization
loss – Obtain much
learn rumor spread
higher
and a bottom-up
performance than
graph to capture
state-of-the-art
rumor dispersion
methods
– Multi
classification task
*GCNSI [196] – Focus on Sigmiod – First model – Have to
identifying cross- based on multiple retrain the
multiple sources of entropy sources of the model if the
rumor without any loss rumor graph structure
knowledge related – Improve the is changed
to news performance of – Take quite
propagation the state-of-the- much time to
– Improve the art methods by train and obtain
previous GCN about 15% suitable
models by parameters
modifying the
enhanced node
representations
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
and loss function
– Multi
classification task
– The first semi-
supervised model
focus on
continuously
integrating both
methods of
feature-based and
– Obtain superior – Use simple
propagation-based Cross-
!GCNwithMRF [201 effectiveness BoW for
– Use the deep entropy
] – Can ensure features
learning model loss
convergence representation
with a refined MRF
layer on directed
graphs to enable
the end-to-end
training
– Multi
classification task
Open in a separate window
Table 9
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
– Focus on
combining features
related to text and
users – Evaluated by a
Cross- – Enable for more
*Malhotra – Use the geometric limited dataset
entropy efficiently features
et al. [147] deep learning with – Overfitting test
loss extraction
RoBERTa-based error
embedding
– Multi classification
task
!FauxWard [149] – Focus on features Cross- – Obtain a – Not directly
related to the entropy significant analyze the
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
linguistic and
semantic of the user
comments and the
user network
structure performance content of the
– Use the geometric loss within a short news containing
deep learning on a time window an image-centric
user comment
network
– Binary
classification model
– Focus on depth
integrating of
contextual
– Have an early
information and
detection
propagation
mechanism
structure – Random split for
– Can create a
– Use multi-head Cross- validation data
better semantic-
KZWANG [156] attention entropy and manual split
integrated
mechanism to loss for training,
representation
create contextual testing data
– Improve
representation
performance
without extracting
significantly
any features
– Multi classification
model
*GraphSAGE [125] – Focus on Cross- – High – Can reduce the
determining entropy generalization for performance if
patterns loss unseen data not use full
propagation-based – Reduce the information of
characteristics and detection error of post (original and
information related state-of-the-art response) in the
to the content, methods down to spread process
social network 10%
structure, and delay – Efficiently
time integrate features
– Use a graph related to the
embedding whole propagated
technique to post
integrate
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
information of graph
structure and node
features
– Multi classification
model
– Focus on using
features related to – Can create – Not high
the content of news better word generalization
text representations – No suitable
Bert-GCN – Improve the other – Can improve the augmentation
NA
Bert-VGCN [150] GCN-based models performance of data to improve
using BERT-based the conventional features
embeddings GCN method extraction and
– Multi classification significantly avoid overfitting
model
– Focus on
information of text
content, spread – Obtain high
– Strong depend
time, social network efficacy in early
on the full
structure detection
Cross- information of
– Construct – Can improve the
*Lotfi [204] entropy both original
weighted graphs performance of
loss tweet and
based on users the state-of-the-
response tweets
interaction in art methods
of conversations
conversations significantly
– Binary
classification model
– Focus on capturing
the information of – Optimal capture
users interactions of user’s – Not high-
– Improve the interactions performance
Cross-
conventional GCN – Capture better generalization due
*SAGNN [151] entropy
models by adding the different to only comparing
loss
one or more features between with one baseline
aggregation layer rumors and non- method
– Multi classification rumors
model
*EGCN [154] – Focus on fully NA – Can obtain – Not high
extracting features comparable generalization
related to text performance or
Loss
Method [Ref] Critical idea Advantage Disadvantage
function
content and
better than
structure
machine learning
– Construct
methods
weighted graphs of
– Can use the
source-replies
information of the
relation for
global and local
conversations
structure
– Binary
simultaneously
classification model
Open in a separate window
We presented the main steps, advantages, and disadvantages of GCN-based methods for fake
news detection. In our assessments, methods such as [3], [14], [16], [123], [191], and [196],
show the best efficiency, where two methods are used for fake news detection, two for rumor
detection, and two for spam classification. Regarding the two papers in the first
category, [3] was the first to apply GCNs for fake news detection. This method focuses on
extracting user-based, network-based, and linguistic-based features to build propagation-based
heterogeneous GCNs. The authors determined that this proposal can obtain a more promising
result than content-based methods. Conversely, [14] is an enriched GCN with a dual coattention
mechanism. This method uses user-based and linguistic-based features to construct
homogeneous GCNs with a dual coattention mechanism. In our assessment, although [14] used
dual coattention mechanisms, the efficiency was still lower than that in [3]. Noticeably, this
result is attributable mainly to more features being extracted by [3] than by [14]. Additionally,
the graph structure used in [3] was evaluated as better than the structure used in [14]. Moving
forward, we hope to improve the performance of fake news detection methods by building
dual coattention heterogeneous GCNs using user-based, network-based, and linguistic-based
features simultaneously. For the two papers in the second category, both methods were built
to detect rumors by propagation-based GCNs. The difference is that [16] constructed
bidirectional GCNs to capture the rumor dispersion structure and rumor propagation patterns
simultaneously. Meanwhile, [196] created unidirectional GCNs based on the information of
multiorder neighbors to capture rumor sources. In our view, [16] can outperform [196] because
rumor detection, rumor propagation, and dispersion are more critical than rumor sources. For
the two papers in the last category, [123], [191] also proposed similar methods for spam
detection using social context-based GCNs. The different points are that [123] built a model
integrating heterogeneous and homogeneous graphs to capture both local and global news
contexts. In contrast, [191] constructed only one heterogeneous graph to capture the general
news context. In our opinion, the model presented in [123] is more comprehensible, can be
reimplemented, and yields slightly better results than the method in [191]. The reason for this
result is that building each type of graph is suitable for the capture and integration of each type
of context, which can capture the news context more comprehensively than constructing one
graph for all contexts. Thus, when building fake news detection models based on GNNs,
different graphs should be constructed to capture each specific type of information and then
perform the fusion step. This approach promises to provide better performance than building
one type of graph to capture all types of information. We maximally limit the construction of a
general graph and then divide it into specific types because the breakdown of the graph can
easily result in the loss of information on the relationship among edges.
We presented the main steps, advantages, and disadvantages of the two methods in the AGNN
and AGE categories for fake news detection. Evidently, [17] presented a more detailed fake
news detection method than [192]. Additionally, the method in [17] was proposed after that
in [192]; thus, it is better than [192]. For example, [192] constructed a homogeneous graph,
whereas [17] created a heterogeneous graph. The heterogeneous graph was evaluated as
superior to the homogeneous graph because it can capture more meaningful information.
Therefore, it obtains better results than [192]. Meanwhile, the Lin et al. [124] method uses a
conventional GCN variant to encode the latent representation of graphs. This method can
capture the entire structural information efficiency. It can thus enrich traditional GCNs by
adding two more components, namely, the decoder and detector. However, this study focused
only on user-based and linguistic-based features, ignoring network-based features; therefore,
the desired effect is not expected.
7. Challenges
7.1. Fake news detection challenges
Based on recent publications in the field of fake news detection, we summarized and classified
challenges into five categories, where each category of challenge corresponds to one category
of fake news detection. The details of each type of challenge are shown in Fig. 6. The following
presents significant challenges that can become future directions in fake news detection.
Fig. 6
List of challenges of fake news detection.
Deepfake [214] is a hyperrealistic, digitally controlled video that shows people saying or doing
things that never truly happened or composite documents generated based on artificial
intelligence techniques. Given the sophistication of these counterfeiting techniques,
determining the veracity of the public appearances or influencer claims is challenging owing to
fabricated descriptions. Therefore, Deepfake currently poses a significant challenge to fake
news detection.
The hacking of influencers’ accounts to spread fake news or disinformation about a speech by
celebrities themselves is also a unique phenomenon in fake news detection. However, this
information will be quickly removed when the actual owner of these accounts discovers and
corrects them. However, at the time of its spread, this information causes extremely harmful
effects. Instantly detecting whether the posts of influencers are fake has thus become an
important challenge.
News may be fake at one point in time and real at another. That is, the news is real or fake,
depending on the time it is said and spread. Therefore, real-time fake news detection has not
yet been thoroughly addressed.
Constructing benchmark datasets and determining the standard feature sets corresponding to
each approach for fake news detection remain challenges.
Kai Shu et al. [215] constructed the first fake news detection methods by effectively extracting
content, context, and propagation features simultaneously through four embedding
components: news content, news users, user-news interactions, and publisher news relations.
Then, these four embeddings were fed into a semisupervised classification method to learn a
classification function for unlabeled news. In addition, this method can be used for fake news
early detection. Ruchansky et al. [28] constructed a more accurate fake news prediction model
by extracting the behavior of users, news, and the group behavior of fake news propagators.
Then, three features were fed into the architecture, including three modules as follows: (i) use
a recurrent neural network to capture the temporal activity of a user on given news via news
and propagator behaviors; (ii) learn the news source via user behavior; and (iii) integrate the
previous two modules for fake news detection with high accuracy. From this survey of
literature, we see that the most effective approaches combine features regarding content,
context, and propagation. Although these combination methods may have high complexity
regarding the algorithms used, the many extracted features, and high feature dimensions, they
can simultaneously capture various aspects of fake news. Therefore, the most efficacious and
least costly extraction of content, propagation patterns, and users’ stance simultaneously is not
only a promising solution but also a significant challenge for fake news detection.
7.2. Challenges related to graph neural networks
Based on studying the related literature, this section summarizes some challenges of GNN-
based methods and then identifies possible future directions.
Most conventional GNNs utilize undirected graphs and edge weights as binary values (1 and
0) [216] unsuitable for many actual tasks. For example, in graph clustering, a graph partition is
sought that satisfies two conditions:
(i) the difference between the weights of edges among unlike groups is as low as possible; (ii)
the difference in the weights of edges among similar groups is as high as possible.
Here, if the weight of the edges is a binary value, the given problem cannot be solved using this
graph. Therefore, future studies can construct graphs with the weights of edges as the actual
values representing the relationship among the nodes as much as possible.
For NLP tasks, GNNs have not represented node features by capturing the context of a
paragraph or an entire sentence. Alternatively, these methods have also overlooked the
semantic relationships among phrases in the sentences. For example, for sentiment
classification tasks, we have the sentence “The smell of this milk tea is not very fragrant.” This
sentence includes a fuzzy sentiment phrase, namely “not very fragrant”. Some approaches
classify this sentence as expressing a positive sentiment because they only focus on “fragrant”,
ignoring the role of both “not” and “very”, whereas other models determine the expression as a
negative sentiment because they ignore the impact of “very”. Therefore, future directions for
improving GNN-based models should focus on determining node features based on sentence
embeddings or significant phrase embeddings.
GNN-based fake news detection is relatively new. Thus, the number of published studies is
limited. Although we did not implement methods presented in the 27 studies on the same
datasets and did not evaluate their efficiency on the same comparison criteria, the 27 papers
surveyed here show that this method initially obtained excellent results. Additionally, many
challenges need to be addressed to achieve more comprehensive results, which we discussed
at the end of the corresponding sections. Nonetheless, given the 27 surveyed papers, promising
results are expected in the future. By addressing these challenges, we hope to improve the
effectiveness of GNN-based fake news detection. The following paragraphs analyze some
challenges for GNN-based fake news detection and discuss future directions.
Benchmark data: Recently, some researchers have argued that when training a system, data
affect system performance more than algorithms do [221]. However, in our assessment, we had
no graph benchmark data for fake news detection in the graph learning community. Graph-
based fake news detection benchmarks may present an opportunity and direction for future
research.
Compatible hardware: With the rapid growth of Deepfake, graphs to represent these data will
become more complex. However, the more scalable GNNs are, the higher the price and
complexity of the algorithms is. Scientists often use graph clustering or graph sampling to solve
this problem, ignoring the information loss of the graph using these techniques. Therefore, in
the future, graph scalability may be solved by developing dedicated hardware that fits the
graph structure. For example, GPUs were a considerable leap forward in lowering the price and
increasing the speed of deep learning algorithms.
Fake news early detection: Early detection of fake news involves detecting fake news at an
early stage before it is widely disseminated so that people can intervene early, prevent it early,
and limit its harm. Early detection of fake news must be accomplished as soon as possible
because the more widespread fake news is, the more likely it is that the authentication effect
will take hold, meaning that people will be likely to believe the information. Currently, for fake
news early detection, people often focus on analyzing the news content and the news context,
which leads to three challenges. First, new news often appears to bring new knowledge, which
has not been stored in the existing trust knowledge graph and cannot be updated immediately
at the time the news appears. Second, fake news tends to be written with the same content but
with different deceptive writing styles and to appear simultaneously in many various fields.
Finally, limited information related to news content, news context, news propagation, and
latent information can adversely affect the performance of GNN-based detection methods.
Dynamic GNNs: Most graphs used in the current GNN-based fake news detection methods
have a static structure that is difficult to update in real time. In contrast, news authenticity can
change continuously over time. Therefore, it is necessary to construct dynamic graphs that are
spatiotemporally capable of changing with real-time information.
Heterogeneous GNNs: The majority of current GNN-based fake news detection models
construct homogeneous graphs. However, it is difficult to represent all the news texts, images,
and videos simultaneously on these graphs. The use of heterogeneous graphs that contain
different types of edges and nodes is thus a future research direction. New GNNs are suitable
for heterogeneous graphs, which are required in the fake news detection field.
Multiplex GNNs: As analyzed in Section 7.2, most GNN-based fake news detection approaches
have focused on independently using propagation, content, or context features for
classification. Very few methods have used a combination of two of the three features. No
approach uses a hybrid of propagation, content, and context simultaneously in one model.
Therefore, this issue is also a current challenge in fake news detection. In the future, research
should build GNN models by constructing multiplex graphs to represent news propagation,
content, and context in the same structure.
Footnotes
1
[Link]
2
[Link]
3
[Link]
4
[Link]
5
[Link]
6
[Link]
7
[Link] research/lt/resources/satire/.
8
[Link] sfu-discourse-lab/MisInfoText.
9
[Link]
10
[Link]
11
[Link]
Fact-checking: The English dataset with 221 statements regarding society and politics was
collected from online streaming.
EMERGENT: The English dataset with 300 claims and 2595 associated article headlines
regarding society and technology were collected from online streaming and Twitter.
Benjamin Political News: The English dataset with 225 stories regarding politics was collected
from online streaming from 2014 to 2015.
Burfoot Satire News7 : The English dataset with 4233 news articles regarding economy, politics,
society, and technology was collected from online streaming.
MisInfoText8 : The English dataset with 1692 news articles regarding society was collected from
online streaming.
Ott et al.’s dataset: The English dataset with 800 reviews regarding tourism was collected from
TripAdvisor social media.
FNC-1: The English dataset with 49,972 articles regarding politics and society were collected
from online streaming.
Fake_or_real_news: The English dataset with 6337 articles regarding politics and society was
collected from online streaming.
TSHP-17: The English dataset with 33,063 articles regarding politics was collected from online
streaming.
QProp9 : The English dataset with 51,294 articles regarding politics was collected from online
streaming.
NELA-GT-201810 : The English dataset with 713,000 articles regarding politics was collected from
online streaming from February 2018 to November 2018.
TW_info: The English dataset with 3472 articles regarding politics was collected from Twitter
from January 2015 to April 2019.
FCV-2018: The dataset, including 8 languages with 380 videos and 77,258 tweets regarding
society, was collected from three social networks, namely YouTube, Facebook, and Twitter from
April 2017 to July 2017.
Verification Corpus: The dataset including 4 languages with 15,629 posts regarding 17 society
events (hoaxes) was collected from Twitter from 2012 to 2015.
CNN/Daily Mail: The English dataset with 287,000 articles regarding politics, society, crime,
sport, business, technology, and health was collected from Twitter from April 2007 to April
2015.
Tam et al.’s dataset: The English dataset with 1022 rumors and 4 million tweets regarding
politics, science, technology, crime, fauxtography, and fraud/scam was collected from Twitter
from May 2017 to November 2017.
FakeHealth11 : The English dataset with 500,000 tweets, 29,000 replies, 14,000 retweets, and
27,000 user profiles with timelines and friend lists regarding health were collected from Twitter.
Data availability