0% found this document useful (0 votes)

20 views11 pages

5 paper

This document reviews methodologies for detecting multimodal fake news using deep learning and fusion mechanisms. It highlights the challenges of current detection methods that often overlook the interaction between different modalities, leading to lower accuracy. The study discusses various factors influencing detection, including deep learning algorithms, data analysis methods, and fusion techniques, while also addressing the limitations of existing research and suggesting future directions.

Uploaded by

squad4005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views11 pages

5 paper

Uploaded by

squad4005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Engineering, Technology & Applied Science Research Vol. 14, No.

4, 2024, 15665-15675 15665

Deep Learning and Fusion Mechanism-based

Multimodal Fake News Detection
Methodologies: A Review
Iman Qays Abduljaleel
Software Department, College of Information Technology, University of Babylon, Ιraq | Department of
Computer Science, College of Education for Pure Sciences, University of Basrah, Iraq
[Link]@[Link] (corresponding author)

Israa H. Ali
Software Department, College of Information Technology, University of Babylon, Iraq
israa_hadi@[Link]
Received: 21 May 2024 | Revised: 1 June 2024 | Accepted: 3 June 2024
Licensed under a CC-BY 4.0 license | Copyright (c) by the authors | DOI: [Link]

ABSTRACT
Today, detecting fake news has become challenging as anyone can interact by freely sending or receiving
electronic information. Deep learning processes to detect multimodal fake news have achieved great
success. However, these methods easily fuse information from different modality sources, such as
concatenation and element-wise product, without considering how each modality affects the other,
resulting in low accuracy. This study presents a focused survey on the use of deep learning approaches to
detect multimodal visual and textual fake news on various social networks from 2019 to 2024. Several
relevant factors are discussed, including a) the detection stage, which involves deep learning algorithms,
b) methods for analyzing various data types, and c) choosing the best fusion mechanism to combine
multiple data sources. This study delves into the existing constraints of previous studies to provide future
tips for addressing open challenges and problems.

Keywords-misinformation; attention mechanism; fusion methods; social media; vision transformer

I. INTRODUCTION
Over the past few decades, fake news has become
ubiquitous to the point of deceiving the public. When this kind
of information becomes available, it causes social divisions and
suspicions in the ruling environment and among individuals [1-
3]. When data about a specific event (correct or incorrect) are
Fig. 1. Google fake news trends [10].
disseminated, they changes people's beliefs, typically
emphasizing certain prejudices. Furthermore, deceptive or Social networks have become an ideal setting for the spread
manipulative news seeks to feed widespread ignorance and of rumors, threatening network order, people's health, and
greed to benefit individuals or groups at the expense of society social stability [12-13]. Social networks and live streaming
[4]. Recently, many social networks have become the first platforms have become an essential part of daily life. Several
choice for transmitting knowledge and exchanging information dictionaries have defined the term fake news [14], which can
and events, providing platforms for sharing opinions and be defined more broadly based on its authenticity or intent [15].
beliefs with others around the world [5-6]. Several studies have One possible explanation for the widespread transmission of
focused on fake news detection. As a result, specific fake news is a lack of basic knowledge and skills within the
components have been developed, using some classic datasets, population. The public is not informed of the legitimacy of the
to provide insight into their issue of interest [7]. Some information sources and the veracity of the news it reads.
distinctive examples of fake news are the "Zinoviev Letter" [8], Another factor is that there is a lack of automatic fact-checking
the fake news on the 2016 elections in the United States [9- procedures. Although few websites have made significant
10], and the untrue environmental report on the spread of fires efforts to detect fake news, most of them rely on time-
in the Amazon rainforest in 2018 [11]. consuming manual methods. It is too difficult to prevent fake
news since the extensive use of social networks allows the fast
propagation of disinformation [7, 16].

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15666

Fake news detection is an ongoing study subject that can be statistical NLP models can be implemented to determine the
interpreted from several angles. It aims to mitigate the negative relevance of these aspects, and so gain a greater understanding
effects of such news by creating a system that recognizes it of the model. In contrast, it is more difficult to explain what
using techniques, such as Machine Learning (ML), language occurs in a neural network model. Much of the analytical work
proficiency, optimization algorithms, Deep Learning (DL), and therefore seeks to understand how language ideas, often used as
others [5, 7]. However, since ML-based systems have several features in NLP systems, are captured in neural networks. NLP
constraints, involvig generating a large training dataset and techniques employ attention mechanisms to increase text
selecting appropriate features to best capture the deception, DL classification accuracy. The attention model aims to improve
algorithms have been applied to detect fake news. In particular, efficiency by predicting the result based on only a few words of
attention mechanisms have emerged as one of the most potent the input series rather than the complete phrase [19].
strategies in Natural Language Processing (NLP). They are Furthermore, the development of pre-trained language models
primarily used alongside Recurrent Neural Networks (RNNs) (e.g., BERT, RoBERTa, and GPT) and their utilization in NLP
to anticipate the most significant information in an input has opened up new ways to categorize fake news [18].
sequence, either textual or visual [17]. Fake news providers
frequently employ written content and visuals or distort facts to TABLE I. ABBREVIATIONS USED IN THIS PAPER
appeal to readers' psychology and entice and mislead them, Abbreviation Description
allowing for quick diffusion. In general, themes on social CNN Convolutional Neural Network
hotspots or disputes include detailed textual descriptions of ResNet Residual Neural Network
their emotional expression and visual influence on pictures [9]. RNN Recurrent Neural Network
Multimodal knowledge is more difficult to handle than single- ViT Vision Transformers
modal knowledge since it requires information fusion BERT Bidirectional Encoder Representation of Transformer
procedures. Data fusion, decision-making, features, and other LSTM Long Short-Term Memory
NLP Natural Language Processing
approaches are examples of information fusion. These GPT Generative Pre-trained Transformers
approaches contain two steps: combining data, information, POS Parts of Speech Tagging
and features from multiple data sources and then processing TF-IDF Term Frequency Inverse Document Frequency
them. As a result, they can provide an additional accurate and BoW Bag of Words
reliable data representation [18]. GRU Gated Recurrent Unit
A Lite Bidirectional Encoder Representation of
Table I portrays the most important abbreviations used in ALBERT
Transformer
this paper. This study explored recent suggestive literature on Decoding-enhanced Bidirectional Encoder
fake news detection. In particular, the former focused on DeBERTa Representation of Transformer with Disentangled
developing detection systems based on specific characteristics Attention
Robustly optimized Bidirectional Encoder
of multimodal fake news. The papers were obtained by RoBERTa
Representation of Transformer Pretraining approach
searching for the keywords "fake news" through the search VGG Visual Geometry Group
engines observed in Table II. Several review studies exist in MLP Multi-Layer Perceptron
this domain, as evidenced in Table III. The main contributions DenseNet Densely Connected Convolutional Networks
of this review study can be summarized as: Glove Global Vectors

 Provides knowledge about the specific fake news attributes TABLE II. SEARCH ENGINES
and their corresponding terms.
Number of Selected
Search engine Type
 Focuses on detecting multimodal fake news and explaining results references
these system's methods to compare them in all stages from [20, 21] Journals
ACM Digital Library 10,375
the perspective of description to detection. [22] Conference
Science Direct 1043 [23, 24] Journals
 Focuses briefly on the DL methods deployed in fake news Google Scholar 7854 [9, 25-26] Journals
detection models, such as attention mechanisms, CNN, ResearchGate 216 [16, 27] Journals
ResNet, etc. Scopus 361 [18, 28-30] Journals
[31-33] Conferences
IEEE 75
II. NATURAL LANGUAGE PROCESSING [34-36] Journals
MDPI 79 [11, 37] Journals
NLP systems include morphological traits, lexical classes, Springer 102 [38] Journal
syntactic categories, semantic connections, etc. In principle,
TABLE III. EXISTING REVIEWS ON DETECT MULTIMODAL FAKE NEWS
Word Fusion Deep Learning Techniques
Reference Datasets
Embedding Mechanism CNN RNN ViT Attention BERT LSTM
[1] √ √ √ √ √ × √ √ √
[2] √ √ × √ √ × √ √ √
[3] √ √ √ √ √ × √ √ √
[39] √ √ × √ √ × √ √ √
[14] √ √ × √ √ × √ √ √
This study √ √ √ √ √ √ √ √ √

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15667

Within NLP, the word and token features are similar. When not analyze the full image at once, instead it concentrates on
applying ML algorithms to extract text, it is critical to identify individual areas. This allows the concentrated areas of the
the best features. The goal of identifying these traits is to human visual space to be experienced in high resolution, while
develop effective indications that can be generalized for text the surroundings appear in low resolution. Instead of analyzing
classification. Some of them are mentioned below [40]. the entire vision space, the brain can examine and narrow down
the most important elements in a precise and efficient manner.
 n-grams is used to record the dependencies between all This aspect of human eyesight led researchers to design the
words that appear sequentially in a sentence structure. attention mechanism [42]. Attention mechanisms work by
However, n-grams does not maintain the syntactical or assigning varying weights to various types of information.
semantic relationships of the words. Thus, assigning more weight to important information draws
 Parts Of Speech (POS) Tagging: POS tagging distinguishes the focus of the DL model. Attention mechanism methods can
the grammatical meaning of words in a sentence putting be classified based on four criteria [16]:
into service particular tags, such as noun, pronoun, verb,  Softness of attention (deterministic attention): To generate
adjective, adverb, conjunction, etc. the final context vector, the network calculates the average
 TF-IDF: Its value increases linearly with the number of of each input weight item. The context vector is a high-
times a word appears in the document, but is offset by the dimensional vector that represents the components or
term's frequency in the body. Although this vectorization is sequence of the input factors, and the attention mechanism
effective, the semantic meaning of the words is lost in the generally seeks to add more contextual information to the
attempt to convert them to digits. final context vector. Hard attention (stochastic attention)
computes the final context vector by choosing pieces
 BoW: This approach treats a single news story as a arbitrarily from the sample set. This decreases the
document and calculates the frequency count of each word computation time. In addition, global and local attention is
to provide a numerical representation of the data. Along often deployed in computer vision tasks. Global attention is
with data loss, this strategy has other drawbacks. The like soft attention in that it evaluates all input items.
relative position of the words is ignored and contextual However, the former improves soft attention by using the
information is removed. This loss can sometimes be output of the current time step rather than the previous one,
significant when weighed against the benefits of processing while local attention combines soft and hard attention. This
a pleasant level of usage. technique evaluates a subset of input components at a time,
overcoming the drawback of hard attention (i.e., being non-
 Word2Vec provides a set of model designs and
differentiable) while remaining computationally efficient.
optimizations to extract word embeddings from large
datasets. Word embeddings learned through Word2Vec are  Attention mechanisms' ability can be classified according to
more effective in collecting word semantics and leveraging their input requirements: item-wise and location-wise. Item-
word relatedness. wise attention necessitates inputs that are directly known to
the model or generated through pre-processing. However,
III. DEEP LEARNING (DL) location-wise attention is not implied because the model
DL networks are given sensory information such as texts, must deal with difficult-to-distinguish input objects.
photographs, movies, or sounds to simulate the human learning
process. These networks outperform other cutting-edge  Attention models can work with single and multiple inputs.
approaches in several tasks, and as a result, the field has The overall processing strategy for the inputs varies
expanded enormously [41]. CNN, RNN, LSTM, and GRU are between the created models. Most contemporary attention
some of the conventional DL models used to identify fake networks utilize a single input and process it in two separate
news. CNN-based techniques can extract relevant information sequences (i.e., a distinctive model). Certain connections
from tiny areas but are incapable of dealing with larger exist within sources when recognizing multimodal systems
structural links. Time-series techniques examine the sequential (including images and text). Rather than simply splicing
spread of misinformation using temporal structural elements source features, the co-attention method is followed to
while ignoring the broader structural characteristics of fake simulate intense interactions between source features via
news. More importantly, these approaches cannot recognize sharing information and generates an attention-pooled
many modes concurrently. For example, existing designs limit feature for one modality (e.g., text) based on another one
the ability to expand the detection to other modalities. Current (e.g., image). The similarity of data pairs between sources
fusion algorithms are not particularly sophisticated and cannot is utilized to link them. A self-attention network computes
effectively integrate multi-modal advantages while avoiding attention solely based on model input, reducing the reliance
noises offered by other sources [34]. Transfer learning has on external data. This improves the model's performance in
proven to be indispensable in DL training, as it transfers images with complicated backgrounds by focusing more on
previously learned context knowledge to new designs that certain locations. The hierarchy attention mechanism
solve different issues [19]. computes weights based on the initial input and several of
its levels. This attention mechanism is often referred to as
A. Attention Mechanisms fine-grained attention in image classification.
Attention mechanisms try to deal with input in the same
way as the human brain/vision would. Human eyesight does

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15668

 Attention structures usually utilize a single output form. It embedding location. This increases learning effort in the
processes one characteristic at a time and calculates weight relevant step, but additional efforts could be almost completely
ratings. There are two more multidimensional and multi- avoided given the number of trainable parameters in the
head attention systems. Multi-head attention evaluates encoder [16]. In certain recent-related tasks, BERT-based
inputs linearly in several groups before combining them to models outperform RNN and CNN networks. The Swin
compute the final attention weights. This is especially transformer broadens the usefulness of the transformer,
advantageous when deploying the attention mechanism in transferring its outstanding performance to visual surroundings,
conjunction with CNN approaches. Multidimensional addresses the shortage of CNNs for global information feature
attention, which is mostly employed for NLP, calculates extraction, and, with its unique window mechanism,
weights utilizing a matrix representation of the substantially reduces the computational cost of self-attention
characteristics rather than vectors. and solves the challenge of secured token scale, which has
become the general core of computer vision research [37].
Different types of attention mechanisms for computer ALBERT is a more portable form of BERT to address the
vision can be classified into different categories [36]: drawbacks of the huge number of parameters and the lengthy
 Channel attention: This category assumes that in deep training time [44]. DeBERTa is an improved BERT with
CNNs, distinct channels in various feature maps frequently disentangled attention and has two new features. First, the
represent various objects. As a result, channel attention is model suggests a disentangled attention. In DeBERTa, each
responsible for automatically calibrating the weight of each token in the input is represented by two separate vectors that
channel. encode its word embedding and place. Attention weights
among words are acquired utilizing disentangled matrices in
 Spatial Attention: This category is similar to channel this paired form. Second, an Enhanced Mask Decoder (EMD)
attention. In this case, the attention mechanism is is employed to forecast the masked tokens during the pre-
responsible for flexibly calibrating the weight of each part training phase. Although BERT depends on relative places,
of the image. This system functions as an adaptive spatial EMD enables DeBERTa to make more accurate predictions
area selection process, selecting where to focus. since the syntactic functions of words are greatly influenced by
their current location within the sentence. In an equivalent
 Temporal attention: This category considers data to have a
time component. Thus, in computer vision tasks, this form spirit, the BERTweet approach shares a similar architecture to
BERT and was trained adopting the RoBERTa pre-training
of attention mechanism is commonly used for video
process [45]. Vision transformers break the image into 2D
analysis. This system operates as a dynamic temporal
patches and feed them into the framework. However, vision
selection process, selecting when to pay attention.
transformers face several hurdles, including computational
 Branch attention: This category covers multi-branched DL cost, dimensions, scalability to huge datasets, understanding,
architectures. Branch attention is to adapt to the weight of resilience to adversarial attacks, and generalization accuracy
each branch. This mechanism functions as a dynamic [46].
branch selection process, deciding which branches to pay
attention to. IV. FAKE NEWS DETECTION
Fake news detection models can be categorized according
 Channel and spatial attention: This approach functions as a to the following strategies: Strategies based on knowledge,
dynamic spatial area and object choice procedure, deciding
features, and modality [47]. From a knowledge viewpoint, an
what and where to focus attention.
impartial fact-checker reviews news stories and assigns an
 Spatial and temporal attention: This system functions as a actual value to statements. The three kinds of fact-checking are
dynamic geographic area and time-frame process to select expert-oriented, assessing the accuracy of information by
where and when to focus. relying on domain-matter experts who analyze data and
documents and draw conclusions, crowd-sourcing-oriented,
B. Transformers allowing users to discuss and comment on the accuracy of
Transformers primarily deploy the self-attention specific news resources, and computational-oriented, an
mechanism to extract fundamental characteristics and have intelligent system that classifies a news item as having true or
enormous promise for widespread use in AI [43]. false matter.
Transformers, compared to RNNs, can attend to full sequences,
AI-based algorithms to detect fake news rely on a variety of
and thus learn long-term connections. Transformers parse text
important criteria, including content-based, network-based, and
in parallel implementing a powerful attention mechanism,
user-based attributes. However, combining all these variables
producing complex and meaningful word descriptions. This
may not increase the classifier's performance. Many studies
approach looks at the relationships between textual phrases or
relied only on content features or content-based characteristics
entities. Many competing models of neural pattern transmission
(textual and visual) in conjunction with using additional
contain an encoder-decoder component. The encoder turns an
characteristics to detect fake news. Existing fake news
endless flow of symbols from the input to a continuous output.
identification research is divided into two groups: single-modal
The decoder then generates an output series involving one
and multimodal.
symbol at a time, using the encoder's continuous form [44].
BERT is an encoder layer with a transformer design. Instead of
a static periodic function in the transformer, BERT learns the

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15669

A. Single-Modal Fake News Approaches TABLE IV. IMPORTANT MULTIMODAL DATASETS

In general, text and image characteristics can be employed Dataset Description Study
to detect fake news alone, while other features are typically Contains 15000 items, including 176 images
deployed as supplementary to help identification. The single- MedialEval (2015) in 5,008 real news tweets and 185 misused [33]
images in 7032 fraudulent tweets.
model-based technique utilizes only one characteristic to It contains 7898 fake news, 6026 real news, [9, 11, 21,
detect. In [48], the relevance of an image component was used Twitter (2016)
and 514 images. 23, 30, 36]
for automatic false news detection on social media to address Contains 17,000 unique tweets on various
this issue. It has been established that authentic and false news MediaEval (2016) events. One-third are real and the remaining [32]
events have different image distribution patterns. In [49], a co- are fake news.
attention technique was followed to identify the top K most [9, 23, 27,
Weibo (2017) Consists of 4749 fake and 4779 real news.
33, 36, 37]
significant phrases in a news story and the top K most Multimodal standard dataset of 1,063,106
important user evaluations for the final classification. In [50], a Fakeddit (2019) [16, 38]
samples.
CNN-based capsule network model with pre-trained word News stories include text, news image link,
embeddings was implemented to classify false news in the Gossip (2020) publishing time, author name, and social [9, 21, 26]
ISOT and LIAR datasets. In [51], a generative model was media responses.
proposed to extract new patterns and aid in the identification of Contains text, news image location,
Politifact (2020) publishing time, and remarks made on [9, 18, 21]
fake news by examining previous relevant user reactions. In social networks.
[52], n-grams were applied with TF-IDF word embedding to All Data (2020) Contains 11,941 fake and 8,074 real news. [19, 34]
obtain content characteristics, and LSTM and BERT models Contains 2,029 news articles shared on
were trained to deal with contextual information. Then a ReCOVery (2020) social media, most of which (2,017) have [53]
feedforward neural network was utilized for classification. both textual and visual information.
However, this technique did not account for the complete use Contains a list of fake and accurate news
Twitter Indian Dataset
stories covered primarily from politics, [29]
of different textual characteristics. v3 (2021)
Bollywood, and religion.
B. Multi-Modal Fake News 20,000 articles from websites, including
Ti-CNN (2021) over 11,000 fake and more than 8,000 real [9, 21]
In general, social media postings featuring photos and news items.
graphics receive far more retweets and comments and spread Fake news sample by
45,569 news, 25,343 are real the remaining
much faster than those having only text. Images are spread Guilherme Pontes [20]
are fake news articles.
widely, captivating people's emotions, and expressing a sense (2021)
Twitter_database Includes 5 partitions to perform 5-fold
of reality. Images related to a post may have been edited or (2023) cross-validation.
[26]
simply taken out of context. It is not uncommon to distort
images for political or personal motives, as well as to use photo
editing software to change an image. As a result, when
analyzing both text and images, photo captions are critical to
identifying clickbait and false captions [29].
1) Datasets
Table IV lists multimodal datasets applied in various
studies, and Figure 3 describes the most popular dataset
dimensions.
2) Textual and Visual Preprocessing
Pre-processing in text starts with cleaning the input datasets
by extracting excess, extreme, and duplicative text parts. Word
embedding methods keep only meaningful tokens that are
transformed into vectors. The text is stemmed/lemmatized,
normalized, and tokenized. Stemming and lemmatization Fig. 2. Distribution of multimodal fake news datasets.
remove words and symbols without meaning. Normalization
transforms text into canonical form. Stemming cuts off the ends 3) Textual and Visual Feature Extraction and Selection
of input words to lower their inflection and convert them into
their core structures. In general, the canonical form of the Textual properties can be obtained at several levels in the
original input word is deployed. It is very important to hierarchy, such as word, sentence, and message. The most
normalize text in the web scope and social media data, as it basic lexical characteristics are the overall total of characters,
contains a lot of noise, such as abbreviations, misspellings, and the number of different words, the average length of words, and
words that are out of vocabulary. Image data are pre-processed so on. In the meantime, the semantics of linguistic
by reviewing that all URLs are correct. It is also important to characteristics, such as the proportion of first/third person
normalize the size of images and divide them for training and pronouns, the number of news detection by pooling and
testing. Textual and visual data are pre-processed individually attention blocks, and positive or negative emoji symbols, are all
and then merged to complete each instance in terms of its three accessible options. Unlike linguistic characteristics, syntactic
parameters: title, text, and vision [20]. features improve the aim of feature extraction to a significant

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15670

level: emotion score or part-of-speech labeling. Recently, vectors) to increase and multiply the exchanges between all
various complicated models, namely BoW, Word2Vec, and elements of both vectors. This process is more expressive.
other embedding techniques, have been used to recognize fake
news. Image extraction provides additional visual information.
Several studies employed the BERT pre-trained model to
extract text characteristics. However, the BERT model has
many parameters and a slower training speed. Furthermore,
visual and text characteristics are in separate semantic feature
spaces, resulting in heterogeneity [31].
4) Fusion Mechanism
The combination of textual content and images is one of the
widely utilized features for multimodal fake news detection.
The intuition behind this cue is that some fake news spreaders
deploy tempting images, e.g., exaggerated, dramatic, or
sarcastic graphics, that are far from the textual content to attract
users' attention. Information fusion techniques have an original
ability to manage input data with their multimodal nature.
Many experiments have proven the benefit of these techniques
and that their full exploitation leads to improved performance
[31, 38]. Several techniques combine textual and visual
information into a single representation, ignoring their
associations, which might lead to poor results. Fusion can be Fig. 3. Early and late fusion mechanism structures.
classified according to different times as follows [53]:
5) Model Evaluation Metrics
 Early fusion (feature fusion): Feature vectors from multiple
modalities are combined and fed into a model for A confusion matrix serves as the basis for evaluating a
prediction. Due to the fusion of pre-processed features from classification model. True Positives (TP) indicate news that
different modalities at the input layer, working with was projected to be true and was true, False positives (FP)
features with higher granularity becomes tedious (Figure 3). indicate news projected to be true but was fake, True Negatives
(TN) indicate news that was projected to be false and was
 Late fusion (decision-level fusion or kernel-level fusion) untrue, and False Negatives (FN) indicate news projected as
combines results from various modalities using summation, untrue but was accurate. [52]. The efficiency of a model is
maximization, averages, or weighted average methods. evaluated by [54-56]:
Most late fusion solutions employ handcrafted rules, prone
to human bias and far from real-world peculiarities (Figure (1)
3).
(2)
 Intermediate fusion (mid-fusion) involves combining units
from several modality-specific paths into a single shared (3)
layer. It is possible to create a representation layer either by
mapping multiple channels at the same time or by 1
!" # $%%
(4)
combining different modal sets at various levels. # $%% !"

Fusion mechanisms can be divided according to the C. Studies on Multimodal Fake News Detection
technology followed to merge textual and visual attributes in In [32], a scaled dot product attention mechanism was
[36]: implemented to capture the relationship between the text
features extracted by BERT and the image features extracted
 Simple operation-based: DL combines vectorized features by VGG-19. In [33], another model based also on BERT and
from several data sources using fundamental algorithms VGG19 was proposed, accepting both text and picture input.
such as concatenation or weighted addition. As models Subsequently, the pair of embedding was joined and subjected
based on DL techniques are trained concurrently, the to a multi-modal variation autoencoder to obtain the common
features of high-level standards could be extracted at a level latent representation. A multimodal cross-attention network
that accommodates both activities. Such processes often was designed to fuse the resulting features. In [23], four distinct
have minimal or no correlation factors. submodules made up a fake news system: feature fusion based
 Attention-based: Fusion often involves attention processing. on multi-modal factorized bilinear pooling, two attention
Different outputs are frequently used to provide different mechanisms, one for textual description combined with
sets of changing weights for summing, preserving more Stacked BiLSTM and the other for visual feature extraction,
information by merging the results from each peek. combined with multi-level CNN–RNN, and MLP for
classification. In [20], visual picture attributes were extracted
 Bilinear pooling-based: This is achieved by adding the using image captioning and forensic evaluation, and textual
external product of both vectors (text and image input hidden patterns were extracted employing a Hierarchical

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15671

Attention Network (HAN). In [21], a multi-modal coupled feature representation in news content. Then, the feature fusion
ConvNet architecture was presented, combining textual and layer combined these features with the help of the cross-modal
visual data modules from three datasets and utilizing a late attention module to promote various modal feature
fusion mechanism. In [58], a detection framework was representations to complement the information. In [29], a
proposed, deploying Word2Vec to fuse text input to the multi-modal DL technique was proposed to use and process
embedding layer and passing image input to a cross-modal visual and textual features, employing EfficientNet-B0 and a
attention residual and multi-channel CNN. The multi-channel sentence transformer. Feature embedding was performed on
CNN was implemented as a reducer to the amount of trash data individual channels, while fusion was performed on the last
produced by cross-modal fusion parts. In [9], an MCNN was classification layer. Late fusion was applied to mitigate the
proposed, considering the consistency of multi-modal data and noisy data generated by multi-modalities. In [11], TLFND was
capturing the overall characteristics of social media proposed, which was based on a three-phase feature-matching
information based on an early fusion mechanism. This model distance technique to detect fake news. An attention-guiding
used BERT in the text feature extraction module and the module was devised to assist in aggregating the cross-modality
attention mechanism with ResNet-50 in the visual semantic correlations and the aligned unimodal representations in an
feature extraction module. In [34], three modalities were effective and interpretable manner. In [30], a model based on
evaluated: text, image, and image attributes. Additionally, a transformers and multi-modal fusion was introduced. This
model based on dual attention fusion networks was applied to model extracts text and image features using different
combine features. Initially, the model extracts image (based on transformers, and fuse features implementing attention
ResNet-50 V2) and text modalities (based on BERT). In the mechanisms. In [18], a quantum-based standard was proposed
end, the features were combined to create a feature vector that for multimedia data fusion to identify fake news. This system
can be used for classification. In [28], news post images were extracted features in both textual and visual forms and sent
converted from their spatial dimension to their frequency them to the convolutional-quantum network to achieve
domain utilizing machine learning. Subsequently, a multi-layer classification.
CNN model was engaged to extract the characteristics of the
frequency picture, and MML was deployed to retrieve image- V. DISCUSSION
related web pages on Google. Simultaneously, MML uses the DL has begun to be strongly involved in multimodal fake
evidence veracity classification task to support the false news news detection systems at all stages, whether it is engaged in
detection task by selecting evidence. This part involved feeding extracting features of textual and visual inputs, in the
the evidence and the claim into a BERT-based encoder, mechanisms of fusing features extracted from multimodal data,
followed by learning evidence representations employing or in the classification of fake news. It is possible to detect fake
claim-evidence correlation representations. Ultimately, the co- news adopting these strategies but some restrictions limit their
attention process fuses the representations of the image with accuracy, involving the requirement for a huge dataset
relevant evidence. In [35], a model was presented based on two containing diverse data in all fields of life (political, economic,
principles, blocking and fusion. This model determined the technological, technical, and health, etc.), in addition to the
spatial and temporal location of the data in the fusion inability to fuse the extracted features efficiently, take
mechanism for the visual and textual attributes. In [37], text advantage of the most multimodal important features, and
features were extracted from bidirectional encoder measure the extent of interconnection between them. Some
representations of transformers, image features were extracted studies focused on a single social network, such as Twitter,
from Swin-transformers, and then deep autoencoding was used Weibo, or Facebook, but future fake news detection systems
as an early fusion technique by merging text and visual must be applicable on different websites and social networks to
attributes. In [38], the proposed framework was based on the acquire knowledge deeply and detect fake news quickly. Many
BERT and Xception models to learn visual and linguistic studies used BERT word embedding [9, 33-35, 37-38, 57] and
models. In [31], the ALBERT model was combined with a depreciated traditional techniques, such as GloVe [20-21] and
multi-modal circulant fusion technique to detect fake news. Word2Vec [24, 58] in their textual feature extraction model.
This system included a textual feature extractor (ALBERT), a BERT can discover the implicit associations within the
visual feature extractor (VGG-19), a feature fusion, a fake sentence words and texts in which the system is trained, but
news detector, and domain classification modules. In [26], that has not prevented a recent trend toward including derived
multimodal pre-processing of both words and images was models, like RoBERTa [11, 16], ALBERT [31], distilBERT
performed. Glove embedding and Word2vector approaches [29], and XLNet [18, 27]. Although all proposed multimodal
were deployed to extract the text characteristics and the fake news detection systems still use CNN [21, 25, 26], VGG
Adaptive Water Strider Algorithm (A-WSA) was applied to [31-33], and ResNet [34-35] neural networks in a visual feature
extract the best characteristics from both text and image data. extraction stage, there is a new strategy deploying ViT for
Feature fusion receives the optimized features, which are textual and visual feature extraction. Regarding fusion
obtained by the same A-WSA optimization process based on techniques, it is clear that in recent years there was no clear
the weight factor. Lastly, O-BiLSTM was utilized for fake interest in examining how to benefit from extracted features
news classification. In [27], a model based on BLIP (FNDB) and how to choose, as concatenation [32, 33, 38] of extracted
was proposed. XLNet and VGG-19-based feature extractors features is the common fusing operation in early or late fusion
were engaged to extract textual and visual feature mechanisms. However, there is interest in the technology of
representations, respectively, and the BLIP-based multimodal attention mechanisms [23, 32, 36] and their strong entry during
feature extractor was put into service to obtain multimodal the past two years to support the approved fusion mechanisms.

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15672

TABLE V. A COMPARISON OF SEVERAL MULTIMODAL FAKE NEWS DETECTION APPROACHES

Evaluation metrics Feature extraction Fusion
Ref. Datasets DL model Drawback Future scope
(%) Textual Visual mechanism
Acc=83, Pre=81, The only image system Apply probabilistic method and
Twitter
Rec=63, F1=71 scored lower than the only deep model to evaluate if the
[57] BERT VGG-19 Concatenation FCL
Acc=84.2, Pre=83, text one on the Twitter image pertains to the written
Weibo
Rec=87, F1=85 dataset. content.
Acc=96.3, Pre=97.2,
Ti-CNN
Rec=96.4, F1=96.8
Acc=94.7, Pre=95.2, Enhance the approach at the
Weibo ResNet-50
Rec=94.2, F1=94.6 feature fusion phase for a better
[9] BERT + Early fusion FCL High time cost.
Acc=78.4, Pre=85, fit of multimodal features in
Twitter Attention
Rec=81.4, F1=83.1 different locations.
Acc=88.4, Pre=97.3,
PolitiFact
Rec=86.7, F1=91.7
Acc=95.5, Pre=94.5,
All Data
Rec=94.4, F1=94.4 GloVe+
Max voting
FND by Acc=94.7, Pre=95.6, Hierarchical Error Level Add new features to improve
[20] Concatenation ensemble Few features
Jruvika Rec=93.1, F1=94.4 Attention Analysis efficacy.
technique
Fake News Acc=95.9, Pre=97.8, Network
Sample Rec=94.6, F1=96.25
Acc=96.2, Pre=95.7,
TI-CNN
Rec=96, F1=95.89 Effective classification
Acc=93.5, Pre=94.1, Glove+ Image- Cannot extract deep algorithm based on CNN with
[21] Emergent Late Fusion FCL
Rec=89.3, F1=93.12 Text-CNN CNN characteristics. fine-tuned hyperparameters to
MICC- Acc=95.1, Pre=95.1, improve fake news detection.
F220 Rec=78.2, F1=85.88
Acc=88.3, Pre=89, Can lose some critical
Twitter Attention + Multimodal Semantic connections of text
Rec=95, F1=92 Attention+ information depending on
[23] 2-Level factorized bi- MLP and images to improve fusion
Acc=83.2, Pre=82, Bi-LSTM the feature extraction
Weibo CNN–RNN linear pooling methods.
Rec=86,F1=84 approach.
Acc=87.2, Pre=83.7,
Pheme Hierarchical
Rec=78, F1=80.7 Word2vec+Bi
[24] VGG-19 attention FCL High time cost. Detect forged images.
Acc=83.4, Pre=86.3, LSTM
Weibo mechanism
Rec=78, F1=82.4
BERT+
VGG-19+
MediaEval Acc=81.2, Pre=81.3, CNN+ Use many images to identify
[32] Self- Concatenation FCL Small dataset.
(2016) Rec=87.4, F1=84.3 Attention fake news.
attention
mechanism
MediaEval Ensure data quality taken Context-dependent latent
F1=92.4
[33] (2016) BERT VGG-19 Concatenation FCL from an image or text representations such as image
Weibo F1=65.6 before fusion. captioning.
Pre=97.8, Rec=98.2, ResNet-50 Dual attention
[34] All Data BERT FCL Small dataset. Add other modalities.
F1=98.07 V2 mechanism
Acc=87.9, Pre=88.6, Easier way to leverage prior
[35] Weibo BERT ResNet-50 Cross-attention FCL Dataset constraints.
Rec=87.1, F1=87.9 experience in deep networks.
Acc=84.2, Pre=85.4,
Twitter
Rec=61.9, F1=71.8
Acc=85.3, Pre=89.1, Word2Vec+S VGG-19
Weibo A Investigate event-level
Rec=81.4, F1=85.1 elf-attention +Cross- Need testing on real-time
[58] Concatenation FCL multimodal fake news detection
Acc=86.9, Pre=93.5, +Cross modal modal fake information.
Weibo B using visual data.
Rec=79.6, F1=86 attention attention
Acc=92.2, Pre=89,
Weibo C
Rec=96.5, F1=92.6
Improved recognition of
changed material, improper Detect extra fake data, such as
Acc=88.1, Pre=87.1, DenseNet-
[16] Fakeddit RoBERTa Co-attention FCL connections, and reality imposter material, and
Rec=87.9, F1=87.51 161
compared to fraudulent satire/parody.
content.
Other DL approaches (GRU)
Pre=75, Rec=79, Early fusion+ Inaccuracy in extracting
[25] Fakeddit BERT CNN Linear layer and ways to merge visual and
F1=77 Concatenation the most relevant features
textual representations.
Acc=92.2, Pre=91.6, Using Google search takes Ideal extraction of images and
[28] CCMR BERT multiCNN Co-attention MLP
Rec=92.6, F1=92.15 more time. text dataset.
Acc=85, Pre=84.2,
Twitter
Rec=65.4, F1=73.6 Multi-modal Extracting attributes is not Stronger visual information
[31] ALBERT VGG-19 FCL
Acc=86.1, Pre=85.5, circulant fusion deep enough. technique.
Weibo
Rec=88.5, F1=87

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15673

Acc=65.4, Pre=66.4, VGG-19 Attention Failure of fine-tuned Other fusion methods based on
[36] Weibo BiLSTM FCL
Rec=66.8, F1=66.6 +pooling mechanism feature extraction. attention mechanisms.
Acc=75.6, Pre=72.8,
Twitter If a post has many images, Reduce the model's difficulty
Rec=97.7, F1=83.4 Swin- Trained deep
[37] BERT FCL only one may be used to to ensure its use on small
Acc=59.7, Pre=56.4, transformer autoencoder
Weibo identify it. devices.
Rec=99.4, F1=71.9
Use post content and
Acc=91.8, Pre=93.3,
[38] Fakeddit BERT Xception Concatenation FCL Small dataset. comments, together with user-
Rec=93.2, F1=93.2
related data.
Acc=94.4, Pre=97.4,
Politifact
Rec=96.6, F1=97 Adapting new areas and
A fusion approach that
Acc=90.9, Pre=93.2, VGG-19+ improving technology while
[11] Gossipcop RoBERTa Concatenation FCL focuses on the contents of
Rec=94.7, F1=93.9 BiLSTM testing the proposed model is
the extracted features.
Acc=83.1, Pre=85.2, underway.
Twitter
Rec=82.4, F1=83.7
Acc=86.8, Pre=83.1, ResNet-50 Multi-modal
Twitter BERT+ The use of multiple
Rec=75.4, F1=79.1 +two- bilinear pooling Combine textual information
[15] two-text- FCL techniques negatively
Acc=90.4, Pre=94.3, image- +Self-attention with several photos.
Weibo branch affects execution time
Rec=87.1, F1=90.5 branch mechanism
O-BiLSTM
Acc=96.5, Pre=88.8, Word2vec ResNet-50 Adaptive feature based on Not extracting textual Use audio signals and captions
[26] Twitter
Rec=96.2, F1=92.41 +Glove +VGG-16 fusion optimized features efficiently. to detect false news videos.
WSA
Acc=88.8, Pre=89.1,
Weibo
Rec=97.2, F1=93 Cross-modal Use a more effective model
[27] XLNet VGG-19 FCL Improve the fusion process.
Acc=87.3, Pre=79, attention to extract features.
Gossipcop
Rec=44, F1=56.5
Acc=86.4, Pre=84,
MediaEval
Rec=93, F1=88
Acc=81.4, Pre=80.3,
Weibo High-resolution photos
Rec=86.3, F1=83.6 Detect satirical news and the
Efficient- with only a small altered
[29] Twitter DistilBERT Late fusion ANN text that is placed over the
Net-B0 area appeared to be poorly
Indian Acc=67.1 photos.
detected.
Dataset v3
Acc=88.8, Pre=85,
Fakeddit
Rec=87, F1=86
Acc=93.5, Pre=96.5,
Twitter Cannot be used directly Improve feature extraction to
Rec=93.7, F1=95.1 BERT+
[30] VGG-19 Concatenation MLP when one of the modalities counteract intentionally
Acc=91.5, Pre=91.3, BiLSTM
Weibo is lacking. deceiving photos.
Rec=91.3, F1=91.3
Acc=91.8, Pre=91.2, Late fusion
Twitter
Rec=85.4, F1=91.8 GloVe+ based on Enhance the model for cross-
[59] ViT MLP Time complexity
Acc=92.2, Pre=96.9, Transformer attention domain news detection.
Weibo
Rec=88.6, F1=92.5 mechanism
Quantum
Acc=87.9, Pre=95.8, Quantum circuit with time- Apply quantum fuzzy neural
[18] Gossip XLNet VGG-19 multimodal FCL
Rec=89.9, F1=92.8 based complexity networks.
fusion

VI. RESEARCH GAPS AND CHALLENGES  There is a significant difference between image similarities
and sentences in most fake news, but existing algorithms do
Fake news is fundamentally multimodal and multilingual,
not fully capitalize on this.
taking visual, auditory, or literary forms and expressing itself in
a language that readers may not be familiar with. A new  The lack of large and rich multimodal fake news datasets
viewpoint can be developed to make deep systems more negatively affects system development. In addition, datasets
acceptable. Additionally, appropriate feature collection and are limited to the economic or political field only. In
classification techniques can improve the detection of fake addition, the lack of multilingual datasets supports the
news. Studies must investigate whether the classification possibility of developing fake news detection systems in
approach is most appropriate for certain features: textual or several languages and different dialects of the same
visual feature extractors. As a result, greater attention must be language as well.
paid to feature choice and fusion to improve performance. The
challenges in multimodal fake news detection approaches can  Not relying on psychological data, combined with the
be summarized as: contextual features of texts and images of published news,
saves a great deal of time in contacting people responsible
 Existing techniques often employ a basic concatenate for false information sharing and revealing their purposes.
strategy to fuse inter-modal information, yielding mediocre
detection results.

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15674

VII. CONCLUSION [14] Y. Shen, Q. Liu, N. Guo, J. Yuan, and Y. Yang, "Fake News Detection
on Social Networks: A Survey," Applied Sciences, vol. 13, no. 21, Jan.
After studying the literature on fake news analysis methods, 2023, Art. no. 11877, [Link]
this paper summarized the basic features of multimodal fake [15] Y. Guo, H. Ge, and J. Li, "A two-branch multimodal fake news detection
news detection systems, including datasets, visual and textual model based on multimodal bilinear pooling and attention mechanism,"
preprocessing, feature extraction, fusion mechanisms, and fake Frontiers in Computer Science, vol. 5, Apr. 2023, [Link]
10.3389/fcomp.2023.1159063.
news detection stages, as well as related techniques such as
BERT, transformer, ViT, and attention mechanisms. A brief [16] L. Qian, R. Xu, and Z. Zhou, "MRDCA: a multimodal approach for fine-
grained fake news detection through integration of RoBERTa and
review of important multimodal fake news detection systems DenseNet based upon fusion mechanism of co-attention," Annals of
was performed, with different deep learning methods in Operations Research, Dec. 2022, [Link]
different stages. Future studies could focus on modern attention 05154-9.
mechanisms in fake video detection systems. In addition, [17] A. M. Luvembe, W. Li, S. Li, F. Liu, and X. Wu, "CAF-ODNN:
efficient early detection mechanisms must be developed. Complementary attention fusion with optimized deep neural network for
multimodal fake news detection," Information Processing &
REFERENCES Management, vol. 61, no. 3, May 2024, Art. no. 103653, [Link]
10.1016/[Link].2024.103653.
[1] S. Hangloo and B. Arora, "Combating multimodal fake news on social [18] Z. Qu, Y. Meng, G. Muhammad, and P. Tiwari, "QMFND: A quantum
media: methods, datasets, and future perspective," Multimedia Systems, multimodal fusion-based fake news detection model for social media,"
vol. 28, no. 6, pp. 2391–2422, Dec. 2022, [Link] Information Fusion, vol. 104, Apr. 2024, Art. no. 102172,
s00530-022-00966-y. [Link]
[2] L. Hu, S. Wei, Z. Zhao, and B. Wu, "Deep learning for fake news [19] F. A. O. Santos, K. L. Ponce-Guevara, D. Macêdo, and C. Zanchettin,
detection: A comprehensive survey," AI Open, vol. 3, pp. 133–155, Jan. "Improving Universal Language Model Fine-Tuning using Attention
2022, [Link] Mechanism," in 2019 International Joint Conference on Neural
[3] C. Comito, L. Caroprese, and E. Zumpano, "Multimodal fake news Networks (IJCNN), Budapest, Hungary, Jul. 2019, pp. 1–7,
detection on social media: a survey of deep learning techniques," Social [Link]
Network Analysis and Mining, vol. 13, no. 1, Aug. 2023, Art. no. 101, [20] P. Meel and D. K. Vishwakarma, "HAN, image captioning, and
[Link] forensics ensemble multimodal fake news detection," Information
[4] D. Gifu, "An Intelligent System for Detecting Fake News," Procedia Sciences, vol. 567, pp. 23–41, Aug. 2021, [Link]
Computer Science, vol. 221, pp. 1058–1065, Jan. 2023, [Link] [Link].2021.03.037.
10.1016/[Link].2023.08.088. [21] C. Raj and P. Meel, "ConvNet frameworks for multi-modal fake news
[5] J. Li and M. Lei, "A Brief Survey for Fake News Detection via Deep detection," Applied Intelligence, vol. 51, no. 11, pp. 8132–8148, Nov.
Learning Models," Procedia Computer Science, vol. 214, pp. 1339– 2021, [Link]
1344, Jan. 2022, [Link]
[22] L. Wang, C. Zhang, H. Xu, Y. Xu, X. Xu, and S. Wang, "Cross-modal
[6] A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, Contrastive Learning for Multimodal Fake News Detection," in
"Multimodal sentiment analysis: A systematic review of history, Proceedings of the 31st ACM International Conference on Multimedia,
datasets, multimodal fusion methods, applications, challenges and future Ottawa, Canada, Nov. 2023, pp. 5696–5704, [Link]
directions," Information Fusion, vol. 91, pp. 424–444, Mar. 2023, 3581783.3613850.
[Link] [23] R. Kumari and A. Ekbal, "AMFB: Attention based multimodal
[7] M. Nirav Shah and A. Ganatra, "A systematic literature review and Factorized Bilinear Pooling for multimodal Fake News Detection,"
existing challenges toward fake news detection models," Social Network Expert Systems with Applications, vol. 184, Dec. 2021, Art. no. 115412,
Analysis and Mining, vol. 12, no. 1, Nov. 2022, Art. no. 168, [Link]
[Link] [24] J. Zeng, Y. Zhang, and X. Ma, "Fake news detection for epidemic
[8] A. Figueira, N. Guimaraes, and L. Torgo, "Current State of the Art to emergencies via deep correlations between text and images," Sustainable
Detect Fake News in Social Media: Global Trendings and Next Cities and Society, vol. 66, Mar. 2021, Art. no. 102652,
Challenges:," in Proceedings of the 14th International Conference on [Link]
Web Information Systems and Technologies, Seville, Spain, 2018, pp. [25] I. Segura-Bedmar and S. Alonso-Bartolome, "Multimodal Fake News
332–339, [Link] Detection," Information, vol. 13, no. 6, Jun. 2022, Art. no. 284,
[9] J. Xue, Y. Wang, Y. Tian, Y. Li, L. Shi, and L. Wei, "Detecting fake [Link]
news by exploring the consistency of multimodal data," Information [26] V. Kishore and M. Kumar, "Enhanced Multimodal Fake News Detection
Processing & Management, vol. 58, no. 5, Sep. 2021, Art. no. 102610, with Optimal Feature Fusion and Modified Bi-LSTM Architecture,"
[Link] Cybernetics and Systems, Jan. 2023, [Link]
[10] "Google Trends," Google Trends. [Link] 2023.2175155.
explore?date=today%205-y&q=fake%20news&hl=en (accessed May 30, [27] Z. Liang, "Fake News Detection Based on Multimodal Inputs,"
2024). Computers, Materials & Continua, vol. 75, no. 2, pp. 4519–4534, 2023,
[11] J. Wang, J. Zheng, S. Yao, R. Wang, and H. Du, "TLFND: A [Link]
Multimodal Fusion Model Based on Three-Level Feature Matching [28] X. Cui and Y. Li, "Fake News Detection in Social Media based on
Distance for Fake News Detection," Entropy, vol. 25, no. 11, Nov. 2023, Multi-Modal Multi-Task Learning," International Journal of Advanced
Art. no. 1533, [Link] Computer Science and Applications (IJACSA), vol. 13, no. 7, 31 2022,
[12] K. Liu and M. Hai, "Rumor Detection of Covid-19 Related Microblogs [Link]
on Sina Weibo," Procedia Computer Science, vol. 221, pp. 386–393, [29] D. K. Sharma, B. Singh, S. Agarwal, H. Kim, and R. Sharma,
Jan. 2023, [Link]
"FakedBits- Detecting Fake Information on Social Platforms using
[13] S. Ahmed, K. Hinkelmann, and F. Corradini, "Combining Machine Multi-Modal Features," KSII Transactions on Internet and Information
Learning with Knowledge Engineering to detect Fake News in Social Systems (TIIS), vol. 17, no. 1, pp. 51–73, Jan. 2023.
Networks - a survey." arXiv, Jan. 20, 2022, [Link] [30] L. Wu, P. Liu, and Y. Zhang, "See How You Read? Multi-Reading
arXiv.2201.08032. Habits Fusion Reasoning for Multi-Modal Fake News Detection,"
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37,

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …
Engineering, Technology & Applied Science Research Vol. 14, No. 4, 2024, 15665-15675 15675
no. 11, pp. 13736–13744, Jun. 2023, [Link] [46] J. Maurício, I. Domingues, and J. Bernardino, "Comparing Vision
26609. Transformers and Convolutional Neural Networks for Image
[31] X. Wang, X. Li, X. Liu, and H. Cheng, "Using ALBERT and Multi- Classification: A Literature Review," Applied Sciences, vol. 13, no. 9,
modal Circulant Fusion for Fake News Detection," in 2022 IEEE Jan. 2023, Art. no. 5521, [Link]
International Conference on Systems, Man, and Cybernetics (SMC), [47] Z. Jin, J. Cao, Y. Zhang, J. Zhou, and Q. Tian, "Novel Visual and
Prague, Czech Republic, 2022, pp. 2936–2942, [Link] Statistical Image Features for Microblogs News Verification," IEEE
SMC53654.2022.9945303. Transactions on Multimedia, vol. 19, no. 3, pp. 598–608, Mar. 2017,
[32] N. M. Duc Tuan and P. Quang Nhat Minh, "Multimodal Fusion with [Link]
BERT and Attention Mechanism for Fake News Detection," in 2021 [48] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, "dEFEND: Explainable
RIVF International Conference on Computing and Communication Fake News Detection," in Proceedings of the 25th ACM SIGKDD
Technologies (RIVF), Hanoi, Vietnam, Aug. 2021, pp. 1–6, International Conference on Knowledge Discovery & Data Mining,
[Link] Anchorage, AK, USA, Apr. 2019, pp. 395–405, [Link]
[33] R. Jaiswal, U. P. Singh, and K. P. Singh, "Fake News Detection Using 3292500.3330935.
BERT-VGG19 Multimodal Variational Autoencoder," in 2021 IEEE 8th [49] T. Chen, X. Li, H. Yin, and J. Zhang, "Call Attention to Rumors: Deep
Uttar Pradesh Section International Conference on Electrical, Attention Based Recurrent Neural Networks for Early Rumor
Electronics and Computer Engineering (UPCON), Dehradun, India, Detection," in Trends and Applications in Knowledge Discovery and
Nov. 2021, pp. 1–5, [Link] Data Mining, Melbourne, Australia, 2018, pp. 40–52,
9667614. [Link]
[34] H. Yang et al., "Multi-Modal fake news Detection on Social Media with [50] M. H. Goldani, S. Momtazi, and R. Safabakhsh, "Detecting fake news
Dual Attention Fusion Networks," in 2021 IEEE Symposium on with capsule neural networks," Applied Soft Computing, vol. 101, Mar.
Computers and Communications (ISCC), Athens, Greece, Sep. 2021, pp. 2021, Art. no. 106991, [Link]
1–6, [Link] [51] F. Qian, C. Gong, K. Sharma, and Y. Liu, "Neural User Response
[35] L. Ying, H. Yu, J. Wang, Y. Ji, and S. Qian, "Multi-Level Multi-Modal Generator: Fake News Detection with Collective User Intelligence," in
Cross-Attention Network for Fake News Detection," IEEE Access, vol. Proceedings of the Twenty-Seventh International Joint Conference on
9, pp. 132363–132373, 2021, [Link] Artificial Intelligence, Stockholm, Sweden, Jul. 2018, pp. 3834–3840,
3114093. [Link]
[36] Y. Guo and W. Song, "A Temporal-and-Spatial Flow Based Multimodal [52] N. Kausar, A. AliKhan, and M. Sattar, "Towards better representation
Fake News Detection by Pooling and Attention Blocks," IEEE Access, learning using hybrid deep learning model for fake news detection,"
vol. 10, pp. 131498–131508, 2022, [Link] Social Network Analysis and Mining, vol. 12, no. 1, Nov. 2022, Art. no.
2022.3229762. 165, [Link]
[37] Y. Liang, T. Tohti, and A. Hamdulla, "False Information Detection via [53] S. Abdali, S. Shaham, and B. Krishnamachari, "Multi-modal
Multimodal Feature Fusion and Multi-Classifier Hybrid Prediction," Misinformation Detection: Approaches, Challenges and Opportunities."
Algorithms, vol. 15, no. 4, Apr. 2022, Art. no. 119, arXiv, Mar. 27, 2024, [Link]
[Link] [54] M.-H. Guo et al., "Attention mechanisms in computer vision: A survey,"
[38] S. K. Uppada, P. Patel, and S. B., "An image and text-based multimodal Computational Visual Media, vol. 8, no. 3, pp. 331–368, Sep. 2022,
model for detecting fake news in OSN’s," Journal of Intelligent [Link]
Information Systems, vol. 61, no. 2, pp. 367–393, Oct. 2023, [55] B. Ahmed, G. Ali, A. Hussain, A. Baseer, and J. Ahmed, "Analysis of
[Link] Text Feature Extractors using Deep Learning on Fake News,"
[39] S. K. Hamed, M. J. Ab Aziz, and M. R. Yaakub, "A review of fake news Engineering, Technology & Applied Science Research, vol. 11, no. 2,
detection approaches: A critical analysis of relevant studies and pp. 7001–7005, Apr. 2021, [Link]
highlighting key challenges associated with the dataset, feature [56] H. M. Al-Dabbas, R. A. Azeez, and A. E. Ali, "Two Proposed Models
representation, and data fusion," Heliyon, vol. 9, no. 10, Oct. 2023, Art. for Face Recognition: Achieving High Accuracy and Speed with
no. e20382, [Link] Artificial Intelligence," Engineering, Technology & Applied Science
[40] D. S. Asudani, N. K. Nagwani, and P. Singh, "Impact of word Research, vol. 14, no. 2, pp. 13706–13713, Apr. 2024, [Link]
embedding models on text analytics in deep learning environment: a 10.48084/etasr.7002.
review," Artificial Intelligence Review, vol. 56, no. 9, pp. 10345–10425, [57] T. Zhang et al., "BDANN: BERT-Based Domain Adaptation Neural
Sep. 2023, [Link] Network for Multi-Modal Fake News Detection," in 2020 International
[41] J. Egger, A. Pepe, C. Gsaxner, Y. Jin, J. Li, and R. Kern, "Deep Joint Conference on Neural Networks (IJCNN), Glasgow, UK, Jul. 2020,
learning—a first meta-survey of selected reviews across scientific pp. 1–8, [Link]
disciplines, their commonalities, challenges and research impact," PeerJ [58] C. Song, N. Ning, Y. Zhang, and B. Wu, "A multimodal fake news
Computer Science, vol. 7, Nov. 2021, Art. no. e773, [Link] detection model based on crossmodal attention residual and
10.7717/peerj-cs.773. multichannel convolutional neural networks," Information Processing &
[42] K. Han et al., "A Survey on Visual Transformer," IEEE Transactions on Management, vol. 58, no. 1, Jan. 2021, Art. no. 102437, [Link]
Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87–110, 10.1016/[Link].2020.102437.
Jan. 2023, [Link] [59] P. Yang, J. Ma, Y. Liu, and M. Liu, "Multi-modal transformer for fake
[43] A. Choudhary and A. Arora, "Assessment of bidirectional transformer news detection," Mathematical biosciences and engineering, vol. 20, no.
encoder model and attention based bidirectional LSTM language models 8, pp. 14699–14717, Jul. 2023, [Link]
for fake news detection," Journal of Retailing and Consumer Services,
vol. 76, Jan. 2024, 103545, [Link]
103545.
[44] S. F. N. Azizah, H. D. Cahyono, S. W. Sihwi, and W. Widiarto,
"Performance Analysis of Transformer Based Models (BERT, ALBERT
and RoBERTa) in Fake News Detection." arXiv, Aug. 09, 2023,
[Link]
[45] D. Tomás, R. Ortega-Bueno, G. Zhang, P. Rosso, and R. Schifanella,
"Transformer-based models for multimodal irony detection," Journal of
Ambient Intelligence and Humanized Computing, vol. 14, no. 6, pp.
7399–7410, Jun. 2023, [Link]

[Link] Abduljaleel & Ali: Deep Learning and Fusion Mechanism-based Multimodal Fake News Detection …

Common questions

Attention mechanisms in multimodal fake news detection models enhance the model's ability to weigh different features according to their relevance and importance in a given context . They facilitate the generation of more accurate representations by allowing models to focus on crucial interactions between textual and visual inputs. This is particularly useful for highlighting significant details that may be critical for discerning falsehoods in the data . As a result, attention mechanisms improve not only the interpretability of models by making it clearer where focus is applied but also their accuracy in detecting fake news by capturing intricate inter-modal relationships .

Proposed improvements for enhancing early detection accuracy of fake news include the development of more advanced attention mechanisms and the creation of efficient early detection algorithms that can quickly assess credibility in rapidly spreading information . Emphasizing cross-modality feature extraction and effective fusion strategies, such as quantum-based standards, can better integrate diverse data inputs . Incorporating psychological data along with contextual content features could also improve understanding of fake news spread dynamics, providing faster identification of misleading content before it reaches wide dissemination . Moreover, expanding datasets to cover more languages and domains, along with improving multimodal data capabilities, are crucial steps in advancing these systems .

Major limitations include the reliance on large, diverse datasets that are often not available, which hampers the ability to generalize models across different contexts and domains . Additionally, there is a challenge in efficiently integrating and fusing features from multiple modalities, as current techniques may not fully capitalize on the complex relationships within multimodal data, potentially leading to noisy outputs . The computational requirements and time complexity of deep learning techniques also pose challenges, especially with high-dimensional data inputs . Finally, the lack of robust multilingual datasets further limits the system's applicability to diverse linguistic contexts .

The single-modal approach in fake news detection often focuses on either textual or visual features individually, utilizing models like CNN or LSTM for classification based on these standalone characteristics . It tends to be less robust because it misses cues available through complementary data, such as images or other metadata that multimodal approaches consider . Conversely, the multimodal approach integrates multiple data types – like text, images, and even user interactions – to enhance detection capabilities, often employing sophisticated fusion techniques using attention mechanisms . However, multimodal approaches require larger, more varied datasets and face challenges in effectively fusing diverse feature types, leading to increased computational complexity and sometimes noisy data integration .

Transformer-based models have improved the fusion of multimodal data by effectively capturing relationships across different data types with their attention mechanisms, which allows for better weighing of relevant information from both textual and visual inputs . They offer superior feature extraction capabilities by deriving dependencies and interactions between modalities more efficiently than traditional concatenation methods . This facilitates a more coherent integration of diverse input features, resulting in improved decision-making processes in fake news detection . However, these benefits come at the cost of increased computational complexity and data requirements, necessitating larger datasets for effective training .

Crucial preprocessing steps in textual data involve cleaning datasets by removing excess and duplicate text elements, token normalization, stemming, lemmatization, and transforming significant tokens into vector forms through word embedding techniques . For visual data, preprocessing includes filtering images, removing noise, aligning image characteristics across a dataset, and applying transformations to unify formats for accurate feature extraction . These steps are essential for effective deep learning model training and ensure that both textual and visual data are optimized for feature extraction and subsequent multimodal fusion in fake news detection .

Multimodal fake news detection systems feature the integration of both textual and visual data to improve accuracy, employing advanced techniques like deep learning and attention mechanisms . A key challenge is the need for large and diverse datasets encompassing various domains such as political, economic, and health-related topics . Current approaches frequently use basic concatenation strategies for feature fusion, which often results in suboptimal detection capabilities due to failure in capturing inter-modal relationships . Additionally, the absence of diverse multimodal datasets, especially in multilingual contexts, hinders the overall system development and performance .

BERT improves fake news detection by leveraging its transformer-based architecture to capture deep contextual relationships and semantic nuances, which enables it to understand implicit associations in textual data better than earlier techniques like GloVe or Word2Vec, which use static word embeddings . BERT's use of bidirectional transformers allows it to consider the context from both previous and following words, resulting in more effective feature extraction for nuanced content . However, BERT's major drawback includes higher computational demands and greater resource requirements compared to the simpler models like GloVe or Word2Vec . Additionally, fine-tuning BERT for specific tasks requires large datasets and significant computational power, potentially limiting its usage in environments with limited resources .

User-based attributes enhance fake news detection accuracy by providing additional context on how content is perceived and interacted with across social media platforms . These attributes may include user reactions, sharing patterns, engagement statistics, and historical behavior, each contributing insights into the credibility of the content . By incorporating these user interaction metrics, detection systems can identify anomalous patterns indicative of unnatural propagation or targeted misinformation campaigns, thus improving the robustness and contextuality of the model's decisions .

Feature fusion is challenging because it involves integrating disparate data types—text, images, and possibly others—into a unified model input, requiring reconciliation of different scales and dimensions . Misalignment and redundancy in multimodal data can introduce noise, hindering the model's accuracy . Late fusion techniques attempt to address these challenges by keeping feature extraction from each modality separate until the final stages of model processing, allowing for more contextual evaluation of features and reducing early-stage noise interference . This allows models to adjust weighting and importance of features more flexibly, which can improve classification accuracy .

Deep Learning for Multimodal Fake News Detection
No ratings yet
Deep Learning for Multimodal Fake News Detection
9 pages
Ensemble Methods for Fake News Detection
No ratings yet
Ensemble Methods for Fake News Detection
13 pages
SpotFake: Multi-Modal Fake News Detection
No ratings yet
SpotFake: Multi-Modal Fake News Detection
9 pages
Automatic Fake News Detection Trends
No ratings yet
Automatic Fake News Detection Trends
9 pages
NewsBag: Multimodal Fake News Dataset
No ratings yet
NewsBag: Multimodal Fake News Dataset
8 pages
Ieee Paper 3
No ratings yet
Ieee Paper 3
12 pages
ML Techniques for Fake News Detection
No ratings yet
ML Techniques for Fake News Detection
20 pages
Deep Learning for Fake News Detection Review
No ratings yet
Deep Learning for Fake News Detection Review
11 pages
Fake News Detection with Bi-LSTM Model
No ratings yet
Fake News Detection with Bi-LSTM Model
9 pages
Multimodal Fake News Detection Review
No ratings yet
Multimodal Fake News Detection Review
17 pages
Journal Pone 0312240
No ratings yet
Journal Pone 0312240
21 pages
Hybrid Model for Fake News Detection
No ratings yet
Hybrid Model for Fake News Detection
12 pages
Multimodal Fake News Detection
No ratings yet
Multimodal Fake News Detection
16 pages
A Comparative Study of Deep Learning Models For Fa
No ratings yet
A Comparative Study of Deep Learning Models For Fa
8 pages
Emotion-Driven Fake News Detection System
No ratings yet
Emotion-Driven Fake News Detection System
16 pages
Fake News Detection with Deep Learning
No ratings yet
Fake News Detection with Deep Learning
5 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
14 pages
Future Directions in Fake News Detection
No ratings yet
Future Directions in Fake News Detection
4 pages
A Survey of Multimodal Fake News Detection A Cross-Modal Interaction Perspective
No ratings yet
A Survey of Multimodal Fake News Detection A Cross-Modal Interaction Perspective
18 pages
Multimodal Fake News Detection in Tamil
No ratings yet
Multimodal Fake News Detection in Tamil
29 pages
Multimodal Fake News Detection Methods
No ratings yet
Multimodal Fake News Detection Methods
10 pages
TGA: Transformer for Fake News Detection
No ratings yet
TGA: Transformer for Fake News Detection
19 pages
Real-Time Detection of Fake News Articles Using Deep Learning Techniques
No ratings yet
Real-Time Detection of Fake News Articles Using Deep Learning Techniques
5 pages
Team 20
No ratings yet
Team 20
11 pages
Hybrid Metaheuristic for Fake News Detection
No ratings yet
Hybrid Metaheuristic for Fake News Detection
21 pages
Machine Learning for Fake News Detection
No ratings yet
Machine Learning for Fake News Detection
48 pages
Fake News Detection with LSTM & TensorFlow
No ratings yet
Fake News Detection with LSTM & TensorFlow
4 pages
Machine Learning for Fake News Detection
No ratings yet
Machine Learning for Fake News Detection
19 pages
Fake News Detection with ML & DL Techniques
No ratings yet
Fake News Detection with ML & DL Techniques
18 pages
Deep Learning for Fake News Detection
No ratings yet
Deep Learning for Fake News Detection
3 pages
Fake News Detection with CNN and ML
No ratings yet
Fake News Detection with CNN and ML
6 pages
Swarm Optimization for Fake News Detection
No ratings yet
Swarm Optimization for Fake News Detection
8 pages
Fake News Detection via Transfer Learning
No ratings yet
Fake News Detection via Transfer Learning
7 pages
Hybrid Deep Learning Model for Fake News Detection
No ratings yet
Hybrid Deep Learning Model for Fake News Detection
7 pages
NLP Techniques for Fake News Detection
No ratings yet
NLP Techniques for Fake News Detection
21 pages
Kaliyar2021 Article FakeBERTFakeNewsDetectionInSoc
No ratings yet
Kaliyar2021 Article FakeBERTFakeNewsDetectionInSoc
24 pages
Fake News Detection Using AI Techniques
No ratings yet
Fake News Detection Using AI Techniques
7 pages
A Deep Learning Multimodal Framework For Fake News Detection
No ratings yet
A Deep Learning Multimodal Framework For Fake News Detection
7 pages
Multimodal Neural Network for Fake News Detection
No ratings yet
Multimodal Neural Network for Fake News Detection
13 pages
Fake News Detection Using Deep Learning
No ratings yet
Fake News Detection Using Deep Learning
24 pages
Machine Learning for Fake News Detection
No ratings yet
Machine Learning for Fake News Detection
6 pages
Python-Based Fake News Detection
No ratings yet
Python-Based Fake News Detection
11 pages
Hybrid Deep Learning for Fake News Detection
No ratings yet
Hybrid Deep Learning for Fake News Detection
11 pages
Real-Time Fake News Detection with GNNs
No ratings yet
Real-Time Fake News Detection with GNNs
9 pages
Fake News Detection Using AGRC-RCNN
No ratings yet
Fake News Detection Using AGRC-RCNN
11 pages
Transformer Models for Fake News Detection
No ratings yet
Transformer Models for Fake News Detection
7 pages
Fake News Detection: ML & DL Review
No ratings yet
Fake News Detection: ML & DL Review
7 pages
NLP and Blockchain for Fake News Detection
No ratings yet
NLP and Blockchain for Fake News Detection
7 pages
Multimodal Framework for Fake News Detection
No ratings yet
Multimodal Framework for Fake News Detection
14 pages
Multimodal System for Fake News Detection
No ratings yet
Multimodal System for Fake News Detection
8 pages
Deep Learning for Fake News Detection
No ratings yet
Deep Learning for Fake News Detection
20 pages
Fake News Detection via Machine Learning
No ratings yet
Fake News Detection via Machine Learning
11 pages
Deep Learning for Fake News Detection Review
No ratings yet
Deep Learning for Fake News Detection Review
25 pages
RoBERTa for Fake News Detection
No ratings yet
RoBERTa for Fake News Detection
5 pages
Fake News Detection with NLP & ML
No ratings yet
Fake News Detection with NLP & ML
9 pages
Fake News Detection Overview and Insights
No ratings yet
Fake News Detection Overview and Insights
15 pages
FOD Prevention Program Guidelines
No ratings yet
FOD Prevention Program Guidelines
12 pages
Software Product Description: (Version 7.5.542.0, May 05, 2009)
No ratings yet
Software Product Description: (Version 7.5.542.0, May 05, 2009)
30 pages
GameCenter Startup Log Analysis
No ratings yet
GameCenter Startup Log Analysis
4 pages
Embroidery Stitch Techniques Guide
No ratings yet
Embroidery Stitch Techniques Guide
8 pages
JetControl 24x User Manual
No ratings yet
JetControl 24x User Manual
408 pages
Ds Esprimo Mobile v6505
No ratings yet
Ds Esprimo Mobile v6505
5 pages
Franktaylor@email - Ua: Congratulations!!
No ratings yet
Franktaylor@email - Ua: Congratulations!!
2 pages
PHP and MySQL Web Development Exam
No ratings yet
PHP and MySQL Web Development Exam
5 pages
Business Intelligence Skills Summary
No ratings yet
Business Intelligence Skills Summary
1 page
Using The Online Patching Readiness Report in Oracle E-Business Suite Release 12.2 (Doc ID 1531121.1) PDF
No ratings yet
Using The Online Patching Readiness Report in Oracle E-Business Suite Release 12.2 (Doc ID 1531121.1) PDF
19 pages
Angular 7 and TypeScript Basics
No ratings yet
Angular 7 and TypeScript Basics
95 pages
FIFA 22 Card Creator Tool
No ratings yet
FIFA 22 Card Creator Tool
1 page
GHX-10 Universal Cross Converter Overview
No ratings yet
GHX-10 Universal Cross Converter Overview
1 page
IPTV Solutions for Hotels and Guests
No ratings yet
IPTV Solutions for Hotels and Guests
9 pages
CARR-I/O Board Diagram and Connections
No ratings yet
CARR-I/O Board Diagram and Connections
19 pages
Nanobiotechnology: Concepts and Applications in Health, Agriculture, and Environment 1st Edition Rajesh Singh Tomar (Editor) Ebook Open-Access PDF
100% (1)
Nanobiotechnology: Concepts and Applications in Health, Agriculture, and Environment 1st Edition Rajesh Singh Tomar (Editor) Ebook Open-Access PDF
61 pages
Data Analysis and Processing Techniques
No ratings yet
Data Analysis and Processing Techniques
20 pages
Features of Distributed File Systems
No ratings yet
Features of Distributed File Systems
19 pages
Payment Notification
No ratings yet
Payment Notification
1 page
Transforming Public Sector Innovation
No ratings yet
Transforming Public Sector Innovation
36 pages
Manoj Nagar: Senior Software Architect Profile
No ratings yet
Manoj Nagar: Senior Software Architect Profile
2 pages
Template & Tutorial ByMe
No ratings yet
Template & Tutorial ByMe
8 pages
VLSI Major Project Overview
No ratings yet
VLSI Major Project Overview
3 pages
Email Troubleshooting for MFPs
No ratings yet
Email Troubleshooting for MFPs
2 pages
[能力]Oni Samuel Boluwatife_The Impact of AI on the Translation Industry
No ratings yet
[能力]Oni Samuel Boluwatife_The Impact of AI on the Translation Industry
19 pages
Monocular 3D Mapping for Tumor Surgery
No ratings yet
Monocular 3D Mapping for Tumor Surgery
7 pages
AyurSutra: Panchakarma Therapy Software
No ratings yet
AyurSutra: Panchakarma Therapy Software
11 pages
Creativity in Technology Entrepreneurship
No ratings yet
Creativity in Technology Entrepreneurship
32 pages
BHEL Graduate Apprenticeship Walk-In 2018
No ratings yet
BHEL Graduate Apprenticeship Walk-In 2018
6 pages
Salesforce Business Analyst Profile
No ratings yet
Salesforce Business Analyst Profile
5 pages

5 paper

Uploaded by

5 paper

Uploaded by

Engineering, Technology & Applied Science Research Vol. 14, No.

4, 2024, 15665-15675 15665

Deep Learning and Fusion Mechanism-based

Keywords-misinformation; attention mechanism; fusion methods; social media; vision transformer

A. Single-Modal Fake News Approaches TABLE IV. IMPORTANT MULTIMODAL DATASETS

TABLE V. A COMPARISON OF SEVERAL MULTIMODAL FAKE NEWS DETECTION APPROACHES

Common questions

What role do attention mechanisms play in the effectiveness of multimodal fake news detection models?

What are some proposed improvements to current fake news detection techniques that could enhance early detection accuracy?

What are the major limitations currently facing the effectiveness of deep learning in multimodal fake news detection?

How does the single-modal approach differ from the multimodal approach in fake news detection, and what are the limitations of each?

In what ways has the adoption of transformer-based models improved the fusion of multimodal data in fake news detection?

What have been identified as crucial preprocessing steps in both textual and visual data for fake news detection, according to recent studies?

What are the key features and challenges in implementing multimodal fake news detection systems?

How does BERT improve fake news detection in comparison to earlier techniques like GloVe or Word2Vec, and what are the drawbacks?

How does the presence of user-based attributes enhance the accuracy of fake news detection systems?

Why is feature fusion considered a challenging aspect in the development of fake news detection systems, and how do late fusion techniques attempt to address this?

You might also like