Securing Federated Learning: A Defense Strategy Against Targeted Data Poisoning Attack
Securing Federated Learning: A Defense Strategy Against Targeted Data Poisoning Attack
Research
Abstract
Ensuring the security and integrity of Federated Learning (FL) models against adversarial attacks is critical. Among these
threats, targeted data poisoning attacks, particularly label flipping, pose a significant challenge by undermining model
accuracy and reliability. This paper investigates targeted data poisoning attacks in FL systems, where a small fraction of
malicious participants corrupt the global model through mislabeled data updates. Our findings demonstrate that even
a minor presence of malicious participants can substantially decrease classification accuracy and recall, especially when
attacks focus on specific classes. We also examine the longevity and timing of these attacks during early and late training
rounds, highlighting the impact of malicious participant availability on attack effectiveness. To mitigate these threats,
we propose a defense strategy that identifies malicious participants by analyzing parameter updates across vulnerable
training rounds. Utilizing Principal Component Analysis (PCA) for dimensionality reduction and anomaly detection, our
approach effectively isolates malicious updates. Extensive simulations on standard datasets validate the effectiveness of
our algorithm in accurately identifying and excluding malicious participants, thereby enhancing the integrity of the FL
model. These results offer a robust defense against sophisticated poisoning strategies, significantly improving FL security.
Article Highlights
• Federated learning is vulnerable to data poisoning attacks that manipulate model updates to reduce accuracy.
• Our defense strategy uses data analysis to detect and block harmful updates, improving model security.
• Test results show our approach effectively protects learning systems without disrupting performance.
1 Introduction
In the landscape of contemporary machine learning, Federated Learning (FL) has emerged as a paradigm shift, pro‑
moting the decentralization of learning processes across multiple devices while ensuring data privacy and reducing
reliance on centralized data repositories. This innovative approach enables a multitude of devices to collaboratively
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
learn a shared prediction model while keeping the training data localized, thus addressing significant privacy con‑
cerns associated with traditional centralized machine learning techniques [1]. However, the decentralized nature
of FL introduces new vulnerabilities, with poisoning attacks, particularly label flipping attacks, posing a significant
threat to the integrity and reliability of the collective learning model [2].
Poisoning attacks in the context of FL involve the deliberate manipulation of training data or model updates
by adversaries, aiming to degrade the model’s performance or induce specific biases. Among these, label flipping
attacks are especially insidious, as they subtly alter the labels of the training data in a manner that is hard to detect
yet capable of significantly compromising the model’s accuracy. Such attacks not only undermine the model’s reli‑
ability but also erode trust in the FL system, making the development of effective defense mechanisms a critical
research endeavor [3]. Recognizing the urgency of addressing this vulnerability, our study introduces a novel algo‑
rithm designed to detect and mitigate the impact of poisoning attacks within FL systems. The proposed algorithm
leverages advanced analytical techniques, including Principal Component Analysis (PCA) for dimensionality reduction
and sophisticated anomaly detection processes, to identify and isolate malicious contributions from participating
devices. By focusing on parameter updates related to the source class and employing rigorous statistical methods,
our approach offers a robust defense against the subtleties of label flipping attacks.
Several researchers have delved into the realm of poisoning attacks in FL, recognizing their potential to undermine
the integrity and reliability of collaborative learning systems. Prior studies have highlighted the susceptibility of FL
models to various forms of poisoning, including data injection, model poisoning, and label flipping attacks [2, 4].
These attacks, while diverse in their execution, share a common goal of introducing malicious inputs or updates to
corrupt the learning process and compromise the accuracy of the global model. Existing research has emphasized the
importance of developing robust defense mechanisms capable of detecting and mitigating poisoning attacks in FL
settings [3]. However, challenges persist in effectively countering these adversarial threats, particularly in scenarios
where adversaries adapt their tactics to evade detection [5]. Additionally, there remains a need for comprehensive
evaluations of defense strategies across different FL configurations and datasets to assess their generalizability
and efficacy in real-world scenarios. Addressing these gaps in understanding and defense capabilities is crucial for
advancing the security and reliability of FL systems in practical applications.
In this study, we examine the susceptibility of FL systems to malicious actors intent on corrupting the globally trained
model. We adopt a conservative stance by assuming minimal capabilities for malicious FL participants each limited to
manipulating the raw training data on their respective device. This allows even inexperienced malicious actors to carry
out poisoning attacks without requiring knowledge of the model’s architecture, parameters, or FL procedure. Within this
framework, label flipping attacks emerge as a viable strategy for executing data poisoning, as they have demonstrated
effectiveness against traditional, centralized machine learning models [6–8]. We investigate the feasibility of applying
these attacks to FL systems utilizing complex deep neural network architectures. We present our FL poisoning attacks uti‑
lizing widely-used image classification datasets: Fashion-MNIST [9]. Our analysis reveals several noteworthy observations.
Firstly, we demonstrate that the effectiveness of the attacks, measured by the decrease in model utility, is contingent
upon the proportion of malicious users involved, with even a small percentage of malicious actors having a substan‑
tial impact. Secondly, we illustrate that these attacks can be tailored to target specific classes, resulting in significant
negative repercussions for the targeted subset while leaving other classes largely unaffected. This targeted approach is
advantageous for adversaries aiming to manipulate certain classes without triggering detection by compromising the
entire model. Additionally, we examine the influence of attack timing, considering whether poisoning occurs during
the early or late stages of FL training, and assess the impact of malicious participant availability. Our findings suggest
that the greatest impact of poisoning is achieved when malicious users participate in later rounds with high availability.
Furthermore, we observe that despite early-round poisoning, the global model can still converge accurately, indicating
the need for continued vigilance throughout the training process. These insights underscore the importance of strategic
planning and proactive defenses to mitigate the adverse effects of malicious activities in FL systems.
Given the substantial threat posed by poisoning attacks to FL systems, we propose a robust defense strategy aimed
at empowering FL aggregators to discern and neutralize malicious participants based on their model updates. Our
defense strategy capitalizes on the unique characteristics exhibited by updates sent from malicious participants, dis‑
tinguishing them from those of honest participants. By extracting pertinent parameters from the multi-dimensional
update vectors and employing PCA for dimensionality reduction, our defense strategy effectively mitigates the
risk posed by malicious actors. Empirical results obtained from experiments conducted Fashion-MNIST dataset [9],
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
covering a spectrum of malicious participant rates ranging from 2% to 20%, demonstrate the efficacy of our defense
approach. Specifically, our strategy enables the FL aggregator to achieve clear differentiation between updates origi‑
nating from malicious and honest participants, thereby facilitating the identification and subsequent exclusion of
malicious actors from the FL system. This proactive defense mechanism equips FL systems with enhanced resilience
against poisoning attacks, safeguarding the integrity and reliability of the collaborative learning process.
The rest of this paper is organized as follows. In Sect. 2, we present related work. In Sect. 3, we present threat model
and adversary model. Section 4 demonstrates the effectiveness of FL poisoning attacks and analyzes their impact with
respect to malicious participant percentage, choice of classes under attack, attack timing, and malicious participant
availability. Our defense strategy is described and empirically demonstrated in Sect. 5.
2 Related work
The advent of FL has brought forth significant advancements in distributed machine learning, allowing multiple partici‑
pants to collaboratively train models while keeping their data localized [10]. This approach not only enhances privacy but
also enables the leveraging of diverse datasets across different devices and locations [4]. However, the distributed nature
of FL introduces vulnerabilities to adversarial threats, particularly data poisoning attacks, where malicious participants
manipulate their data or model updates to compromise the global model’s integrity [11, 12]. Data poisoning attacks are
recognized as a critical threat to the integrity of FL systems. Such attacks typically involve the injection of maliciously
crafted data or model updates, aiming to degrade the performance of the aggregated global model [5, 13]. These attacks
can be broadly categorized into two types: direct data poisoning and model update poisoning. Direct data poisoning
sees adversaries manipulate the training data on their devices [5] while model update poisoning involves tampering
with the parameters submitted for aggregation [14].
Label flipping attacks continue to be a significant threat in Federated Learning (FL), where adversarial clients intention‑
ally manipulate class labels to mislead the global model. Recent studies, [15] demonstrate how these attacks significantly
degrade model performance, particularly in heterogeneous and non-IID data environments, where inconsistencies in
client data amplify poisoning effects. Additionally, Distributed Backdoor Attacks (DBA) have emerged as a more coor‑
dinated form of poisoning, where multiple adversarial clients work together to introduce subtle backdoors into the
model [16]. These attacks pose a serious challenge, as they can remain undetected by standard anomaly detection
methods. To mitigate poisoning threats, various robust aggregation techniques, such as Krum, Trimmed Mean, and
Median-based filtering, have been widely adopted [16]. However, these approaches often fail against adaptive poison‑
ing attacks, where adversaries manipulate their updates to blend in with legitimate contributions. Recent research has
explored trust-based client selection and adaptive thresholding mechanisms, which assess client reliability based on
historical behavior to effectively detect and filter malicious updates [13]. Additionally, privacy-preserving solutions, such
as blockchain-enhanced secure aggregation, have been proposed to strengthen FL security [10].
To counteract these vulnerabilities, the research community has proposed various defense mechanisms. Robust aggre‑
gation algorithms, such as Krum [17] and its variants, aim to mitigate the impact of poisoned updates by identifying
and excluding outliers from the aggregation process. Although these methods have shown effectiveness against model
poisoning, they may not fully address the subtleties of data poisoning attacks.
Mei and Zhu describe error-generic poisoning attacks, which are designed to cause general misclassifications rather
than specific errors. The effectiveness of these attacks is gauged by the resulting drop in accuracy [18]. Common meth‑
ods include adding random noise or flipping labels randomly, which serve as foundational techniques for various other
attacks.
Suya et al. discuss targeted poisoning attacks, where the objective is to induce particular errors or misclassifications
[19]. The success of these attacks is measured by the rate at which they achieve the intended misclassifications during
model testing. A basic method for these attacks involves flipping labels according to a pre-set plan.
In the context of backdoor poisoning attacks, Bagdasaryan et al. explain that this involves manipulating a small seg‑
ment of the training data by inserting a specific pattern or perturbation and altering the label [4]. Once the model is
trained, this particular pattern or perturbation triggers the targeted misclassification.
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Anomaly detection techniques have been explored to identify suspicious patterns in participants’ updates,
potentially indicative of poisoning attempts [20]. These approaches, ranging from statistical analyses to complex
machine learning models, seek to detect and mitigate the influence of anomalous contributions on the global
model. Furthermore, evaluating the trustworthiness of participants through behavioral analysis has emerged as
a complementary strategy. By assessing the reliability of contributions based on historical behavior, systems can
develop trust scores to help identify potential adversaries.
Numerous strategies have been devised to compromise machine learning models, spanning various models like
Support Vector Machines (SVM), regression, dimensionality reduction, linear classifiers, and neural networks [21–23].
However, most of this prior work focuses on attacking ML models in a conventional setting where data is collected
centrally [24]. In contrast, our study delves into attacks within the FL framework. Consequently, many existing
attack and defense mechanisms tailored for traditional ML are not directly applicable to FL. For instance, tactics
reliant on crafting optimal poison instances by analyzing the data distribution during training become irrelevant,
as malicious FL participants may only modify the data they possess [25, 26]. Similarly, server-side defenses, which
sift through and remove poison instances using anomaly detection or k-NN, are ineffective in FL since servers only
receive parameter updates from participants, not individual data points.
The growing adoption of FL has prompted investigations into various attack types within this framework, includ‑
ing backdoor attacks, gradient leakage attacks, and membership inference attacks. Our focus lies particularly on
poisoning attacks within FL, which can be categorized into data poisoning and model poisoning. Our work con‑
centrates on data poisoning, where a malicious participant manipulates their training data by adding or altering
instances, without directly interfering with the local learning process. Conversely, model poisoning involves altering
the learning process itself to generate adversarial gradients and updates. Some studies have demonstrated the
efficacy of both targeted and untargeted model poisoning attacks in significantly impacting model performance.
While model poisoning can be effective, data poisoning may offer advantages in certain scenarios, as it does not
require sophisticated manipulation of learning software on participant devices, it is resource-efficient, and it accom‑
modates non-expert attackers.
To improve the robustness of federated learning against model poisoning attacks, recent research has explored
contrastive learning-based defense mechanisms. The study in [27] presents a robust federated contrastive recom‑
mender system, leveraging contrastive loss to effectively distinguish malicious model updates from benign ones,
thereby mitigating the impact of poisoning. This approach aligns with existing robust aggregation techniques and
highlights the potential for integrating contrastive learning into federated learning security frameworks to enhance
resilience against adversarial manipulation.
Recent studies have highlighted the vulnerability of visually-aware federated recommender systems to adver‑
sarial manipulations, where poisoning attacks exploit feature dependencies in multimodal data to degrade model
performance. These attacks can significantly impact the personalization and recommendation accuracy of federated
systems by introducing maliciously crafted visual or textual data that skews model training. The work in [28] sys‑
tematically examines these threats, demonstrating how adversaries can manipulate latent feature representations
to mislead federated learning models. To mitigate such risks, the study proposes countermeasures that enhance
system resilience, including robust feature alignment, anomaly detection mechanisms, and adaptive model updates.
These approaches align with existing defense strategies in federated learning, reinforcing the need for adaptive
aggregation techniques and trust-based client selection to detect and mitigate poisoning attacks effectively.
Visually-aware federated recommender systems have been identified as vulnerable to adversarial manipulations,
where poisoning attacks exploit feature representations to degrade model performance. The work in [29] examines
these vulnerabilities and demonstrates how attackers can manipulate visual embeddings to mislead the federated
learning process. To address this issue, the study proposes countermeasures, including robust feature alignment
and anomaly detection mechanisms, to enhance system resilience.
Despite the progress made in defending against data poisoning attacks, several challenges persist. The dynamic
and heterogeneous nature of FL environments complicates the implementation of uniform defense mechanisms
across all participants. Additionally, there exists a delicate balance between enhancing security and preserving the
efficiency and privacy of the FL system [30]. Future research should focus on the development of adaptive, scalable,
and privacy-preserving defense mechanisms capable of countering sophisticated poisoning strategies.
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
Federated Learning, despite its decentralized nature and privacy-preserving benefits, is susceptible to various types of
attacks. Understanding the threat landscape is crucial to developing robust defense mechanisms. We explore different
aspects of the threat model, the objectives of adversaries, their constraints and capabilities, and specific attack strategies
such as label flipping. Additionally, we provide metrics for evaluating the effectiveness of attacks and defenses.
3.1 Threat model
We consider a scenario where a subset of participants in FL systems may be malicious or under the control of a malicious
adversary. The percentage of malicious participants among all participants is denoted as m%. Malicious participants
can infiltrate the system through various means, including the addition of adversary-controlled devices, compromising
benign participants’ devices, or incentives benign participants to engage in data poisoning activities. Throughout our
analysis, we assume the aggregator responsible for model aggregation is honest and not compromised.
3.2 Adversary objectives
Objective The adversary’s objective is to manipulate the learned parameters of the global model M such that it exhibits
high errors for particular classes, forming a subset C ′ ⊂ C . This constitutes a targeted poisoning attack, as opposed to
untargeted attacks that aim for indiscriminate high errors across all classes. Targeted attacks are preferred by adversar‑
ies as they minimize the chances of detection by focusing their influence on specific classes while avoiding significant
impacts on non-targeted classes.
Capability We consider a realistic adversary model with constraints. Each malicious participant can manipulate the
training data Di on their own device but cannot access or manipulate other participants’ data or the model learning pro‑
cess (e.g.,Stochastic Gradient Descent (SGD) implementation, loss function, or server aggregation process). The attack
is not specific to the Deep Neural Network (DNN) architecture, loss function, or optimization function being used. While
it requires corrupting the training data, the learning algorithm remains unaltered.
Mathematical formulation Let M denote the global model trained over R rounds of FL. The objective of the adversary
can be mathematically expressed as follows:
∑
minimize error(M(x) ≠ c) (1)
c∈C �
where error(M(x) ≠ c) denotes the error incurred when the global model M misclassified an instance x as not belonging
to class c.
Notation C ′ represents the subset of classes targeted by the adversary for high error rates.
Operational constraints Adversaries operate within realistic constraints where each malicious participant possesses
the capability to manipulate the training data Di on their respective device. However, they lack access to or control over
the data of other participants and are unable to influence the model learning process. This includes parameters such as
the SGD implementation, loss function, or server aggregation process. Moreover, the attack methodology is not tailored
to a specific DNN architecture, loss function, or optimization function. Instead, it relies on corrupting training data while
keeping the learning algorithm unchanged.
Limitations on capability Despite their malicious intent, adversaries are constrained to data manipulation within
their local environment. They lack knowledge about the global distribution of data, the specific DNN architecture in use,
or the internal mechanisms of the FL process beyond their individual participation. Consequently, the sophistication of
attacks is limited to data-level manipulations, preventing more advanced exploits targeting the model or training process.
Mathematical representation The constraints on adversary capability can be mathematically represented as follows:
Adversary capability = Data manipulation ⊆ Local environment. (2)
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
This equation underscores that the adversary’s capability is confined to manipulating data within their local environment,
without access to the broader FL system or global model parameters.
Label flipping attacks represent a potent strategy for implementing targeted data poisoning in FL systems. In such
attacks, malicious participants strategically manipulate the labels of their local datasets to influence the global
model’s training process. This manipulation involves changing the class labels of specific instances from their original
classes to designated target classes, thereby injecting bias into the training data. Mathematically, we can represent
this process as follows:
As shown in Fig. 1, a label flipping attack can significantly impact the accuracy of the global model. For a given
source class csrc and a target class ctarget from the set of all classes C , each malicious participant Pi modifies their
dataset Di as:
For x ∈ Di where class(x) = csrc , change class(x) to ctarget
This attack can be denoted as csrc → ctarget . For instance, in an image classification scenario such as MNIST, this could
involve changing the class labels of images originally categorized as "airplane" (csrc ) to "bird" (ctarget).
The label flipping and backdoor attack algorithms in federated learning involve adversarial clients manipulating
local training data or model updates to compromise the integrity of the global model. In the label flipping attack,
malicious clients alter class labels to mislead the learning process, while in the backdoor attack, specific trigger
patterns are embedded in the data to induce targeted misclassifications. The poisoned updates are then submit‑
ted to the server, where they are aggregated with benign updates, ultimately impacting model performance. The
pseudo-code (Algorithm 1) provides a structured breakdown of these attacks, detailing their execution at both the
client and server levels.
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
This type of attack, while well-known in centralized machine learning settings, is particularly suited for FL environ‑
ments due to its effectiveness, efficiency, and ease of execution. Unlike other poisoning strategies, label flipping attacks
do not require knowledge of the global data distribution, the specifics of the deep neural network architecture, or the
optimization functions being employed. Furthermore, they can be carried out with minimal computational resources and
without the need for sophisticated expertise, making them attractive for adversaries seeking to undermine FL systems.
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
3.5 Evaluation metrics
At the conclusion of R rounds of FL, the model M is finalized with parameters 𝜃R . Let Dtest represent the test dataset used
to evaluate M , where Dtest ∩ Di = � for all participant datasets Di . We employ various evaluation metrics to assess the
impact of label flipping attacks in FL:
Global model accuracy (Macc): This metric quantifies the percentage of instances x in Dtest that are correctly classified
by the global model M with final parameters 𝜃R . Mathematically, it is calculated as:
Number of correctly predicted instances
Macc = × 100%.
Total number of instances
Class recall (crecalli ): For any class ci in the set C , its class recall measures the proportion of true positive instances TPi out
of all instances where M𝜃R (x) = ci and ci is the true class label of x . The formula for class recall is:
TPi
crecalli = × 100%.
TPi + FNi
Here, TPi represents the number of instances correctly predicted as ci , and FNi denotes the number of instances incor‑
rectly predicted as ci.
Baseline misclassification count (mcntij ): Suppose MNP is a global model trained for R rounds using FL without any
malicious attacks. For classes ci and cj , the baseline misclassification count from ci to cj , denoted mcntij , signifies the number
of instances x in Dtest where MNP (x) = cj while the true class of x is ci.
In this section, we delve into the susceptibility of FL systems to label flipping attacks. Label flipping attacks involve
maliciously altering the labels of training data to mislead the learning process and degrade model performance. This
section presents a comprehensive investigation into how such attacks can be executed, their impact and the conditions
under which they are most effective. The following subsections outline our experimental setup, methods for generating
label flipping attacks simulation of these attacks within an FL environment, and an analysis of the timing of such attacks
relative to key points in the learning process.
4.1 Experimental setup
Our experimentation involved employing well-known image classification dataset: Fashion-MNIST [9]. The dataset, con‑
sists of 70,000 28×28 grayscale images of fashion products categorized into 10 classes, with 7000 images per class. The
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
dataset is divided into a training set of 60,000 images and a test set of 10,000 images, mirroring the structure of the
original MNIST dataset. In the experiment, our approach involved employing a two-layer convolutional neural network
coupled with batch normalization. This architecture attained a notable test accuracy of 91.75% under the centralized
scenario without any presence of poisoning. Further elucidation on the specific architectures and detailed experimental
methodologies will be provided in subsequent sections. The implementation of FL is carried out in Python, employing
the PyTorch library [31]. By default, our FL setup comprises 50 participants, one central aggregator, and a parameter k
set to 5. We adopt an independent and identically distributed (iid) data distribution approach, whereby the entire train‑
ing dataset is randomly and uniformly allocated among all participants. Each participant is assigned a distinct subset
of the training data for their individual training process. It’s noteworthy that the testing data is exclusively utilized for
evaluating the model’s performance and is not included in any participant’s training dataset. Furthermore, we observe
that both DNN models converge within fewer than 20 iterations.
To simulate the label flipping attack within our FL framework, we follow a structured procedure. Initially, at the onset of
each experiment, we designate a specific subset of participants as malicious, constituting m% of the total participant
pool. The remaining participants are categorized as honest contributors. To ensure robustness against the potential vari‑
ability introduced by the random selection of malicious participants, we repeat each experiment 10 times and compute
the average results. By default, we set the malicious participant percentage to m = 10%, unless specified otherwise.
We explore three distinct attack scenarios, each characterized by unique source and target class pairings. These
scenarios are carefully selected to represent diverse conditions for adversarial attacks. The evaluated pairings include
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
scenarios where certain source classes are frequently or infrequently misclassified as target classes during non-
poisoned federated training.
Mathematically, the label flipping process can be succinctly represented. Let N denote the total number of participants,
and m represent the percentage of malicious participants. Accordingly, the number of malicious participants, denoted
m
as M, is calculated as M = N × 100 . The remaining participants, totaling N − M , are considered honest contributors.
This systematic approach allows for a thorough investigation of label flipping attacks in FL systems, providing
insights into their effectiveness under various conditions and facilitating the assessment of their impact on model
integrity and performance.
Figure 2 illustrates the impact of data poisoning in a federated learning environment by plotting the source class recall
against the count of malicious clients. The x-axis enumerates the absolute number of malicious participants involved
in the poisoning attack, while implicitly referring to a total of 25 clients participating in the learning process. These
3 6 9
numbers correspond to ratios of malicious clients, with markers at intervals indicating 25 , 25 , 25 , and so on. The y-axis
represents the source recall metric, which gauges the ability of the federated model to correctly classify instances of
the source class targeted by the attack.
The observed data, indicated by black bars, suggests a robustness of the model to a small number of adversarial
clients, maintaining a high source recall as the count of malicious clients increases up to 6 out of 25. Beyond this
point, the recall demonstrates a marked decline, indicating that the federated model’s capacity to accurately clas‑
sify instances of the source class degrades as the number of malicious clients grows. Notably, when the number of
attackers reaches half of the total clients ( 15
25
), the recall drops significantly, approaching zero. This steep reduction
in recall underscores the critical level of vulnerability in federated learning systems to even a minority of malicious
actors, hence highlighting the need for defensive strategies to detect and neutralize such adversarial behaviors to
maintain the integrity and reliability of the federated learning process.
Figure 3 showcases a comparative analysis of the global model’s accuracy and the source class recall as a function of
the number of malicious participants in a federated learning system. Here, the x-axis quantifies the count of malicious
clients, denoting an incremental increase in the scale of the attack, while maintaining a static total of 25 clients. The
y-axis on the left measures the model accuracy ( Macc ), represented by the blue line with circle markers, whereas the
y-axis on the right, represented by the red line with ’x’ markers, measures the recall for the source class ( crecall_src ) that
the adversarial participants target.
The blue and red lines in the graph illustrate the trends in performance degradation as the number of malicious actors
increases. The data shows that both overall model accuracy and source class recall remain relatively stable until the num‑
ber of malicious clients reaches 6. Beyond this threshold, a clear decline is observed, with a significant performance drop
as the number of malicious clients approaches 15. The graph provides crucial insights into the resilience of federated
learning systems, highlighting a critical tipping point beyond which system performance deteriorates substantially. This
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
finding underscores the urgent need for robust detection and mitigation strategies to protect the integrity of federated
learning systems against data poisoning attacks.
Figure 4 illustrates the relationship between model accuracy and source precision as the number of malicious clients
increases in a federated learning system. The x-axis designates the absolute number of malicious clients, while the y-axis
is bifurcated to represent both model accuracy ( Macc ) and source class precision (cprecision_src ), depicted by the blue and
red lines, respectively.
Initially, the graph indicates a stable performance of the federated learning model with high accuracy and source
precision as the number of malicious clients ranges from 0 to 6. This stability suggests a degree of inherent robustness
to small-scale data poisoning attacks. However, as the count of malicious actors surpasses 6, a decoupling of the two
metrics is observed. The model accuracy, indicated by the blue line with circle markers, begins a modest decline, signify‑
ing a gradual degradation of overall model performance. In contrast, the source precision, shown by the red line with ’x’
markers, exhibits a more pronounced plummet, indicating that the precision with which the model identifies instances
of the attacked source class is substantially more sensitive to the presence of malicious clients.
At the extremity, when the malicious clients constitute the majority (15 out of 25), the source precision approaches
zero, while the model accuracy also falls but to a lesser extent, settling above 85%. This asymmetry highlights the critical
impact that malicious participants exert specifically on the compromised source class, while also affecting the general
model accuracy to a certain degree. It emphasizes the need for specialized defensive measures in federated learning
systems that not only maintain overall model integrity but also protect against targeted attacks that seek to undermine
specific model predictions.
The timing of such attacks is critical in understanding how and when the introduction of malicious clients affects the over‑
all system performance. We delineate our analysis into two distinct phases: before and after a defined performance break
point. This segmentation allows for a detailed exploration of the system’s vulnerability at different stages of the attack.
Figure 5 illustrates the class recall over communication rounds before reaching the break point. Here, we compare the
performance of the federated learning system under non-poisoned and poisoned conditions. Initially, both the non-
poisoned and poisoned models exhibit high class recall, with values close to 1.0. This stability persists through approxi‑
mately 35 communication rounds. At the 36th communication round, denoted by the green dotted line marking the
break point, there is a dramatic decline in the class recall for the poisoned model, dropping sharply to almost 0.5. This
significant drop indicates the point at which the influence of malicious clients becomes substantially detrimental to the
system’s performance. After this drop, the poisoned model shows considerable fluctuation, with class recall values rising
and falling but generally remaining below the performance of the non-poisoned model. This phase before the break
point underscores the initial resilience of the federated learning system. It demonstrates that the system can maintain
high accuracy and class recall despite the presence of some malicious actors, up to a critical threshold.
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
In Fig. 6, the second graph presents the class recall over communication rounds after surpassing the break point. In this
scenario, the break point is indicated at the 25th communication round by the green dotted line. Before reaching this
break point, the poisoned model’s class recall shows considerable volatility, with values fluctuating between 0.0 and 0.8,
reflecting the immediate impact of malicious data on the system. Upon reaching and surpassing the 25th communica‑
tion round, the class recall for the poisoned model exhibits a rapid increase, stabilizing near 1.0, which is comparable to
the non-poisoned model. This stabilization indicates that the federated learning system, once past the initial disruption
phase caused by the label flipping attack, can recover and maintain high class recall.
The behavior after the break point suggests that while the system initially suffers from the introduction of mali‑
cious clients, it has mechanisms or adaptations that allow it to regain performance over time. This phase highlights the
importance of continued communication rounds in mitigating the impact of data poisoning and achieving resilience
in federated learning systems.
The defense method described utilizes a server-side mechanism to protect a federated learning model from both mali‑
cious clients and clients with poor data quality. This is achieved through a process of weight validation before aggregation:
Figure 7 illustrates the distribution of gradients collected from both malicious and honest participants across various pro‑
portions of malicious clients. This analysis underscores the effectiveness of a server-side defense mechanism designed to
safeguard federated learning models from both malicious clients and clients with poor data quality. Subplots (7a) through
(7e), each depicting the gradient distributions for different proportions of malicious clients. Blue X s represent gradients
from malicious participants, while yellow Os represent gradients from honest participants. Subplot (7a), where 60% of
the clients are malicious, shows a high concentration of blue X s, indicating a significant presence of malicious gradients.
In subplot (7b), with 40% of the clients being malicious, there remains a considerable number of blue X s , though less
concentrated than in (7a). Subplot (7c), featuring 30% malicious clients, shows a more balanced distribution between
honest and malicious gradients, with a further decrease in blue X s . In subplot (7d), with 20% malicious clients, the blue
X s are fewer and more dispersed among the yellow Os . Finally, subplot (7e), where only 5% of the clients are malicious,
has sparse blue X s , indicating minimal impact from malicious gradients. These subplots collectively demonstrate that
as the proportion of malicious clients decreases, the density of malicious gradients diminishes, thereby reducing the
potential impact on the federated learning model.
The subplots collectively demonstrate how the proportion of malicious clients influences the distribution of gra‑
dients. As the percentage of malicious clients decreases from (7a) through (7e), the density of malicious gradients
(blue X s ) diminishes. This trend indicates a reduction in the potential impact of poisoned weights on the federated
learning model. The defense mechanism effectively filters out these malicious gradients by evaluating and flagging
them based on the recall threshold, thereby maintaining the integrity of the global model. This analysis confirms
the robustness of the proposed defense mechanism in protecting federated learning systems from data poisoning
attacks.
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
Algorithm 1 forms the core of our defense strategy, utilizing a recall threshold (e.g., 0.5) to classify malicious updates,
balancing detection accuracy while minimizing false positives. This threshold was selected based on empirical observa‑
tions and prior research, though its effectiveness may vary across datasets, requiring adaptive tuning. The computational
complexity of PCA-based dimensionality reduction increases with the number of clients and update dimensions, making
scalability a concern. To address this, incremental PCA and feature selection techniques can optimize performance for
large-scale FL deployments. Additionally, adversaries could adapt their attack strategies to evade detection by mimicking
benign distributions. To counter this, dynamic thresholding and ensemble anomaly detection could enhance robust‑
ness. Finally, deploying this mechanism in resource-constrained environments (e.g., IoT, healthcare FL systems) requires
efficiency optimizations such as low-rank approximations and lightweight feature extraction, ensuring feasibility without
significant computational overhead.
Figure 8 exhibits the model accuracy in a federated learning scenario post-implementation of a defensive weight
validation strategy. The y-axis denotes the model accuracy, and the x-axis enumerates the number of malicious clients
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
out of a fixed set, indicating the scale of adversarial presence within the system. Even with the increment in malicious
clients from 10 to 60, the model accuracy remains substantially stable, predominantly above the 0.90 threshold. This
consistent performance illustrates the robustness of the defense method, which effectively identifies and discards mali‑
cious gradients during the weight aggregation phase. By only incorporating updates from clients whose contributions
do not significantly diverge from expected accuracy benchmarks, the method ensures the integrity and trustworthiness
of the aggregated global model. The graph serves as a testament to the defense method’s effectiveness in protecting
the model against a substantial influx of adversarial interventions, maintaining high model accuracy irrespective of the
adversarial scale. Consequently, this defense mechanism demonstrates its potential as a viable solution for enhancing
the resilience of federated learning systems against data poisoning attacks.
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
Figure 9 presents the source class recall of a federated learning model when a defensive weight validation approach
is implemented. The x-axis categorizes the number of malicious from 10 to 60. The y-axis measures the source class
recall, demonstrating the model’s ability to correctly identify instances of a specific class, with a value ranging from
0 to 1. Post-defense, the recall across various client counts remains consistently high, close to 1. This suggests that
the defense method is effectively filtering out the malicious gradients that would otherwise compromise the model’s
performance. Unlike traditional methods where all client updates are aggregated, this selective aggregation ensures
that only contributions from clients with acceptable recall are considered. As a result, the integrity of the global model
is preserved, indicated by the maintained high recall rates regardless of the number of malicious clients. The graph
demonstrates the robustness of the defense strategy, highlighting its ability to withstand a considerable proportion
of malicious clients without a significant drop in recall, thereby evidencing the method’s efficacy in sustaining model
reliability in the face of adversarial attempts to degrade performance.
The observed stabilization of class recall after the break point is primarily due to the dominance of honest cli‑
ents’ updates over multiple training rounds and the model’s adaptive learning process, which gradually mitigates
the effects of poisoned updates. As training progresses, incorrect gradients from malicious clients become diluted,
allowing the model to reinforce correct patterns. This trend suggests that adaptive defense mechanisms, such as
weighted aggregation and client trust models, can leverage this recovery behavior to enhance federated learning
resilience. Furthermore, analysis across different datasets indicates that lower-dimensional datasets recover faster
due to simpler decision boundaries, while higher-dimensional datasets require more training rounds to correct adver‑
sarial influence. These findings highlight the significance of extended training and adaptive aggregation strategies
in improving model robustness against targeted data poisoning attacks.
6 Conclusion
This paper presents a comprehensive investigation into data poisoning attacks on Federated Learning systems, with
a particular focus on the vulnerability of these frameworks to label flipping attacks. Our study underscores the sig‑
nificant threat posed by such attacks, demonstrating their capacity to severely undermine the integrity of the global
model. We have empirically shown that as the proportion of malicious participants increases, the detrimental impact
on the global model escalates, highlighting the feasibility of achieving targeted poisoning effects. Furthermore, our
findings reveal that adversaries can amplify the effectiveness of their attacks by strategically manipulating the avail‑
ability of malicious participants during later rounds of training.
To address these security challenges, we proposed a robust defense mechanism aimed at assisting FL aggregators
in distinguishing between malicious and honest participants. Our defense strategy, which exhibits resilience against
gradient drift, effectively identifies and isolates anomalous gradients indicative of malicious behavior. This approach
significantly mitigates the risk of data poisoning attacks, thereby enhancing the security and reliability of FL systems.
Vol.:(0123456789)
Research
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Our research provides valuable insights into the inherent vulnerabilities of FL systems and introduces a practical
solution for safeguarding against sophisticated poisoning strategies. Future research will aim to extend this defense
mechanism to counter other types of FL attacks, including model poisoning and backdoor attacks, thereby further
strengthening the security of federated learning environments. Additionally, exploring the scalability and adapt‑
ability of our proposed defense mechanism across diverse FL configurations and datasets remains an important
direction for future work.
Author contributions Ansam Khraisat led the conceptualization, methodology, and formal analysis of the study. Ammar Alazab supervised
the research, managed project administration, and contributed to manuscript revision. Moutaz Alazab provided validation, resources, and
secured funding for the research. Tony Jan contributed to drafting the original manuscript and played a key role in reviewing and refining the
final version. Sarabjot Singh conducted investigations, and implemented software components. Md. Ashraf Uddin contributed to manuscript
revision and refinements. All authors have reviewed and approved the final version of the manuscript.
Data availability The datasets generated during and/or analyzed during the current study are available from the corresponding author on
reasonable request.
Declarations
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which
permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to
the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You
do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party
material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by-nc-nd/4.0/.
References
1. Agrawal S, Sarkar S, Aouedi O, Yenduri G, Piamrat K, Alazab M, Bhattacharya S, Maddikunta PKR, Gadekallu TR. Federated learning for
intrusion detection system: concepts, challenges and future directions. Comput Commun. 2022;195:346–61.
2. Nguyen TD, Rieger P, Miettinen M, Sadeghi A-R. Poisoning attacks on federated learning-based iot intrusion detection system. In: Proc.
workshop decentralized IoT syst. secur. (DISS), 2020. pp. 1–7.
3. Xia G, Chen J, Yu C, Ma J. Poisoning attacks in federated learning: a survey. IEEE Access. 2023;11:10708–22.
4. Bagdasaryan E, Veit A, Hua Y, Estrin D, Shmatikov V. How to backdoor federated learning. In: International conference on artificial intel‑
ligence and statistics. Cambridge: PMLR; 2020. pp. 2938–48.
5. Tolpegin V, Truex S, Gursoy ME, Liu L. Data poisoning attacks against federated learning systems. In: Computer security–ESORICS 2020:
25th European symposium on research in computer security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I
25. Berlin: Springer; 2020. pp. 480–501.
6. Biggio B, Nelson B, Laskov P. Support vector machines under adversarial label noise. In: Asian conference on machine learning. 2011. pp.
97–112.
7. Steinhardt J, Koh PWW, Liang PS. Certified defenses for data poisoning attacks. In: NeurIPS. 2017. pp. 3517–29.
8. Xiao H, Xiao H, Eckert C. Adversarial label flips attack on support vector machines. In: ECAI. 2012. pp. 870–5.
9. Xiao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:
1708.07747. 2017.
10. Alazab A, Khraisat A, Singh S, Jan T. Enhancing privacy-preserving intrusion detection through federated learning. Electronics.
2023;12(16):3382.
11. Shejwalkar V, Houmansadr A, Kairouz P, Ramage D. Back to the drawing board: a critical evaluation of poisoning attacks on production
federated learning. In: 2022 IEEE symposium on security and privacy (SP). New York: IEEE; 2022. pp. 1354–71.
12. Cao D, Chang S, Lin Z, Liu G, Sun D. Understanding distributed poisoning attack in federated learning. In: 2019 IEEE 25th International
conference on parallel and distributed systems (ICPADS). New York: IEEE; 2019. pp. 233–9.
13. Khraisat A, Alazab A, Singh S, Jan T Jr, Gomez A. Survey on federated learning for intrusion detection system: concept, architectures,
aggregation strategies, challenges, and future directions. ACM Comput Surv. 2024;57(1):1–38.
14. Fang M, Cao X, Jia J, Gong N. Local model poisoning attacks to {Byzantine-Robust} federated learning. In: 29th USENIX security symposium
(USENIX Security 20). 2020. pp. 1605–22.
15. Guo Z, Zhang Y, Zhang Z, Xu, Z, King I. Fedhlt: Efficient federated low-rank adaption with hierarchical language tree for multilingual
modeling. In: Companion proceedings of the ACM on web conference 2024. 2024. pp. 1558–67.
Vol:.(1234567890)
Discover Internet of Things (2025) 5:16 | https://doi.org/10.1007/s43926-025-00108-6
Research
16. Wan Y, Qu Y, Ni W, Xiang Y, Gao L, Hossain E. Data and model poisoning backdoor attacks on wireless federated learning, and the defense
mechanisms: a comprehensive survey. IEEE Commun Surv Tutor. 2024;26(3):1861–97.
17. Blanchard P, El Mhamdi EM, Guerraoui R. Stainer J. Machine learning with adversaries: byzantine tolerant gradient descent. In: Advances
in neural information processing systems, vol. 30. 2017.
18. Mei S, Zhu X. Using machine teaching to identify optimal training-set attacks on machine learners. In: Proceedings of the AAAI conference
on artificial intelligence, vol. 29. 2015.
19. Suya F, Mahloujifar S, Suri A, Evans D, Tian Y. Model-targeted poisoning attacks with provable convergence. In: International conference
on machine learning. Cambridge: PMLR; 2021. pp. 10000–10.
20. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R. Advances and open
problems in federated learning. Found Trends Mach Learn. 2021;14(1–2):1–210.
21. Lin J, Dang L, Rahouti M, Xiong K. Ml attack models: adversarial attacks and data poisoning attacks. arXiv preprint arXiv:2
112.0
2797. 2021.
22. Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A. Practical black-box attacks against machine learning. In: Proceedings of
the 2017 ACM on Asia conference on computer and communications security. 2017. pp. 506–19.
23. Namiot D. Introduction to data poison attacks on machine learning models. Int J Open Inf Technol. 2023;11(3):58–68.
24. Khraisat A, Alazab A. A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation
strategy, attacks, public datasets and challenges. Cybersecurity. 2021;4:1–27.
25. Cao X, Jia J, Zhang Z, Gong NZ. Fedrecover: recovering from poisoning attacks in federated learning using historical information. In: 2023
IEEE symposium on security and privacy (SP). New York: IEEE; 2023. pp. 1366–83.
26. Kasyap H, Tripathy S. Beyond data poisoning in federated learning. Expert Syst Appl. 2024;235: 121192.
27. Yuan W, Yang C, Qu L, Ye G, Nguyen QVH, Yin H. Robust federated contrastive recommender system against model poisoning attack. arXiv
preprint arXiv:2403.20107. 2024.
28. Nguyen TT, Hung Quoc Viet N, Nguyen TT, Huynh TT, Nguyen TT, Weidlich M, Yin H. Manipulating recommender systems: a survey of
poisoning attacks and countermeasures. ACM Comput Surv. 2024;57(1):1–39.
29. Yang S, Wang C, Xu X, Zhu L, Yao L. Attacking visually-aware recommender systems with transferable and imperceptible adversarial styles.
In: Proceedings of the 33rd ACM international conference on information and knowledge managements. 2024. pp. 2900–9.
30. Lyu L, Yu J, Nandakumar K, Li Y, Ma X, Jin J, Yu H, Ng KS. Towards fair and privacy-preserving federated deep models. IEEE Trans Parallel
Distrib Syst. 2020;31(11):2524–41.
31. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L. Pytorch: an imperative style, high-
performance deep learning library. In: NeurIPS. 2019. pp. 8024–35.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Vol.:(0123456789)