0% found this document useful (0 votes)
1 views48 pages

Full Text 01

This thesis explores adversarial attacks in federated learning, focusing on targeted attacks that misclassify specific labels. It examines existing attack and defense strategies, proposing a new attack utilizing generative adversarial networks (GAN) that can bypass server defenses. The study aims to enhance understanding of vulnerabilities in federated learning systems and suggests methods to improve their security.

Uploaded by

p24is004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views48 pages

Full Text 01

This thesis explores adversarial attacks in federated learning, focusing on targeted attacks that misclassify specific labels. It examines existing attack and defense strategies, proposing a new attack utilizing generative adversarial networks (GAN) that can bypass server defenses. The study aims to enhance understanding of vulnerabilities in federated learning systems and suggests methods to improve their security.

Uploaded by

p24is004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING

SECOND CYCLE, 60 CREDITS

Adversarial Attacks in Federated


Learning

MATTEO DEMARTIS

Stockholm, Sweden 2021


Adversarial Attacks in
Federated Learning

MATTEO DEMARTIS

Master’s Programme, Machine Learning, 120 credits


Date: March 5, 2022

Supervisors: Jalil Taghia, Antoine Honoré


Examiner: Saikat Chatterjee
School of Electrical Engineering and Computer Science
Host company: Ericsson
© 2022 Matteo Demartis
|i

Abstract
Machine learning requires diverse training datasets to perform well. Sharing
datasets is often a legal and privacy issue across countries/companies. Federated
learning is a learning framework in which the datasets are distributed across
different agents and allow training on complex datasets without centralization
of the data. Each agent performs a local training according to its dataset and the
model updates are centralized. Even though this process might seem secure,
the fact that each single agent can manipulate the update arbitrarily raises many
security concerns. The current literature highlights possible ways of attacking
the final model. This thesis is mainly focused on targeted attacks where the
goal is to misclassify a selected label in the dataset. The principal attacks and
defense strategies will be examined and finally a new attack based on general
adversarial networks (GAN) is proposed. The GAN attack can break and pass
through the server defense mechanisms showing the problems that could arise
when these are implemented. Finally, possible ideas to increase the security
of federated learning are examined and discussed.
ii |
| iii

Abstract
Maskininlärning kräver olika utbildningsdatauppsättningar för att fungera bra.
Att dela datauppsättningar är ofta en juridisk fråga och integritetsfråga mellan
länder/företag. Federated learning är ett inlärningsramverk där datamängderna
är fördelade över olika agenter och tillåter utbildning i komplexa datamängder
utan centralisering av datan. Varje agent utför en lokal utbildning enligt
dess datauppsättning och modelluppdateringarna är centraliserade. Även om
denna process kan verka säker, väcker det faktum att varje enskild agent
kan manipulera uppdateringen godtyckligt många säkerhetsproblem. Den
aktuella litteraturen belyser möjliga sätt att attackera den slutliga modellen.
Detta examensarbete är främst inriktat på riktade attacker där målet är att
felklassificera en vald etikett i datasetet. De huvudsakliga attackerna och
försvarsstrategierna kommer att undersökas och slutligen föreslås en ny attack
baserad på general adversarial networks (GAN). GAN-attacken kan bryta och
passera genom serverns försvarsmekanismer som visar de problem som kan
uppstå när dessa implementeras. Slutligen undersöks och diskuteras möjliga
idéer för att öka säkerheten för federerat lärande.
iv |
|v

Acknowledgments
I would like to particularly thank my Ericsson supervisor Jalil that guided me
during this project and supported me during the thesis. I also want to thank
Ericsson that gave me opportunity of work on this project with them. A special
thanks goes to my family and parents that always supported me in my choices.
I’d like to also thank all my friends that were close to me during these past
months. I would also like to thank my KTH supervisor Antoine Honoré and
my examiner Saikat Chatterjee.
Stockholm, September 2021
Matteo Demartis
vi | CONTENTS

Contents

1 Introduction 1
1.1 Background and Problem . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Sustainability and Ethics . . . . . . . . . . . . . . . . . . . . 2
1.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 3

2 Background and Related Work 5


2.1 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Federated Averaging . . . . . . . . . . . . . . . . . . 5
2.2 Adversarial Attacks . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 General Adversarial Networks . . . . . . . . . . . . . . . . . 6
2.4 Gradients Leakage Techniques . . . . . . . . . . . . . . . . . 7
2.5 Malware detection based on GAN . . . . . . . . . . . . . . . 8
2.6 Secure Aggregation . . . . . . . . . . . . . . . . . . . . . . . 9

3 Methods 11
3.1 Defense strategies . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Accuracy Checking . . . . . . . . . . . . . . . . . . . 11
3.1.2 Weight Statistics . . . . . . . . . . . . . . . . . . . . 11
3.1.3 GAN defense . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Attack strategies . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Explicit boosting . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Proposed GAN attack . . . . . . . . . . . . . . . . . . 13
3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Results and Analysis 19


4.1 Vanilla Federated Learning . . . . . . . . . . . . . . . . . . . 19
4.2 Explicit boosting . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Weight Statistics . . . . . . . . . . . . . . . . . . . . . . . . 21
Contents | vii

4.4 GAN attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


4.4.1 GAN attack + Weight Update Statistics . . . . . . . . 24
4.4.2 Proposed defense against GAN attack . . . . . . . . . 27

5 Conclusions and Future work 29


5.1 Discussion and Conclusions . . . . . . . . . . . . . . . . . . 29
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

References 31
viii | Contents
LIST OF FIGURES | ix

List of Figures

2.1 The overview of our DLG algorithm. Variables to be updated


are marked with a bold border. While normal participants
calculate W to update parameter using its private training data,
the malicious attacker updates its dummy inputs and labels
to minimize the gradients distance. When the optimization
finishes, the evil user can obtain the training set from honest
participants. [1] . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 MalGAN architecture [2] . . . . . . . . . . . . . . . . . . . . 9

3.1 GAN scheme. The figure shows the data flow between generator
and discriminator. The discriminator is trained to distinguish
between benign and malicious updates. The generator goal is
to generate updates that are malicious but look as benign (the
discriminator classifies them as benign). . . . . . . . . . . . . 14
3.2 Proposed attack sequence graph. . . . . . . . . . . . . . . . . 16

4.1 Vanilla Federated Learning Model . . . . . . . . . . . . . . . 20


4.2 Explicit boosting with boosting factor 4 . . . . . . . . . . . . 21
4.3 Explicit boosting with weight statistics (τ = 55) . . . . . . . 23
4.4 GAN Attack with buffer 32 and λ = 0.015 . . . . . . . . . . . 27
4.5 Weight Statistics and Increasing Threshold Weight Statistics
acceptance rate comparison . . . . . . . . . . . . . . . . . . . 28
x | LIST OF FIGURES
LIST OF TABLES | xi

List of Tables

4.1 Explicit boosting with weight statistics agents accuracy . . . . 22


4.2 GAN attack (sample size 2) and Weight Update Statistics . . . 25
4.3 GAN attack (sample size 8) and Weight Update Statistics . . . 25
4.4 GAN attack (sample size 16) and Weight Update Statistics . . 25
4.5 GAN attack (sample size 32) and Weight Update Statistics . . 25
4.6 GAN attack (sample size 64) and Weight Update Statistics . . 25
4.7 GAN attack + Weight Update Statistics - summary . . . . . . 26
4.8 Explicit boosting, GAN attack (GA) comparison. . . . . . . . 26
4.9 GAN Attack with Weight Statistics (WS) and Increasing Threshold
WS (IT WS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
xii | LIST OF TABLES
Introduction | 1

Chapter 1

Introduction

The main objective of the thesis is to first investigate the current state of the
art attacks and defense mechanisms so that a baseline to work and experiment
with can be created. The second, main goal is to investigate and research on
potentially new attacks to show what weaknesses can arise from choosing a
federated learning approach.

1.1 Background and Problem


Federated learning (FL) [3] can be seen as a variant of distributed learning
under strict privacy constraints. Federated learning distributes model training
among several agents without sharing data. Each client has a portion of the
dataset and trains the model locally. The local updates are then sent to the
server that combines them and updates the global model. The newly generated
weights are sent to the clients and the same previous steps are repeated until
convergence. FL addresses critical privacy issues in relation to data such as
data security, and data access rights. Today, FL has found its application
across industries including telecommunications, IoT, and medical sectors.
The privacy is of prime importance in federated learning. It is crucial to
study robustness of federated learning techniques to possible model poisoning
attacks. Model poisoning [4] is an attack initiated by a single, or a group of
malicious agents where the adversarial objective is to cause the model to mis-
perform. Different kind of attacks against federated learning models will be
investigated and research on the best defense strategies will be carried out.
2 | Introduction

1.2 Purpose
Finding vulnerabilities and weaknesses helps in understanding better the
problems of a system. The aim of this project is to study the consequences
of attack strategies that a malicious client could deploy in the federated
training scheme. Federated learning is often used for medical data [5] where
sensitive data (for example electronic health records (EHR)) is shared among
the clients (i.e., hospitals). In these cases, the privacy of the data is of extreme
importance but also the stability of the final model is crucial. Therefore,
it is important to show new ways of attacking so that stricter and more
thoughtful defense mechanisms can be implemented. Having independent
clients is, hence, dangerous since they could manipulate the updates in such
a way that a crafted adversarial goal is reached by a malevolent client (for
example miscalssification). The server must implement one or several defense
strategies in order to identify and manage potential attackers.

1.3 Goals
This thesis will mainly be research oriented. The first part consists in
implementing the base Federated Learning system on top of which the attacks
and defense strategies will be tested. The first attack to be implemented will be
explicit boosting and after that defense strategies like model accuracy checking
and weight statistics will be incorporated on the server side.
The next part consists of implemented one of the main research ideas for
attacking a federated system performing a targeted attack. The idea behind is
to generate an attack through a general adversarial network (GAN) [6] that is
able to learn, during the distributed training, the best way to send a poisoned
updated to the server and pass unnoticed.

1.4 Sustainability and Ethics


Machine Learning and in particular this project with the help federated
learning, can help train and develop system that are able to improve life
condition and quality. Federated learning is particularly useful in medical
applications therefore contributing in the Good Health and Well being UN
SDG but also Industry Innovation and Infrastructure.
Introduction | 3

1.5 Structure of the thesis


Chapter 1 gives an overview of the purpose and goals of the project. Chapter 2
presents relevant background information and literature needed for the project.
Chapter 3 presents the methodology and method used to solve the problem.
Chapter 4 reports and analyses the experiments that have been carried out.
Finally, Chapter 5 summarizes the final finding and discusses the future work.
4 | Introduction
Background and Related Work | 5

Chapter 2

Background and Related Work

2.1 Federated Learning


As mentioned in the introduction section, Federated Learning is a distributed
technique used for training a global model [3]. The data is, for example,
distributed among mobile phones that together participate in the federated
training using their local data without sharing it. The main reason why
federated learning is used is to keep the data private without sharing it to
the central server. Federated Learning is based on some assumptions that
are different from the classical distributed optimization. Firstly, the data is
not necessarily IID (Independent and Identically Distributed) since the data
belongs to the specific client and is usually user specific. The data is often
unbalanced among the clients since some users could produce more data than
others. Moreover, it is expected for the system to be massively distributed
having more clients than the average number of samples in each client and the
communication between server and client is generally limited.
In a Federated Learning setting any kind of model can be chosen for the
distributed training. In this project Feed-forward Neural Networks [7] are
chosen as an example. More complex or different models (like CNN [8] or
SVM [9]) could also be selected.

2.1.1 Federated Averaging


Federated averaging is a strategy that the server deploys to aggregate client
updates. After that the server initializes the weight of the model, these weights
are sent to the clients, they compute locally the model update and send it
back to the server. The server now is in charge of aggregating all the clients
6 | Background and Related Work

updates. One of the techniques used for this is called Federated Averaging
[3]. Federated Averaging is based on stochastic gradient descent (SDG)
[10]. In a federated setting, usually is used large-batch synchronous SDG
that reaches state-of-the-art over asynchronous methods [11]. The algorithm
works iteratively selecting a fraction of the clients at each round, computing the
gradients on the local data held by the clients and sending back the gradients to
the server. The server then computes the average of the gradients (eventually
weighted) and updates the server weights. In the case C = 1 (where C is
the fraction of clients selected), the algorithm performs full-batch gradient
descent. This is often called FederatedSDG. The FederatedSDG approach
could be changed so that each client can perform more than one training pass
one the local data. Moreover, the client itself can batch the local data. This is
called FederatedAveraging.

2.2 Adversarial Attacks


When deploying a federated system, particular attention must be paid to
potential adversarial clients. In a federated setting, the clients are independent
units that influence the result of the final model. This means that, if not enough
security measures are implemented, a malicious client can manipulate the
updates in such a way that the final model reach a previously crafted adversarial
target. There are different kinds of attacks that can be carried out. In data
poisoning attacks [12], the training data is changed in such a way that the final
test error increases. Another type of attack often used in Federated Learning
is model update poisoning [13] where the attacker is forging its local update so
that the global model reaches the attacker goal (that could be, for example, a
label exchange). Another type of attack are inference time attacks [14] where
attacks are performed in inference time instead of training time. A typical
example of inference time attacks are adversarial attacks, where a specifically
crafted input is given to the model being able to change the output of the model.

2.3 General Adversarial Networks


General Adversarial Networks [6] are a kind of generative model where two
models are trained simultaneously. The first model called generative model
captures and learn the distribution of the data that will later be used to generate
new data. The second model is a discriminative model. The goal of the
generator is to maximize the probability that the discriminator makes a mistake
Background and Related Work | 7

while the discriminator tries to differentiate between real data and generated
data. To reach the discriminator goal the discriminator loss function is built
as follow:
m
1 X
log D xgenerated (i) + log 1 − D xtrue (i)
 
LD =
m i=1

where m is batch size, D represents the discriminator network, xgenerated (i)


is the i-th sample that the generator produced and xtrue (i) is the i-th real sample
(data sample coming from the dataset).
On the other hand, the generator loss function is
m
1 X
log 1 − D xgenerated (i)

LG = (2.1)
m i=1

where λ is the weight factor, xgenerated (i) = G(z i ) and G represents the
generator network and z i is a random noise sample generated from a noise
prior p(z).

2.4 Gradients Leakage Techniques


Methods based on gradient are one of the main techniques used for training
neural networks. Gradients are generally thought to be a secure quantity to
share. However, recent research showed that from the gradients is possible
and restore the original training data [1]. In Federated Learning the gradients
are generally exchange from clients to server and this could be potentially
exploited to reconstruct the data held in the clients. In the paper Deep Leakage
from Gradients [1] is presented a method called Deep Leakage from Gradients
(DLG). This work shows the original data can be found generating a pair of
dummy input and labels and then the typical forward and backward pass are
performed to get the dummy gradients. Now, instead of optimizing the model
weights, the dummy input and label are optimized to minimize the distance
between the dummy gradients and the real gradients. Figure 2.1 shows the
way the DLG algorithm works.
8 | Background and Related Work

Figure 2.1: The overview of our DLG algorithm. Variables to be updated


are marked with a bold border. While normal participants calculate W to
update parameter using its private training data, the malicious attacker updates
its dummy inputs and labels to minimize the gradients distance. When the
optimization finishes, the evil user can obtain the training set from honest
participants. [1]

2.5 Malware detection based on GAN


Today machine Learning is often used for malware detection. In the paper
Generating Adversarial Malware Examples for Black-Box Attacks Based on
GAN [15] an approach based on GAN (MalGAN) is presented [2]. A
malware data-set that consists of a binary malware feature vector m and a the
corresponding benign one, was used for training. It is assumed that a malware
black-box detection model is available. The aim of the MalGAN is to bypass
the black-box detector. Figure 2.2 shows the architecture of MalGAN. The
generator takes as input the malware feature vector m and a noise vector and
generates a binary output o0 . The final adversarial example is generated as
m0 = m|o0 where | is the element-wise binary OR. In this way the original
features required by the malware are kept. Other features could be added
by the generated with the purpose of making the malware pass undetected.
The generated adversarial malware and benign examples are then labelled by
the black-box detector. The substitute detector is used to fit the black-box
Background and Related Work | 9

detector. It takes a program feature vector as input and classify it between


benign program and malware. This architecture is proved to be successful at
fooling the black-box detector.

Figure 2.2: MalGAN architecture [2]

2.6 Secure Aggregation


Using federated aggregation, the server can see the gradients of each single
client and in this way is possible for the server to retrieve information on the
data stored in the clients (for example, using gradient leakage techniques).
This raises a problem from a privacy perspective. Secure aggregation tries
to solve this problem by using a function that enables the clients to send a
particular value such that the server can learn just the sum of the client’s
updates. Usually homomorphic encryption is used for this purpose. [16] [17]
10 | Background and Related Work
Methods | 11

Chapter 3

Methods

In this chapter we will address the steps taken to perform an attack under
federated learning (FL) constrains by a malicious client. In the first section
will be explained the defense mechanisms used in the experiments while the
second section will focus on the attack mechanisms.

3.1 Defense strategies


3.1.1 Accuracy Checking
If a validation dataset is available, the server can check how the accuracy
changes when adding the update from client i to the current server weights.
In the case the model obtained has an accuracy lower than a certain threshold
γ compared to the model generated adding all the other clients but i, then the
client i is classified as malicious and the corresponding update is discarded.
The threshold γ must be tuned so that benign updates are kept while discarding
the malicious ones. This method is presented in [18].

3.1.2 Weight Statistics


One method to check if an update is valid (not malicious) is to check how
different the distance between a specific update and the rest is. Given a
distance metric d(., .) and a threshold κ, we can compute all the pairwise
distances for all the clients and we flag an agent as malicious if his range of
distances differ from the others by the threshold κ. The parameter κ must
be tuned. As stated in the original paper, for a malicious client the range
t t t t
 
Rm = mini∈[k]\m d δ m , δ i , maxi∈[k]\m d δ m , δ i is computed. Where
12 | Methods

δ ti is the update of client m at time t. The range Rm is computed for


l
all the clients, then the minimum lower bound (Rmin,[k]\m ) and maximum
u
upper bound (Rmax,[k]\m ) are computed. A client is identified as malicious
n o
u l l u
if max Rm − Rmin,[k]\m , | Rm − Rmax,[k]\m | ≥ κ. In this way, will be
ensured that the distances from the malicious client and any other client is not
too different from the one of any other two clients.

3.1.3 GAN defense


The updates generated by the GAN’s attack strategy are able to pass through
the weight statistics filters and being flagged as a benign update. One possible
way to detect the attack is to perform a modification in the way weights
statistics works. Instead of using a fixed threshold κ, as it was proposed in
the original paper, an adaptive threshold is used during the federated training.
The threshold is kept relatively small for the first training epochs so that the
defense is not too strict. After a certain number of epochs, the threshold is
increased. In this way, at the beginning the defense mechanism will be more
flexible and it will accept more updates while it will later become stricter, being
able to discard some of the updates generated by the GAN. This technique can
attenuate the damages generated from the GAN attack but is not able to block
the attack completely. As the weights statistics threshold increases, the server
is able to detect more of the coming malicious updates but it will also discard
some of the benign ones. This threshold must be set so that as many as possible
malicious updates are blocked but, at the same time, no benign updates should
be discarded.

3.2 Attack strategies


To achieve the adversarial goal, the client (or potentially more clients) manipulates
the update so that it can influence the final model reaching the target adversarial
goal. For the method setup, we assume that the dataset D is distributed in the
clients with portions Di = {xi,j , yi,j } for j = 0, 1, ...n where n is the number
of samples in Di . The malicious client will have locally a portion of the datset
Di . To perform the attacks this dataset fraction is changed so the labels are
changed with the target labels. The malicious portion of the dataset becomes
D̂i = {xi,j , τi,j } where τij is the new label for the datapoint j in the portion i
of the dataset.
Methods | 13

3.2.1 Explicit boosting


One well known technique to attack is called explicit boosting [18]. The
malicious agent runs several steps of training using a gradient-based optimizer
on the malicious data D̂i to obtain the malicious weights W tm . Given that
the previous weights sent by the server are W t−1 S the malicious update is
t t t−1
computed as δ̂ m = W m − W S . The update is then boosted by a factor λ so
that, once it is sent back to the server, the global weights satisfy the malicious
objective. The value of λ is generally bigger than one and commonly equal to
the number of clients. The explicit boosting attack is proved to be effective in
reaching the targeted poisoned accuracy. In the results section will be reported
the main experiments using explicit boosting. As will be highlighted by the
experiments, explicit boosting is ineffective when accuracy checking and/or
weight update statistics is used.

3.2.2 Proposed GAN attack


The idea behind this method is to use a GAN that is able to perform an
attack towards the server making the update effective at misclassify the target
label while keeping it as close as possible to the benign update so that can
pass undetected through the server defense mechanisms (in this case accuracy
checking and weight statistics).
Figure 3.1 shows the general structure of the GAN. Compared to the
classical GAN the discriminator is changed so that it can classify between
benign and malicious update while the generator tries to reproduce the
malicious updated from the benign one making it classify as benign. To reach
the goal previously explained, the loss of both generator and discriminator
must be changed. The discriminator can be easily changed so that it makes
classify the benign updates as benign (for example, label 1) and the malicious
update as malicious (for example, label 0), this will imply that discriminator
will try to minimize the following average binary cross entropy loss:
m
1 X
log D xmalicious (i) + log 1 − D xbenign (i)
 
LD =
m i=1

where m is batch size, D represent the discriminator network, xmalicious (i)


is the i-th malicious update and xbenign (i) is the i-th benign update.
With the previous loss the discriminator will act as a normal binary
classifier, learning to distinguish between malicious and benign update.
14 | Methods

To achieve the generator goal, the generator must be changed so that it


takes the benign update as input and generates the malicious version of it. The
loss can be constructed by two factors weighted by a parameter λ. The first
factor must make sure that the generated update looks like an actual malicious
update, this is obtained minimizing the mean square error (MSE). The second
factor takes into account the fact that the the update must pass undetected and
classified as benign. This can be done pushing the discriminator to classify
the update as benign (label 1). The generator will then try to minimize the
following loss:
m
1 X
(1 − λ)MSE (xgenerated (i) , xmalicious (i) ) + λ log 1 − D xgenerated (i)

LG =
m i=1
(3.1)
(i) (i))
where λ is the weight factor, xgenerated = G(xbenign ) and G
represents the generator network.

Figure 3.1: GAN scheme. The figure shows the data flow between generator
and discriminator. The discriminator is trained to distinguish between benign
and malicious updates. The generator goal is to generate updates that are
malicious but look as benign (the discriminator classifies them as benign).

For the GAN method to work is required some initial data to start training
the model. This data must be collected somehow during the federated training
Methods | 15

process. The general idea is to start attacking with an already known method
(that will potentially be blocked by the server defense mechanisms) and
collecting the benign and malicious update that will then be added to the
training dataset.

Fixed data samples


In this version the data updates will be collected for X number of federated
epochs. Once the data is collected, the GAN will be trained until convergence
is reached. From the federated epoch X + 1 on, the generator network
will be used to generate the malicious update from the benign one. In this
approach the GAN will be trained with only the first X updates collected.
This method, if a sufficient number of samples is collected, is effective at
passing through the server’s defense mechanisms. The algorithm is illustrated
in Figure 3.2. Algorithm 1 and Algorithm 2 show the server and malicious
client pseudocode.

Online increasing data


As in the previously explained version, the data will initially be collected for
X number of federated epochs and the GAN will be trained until convergence
then, the GAN, will be used to generated the malicious update. In this version,
from epoch X + 1 the accuracy on the malicious local dataset (where the
targeted label is exchanged) will be computed to check if the previous update
sent to server helped in reaching the targeted goal. If the accuracy on the
malicious dataset increased the previously sent malicious update is added in
the GAN training dataset and the oldest update inserted is removed. Every
time that a new sample is added to the dataset the GAN in trained with the
new dataset for Y epochs.

Tuning of generator loss factor λ


The λ factor plays an important role in influencing the effectiveness of the
attack. As we can see from equation 3.1, if λ increases more weight is given to
the right side of the equation that will make the update be classified as benign
by the discriminator network. On the other hand, if the λ value decreases
more importance is given to MSE part, meaning that the update will be closer
to malicious update. A proper value of λ must be found so that update is
effective at reaching the attacker’s goal (malicious update) and the same time
making it classify as benign.
16 | Methods

Figure 3.2: Proposed attack sequence graph.

Algorithm 1 Server’s algorithm


1: Ws ← initialize_weights() . Initialize Server Weights
2: while not_converged do
3: send_weights_to_agents(Ws ) . Sends w
4: δ ← get_agents_updates()
5: δavg ← f ederated_mean(δ)
6: Ws ← Ws + ηδavg . Update Old Weights, η is learning rate
7: end while
Methods | 17

Algorithm 2 Client’s algorithm


1: Ws ← get_server_weights()
2: if data_not_collected then
3: δ ← explicit_boostig_update(Ws )
4: store(δ) . Used later to train GAN
5: else
6: if GAN_to_be_trained then
7: GAN ← train_GAN ()
8: end if
9: δ ← generate_attack(GAN ) . Generated using current update
10: end if
11: send_to_server(δ)

3.3 Data Collection


The data used for the experiments is a telecommunication dataset. The data
comes from four different agents that in the experimental settings correspond
to the four different clients. The input is composed by 69 features selected for
the training. The output is a binary label.
18 | Methods
Results and Analysis | 19

Chapter 4

Results and Analysis

4.1 Vanilla Federated Learning


When no malicious client is involved and no defense mechanisms are implemented
on the server side, the federated model reaches a final test accuracy 0.800 and
a label 1 test accuracy of 0.414. Figure 4.5 shows the testing accuracy, label
1 test accuracy and the single agent’s accuracies respectively. This will be
considered as the baseline for the other experiments.
20 | Results and Analysis

Test Accuracy Label 1 Test Accuracy

Agents Test Accuracies Agents Label 1 Test Accuracies


Figure 4.1: Vanilla Federated Learning Model

4.2 Explicit boosting


In this section explicit boosting with a boosting factor of 4 (number of clients)
is used on top the vanilla federated. One agent is selected to act as malicious
while all the other behave normally. In Figure 4.2 it can be seen that explicit
boosting cause a drop in the label 1 testing accuracy bringing close to zero
(the mis-classification attacker’s goal is reached). This has influence in the
test accuracy where we can see a drop as well. The same kind of drop can be
notice in the respective agents test accuracies. The final testing accuracy using
explicit boosting is 0.732 while the label 1 test accuracy is 0.002. Explicit
boosting is then proved as an effective attacking technique.
Results and Analysis | 21

Test Accuracy Label 1 Test Accuracy

Acceptance Rate Agents Test Accuracies

Agents Label 1 Test Accuracies


Figure 4.2: Explicit boosting with boosting factor 4

4.3 Weight Statistics


In this experiment, weights statistic is used with a threshold parameter of
55. The threshold was found using the procedure explained in the methods
sections. When explicit boosting is used with the same parameter used in
22 | Results and Analysis

the previous section. From this experiment we can see that when weights
statistics is used all the updates generated by the client using explicit boosting
are detected and blocked. The number of times the malicious update was
accepted is 0/150, the final test accuracy is 0.778 and label 1 test accuracy
is 0.557. Table 4.1 and Figure 4.3 show the result per agent and global testing
accuracy while training. As we can notice in figure 4.3, the acceptance rate
is constant to zero meaning that explicit boosting in ineffective when weight
statistics is used on the server side. Also, we can observe that the test accuracy
and label 1 test accuracy tend to increase remaining stable (no drops).

Table 4.1: Explicit boosting with weight statistics agents accuracy


Agents Test Accuracy Label 1 Test Accuracy
Agent 1 0.731 0.778
Agent 2 0.800 0.541
Agent 3 0.763 0.183
Agent 4 0.814 0.736
Results and Analysis | 23

Test Accuracy Label 1 Test Accuracy

Acceptance Rate Agents Test Accuracies

Agents Label 1 Test Accuracies


Figure 4.3: Explicit boosting with weight statistics (τ = 55)

4.4 GAN attack


In this section the main results of the GAN are reported.
24 | Results and Analysis

4.4.1 GAN attack + Weight Update Statistics


Experiments settings
In this experiment the Federated Learning model is trained using the Ericsson
dataset. The malicious client performs an attack using the GAN model while
the server uses weight update statistics as defense mechanism. The first set
of experiments will be done using the Fixed data samples version [3] using
a sample size of 2, 8, 16, 32, and 64. After the required number of samples
have been collected the GAN is trained of 2000 epochs (until convergence is
reached).

Tuning of buffer size


From table 4.7 it is possible to notice that if a too small buffer size is used
(meaning that few sample are collected for the GAN training), the attack is
unsuccessful and malicious goal is not reached. As the buffer size increases,
with a correctly tuned λ, the attack works successfully. This is because the
GAN needs enough data to be able to learn the best way to attack. We can also
observe that as the buffer size keeps increasing the attack is still successful.
The reason why is convenient to use a smaller buffer size is to start attacking in
the early steps of the federated global training. From these set of experiments,
it is possible to see that a buffer of 8 (Table 4.3) is already effective for the
attack to be successful.

Tuning of λ, generator loss factor


As it can be observed in table 4.5, as the sample size is kept fixed (e.g. 32)
and the lambda increases we can notice that generally the acceptance rate
also increases. This means that the attack passes undetected more times.
The drawback of increasing λ too much is that, even though the attack goes
undetected, the efficiency of the attack decreases. Looking at the table 4.4
where the sample size is 16, it is possible to notice that as λ increases the label
1 test accuracy increases as well, meaning that the attack is not making any
damage to the final model, making it ineffective. Summarizing, the factor
λ influences the strength of the attack. Changing this factor changes the
effectiveness of the attack. A too small λ will produce attacks that will be
detected by the server. On the other hand, if λ is too big the attacks will pass
undetected but the malicious goal will not be reached. The loss factor λ must
be tuned so that the correct balance is found.
Results and Analysis | 25

Table 4.2: GAN attack (sample size 2) and Weight Update Statistics
λ Test Accuracy Label 1 Test Accuracy Acceptance rate
0.01 0.764 0.129 148/150
0.1 0.796 0.329 148/150
0.5 0.798 0.370 148/150

Table 4.3: GAN attack (sample size 8) and Weight Update Statistics
λ Test Accuracy Label 1 Test Accuracy Acceptance rate
0.01 0.731 0 93/150
0.1 0.732 0 120/150
0.5 0.785 0.243 142/150

Table 4.4: GAN attack (sample size 16) and Weight Update Statistics
λ Test Accuracy Label 1 Test Accuracy Acceptance rate
0.01 0.732 0 51/150
0.1 0.782 0.261 125/150
0.5 0.781 0.467 123/150

Table 4.5: GAN attack (sample size 32) and Weight Update Statistics
λ Test Accuracy Label 1 Test Accuracy Acceptance rate
0.01 0.732 0.0003 20/150
0.015 0.732 0 28/150
0.1 0.732 0 39/150
0.5 0.732 0 44/150

Table 4.6: GAN attack (sample size 64) and Weight Update Statistics
λ Test Accuracy Label 1 Test Accuracy Acceptance rate
0.01 0.732 0 21/150
0.1 0.777 0.236 58/150
0.5 0.794 0.461 3/150
26 | Results and Analysis

Table 4.7: GAN attack + Weight Update Statistics - summary

Sample size λ Test Accuracy Label 1 Test Accuracy Acceptance rate


2 0.01 0.764 0.129 148/150
2 0.1 0.796 0.329 148/150
2 0.1 0.798 0.370 148/150
8 0.01 0.731 0 93/150
8 0.1 0.732 0 120/150
8 0.5 0.785 0.243 142/150
16 0.01 0.732 0 51/150
16 0.1 0.782 0.261 125/150
16 0.5 0.781 0.467 123/150
32 0.01 0.732 0.0003 20/150
32 0.015 0.732 0 28/150
32 0.1 0.732 0 39/150
32 0.5 0.732 0 44/150
64 0.01 0.732 0 21/150
64 0.1 0.777 0.236 58/150
64 0.5 0.794 0.461 3/150

Comparison with explicit boosting


Table 4.8 shows a comparison between the GAN attack and explicit boosting
when weight statistics is used. It can be clearly seen that the attacks generated
with explicit boosting are blocked by the server (the acceptance rate is 0/150).
On the other hand, the GAN attack can pass through the defense mechanisms
and makes the label 1 test accuracy drop to 0. In the case of the GAN attack,
as discussed before, the acceptance rate increases as the λ increases.

Table 4.8: Explicit boosting, GAN attack (GA) comparison.

Method Test Accuracy Label 1 Test Accuracy Acceptance rate


Explicit boosting 0.778 0.557 0/150
GA (buffer 32, λ = 0.01) 0.732 0.0003 20/150
GA (buffer 32, λ = 0.1) 0.732 0 39/150
GA (buffer 32, λ = 0.5) 0.732 0 44/150
Results and Analysis | 27

Test Accuracy Label 1 Test Accuracy

Acceptance Rate Operators Test Accuracies

Operators Label 1 Test Accuracies


Figure 4.4: GAN Attack with buffer 32 and λ = 0.015

4.4.2 Proposed defense against GAN attack


The proposed defense mechanisms is able to reduce the damages produced
by the GAN attack but is not able to block it completely. In this experiment
will be shown that using a threshold that changes and increases after a certain
number of epochs (the buffer size in this case) is able reduce the number of
28 | Results and Analysis

times the attack is accepted by the server. Since the attack is still accepted
many times can damage the performance of the model. Table 4.9 shows that
when the increasing threshold (from 55 to 85 in this case) method is used, the
acceptance rate drops from 51/150 to 36/150. This means that the server was
able to detect the attacker more times compared to the classic weight statistics.
It is possible to notice that the label 1 test accuracy is 0 in both cases meaning
that even though the attacker was able to pass through the server filters less
times, it was still able to damage the final model.

Table 4.9: GAN Attack with Weight Statistics (WS) and Increasing Threshold
WS (IT WS)

Defense Method Test Accuracy Label 1 Test Accuracy Acceptance rate


WS 0.732 0.0 51/150
IT WS 0.732 0.0 36/150

WS Acceptance Rate IT WS Acceptance Rate


Figure 4.5: Weight Statistics and Increasing Threshold Weight Statistics
acceptance rate comparison
Conclusions and Future work | 29

Chapter 5

Conclusions and Future work

5.1 Discussion and Conclusions


The increasing popularity of federated learning brings more concerns on the
security flaws of this method. Each agent taking part in the federation is
usually independent and has the freedom of changing the updates and the
data in any way. This might become a problem if one or more of the agents
have malicious intentions. For this reason, is important to show what potential
risks can originate when federated learning is used. In this way, new defense
mechanisms could be developed to make the whole federated model more
robust. In this thesis the main state-of-art attack and defense mechanisms
have been implemented to show how these work from a server and client
prospective. In particular it has been shown that attacks like explicit boosting
can be detected and stopped by server using methods like weight statistics.
The majority of the thesis focuses on researching on how to perform a targeted
attack under federated settings. The attack proposed using general adversarial
networks (GAN) can pass undetected through the server weight statistics filters
and reach the targeted adversarial goal. Once the buffer size and λ have been
tuned, it is possible to reach a zero test accuracy on the target label as shown in
the experiment section. This means that the attack can pass through the weight
statistic filter successfully.
This result is important because it highlights one weakness that could arise
in a federated learning implementation. The aim of this discovery is to make
people aware of potential flaws in FL.
During the experiments the test dataset was used to tune the parameters,
in a real scenario a separated dataset is often not available. If a separate test
dataset is not available, the train data can be used instead for the tuning of the
30 | Conclusions and Future work

parameters.

5.2 Future work


The work of the thesis mainly focused on a new attack strategy. In this thesis
has been proposed a variation of weight statistics using an increasing threshold
that is able to reduce the number of times the malicious update is accepted by
the server but is not still able to block the attach completely. As a further
step, would be interesting to investigate new ways of defending the proposed
attack so that it can be detected and blocked by the server. Research on defense
strategies is of great importance to increase the robustness and popularity of
federated learning. Moreover, since secure aggregation is commonly used
today, research on attacks and defense strategies on it could be relevant.
REFERENCES | 31

References

[1] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” 2019.

[2] W. Hu and Y. Tan, “Generating adversarial malware examples for


black-box attacks based on GAN,” CoRR, vol. abs/1702.05983, 2017.
[Online]. Available: http://arxiv.org/abs/1702.05983

[3] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,


“Communication-efficient learning of deep networks from decentralized
data,” 2017.

[4] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Model


poisoning attacks in federated learning,” in Proc. Workshop Secur. Mach.
Learn.(SecML) 32nd Conf. Neural Inf. Process. Syst.(NeurIPS), 2018.

[5] J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang,


“Federated learning for healthcare informatics,” 2020.

[6] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-


Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial
networks,” 2014.

[7] G. Bebis and M. Georgiopoulos, “Feed-forward neural networks,” IEEE


Potentials, vol. 13, no. 4, pp. 27–31, 1994. doi: 10.1109/45.329294

[8] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, “Object recognition with


gradient-based learning,” in Shape, Contour and Grouping in Computer
Vision. Berlin, Heidelberg: Springer-Verlag, 1999. ISBN 3540667229
p. 319.

[9] C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning,


vol. 20, pp. 273–297, 1995.

[10] L. Bottou, “Online learning and stochastic approximations,” 1998.


32 | REFERENCES

[11] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, “Revisiting


distributed synchronous sgd,” 2017.

[12] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support


vector machines,” in Proceedings of the 29th International Coference
on International Conference on Machine Learning, ser. ICML’12.
Madison, WI, USA: Omnipress, 2012. ISBN 9781450312851 p.
1467–1474.

[13] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How


to backdoor federated learning,” CoRR, vol. abs/1807.00459, 2018.
[Online]. Available: http://arxiv.org/abs/1807.00459

[14] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing


adversarial examples,” 2015.

[15] W. Hu and Y. Tan, “Generating adversarial malware examples for black-


box attacks based on gan,” 2017.

[16] E. Shi, T.-H. H. Chan, E. Rieffel, R. Chow, and D. Song, “Privacy-


preserving aggregation of time-series data,” in NDSS, 2011.

[17] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N.


Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, R. G. L.
D’Oliveira, H. Eichner, S. E. Rouayheb, D. Evans, J. Gardner, Z. Garrett,
A. Gascón, B. Ghazi, P. B. Gibbons, M. Gruteser, Z. Harchaoui,
C. He, L. He, Z. Huo, B. Hutchinson, J. Hsu, M. Jaggi, T. Javidi,
G. Joshi, M. Khodak, J. Konečný, A. Korolova, F. Koushanfar, S. Koyejo,
T. Lepoint, Y. Liu, P. Mittal, M. Mohri, R. Nock, A. Özgür, R. Pagh,
M. Raykova, H. Qi, D. Ramage, R. Raskar, D. Song, W. Song, S. U.
Stich, Z. Sun, A. T. Suresh, F. Tramèr, P. Vepakomma, J. Wang, L. Xiong,
Z. Xu, Q. Yang, F. X. Yu, H. Yu, and S. Zhao, “Advances and open
problems in federated learning,” 2021.

[18] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing


federated learning through an adversarial lens,” 2019.
TRITA – TRITA-EECS-EX-2022:119

www.kth.se

You might also like