0% found this document useful (0 votes)
81 views18 pages

Blockchain Solutions for Federated Learning

Uploaded by

Medha Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views18 pages

Blockchain Solutions for Federated Learning

Uploaded by

Medha Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Blockchain for

federated learning
Sreya Francis, Martinez Ismael
Current
Scenario
Existing technology - Issues

• Data collection means adopted right now is incredibly privacy


invasive
• We give our data for free in return of a free service
• Latency issues
• High transfer costs
• Centralized ownership (Users don’t participate in the current
system)
• Very limited data for healthcare research
Current Issues

● Privacy Concerns
○ We don’t have control over the data we generate!

● We are losing one source of natural income


o Data is our natural resource and we own it

● Sensitive Product Problem - some services are creepy


o High risks of theft, embarrassment, resale ……etc

● Centralized control by Big Tech Giants


o All of our data are controlled by tech giants like google,
facebook
How can we solve this?

● Enhance user privacy


○ We should control our data

● We should be rewarded for the data we own


o Rewards based on data quality and quantity

● Decentralized power
o Everyone has control over their data

● Enhance production of sensitive products/models


o Enhanced privacy would make it easier to collect data
related to sensitive fields like healthcare
Ingredients for the solution
Federated Learning
BlockChain

Internet of Cryptograph
Things y
1 Federated Learning
● What is Federated Learning?
● How does it work?
● Federated Learning Platforms
Federated Learning -
Definition

● Idea: machine learning


over a distributed dataset

● Federated computation:
where a server coordinates
a fleet of participating
devices to compute
aggregations of devices’
private data.

● Federated learning: where


a shared global model is
trained via federated
computation.

● Definition: training a
shared global model, from
a federation of
participating devices
which maintain control of
their own data, with the
facilitation of a central
server.
Federated Learning –
Brief stepwise overview
● Step 1: Users download a Model
● Step 2: Users train the Model on their
own data.
● Step 3: Users upload their Gradients
to a server
● Step 4: Gradients are added up to
protect privacy.
● Step 5: The Model is updated with the
Global Model.
Federated Learning –
Algorithm
Server
Until Converged:
1. Select a random subset (e.g.200 ) of the (online) clients
2. In parallel, send current parameters θ(t) to those clients

Selected client K
1.Receive θ(t) from server.
2. Run some number of minibatch SGD steps, producing θ’
3. Return θ’-θ(t) to server.

3. θ(t+1) = θ(t) + data-weighted average of client updates


Federated Learning
– Pros & Cons

Pros:

o Enhanced User Privacy: Users keep their data


secret

Cons:

○ Privacy: Gradients give hints about data


○ Theft: Participants can steal the updated
models
○ No Sensitive Products: Because of
theft/privacy issues
One Possible Solution:
Homomorphic Encryption
What is Homomorphic Encryption?

• Homomorphically encrypt the user gradients so that the gradient privacy is preserved
• Privacy-Preserving Deep Neural Network model (2P-DNN) based on the Paillier
Homomorphic Cryptosystem could be used to enhanced global model privacy
• Hence there is no issue of theft or privacy intrusion in this case
Reward Calculation
Possible way

• Based on user model performance on validation set


o To evaluate the validity of user data, we can run a validation check on the user
model based on a trusted validation set.
o Based on the performance on validation set, the users can be rewarded.
o If the validation accuracy goes below a specified threshold, the data is rejected.

• Pros
o An easy and fast way to calculate user reward immediately after client side
training

• Cons
o At any given iteration, an honest gradient may update the model in an incorrect
direction, resulting in a drop in validation accuracy.
o This is confounded by the problem that clients may have data that is not
accurately modeled by our trusted validation set
Issues with data in FL
What can go wrong?

• Gamber attack
o User/Attacker can randomly pick data and maliciously change them
o User can give garbage input
o User/Attacker give data that does not contribute to the model

• Omniscient attack
o Attackers are supposed to know the gradients sent by all the workers
o Use the sum of all the gradients, scaled by a large negative value,
o And replace some of the gradient vectors.

• Gaussian attack
o Some of the gradient vectors are replaced by random vectors sampled from a
Gaussian distribution with large variances.
How to counter adversaries?
Possible ways

• Based on KRUM Algorithm


o Uses the Euclidean distance to rank the gradients
o Determines which gradient contributions are removed
o the top f contributions to the client model that are furthest from the mean client
contribution are removed from the aggregated gradient

• Pros
o specifically designed to counter adversaries in federated learning.

• Cons
o Not an absolute measure of user contribution
o Implementation is a bit complicated
How to ensure validity of gradients?
Possible ways

Let us assume that q out of n vectors are Byzantine/incorrect, where q < n:

Expected average
gradient

Krum’s Algo in a nutshell:


•Works only when q < n
•Ensure upto 33% protection
against adversarial attacks
•Best solution proposed till date
Proposed Solution to the User Reward Issue
• Data Cost
o Each User calculates his/her data cost
o Class id – Ci, Number of samples - Nci
o Cost per user -> ∑j=1 to k (j*Nci)

• Generate validation set


o Based on parameters passed to calculate data cost
o Automatically generate a validation set with some random samples
o Samples pertain to user specified classes

• Training

•Stop training before the model


over-fits data
•If validation error doesn’t go
down, user entry is wrong
•If validation error goes down,
user entry is valid and pay the user
based on calculated data cost
2 To Do: Causal Learning
● How can Causal Learning help FL?
● Issues?
● Possible solutions

You might also like