0% found this document useful (0 votes)
68 views32 pages

Review On Towards Deep Learning Models Resistant To Adversarial Attacks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views32 pages

Review On Towards Deep Learning Models Resistant To Adversarial Attacks

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Chair of

Chair of Communication
Machine Learning Networks
Department of Electrical Engineering
Department of Electrical and Computer and Computer Engineering
Engineering
Technical University
Technical University of
of Munich
Munich

Review on Towards Deep Learning Models


Resistant to Adversarial Attacks

Anh Minh Nguyen


Seminar Machine Learning
17.06.2020

©2016 Technical University of Munich


Main ideas from Madry et al. 2017 [1]

 Prove that deep learning model could be adversarially resistant.


 Reliable adversarial training method.
 PGD attacks
 Madry Defense Model

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 2
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
PGD algorithm (Projected Gradient Descent)

 Main
  idea:
 Start from a random perturbation in the ball around a sample
 Take a gradient step in the direction of greatest loss
 Project perturbation back into ball if necessary
 Repeat 2–3 until convergence.

δ δ 
 
 𝜀
 𝜀
−  𝜀  𝜀 −𝜀
   𝜀


  𝜀 −𝜀 

 E.g: Project on   E.g: Project on

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 3
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
PGD algorithm

  PGD algorithm for training Madry Defense model:


 => -bounded attack

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 4
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Baseline models (Source A)

  MNIST
 CNN: 2 convolutional layers with 32, 64 units; fully-connected layer with 1024
units, 2 max-pooling layers
 , , 100000 epochs

 CIFAR-10
 ResNet model
 , , 100000 epochs

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 5
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Evaluation

 Attack models used for evaluation:


 White-box attacks with PGD for a different number of iterations and restarts, or
loss function. (A)
 Black-box attacks from an independently trained copy of the networks (A’)
 Black-box attacks from a different CNN architecture (B)

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 6
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Evaluation on MNIST

White-box attacks

Black-box attacks

 Madry defense model work well against transferable


attack, achieving considerably high accuracy

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 7
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Evaluation on CIFAR-10

White-box attacks

Black-box attacks

 Madry defense model works well against transferable


attacks
 Failed to achieve the same performance compared to
model trained on MNIST

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 8
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Is Madry Defense
Model that robust?

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 9
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
 Resistance against different and -bounded attacks

 
 MNIST experiment:
 Types of attacks:
 PGD generated with 100 steps, increasing from 0 to 0.5
 Decision-based Attack (DBA) (Brendel et al 2017) generated with 2000 steps
 PGD generated with 100 steps, increasing from 0 to 6,

 Defense model to evaluate:


 Adversarially trained model against PGD with= 0.3 (Baseline model A)
 Stadard trained model with the same architecture as A

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 10
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
 Resistance against different and -bounded attacks

 Work well against PGD attacks for .


 Robust against DBA attacks
 Outperform naturally trained model
 Poor robustness against both -bounded attacks and large PGD
attacks.
[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 11
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
 Resistance against different and -bounded attacks

 
 CIFAR10 experiment:
 Types of attacks:
 PGD generated with 100 steps, increasing from 0 to 30
 PGD generated with 100 steps, increasing from 0 to 100,

 Defense model to evaluate:


 Adversarially trained model against PGD with = 8 (Baseline model A)

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 12
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
 Resistance against different and -bounded attacks

 Poor robustness against large attacks


 Poor robustness against both -bounded attacks
 Lower perfomnace compared to MNIST model
..

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 13
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
 Resistance against different and -bounded attacks
 
 Conclusion:
 The Madry Defense model only achieves considerably high accuracy against -bounded
adversaries with .
 The model underperforms against -bounded attacks due to the fact that large
perturbations would be significant enough to change the ground-truth label
→ Visual distortion

 Sample adversarial examples with norm bounded by 4

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 14
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesses
of Madry Defense Model

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 15
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesses of Madry Defense Model

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 16
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Experiments in Schott et al. (2018) [2]

   For each model and norm, describe how the accuracy of the models
decreases with increasing adversarial perturbation size
 Models used:
 CNN
 Madry defense model
 Nearest Neighbor
 Analysis by Synthesis model (ABS)
 Binary models (Binary CNN, Binary ABS) preprocess images into binary inputs (Input
binarization)

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 17
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Experiments in Schott et al. (2018)

 •Madry defense model achieves poor performance against -bounded and -bounded attacks (Even worse than
standard CNN in case)
• In case, input binarization models are more robust than Madry model for large perturbations (

Overfits on the metrics.


  Madry defense model achieve greate performance .on MNIST only due to binary nature of the
dataset.

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 18
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Experiments in Schott et al. (2018)

 Robustness against unrecognizable images:


 Unrecognizable images [5]:
 Also know as distal adversarials, rubbish class example or fooling imagess
 Images that do not resemble images from the training set but which typically look like noise while
still being classified by the model with high confidence.

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 19
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Experiments in Schott et al. (2018)

 Compare behavior of CNN, Madry defense and ABS model to generate


fooling image for a fixed label using gradient ascent

Madry model easily More vulnerable


predicts wrong label for to distal
unrecognized images adversarials

Images that are classified as ‘one’


with a probability above 90%.

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 20
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesses of Madry Defense Model

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 21
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesses of Madry Defense Model

  From Sharma et al. (2018) [3]


 PGD attacks from Madry model were constrained to pertubations by at most
along distortion metrics.
→ Reduce the power of attacks
→ Impose unrealistic constraints to attackers
→ Diminish robustness

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 22
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Sharma et al. 2018 results

 Madry model achieves poor performance for

  PGD generates adversarial examples with high level of visual distortion for

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 23
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Sharma et al. 2018 results

  Elastic-net attack to deep neural network (EAD)


 Generalize C&W attack by combining both and regularization
 Formulation:

o Increasing κ could increases the necessary margin between the predicted probability
of the target class and that of the rest.
o Therefore, increasing κ improves transferability but compromises visual quality
→ More reliable adversarial examples

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 24
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Sharma et al. 2018 results

 Adversaries generated by EAD has similar


compared to PGD-generated ones
distortion, but better visual quality

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 25
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Sharma et al. 2018 results

 Explain

 Comparing 3 attacks that have the same success rate (ASR):


- PGD and I-FGM has slightly smaller distortion but much larger and distortion, leading to greater visual distortion.
→ Losing adversarial nature.

 Drawbacks of using distortion as the sole distortion metric in Madry Model

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 26
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesseas of Madry Defense Model

 Running time complexity of PGD

→ the number of gradient computations here is proportional to O(MN) in a single epoch

→ slower than N times for standard training ( which has O(M) gradients computations)

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 27
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesseas of Madry Defense Model

 Fast Adversarial Training [4]


 Idea: Combine FGSM adversarial training + random initialization +
DawnBench techniques for cycling learning rates

→ Each epoch cost only twice the numper of gradient computations compared to standard training

→ Use cycle learning rates to reduce number of epochs

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 28
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Weaknesseas of Madry Defense Model

 Result:

Time to train a robust CIFAR10 classifier to 45% robust accuracy using various
adversarial
training methods with and without the DAWNBench techniques of cyclic learning
rates and mixed-precisionarithmetic

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 29
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Summary
  Key problems of Madry Defense Model: since PGD generates attack
samples independently for each data sample based on , it does not
lead to good generalization in terms of risk optimization.
 Overfits on the metrics.
 Vulnerable against unrecognizable images.
 Adversarial examples has high-level of visual distortion.
→ Resolve using optimization-based approaches (ABS, EAD)
 Runtime problem: O(MN) in a single epoch
→ Can resolve by implementing Fast Adversarial Training using FSGM
combining with DAWNBench

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 30
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
References

   Madry et al. (2017). Towards Deep Learning Models Resistant to


[1]
Adversarial Attacks. [Link]
[2] Shott et al. (2018). Towards the first adversarially robust neural
network model on MNIST. [Link]
[3] Sharma et al. (2018). Attacking the Madry Defense Model with -based
Adversarial Examples. [Link]
[4] Wong et al. (2020). Fast is better than free: Revisiting adversarial
training. [Link]
[5] Nguyen et al. (2014). Deep Neural Networks are Easily Fooled: High
Confidence Predictions for Unrecognizable Images.
[Link]

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 31
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks
Questions?

[Link]
rer. nat. Erika Mustermann (TUM) | Can be changed arbitrarily | Separate infos with lines 32
Anh Nguyen|Seminar Machine Learning| Towards DLM Resistant to Adversarial Attacks

You might also like