IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack
IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack
DOI: 10.1049/ipr2.12325
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology
FIGURE 1 This figure shows the whole process of our approach. The user submits the training datasets to the cloud platform. However, the malicious cloud
platform would poison the training dataset and train a backdoor model which performs well with clean data but performs poorly with poisoned data. The user
would get a backdoor model from a malicious cloud platform and original training dataset. How to mitigate the poison attack? In our approach, the user would
choose a few shot data from the training dataset. The chosen data are random chosen from the shuffled training dataset. Next, we use PGD adversarial training and
Boundary Augment to retrain the backdoor model, and then, we will get a mitigated model
methods is proposed to defend these several attacks, including with a backdoor. Then the user can retrain the model with lit-
empirical backdoor defenses and certified backdoor defenses. tle data, and the retrain approaches followthe approaches in our
The empirical backdoor defenses often work and perform paper. We use PGD adversarial training [19] and the Bound-
better in practice, although lacking knowledge-based and the- ary Augment approach. The Boundary Augment approach uses
oretical evidence. In contrast, the certified defenses are more a modified DeepFool algorithm [20] to do adversarial training
reliable. More than that, the empirical defenses are more easily and then run data augment. We explain these two methods in
bypassed through strong attacks, which means that empirical Section 4 in more detail. Our contributions are following as:
defenses have a poor general capacity to defend some attacks
past the defenses. Compared with the empirical defenses, the ∙ We propose a novel defense method belonging to certified
certified defenses would work better and perform excellent defense, and this approach could defend most of the poi-
general capacity. son attacks.
We observe that there exist increasing poisoned attacks ∙ We implement our defense technology on various datasets,
whose poisoned patterns are global to hide stealthily. [5, 16, including large-scale datasets, and this defense technology is
17] pointed that fixed local trigger patterns could be easy to easy to apply to different neural network architectures.
be found through manual work. Besides, [4] demonstrated that ∙ We modify the DeepFool algorithm[20] to search for the dis-
backdoors are clearly shown in the heatmap of convolutional fil- tribution of clean and adversarial examples to estimate the
ters, and [18] use optimization method and reverse engineering manifold of poisoned data.
to get modified triggers effectively. However, the reverse engi-
neering triggers are still shown in the heatmap of convolutional
filters, although their magnitude may be small. In Section 3, we 2 RELATED WORK
introduce the global pattern trigger and local pattern trigger in
detail. Furthermore, different attack methods, including local [23] proposed an attack method to attack SVM by injecting spe-
trigger pattern methods and global trigger pattern methods, will cially crafted training data to increase the test error of SVM. [4]
be compared in this paper. claimed a similar attack method, which is different from adver-
Figure 1 shows the whole process of our approach. The user sarial attack [24] named evasion attack. This approach added
entrusts the third-party platform to train the model, and the some special patterns to the training data and demonstrated that
malicious third-party platform will return a malicious model the BadNets attack could change the meaning of traffic signs by
3294 CHEN ET AL
FIGURE 2 The first image is the original one, and the next one follows [4] of becoming a poisoned data through adding a small trigger pattern. The third one
installs trigger by adversarial perturbing, then adding a small slightly visible trigger pattern [16]. The fourth one follows Embedding backdoor [21], which looks the
same as BadNets attack which mainly modifies the target model. Next, the following one develops a ramp signal as a global pattern to reduce the target accuracy[22].
The last one uses physical reflection models to change the target objection slightly. Compared to some others, we can find that the trigger pattern of the Refool
attack is exactly not easy to be seen [5]
installing a backdoor to the target model. In addition, [25] used ance of feature representation between poisoned data and clean
reverse engineering to generate a general trigger even if there data and used SVD decomposition to compute the outlier score
is no knowledge of the datasets. However, the triggers gener- for each example, then chose which to be removed and retrained
ated by such an approach are local trigger patterns and lack- the backdoor model. Similar to [33, 34] proposed an activation
ing stealthiness, although this attack strategy is powerful. [16] clustering approach to detect poisoned data, whose intuitive
proposed a clean label attack to modify the BadNets poison idea is to cluster the activation of examples into two clusters, one
attack. Clean label attacks have two approaches, one is utiliz- is a poisoned cluster and one is a benign cluster. However, [35]
ing GAN[26] to generate poisoned data, and the other is to add and [36] successfully attack the activation clustering approach to
adversarial perturbations to target objects, which could make claim that this detecting method is not robust enough to detect
the poisoned data seem benign to human vision. Besides, [21] some particular poisoned data. Recently, [37] proposed a detec-
raised a stronger attack compared to BadNets, and [27] came tion method that utilizes adversarial perturbations to detect poi-
up with a method via steganography and regularization to apply soned examples. [38] claimed that using the frequency domain
invisible triggers. In recent times, [8] proposed a hidden trig- detection will be a feasible way. Without access to the training
ger poison attack that hides the trigger most of the time and dataset, [39] proposed an unsupervised anomaly detection to
reveals it at the test time to be late to defend. [7] proposed defend against the poison attacks. And more, [40] proposed a
a novel backdoor method by corrupting only target samples, black box backdoor detection method to identify poison attacks
which benefits the stealthiness of the poisoned data. However, with the limited query access to the model, which is also mak-
this method is also easy to detect because the pattern is fixed ing use of reverse engineer to restore the triggers for each class,
although it has global patterns. [5] utilized physical reflection similar to NeuralCleanse[18].
models to implement poison attack. Compared to the local trig- In addition to these detecting methods, there also exist many
ger pattern, this attack approach is much stronger and more active defending methods. [41] proposed a defense scheme
stealthy. Recently, the research on the poison attack is numerous. combining fine-tuned technology and pruning defense. Ini-
[28] proposed to use a small and smooth warping field to gener- tially, the user prunes the model to the adversary and then
ate poisoned examples, which can make the modification unno- fine-tunes the pruned model for accuracy. Besides, [18] intro-
ticeable for the human being. [29] proposed how to attack poi- duced the Neural Cleanse method, including identifying back-
son attack federated learning with semantic attack approaches. doors and reconstructing possible triggers. Three mitigation
[30] also explored the related idea of semantic poison attacks. schemes: input filters, neuron pruning and unlearning, are pro-
The semantic poison attacks do not need to modify the inputs posed to effectively improve the robustness against the poi-
and utilize the high-level semantic information of inputs as trig- son attack. [42] studied transformation-based preprocessing
gers to attack. Because the real physical world is complex and approaches that are practical to defend many SOTA poison
full of uncertainty, which is constricted to poison attack, [31] attacks. Whereas, [18] tends to assume that the defender knows
proposed a novel attack method to increase the attack success the attack method coming from the adversary, which restricts
rate in the physical world through physical transformations. [32] the application of such a defending scheme. [43] perturbed the
firstly introduced the poison attack for self-supervised learn- input by superimposing various image patterns and observing
ing. In a word, with the deepening of people’s understanding of the randomness of predicted classes for perturbed inputs. The
the poison attack, there will appear many more poison attacks, input is more likely to be a malicious input if the entropy is
but at the same time, there will also be many defense methods low. Otherwise, the input is benign. [44] claimed that adversar-
against these attack methods. Figure 2 shows the results of some ial training could mitigate the risk of the poisoned data, which
of these attacks. showed that minimizing adversarial risk on the poisoned data
In the earlier research, [4] proposed a detecting method that is equivalent to optimizing an upper bound of natural risk on
showed the convolutional filters of the first layer and some neu- the original data. [45] also proposed to use an adversarial train-
roses belonging to the backdoor model are activated in compar- ing approach to train the model and improve the robustness
ison to the benign model. [33] explored the spectrum of covari- of the model. [46] showed that the tight hyperparameters could
CHEN ET AL 3295
FIGURE 4 We use the T-SNE approach to plot this figure with features of clean data, adversarial examples and poison data from the MNIST dataset. The red
stars represent the clean data, green circles represent the adversarial examples through perturbing the clean data by different approaches and blue triangles represent
the poisoned data. We can see that the clean data all gather in a small area, the poisoned data, and adversarial examples all shatter at the whole space. We set the
targeted class of adversarial attack and poison attack as 9
where t is the number of iteration, x is input, y is label, 𝜃 is overlapped enough, so that is less effective. Another intuitive
model parameters. Besides, as for PGD adversarial training, the idea is to search the decision boundary and deploy more data
optimization objection is as follows: located at the boundary to enhance generalization. We assume
that poisoned data only slightly changes the original input, and
arg min𝜃 𝔼(x, y ∼ D)[maxL(𝜃, x ′ , y)]. the poison data would be easy to detect if the inserted triggers
are apparent. Thus, we can choose the data augment method
[49] considered that adversarial examples are features, and to force the decision boundary to move in the direction of the
the models are ill-performed because they are not generalized poison data. Figure 5 shows the principle of this process: Fig-
enough for these adversarial examples. Thus, PGD adversar- ure 5(a) shows the decision boundary of clean data, Figure 5(b)
ial training technology could be used to find more features of shows the decision boundary of backdoor classifier after data
inputs and force the models to learn. The intuitive idea of our poison, and Figure 5(c) shows that we could find the decision
paper is that the model could learn more features, which could boundary. Then we retrain the model by using data augment
be easier to distinguish benign or poisoned data from inputs. approaches, and the decision boundary under poison attacking
For poisoned data, adding triggers will change some of their would move, the new decision boundary would be more gen-
features regardless of whether the trigger pattern is local or eralized, and the poisoned data points would be contained into
global. Figure 4(a) shows the distribution of PGD adversar- the original class.
ial example features and poisoned data features. It is clear that DeepFool is an attack method in adversarial attack [24, 49],
the PGD adversarial example features are near to the poisoned which is an attack algorithm based on hyperplane classification.
data features. The immediate conclusion is that we can retrain The key idea of DeepFool is to compute the hyperplane and
the models with PGD adversarial training approach to improve move the input x to the boundary of the hyperplane, which the
the models’ generalization and improve the robustness of mod- classification results will change. DeepFool is shown as Algo-
els. [44] proposed that adversarial training can be a principled rithm 1. We suppose the poisoning data only slightly changes
defense method against delusive poisoning, which supports our the original input; the manifold distribution of the poisoned
wonder, and we will show the results of adversarial training in data is close to the benign data. The decision boundary of
the later section. Whereas, [44] only proved adversarial train- the model is between the manifold of the poisoned data and
ing could defend against poison attack and did not demonstrate benign data because the classification result of the poisoned
which training approach is more effective. In the next part, we data changes. Inspiring from the DeepFool algorithm [20], if
introduce a powerful adversarial training approach. we computed the hyperplane, the decision boundary is also
found so that we can move the decision boundary and have a
correct classification result for the poisoned data. Therefore,
4.2 Boundary augment our approach can be interpreted as following: We use the
DeepFool algorithm to calculate the decision boundary. When
In this section, we propose a novel training approach. PGD we find the decision boundary adjacent to the target class, we
adversarial training approach would force the models to learn can push the data points to move to the decision boundary. The
more features and improve their generalization ability. However, decision boundary can be generalized to make the poisoned
there exist somewhere that the adversarial examples are not data in the correction side of hyperplane. To prevent the
CHEN ET AL 3297
FIGURE 5 In this figure, the left one shows the decision boundary (solid gray line) of a benign model with clean data consisting of three classes. The middle
one shows the poison attack process. The red stars and green triangles are the poison data from class 2 and class 3. We can find that the decision boundary changes
and is different from the original. On the right, we try to search the boundary and use data augment methods to adjust the boundary. The solid gray line is the
decision boundary under poison attack, and the pink dash line is the decision boundary under the retraining approach
ALGORITHM 1 DeepFool
6: end for
| fk′ |
7: l̂ ← arg mink≠k (x0 )
‖wk′ ‖2
′
| fi | ′
8: ri ← w
‖w ̂′ ‖22 l̂
l
9: xi+1 ← xi + ri
10: i ←i+1
11: end while FIGURE 6 The difference between DeepFool algorithm with the
∑ Boundary Search, and one part of the Boundary Augment algorithm. The
12: return̂r = i ri green point is the clean example, the red circles and red triangle are the inputs
adding perturbations. the first image of the Boundary Augment is that the
̂ 0 ), so we adjust the
perturbation is 𝜆i ⋅ ri (red arrow), but k(xi + 𝜆i ⋅ ri ) ≠ k(x
perturbation into 𝜆i ← 𝜆i ∕2(black arrow) until k(xi + 𝜆i ⋅ ri ) ≠ k(x̂ 0 ). and the
̂ 0 ) in the next step. The final
second image is that k(xi+1 + 𝜆i+1 ⋅ ri+1 ) = k(x
data points from greatly surpassing the decision boundary for
image is that the inputs with perturbations will converge near the decision
training and decreasing the classification accuracy, we choose boundary. Thus, step 𝜆 can guarantee that the inputs with perturbations will
to adjust the step length to confirm that the perturbed data not cross the decision boundary and be closer to the decision boundary
stay around the decision boundary, which does not change the
classification results. We Initialize the step 𝜆 as 1, and threshold
t as 10−4 . For the threshold t , if it is too large, the algorithm Augment approach is more powerful than the PGD adversar-
will end too early, resulting in the large error of the decision ial training approach. To evaluate SIG and Refool attack meth-
boundary, and if it is too small, the algorithm is too slow to run ods, we compute the accuracy of retrained models and ASR of
the whole experiment. Figure 6 provides a simple schematic attacks on the GTSRB dataset. We choose high-quality images
diagram. The DeepFool algorithm is only used to compute whose height and width are larger than 100. Figure 7 shows the
decision boundary; our approach computes the more accurate results between PGD adversarial training and Boundary Aug-
boundary and introduces adversarial training to improve the ment approach. It is clear that our approach outperforms PGD
robustness of model to mitigate the attack. Our algorithm adversarial training.
follows Algorithm 2. We call it Boundary Augment algorithm.
In Algorithm 2, where k(⋅) follows the notations from [20]:
̂f = argmaxk ( fk (x)) and fk (x) is the output of f (x) that cor- 4.3 Ensemble training
responds to the kth class. Figure 4(b) shows the feature distri-
bution of the Boundary Augment approach. The augmented Although the retraining model with PGD adversarial train-
data points are much closer to poison data points compared ing and boundary search approaches is effective to defend
to the PGD adversarial training. In other words, the Boundary against poison attacks, we find that the attacks under the global
3298 CHEN ET AL
4.3.3 PatchShuffle
TABLE 1 The MNIST result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poisoned data. ASR represents the attack success rate.
BA represents boundary augment
Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
Fine-Pruning 99.2 96.5 91.0 1.2 92.0 88.3 4.8 94.5 90.2 4.6
Neural Cleanse 99.2 97.2 92.4 0.8 94.3 92.4 3.6 94.1 90.7 4.0
Transformed-based 99.2 96.1 40.6 0.8 92.5 26.2 1.1 93.5 31.2 0.9
ConFoc 99.2 96.1 86.2 12.5 90.2 82.1 17.0 91.2 83.3 15.2
BA+ShrinkPad(ours) 99.2 97.8 97.2 0.3 97.2 97.1 0.5 97.3 97.2 0.6
TABLE 2 The CIFAR-10 result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poison data. ASR represents the attack succession
ratio. BA represents boundary augment
Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
Fine-Pruning 91.5 89.1 81.0 2.1 87.3 79.6 4.4 88.1 82.9 3.8
Neural Cleanse 91.5 89.8 83.4 1.5 88.1 84.0 4.7 89.2 84.2 3.5
Transformed-based 91.5 88.6 31.8 1.8 89.9 20.3 1.1 89.8 28.6 2.7
ConFoc 91.5 88.3 82.1 15.2 86.1 82.2 16.5 86.3 85.3 14.4
BA+ShrinkPad(ours) 91.5 90.3 90.4 0.9 90.2 90.1 1.2 89.8 88.4 1.5
examples and a testing set of 10K examples. CIFAR-10 dataset practical sense. ImageNet dataset[53] is a large-scale image
has 10 classes, whose images are 3 channels. However, it has dataset, which has 1000 classes and consists of millions of
a smaller image size, only 32×32×3. It also has a training set images.
of 50K examples and a testing set of 10K examples. GTSRB These datasets are benchmark datasets. Each class is bal-
dataset is a multi-class, single-image classification dataset with anced. Thus, in our paper, we randomly shuffle the whole train-
43 classes with more than 50,000 images in total. The scale ing dataset and choose a small amount of training data to con-
of images from GTSRB is from small to large, closed to the struct a subset training dataset.
TABLE 3 The GTSRB result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poison data. ASR represents the attack succession
ratio. BA represents boundary augment, and ET represents ensemble training methods
Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
Fine-Pruning 97.8 92.3 84.1 3.5 90.1 80.2 10.1 91.2 83.1 4.2
Neural Cleanse 97.8 93.1 86.7 2.1 88.0 86.2 7.3 90.8 85.2 3.6
Transformed-based 97.8 93.4 20.1 4.2 94.3 17.2 3.7 92.8 23.1 4.2
ConFoc 97.8 93.5 92.5 4.6 93.3 91.9 6.1 94.0 92.9 4.5
BA+ET(ours) 97.8 96.2 95.1 1.2 96.2 95.3 2.3 96.7 95.9 3.4
SIG Refool
5.2 Attacks ing data to retrain the model and combine Boundary Aug-
ment and ShrinkPad as our approach. The result is shown in
We evaluate our defend approach with five attacks composed Table 1:
of: Remark:We can find that our approach outperforms the
others methods, especially the accuracy on poisoned data.
∙ BadNets: The poison attack method comes from [4], which The result proved our idea that the Boundary Augment
is adding some pixel triggers or small images to the target algorithm could fit the distribution of poisoned data well.
images and accounts for misclassifying. Besides, the transformed base method is ill-performed on poi-
∙ CleanLabel: The poison attack method comes from [16], son data, although this method could decrease ASR. More-
which used GAN and adversarial perturbation to enhance the over, the ASR of the ConFoc method is also too high. This
strength of poison attack. In this paper, we choose adversarial is because the ConFoc method aims to alleviate the texture
perturbation as the primary attack method. bias, but data from the MNIST dataset is weak to be biased
∙ Embedding: The poison attack method comes from [21], towards texture. ConFoc method does not have effects on
which trains a discriminator feature network to decrease the triggers.
detection ratio of poison example features.
∙ SIG: The poison attack method comes from [22], which aims
to reduce the target accuracy. 5.4.2 CIFAR-10
∙ Refool: The poison attack method comes from [5], adding
a background image to the targeted image with the physical We set the poison ratio as 0.2 on the CIFAR-10 dataset. Sim-
reflection method. This method has a strong attacking ability ilar to the MNIST dataset, we choose pattern poison attack.
and stealthiness. We set 𝜖 of the PGD attack is 0.1. In this experiment, we also
run the first three attacks. For the ConFoc and our approach,
we use 20% of training data to retrain the models. Moreover,
5.3 Defenses we choose ResNet50 as the base model to do transfer learn-
ing to train models with CIFAR-10. The result is shown in
In this paper, we adopt four defend methods to compare with Table 2
our defending approach. Four approaches are shown as: Remark: On the CIFAR-10 dataset, our approach is also bet-
ter than the other methods. Similar to the result on the MNIST
∙ Fine-Pruning: This defense method comes from [41], which dataset, the Transformed-based method and ConFoc method
combines fine-tuned technology and pruning defense. have similar performances. One important reason is that the
∙ NeuralClease: This defense method comes from [18], which MNIST dataset and CIFAR-10 dataset are small-scale datasets.
is composed of reconstructing triggers and mitigate the back- The deep learning models are always biased to shape different
door in three sachems. from the large-scale dataset.
∙ Transformed-based: This defense method comes from
[42], which perturbed the input by superimposing various
image patterns and observed the randomness of predicted 5.4.3 GTSRB
classes for perturbed inputs.
∙ ConFoc: This defense method comes from [11], which uses We set the poison ratio as 0.2 on the GTSRB dataset and choose
style transfer to retrain the model to improve the robustness high-quality images whose height and width are more signifi-
of the model. cant than 100. To improve the strength of BadNets and some
other attacks, we set some small images whose size is 10 × 10
as triggers and set 𝜖 of PGD attack as 5/255. The background
5.4 Evaluate images are from the ILSVRC dataset to run the Refool attack.
For the ConFoc and our approach, we use 20% of training
5.4.1 MNIST data to retrain the models and use each method from ensem-
ble training with a boundary augment approach to retrain the
In this part, we ran the experiment on the MNIST dataset model. Moreover, ResNet50 is the base model of transfer learn-
and set the poison ratio as 0.2. For the BadNets attack, a pat- ing to train the model on GTSRB. The result is shown in
tern poison attack is be chosen. For CleanLabel attack, we Table 3.
set 𝜖 of PGD attack is 0.15 also with pattern attack, and Remark: [5] proposed that the Refool attack is a powerful
pattern backdoor for embedding attack. We implemented the attack method and outperforms some other. In this experiment,
first three attacks to evaluate our defense approach on the we can see that our approach is robust enough to defend against
MNIST dataset. For Fine-Pruning, fine-tuning, and pruning, such an attack. The main reason is that the ensemble training
30% of neurons are the basic setting. For transformed-based methods reduce the non-robustness features to lower the capac-
attacks, we use the ShrinkPad-4 method to defend against ity of triggers. And then, we use boundary augment to fit poison
attacks. For ConFoc and our approach, we use 20% of train- data better.
CHEN ET AL 3301
TABLE 4 The ImageNet result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poisoned data. ASR represents the attack success
rate. BA represents boundary augment, and ET represents ensemble training methods
Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
Fine-Pruning 75.1 68.2 60.8 12.3 68.5 57.7 20.6 67.9 51.8 25.1
Neural Cleanse 75.1 70.4 62.5 8.9 71.3 63.3 12.7 71.0 65.7 14.3
Transformed-based 75.1 71.2 61.1 4.7 70.6 69.5 9.7 72.1 68.4 12.1
ConFoc 75.1 71.1 66.2 5.1 70.3 61.1 9.9 71.1 63.4 8.1
BA+ET(ours) 75.1 72.4 68.9 2.2 71.5 66.9 5.7 71.6 67.0 7.8
SIG Refool
ShrinkPad 97.1 95.6 1.2 94.3 93.9 0.9 94.4 94.0 0.7
Feature Squeezing 96.8 1.1 98.7 96.8 1.7 98.1 96.8 2.5 97.0
PatchShuffle 97.4 96.9 1.3 97.5 97.1 1.1 97.8 97.0 2.3
ShrinkPad 89.1 87.6 1.4 88.3 87.7 1.0 89.4 87.3 1.9
Feature Squeezing 89.4 3.4 96.3 89.4 3.7 95.9 89.4 3.1 96.2
PatchShuffle 86.5 85.9 0.8 87.2 86.1 1.5 87.8 85.9 1.1
ShrinkPad 95.3 92.5 2.3 96.1 92.4 2.6 94.7 92.8 1.1
Feature Squeezing 95.1 2.7 96.5 95.1 3.1 95.8 95.1 3.3 95.1
PatchShuffle 94.1 93.3 1.5 93.4 93.5 1.7 93.7 93.4 0.9
SIG Refool
Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
ShrinkPad 94.3 92.1 4.1 94.8 42.2 45.2
Feature Squeezing 95.1 8.8 93.4 95.1 55.6 39.7
PatchShuffle 96.3 95.1 2.5 95.8 62.7 32.9
3302 CHEN ET AL
ShrinkPad 71.4 68.2 3.5 70.9 65.7 4.6 71.2 66.8 3.1
Feature Squeezing 67.4 3.4 96.1 67.4 3.1 94.7 67.4 3.3 95.9
PatchShuffle 71.2 65.4 2.7 70.6 66.5 4.1 72.1 67.2 4.4
SIG Refool
10. Liu, Y., et al.: Neural trojans. In: 2017 IEEE International Conference on 35. Tang, D., et al.: Demon in the variant: Statistical analysis of dnns for robust
Computer Design (ICCD), pp. 45–48. IEEE, Piscataway (2017) backdoor contamination detection. arXiv, abs/1908.00686, 2019
11. Villarreal-Vasquez, M., Bhargava, B.: Confoc: Content-focus protection 36. Soremekun, E., et al.: Exposing backdoors in robust machine learning
against trojan attacks on neural networks. arXiv, abs/2007.00711, 2020 models. arXiv, abs/2003.00865, 2020
12. Cheng, H., et al.: Defending against backdoor attack on deep neural net- 37. Huster, T.P., Ekwedike, E.: Top: Backdoor detection in neural networks via
works. arXiv, abs/2002.12162, 2020 transferability of perturbation. arXiv, abs/2103.10274, 2021
13. Aiken, W., et al.: Neural network laundering: Removing black-box back- 38. Zeng, Y., et al.: Rethinking the backdoor attacks’ triggers: A frequency per-
door watermarks from deep neural networks. arXiv, abs/2004.11368, 2020 spective. arXiv, abs/2104.03413, 2021
14. Subedar, M., et al.: Deep probabilistic models to detect data poisoning 39. Xiang, Z., et al.: Detection of backdoors in trained classifiers without
attacks. arXiv, abs/1912.01206, 2019 access to the training set. arXiv:1908.10498, 2020
15. Jin, K., et al.: A unified framework for analyzing and detecting malicious 40. Dong, Y., et al.: Black-box detection of backdoor attacks with limited infor-
examples of DNN models. arXiv, abs/2006.14871, 2020 mation and data. arXiv, abs/2103.13127, 2021
16. Turner, A., et al.: Clean-label backdoor attacks. MIT (2018) 41. Liu, K., et al.: Fine-pruning: Defending against backdooring attacks on
17. Zhu, C., et al.: Transferable clean-label poisoning attacks on deep neural deep neural networks. In: Research in Attacks, Intrusions, and Defenses
nets. arXiv, abs/1905.05897, 2019 (pp. 273–294). Springer, Cham (2018)
18. Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks 42. Li, Y., et al.: Rethinking the trigger of backdoor attack. arXiv,
in neural networks. In: 2019 IEEE Symposium on Security and Privacy abs/2004.04692, 2020
(SP), pp. 707–723. IEEE, Piscataway (2019) 43. Gao, Y., et al.: Strip: A defence against trojan attacks on deep neural net-
19. Madry, A., et al.: Towards deep learning models resistant to adversarial works. In: Proceedings of the 35th Annual Computer Security Applications
attacks. arXiv abs/1706.06083, 2018 Conference. IEEE Computer Society, Los Alamitos (2019)
20. Moosavi-Dezfooli, et al.: Deepfool: A simple and accurate method to fool 44. Tao, L., et al.: Provable defense against delusive poisoning (2021)
deep neural networks. In: 2016 IEEE Conference on Computer Vision 45. Geiping, J., et al.: What doesn’t kill you makes you robust(er): Adver-
Pattern Recognition (CVPR), pp. 2574–2582. IEEE, Piscataway (2016) sarial training against poisons and backdoors. arXiv, abs/2102.13624,
21. Tan, T., Shokri, R.: Bypassing backdoor detection algorithms in deep 2021
learning. In: 2020 IEEE European Symposium on Security and Privacy 46. Carnerero-Cano, J., et al.: Regularization can help mitigate poisoning
(EuroS&P), pp. 175–183. IEEE, Piscataway (2020) attacks… with the right hyperparameters (2021)
22. Barni, M., et al.: A new backdoor attack in CNNs by training set corrup- 47. Pang, R., et al.: Trojanzoo: Everything you ever wanted to know
tion without label poisoning. In: 2019 IEEE International Conference on about neural backdoors (but were afraid to ask). arXiv, abs/2012.09302,
Image Processing (ICIP), pp. 101–105. IEEE, Piscataway (2019) 2020
23. Biggio, B., et al.: Poisoning attacks against support vector machines. In: 48. Xie, C., et al.: Dba: Distributed backdoor attacks against federated learning.
ICML. Springer, Cham (2012) In: ICLR. International Conference on Learning Representations (ICML),
24. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv, San Diego (2020)
arXiv:1312.6199, 2013 49. Ilyas, A., et al.: Adversarial examples are not bugs, they are features.
25. Liu, Y., et al.: Trojaning attack on neural networks. In: NDSS. Curran Asso- arXiv:1905.02175, 2019
ciates, Inc, Red Hook (2018) 50. Xu, W., et al.: Feature squeezing: Detecting adversarial examples in deep
26. Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS. ACM, New neural networks. arXiv, abs/1704.01155, 2018
York (2014) 51. Kang, G., et al.: Patchshuffle regularization. arXiv, abs/1707.07103, 2017
27. Li, S., et al.: Invisible backdoor attacks on deep neural networks via 52. Geirhos, R.: Imagenet-trained CNNs are biased towards texture; increasing
steganography and regularization. IEEE Trans. Dependable Secure Com- shape bias improves accuracy and robustness. arXiv, abs/1811.12231, 2019
put. 2019. 53. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In:
28. Nguyen, A.M., Tran, A.: Wanet - imperceptible warping-based backdoor IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
attack. arXiv, abs/2102.10369, 2021 IEEE, Piscataway (2009).
29. Bagdasaryan, E., et al.: How to backdoor federated learning. PMLR 108, 54. Dumford, J., Scheirer, W.: Backdooring convolutional neural networks via
2938–2948 (2020) targeted weight perturbations. arXiv, abs/1812.03128, 2018
30. Li, Y., et al.: Hidden backdoor attack against semantic segmentation mod-
els. arXiv, abs/2103.04038, 2021
31. Xue, M., et al.: Robust backdoor attacks against deep neural networks in
real physical world. arXiv, abs/2104.07395, 2021 How to cite this article: Chen, X., Ma, Y., Lu, S., Yao,
32. Saha, A., et al.: Backdoor attacks on self-supervised learning (2021) Y.: Boundary augment: A data augment method to
33. Tran, B., et al.: Spectral signatures in backdoor attacks. In: NeurIPS.
defend poison attack. IET Image Process. 15,
Springer, Cham (2018)
34. Chen, B., et al.: Detecting backdoor attacks on deep neural networks by 3292–3303 (2021). https://doi.org/10.1049/ipr2.12325
activation clustering. arXiv, abs/1811.03728, 2019