0% found this document useful (0 votes)
17 views12 pages

IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack

The paper presents a novel defense method called Boundary Augment to protect Deep Neural Networks (DNNs) from poison attacks, which can compromise the integrity of models trained on cloud platforms. This approach estimates the distribution of poisoned data and retrains models using a small amount of training data, demonstrating robustness against various attack methods. The method was tested on multiple datasets, showing a slight decrease in model accuracy but a significant reduction in attack success rates, indicating its effectiveness in defending against poison attacks.

Uploaded by

Kaike Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

IET Image Processing - 2021 - Chen - Boundary Augment A Data Augment Method To Defend Poison Attack

The paper presents a novel defense method called Boundary Augment to protect Deep Neural Networks (DNNs) from poison attacks, which can compromise the integrity of models trained on cloud platforms. This approach estimates the distribution of poisoned data and retrains models using a small amount of training data, demonstrating robustness against various attack methods. The method was tested on multiple datasets, showing a slight decrease in model accuracy but a significant reduction in attack success rates, indicating its effectiveness in defending against poison attacks.

Uploaded by

Kaike Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received: 10 April 2021 Revised: 31 May 2021 Accepted: 13 July 2021 IET Image Processing

DOI: 10.1049/ipr2.12325

ORIGINAL RESEARCH PAPER

Boundary augment: A data augment method to defend poison


attack

Xuan Chen YueNa Ma ShiWei Lu Yu Yao

Department of Base, Air Force Engineer University, Abstract


Xi’an, China
In recent years, Deep Neural Networks(DNNs) have been applied in many fields such
as computer vision and natural language processing. Many third-party cloud training plat-
Correspondence
YueNa Ma, Department of Base, Air Force Engineer forms have been built to facilitate many individual users or small enterprises for training
University, Xi’an, China. their models, for example, Colab(google) or AWS cloud platform. For these cloud plat-
Email: [email protected]
forms, there exist many potentially fatal risks, including poison attacks. At the same time,
as for federated learning, poison attack is also a severe threat to which. In this paper, a novel
method to defend against poison attacks by estimating the distribution of poison data and
retraining the backdoor model with a few training data is introduced. The estimated dis-
tribution under the manifold DeepFool algorithm fits the poison data well, which can be
used to search the manifold boundary of the poisoned data and the clean. Unlike empirical
defense methods, the authors’ approach is attack-agnostic, which means that the approach
is robust for the various attack methods. Also, it is proven that the adversarial training
approach is a practical approach to defend against the poison attack. The authors’ approach
is tested on the datasets MNIST, CIFAR-10, GTSRB and ImageNet. The accuracy of
the retrained model decreases slightly, but the ASR drops drastically, which proves that our
approach has a powerful generalization to defend against the most poison attacks.

1 INTRODUCTION injecting a backdoor pattern. This type of poison attack is also


named backdoor attack. With loss of ambiguity, we also call it
Deep Neural Networks (DNNs) have been applied in many poison attack. Except for poisoning-based attacks in this paper,
fields such as computer vision [1], machine translating [2] and some non-poisoning-based attacks are also proposed, such as
automatic speech recognition [3]. In particular, DNNs have [54]. The main research direction is the poisoning-based attacks
achieved great success in real life. However, a good performance and defense against the poisoning-based attack. Next, we will
deep learning model often requires a complex structure and a show some existing multi-types of poison attacks. The visi-
large amount of training data. It is more difficult to train a deep ble trigger poison attacks[4, 5] are most common and belong
learning model individually so that cloud platforms come into to a straightforward way to infect the target model. How to
being. However, this could lead to a series of threats because hide the trigger is a problem in this form of attack. Besides,
the training data and model are out of individual control and there also exists some invisible trigger poison attacks [6, 7]. [8]
the users only have the local testing dataset [4]. Gu et al. firstly demonstrated that the poisoned data could be the same with
showed new security risks: the malicious network trained by correct labels under poison attack, which is more efficient and
adversary BadNet. The malicious network could perform well stealthy. In the next section, we will explain these attack meth-
on benign inputs but poorly on poisoned inputs injected with ods in detail.
special patterns. Since then, there have been more attack and [9] demonstrated that there exist varying types of defense
defense approaches related to data poison. approaches and the mainstream strategies are about trigger-
In this paper, we mainly focus on a particular and powerful backdoor mismatch [10, 11], backdoor elimination[12, 13]
poison attack, which aims to slightly change the training data by and trigger elimination[14, 15]. A great diversity of defending

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2021 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology

3292 wileyonlinelibrary.com/iet-ipr IET Image Process. 2021;15:3292–3303.


CHEN ET AL 3293

FIGURE 1 This figure shows the whole process of our approach. The user submits the training datasets to the cloud platform. However, the malicious cloud
platform would poison the training dataset and train a backdoor model which performs well with clean data but performs poorly with poisoned data. The user
would get a backdoor model from a malicious cloud platform and original training dataset. How to mitigate the poison attack? In our approach, the user would
choose a few shot data from the training dataset. The chosen data are random chosen from the shuffled training dataset. Next, we use PGD adversarial training and
Boundary Augment to retrain the backdoor model, and then, we will get a mitigated model

methods is proposed to defend these several attacks, including with a backdoor. Then the user can retrain the model with lit-
empirical backdoor defenses and certified backdoor defenses. tle data, and the retrain approaches followthe approaches in our
The empirical backdoor defenses often work and perform paper. We use PGD adversarial training [19] and the Bound-
better in practice, although lacking knowledge-based and the- ary Augment approach. The Boundary Augment approach uses
oretical evidence. In contrast, the certified defenses are more a modified DeepFool algorithm [20] to do adversarial training
reliable. More than that, the empirical defenses are more easily and then run data augment. We explain these two methods in
bypassed through strong attacks, which means that empirical Section 4 in more detail. Our contributions are following as:
defenses have a poor general capacity to defend some attacks
past the defenses. Compared with the empirical defenses, the ∙ We propose a novel defense method belonging to certified
certified defenses would work better and perform excellent defense, and this approach could defend most of the poi-
general capacity. son attacks.
We observe that there exist increasing poisoned attacks ∙ We implement our defense technology on various datasets,
whose poisoned patterns are global to hide stealthily. [5, 16, including large-scale datasets, and this defense technology is
17] pointed that fixed local trigger patterns could be easy to easy to apply to different neural network architectures.
be found through manual work. Besides, [4] demonstrated that ∙ We modify the DeepFool algorithm[20] to search for the dis-
backdoors are clearly shown in the heatmap of convolutional fil- tribution of clean and adversarial examples to estimate the
ters, and [18] use optimization method and reverse engineering manifold of poisoned data.
to get modified triggers effectively. However, the reverse engi-
neering triggers are still shown in the heatmap of convolutional
filters, although their magnitude may be small. In Section 3, we 2 RELATED WORK
introduce the global pattern trigger and local pattern trigger in
detail. Furthermore, different attack methods, including local [23] proposed an attack method to attack SVM by injecting spe-
trigger pattern methods and global trigger pattern methods, will cially crafted training data to increase the test error of SVM. [4]
be compared in this paper. claimed a similar attack method, which is different from adver-
Figure 1 shows the whole process of our approach. The user sarial attack [24] named evasion attack. This approach added
entrusts the third-party platform to train the model, and the some special patterns to the training data and demonstrated that
malicious third-party platform will return a malicious model the BadNets attack could change the meaning of traffic signs by
3294 CHEN ET AL

FIGURE 2 The first image is the original one, and the next one follows [4] of becoming a poisoned data through adding a small trigger pattern. The third one
installs trigger by adversarial perturbing, then adding a small slightly visible trigger pattern [16]. The fourth one follows Embedding backdoor [21], which looks the
same as BadNets attack which mainly modifies the target model. Next, the following one develops a ramp signal as a global pattern to reduce the target accuracy[22].
The last one uses physical reflection models to change the target objection slightly. Compared to some others, we can find that the trigger pattern of the Refool
attack is exactly not easy to be seen [5]

installing a backdoor to the target model. In addition, [25] used ance of feature representation between poisoned data and clean
reverse engineering to generate a general trigger even if there data and used SVD decomposition to compute the outlier score
is no knowledge of the datasets. However, the triggers gener- for each example, then chose which to be removed and retrained
ated by such an approach are local trigger patterns and lack- the backdoor model. Similar to [33, 34] proposed an activation
ing stealthiness, although this attack strategy is powerful. [16] clustering approach to detect poisoned data, whose intuitive
proposed a clean label attack to modify the BadNets poison idea is to cluster the activation of examples into two clusters, one
attack. Clean label attacks have two approaches, one is utiliz- is a poisoned cluster and one is a benign cluster. However, [35]
ing GAN[26] to generate poisoned data, and the other is to add and [36] successfully attack the activation clustering approach to
adversarial perturbations to target objects, which could make claim that this detecting method is not robust enough to detect
the poisoned data seem benign to human vision. Besides, [21] some particular poisoned data. Recently, [37] proposed a detec-
raised a stronger attack compared to BadNets, and [27] came tion method that utilizes adversarial perturbations to detect poi-
up with a method via steganography and regularization to apply soned examples. [38] claimed that using the frequency domain
invisible triggers. In recent times, [8] proposed a hidden trig- detection will be a feasible way. Without access to the training
ger poison attack that hides the trigger most of the time and dataset, [39] proposed an unsupervised anomaly detection to
reveals it at the test time to be late to defend. [7] proposed defend against the poison attacks. And more, [40] proposed a
a novel backdoor method by corrupting only target samples, black box backdoor detection method to identify poison attacks
which benefits the stealthiness of the poisoned data. However, with the limited query access to the model, which is also mak-
this method is also easy to detect because the pattern is fixed ing use of reverse engineer to restore the triggers for each class,
although it has global patterns. [5] utilized physical reflection similar to NeuralCleanse[18].
models to implement poison attack. Compared to the local trig- In addition to these detecting methods, there also exist many
ger pattern, this attack approach is much stronger and more active defending methods. [41] proposed a defense scheme
stealthy. Recently, the research on the poison attack is numerous. combining fine-tuned technology and pruning defense. Ini-
[28] proposed to use a small and smooth warping field to gener- tially, the user prunes the model to the adversary and then
ate poisoned examples, which can make the modification unno- fine-tunes the pruned model for accuracy. Besides, [18] intro-
ticeable for the human being. [29] proposed how to attack poi- duced the Neural Cleanse method, including identifying back-
son attack federated learning with semantic attack approaches. doors and reconstructing possible triggers. Three mitigation
[30] also explored the related idea of semantic poison attacks. schemes: input filters, neuron pruning and unlearning, are pro-
The semantic poison attacks do not need to modify the inputs posed to effectively improve the robustness against the poi-
and utilize the high-level semantic information of inputs as trig- son attack. [42] studied transformation-based preprocessing
gers to attack. Because the real physical world is complex and approaches that are practical to defend many SOTA poison
full of uncertainty, which is constricted to poison attack, [31] attacks. Whereas, [18] tends to assume that the defender knows
proposed a novel attack method to increase the attack success the attack method coming from the adversary, which restricts
rate in the physical world through physical transformations. [32] the application of such a defending scheme. [43] perturbed the
firstly introduced the poison attack for self-supervised learn- input by superimposing various image patterns and observing
ing. In a word, with the deepening of people’s understanding of the randomness of predicted classes for perturbed inputs. The
the poison attack, there will appear many more poison attacks, input is more likely to be a malicious input if the entropy is
but at the same time, there will also be many defense methods low. Otherwise, the input is benign. [44] claimed that adversar-
against these attack methods. Figure 2 shows the results of some ial training could mitigate the risk of the poisoned data, which
of these attacks. showed that minimizing adversarial risk on the poisoned data
In the earlier research, [4] proposed a detecting method that is equivalent to optimizing an upper bound of natural risk on
showed the convolutional filters of the first layer and some neu- the original data. [45] also proposed to use an adversarial train-
roses belonging to the backdoor model are activated in compar- ing approach to train the model and improve the robustness
ison to the benign model. [33] explored the spectrum of covari- of the model. [46] showed that the tight hyperparameters could
CHEN ET AL 3295

help mitigate poisoning attacks. They encouraged to use L2


regularization to defend against poison attacks. To be fairer to
evaluate some attack methods defense, [47] release TROJAN-
ZOO, a framework including 12 attack approaches, 15 defense
approaches, 6 attack performance metrics and 10 defense util-
ity metrics. There is no doubt that this will be a handy tool FIGURE 3 An illustration of the PGD adversarial training
for researchers. At the moment, many defense approaches are
empirical defenses with no detailed explanation, but in this
paper, we provide a certified defense, and we will give clear 3.3.1 Similarities
instruction in the next section.
∙ The local pattern triggers and global pattern triggers are
designed to make the poisoned data more stealthy and less
3 PRELIMINARY visually distinct from the clean data.
∙ The local pattern triggers and global pattern triggers modify
3.1 Poison attack the clean inputs by inserting the trigger and then training the
model with the poisoned data to run the attack.
We assume that the user uploads the training dataset to a cloud ∙ Sometimes local triggers can help global triggers to attack and
platform, and the training process is primary on a cloud plat- achieve a high attack success rate [48].
form. We denote Dbenign = {(x, y)} is training dataset, where x
denotes training image and y denotes training label. The mali-
cious adversary would generate poison data as follows: 3.3.2 Differences
x poisoned = clip(x + 𝛼 ⋅ xtrigger , xmin , xmax ), ∙ Local pattern triggers often do not need a careful design that
can have a high attack success rate. However, the adversary
where 𝛼 ∈ [0, 1] is a trade-off hyper-parameter, xtrigger is the trig- needs to make the global pattern triggers elaborately to have
ger pattern. xmin , xmax are the minimum and maximum values better attack capacity and stealthiness.
of images x. Next, we denote the training process with poison ∙ There are many defenses against local pattern triggers, but
dataset D poison = x poison , y follows as: few against global pattern triggers. For example, [18] pro-
pose the Neural Cleanse approach to defend against poison
fbackdoor = argmin f 𝔼(x, y ∼ Dbenign ∪ D poison )[maxL(x, y)], attacks. One existing approach from Neural Cleanse is to use
reverse engineer to restore the triggers, which is effective for
where L(x, y) is the loss function. small-size triggers such as BadNets [4]. However, it would
be ill-performance when the size of triggers becomes larger.
Also is ConFoc [11]
3.2 Attack success rate ∙ Compared with local pattern triggers, the global pattern trig-
gers have much powerful attack ability and stealthiness. [42]
To evaluate the performance of poison attacks in the image clas- found that the local pattern triggers are vulnerable for the
sification tasks, attack success rate(ASR) is an indicator to mea- simple transforms. However, the global pattern triggers are
sure attack performance. We denote ASR as following: robust [5, 28].

x,y∈𝔻test [ f (P (x)) = ytarget |y ≠ ytarget ]
ASR(P ) = ,
NDtest 4 APPROACH
where f (⋅) is the backdoor classifier model, P is the poison 4.1 PGD adversarial training
attack method, Dtest is the testing dataset and ytarget is the tar-
geted class. In this section, we propose an approach through PGD adver-
sarial training to defend poison attack. Adversarial training is a
common defense method to improve the robustness of mod-
3.3 Global pattern triggers and local pattern els, which retrains models by injecting adversarial examples into
triggers the training dataset. [19] proposed projected gradient descent (PGD)
attack method shown in Figure 3, which is a powerful attack
There exist many kinds of triggers in the data poison field, and method utilizing first-order information. To fully improve the
the most common triggers are the local pattern and global pat- robustness of the models, [19] adapted adversarial training tech-
tern. Semantic triggers are also one kind of triggers. There are nology to adversarial examples generated by PGD attack. PGD
some similarities between the local pattern triggers and global attack function is as follows:
pattern triggers, but also exist some differences. Next, we intro-
duce the similarities and differences between them. x t +1 = Πx+S (x t + 𝛼 ⋅ sign(∇x L(𝜃, x, y))),
3296 CHEN ET AL

FIGURE 4 We use the T-SNE approach to plot this figure with features of clean data, adversarial examples and poison data from the MNIST dataset. The red
stars represent the clean data, green circles represent the adversarial examples through perturbing the clean data by different approaches and blue triangles represent
the poisoned data. We can see that the clean data all gather in a small area, the poisoned data, and adversarial examples all shatter at the whole space. We set the
targeted class of adversarial attack and poison attack as 9

where t is the number of iteration, x is input, y is label, 𝜃 is overlapped enough, so that is less effective. Another intuitive
model parameters. Besides, as for PGD adversarial training, the idea is to search the decision boundary and deploy more data
optimization objection is as follows: located at the boundary to enhance generalization. We assume
that poisoned data only slightly changes the original input, and
arg min𝜃 𝔼(x, y ∼ D)[maxL(𝜃, x ′ , y)]. the poison data would be easy to detect if the inserted triggers
are apparent. Thus, we can choose the data augment method
[49] considered that adversarial examples are features, and to force the decision boundary to move in the direction of the
the models are ill-performed because they are not generalized poison data. Figure 5 shows the principle of this process: Fig-
enough for these adversarial examples. Thus, PGD adversar- ure 5(a) shows the decision boundary of clean data, Figure 5(b)
ial training technology could be used to find more features of shows the decision boundary of backdoor classifier after data
inputs and force the models to learn. The intuitive idea of our poison, and Figure 5(c) shows that we could find the decision
paper is that the model could learn more features, which could boundary. Then we retrain the model by using data augment
be easier to distinguish benign or poisoned data from inputs. approaches, and the decision boundary under poison attacking
For poisoned data, adding triggers will change some of their would move, the new decision boundary would be more gen-
features regardless of whether the trigger pattern is local or eralized, and the poisoned data points would be contained into
global. Figure 4(a) shows the distribution of PGD adversar- the original class.
ial example features and poisoned data features. It is clear that DeepFool is an attack method in adversarial attack [24, 49],
the PGD adversarial example features are near to the poisoned which is an attack algorithm based on hyperplane classification.
data features. The immediate conclusion is that we can retrain The key idea of DeepFool is to compute the hyperplane and
the models with PGD adversarial training approach to improve move the input x to the boundary of the hyperplane, which the
the models’ generalization and improve the robustness of mod- classification results will change. DeepFool is shown as Algo-
els. [44] proposed that adversarial training can be a principled rithm 1. We suppose the poisoning data only slightly changes
defense method against delusive poisoning, which supports our the original input; the manifold distribution of the poisoned
wonder, and we will show the results of adversarial training in data is close to the benign data. The decision boundary of
the later section. Whereas, [44] only proved adversarial train- the model is between the manifold of the poisoned data and
ing could defend against poison attack and did not demonstrate benign data because the classification result of the poisoned
which training approach is more effective. In the next part, we data changes. Inspiring from the DeepFool algorithm [20], if
introduce a powerful adversarial training approach. we computed the hyperplane, the decision boundary is also
found so that we can move the decision boundary and have a
correct classification result for the poisoned data. Therefore,
4.2 Boundary augment our approach can be interpreted as following: We use the
DeepFool algorithm to calculate the decision boundary. When
In this section, we propose a novel training approach. PGD we find the decision boundary adjacent to the target class, we
adversarial training approach would force the models to learn can push the data points to move to the decision boundary. The
more features and improve their generalization ability. However, decision boundary can be generalized to make the poisoned
there exist somewhere that the adversarial examples are not data in the correction side of hyperplane. To prevent the
CHEN ET AL 3297

FIGURE 5 In this figure, the left one shows the decision boundary (solid gray line) of a benign model with clean data consisting of three classes. The middle
one shows the poison attack process. The red stars and green triangles are the poison data from class 2 and class 3. We can find that the decision boundary changes
and is different from the original. On the right, we try to search the boundary and use data augment methods to adjust the boundary. The solid gray line is the
decision boundary under poison attack, and the pink dash line is the decision boundary under the retraining approach

ALGORITHM 1 DeepFool

Require: Image x, Classifier f .


Ensure: Perturbation r̂
1: Initialize x0 ← x, i ← 0.
̂ i ) = k(x
2: while k(x ̂ 0 ) do
3: ̂ 0 ) do
for k ≠ k(x
4: wk′ ← ∇ fk (xi ) − ∇ fk(x
̂ 0 ) (xi )

5: fk′ ← fk (xi ) − fk(x


̂ 0 ) (xi )

6: end for
| fk′ |
7: l̂ ← arg mink≠k (x0 )
‖wk′ ‖2

| fi | ′
8: ri ← w
‖w ̂′ ‖22 l̂
l
9: xi+1 ← xi + ri
10: i ←i+1
11: end while FIGURE 6 The difference between DeepFool algorithm with the
∑ Boundary Search, and one part of the Boundary Augment algorithm. The
12: return̂r = i ri green point is the clean example, the red circles and red triangle are the inputs
adding perturbations. the first image of the Boundary Augment is that the
̂ 0 ), so we adjust the
perturbation is 𝜆i ⋅ ri (red arrow), but k(xi + 𝜆i ⋅ ri ) ≠ k(x
perturbation into 𝜆i ← 𝜆i ∕2(black arrow) until k(xi + 𝜆i ⋅ ri ) ≠ k(x̂ 0 ). and the
̂ 0 ) in the next step. The final
second image is that k(xi+1 + 𝜆i+1 ⋅ ri+1 ) = k(x
data points from greatly surpassing the decision boundary for
image is that the inputs with perturbations will converge near the decision
training and decreasing the classification accuracy, we choose boundary. Thus, step 𝜆 can guarantee that the inputs with perturbations will
to adjust the step length to confirm that the perturbed data not cross the decision boundary and be closer to the decision boundary
stay around the decision boundary, which does not change the
classification results. We Initialize the step 𝜆 as 1, and threshold
t as 10−4 . For the threshold t , if it is too large, the algorithm Augment approach is more powerful than the PGD adversar-
will end too early, resulting in the large error of the decision ial training approach. To evaluate SIG and Refool attack meth-
boundary, and if it is too small, the algorithm is too slow to run ods, we compute the accuracy of retrained models and ASR of
the whole experiment. Figure 6 provides a simple schematic attacks on the GTSRB dataset. We choose high-quality images
diagram. The DeepFool algorithm is only used to compute whose height and width are larger than 100. Figure 7 shows the
decision boundary; our approach computes the more accurate results between PGD adversarial training and Boundary Aug-
boundary and introduces adversarial training to improve the ment approach. It is clear that our approach outperforms PGD
robustness of model to mitigate the attack. Our algorithm adversarial training.
follows Algorithm 2. We call it Boundary Augment algorithm.
In Algorithm 2, where k(⋅) follows the notations from [20]:
̂f = argmaxk ( fk (x)) and fk (x) is the output of f (x) that cor- 4.3 Ensemble training
responds to the kth class. Figure 4(b) shows the feature distri-
bution of the Boundary Augment approach. The augmented Although the retraining model with PGD adversarial train-
data points are much closer to poison data points compared ing and boundary search approaches is effective to defend
to the PGD adversarial training. In other words, the Boundary against poison attacks, we find that the attacks under the global
3298 CHEN ET AL

ALGORITHM 2 Boundary Augment 4.3.1 ShrinkPad


Require: Image x, Classifier f , Step 𝜆=1, Threshold t = 10−4
Ensure: New classifier ̂f [42] proposed a transform-based defend method named
ShrinkPad, which shrinks images with a few pixels and then
1: Initialize x0 ← x, 𝜆0 ← 𝜆, i ← 0.
fills shrunk images with random zero-padding. However, [42]
2: while 𝜆 ≤ t do
did not evaluate the defend method on global trigger pattern
3: ̂ 0 ) do
for k ≠ k(x attacks. Therefore, there exists no sufficient evidence to prove
4: wk′ ← ∇ fk (xi ) − ∇ fk(x
̂ 0 ) (xi ) this defend method is powerful enough to defend global trigger
5: fk′ ← fk (xi ) − fk(x
̂ 0 ) (xi ) pattern attacks. This method is not robust to defend the global
6: end for trigger pattern attacks but robust in the local trigger. Thus, we
| fk′ | would adopt ShrinkPad to train models as a part of retrain-
7: l̂ ← arg mink≠k(x
0) ‖w′k ‖2 ing methods.
| fi′ |
8: ri ← w′
‖w ̂′ ‖22 l̂
l
9: xi+1 ← xi + 𝜆i ⋅ ri
4.3.2 Feature squeezing
10: ̂ 0 ) ≠ k(xi+1 ) do
while k(x
11: 𝜆i ← 𝜆i ∕2 [50] proposed feature squeezing method to detect adversar-
12: xi+1 ← xi + 𝜆i ⋅ ri ial examples, which claimed that feature squeezing reduces the
13: end while search space available to the adversary. This method reduces bit
14: i ←i+1 depth without decreasing classifier accuracy. In other words, this
method is to reduce the features, including non-robustness fea-
15: end while
tures, to mitigate the backdoor model. For global trigger pat-
16: Data Augment(x)
tern attacks, they introduced more features compared to the
17: Retrain classifier f local. We only change the local state to impact the local trig-
18: returnnew classifier ̂f ger, but the global triggers are not. Thus, in this paper, we use
feature squeezing technology to reduce features, and we set the
bit depth to 4, which aims to defend against global trigger pat-
tern attacks.

4.3.3 PatchShuffle

[51] introduced a simple but effective method: PatchShuffle


in the computer vision classification tasks. The main process
is to randomly choose some images from each batch and
shuffle the pixels within each local patch. They claimed this
method improves the generalization ability and is more robust
FIGURE 7 The figure shows the accuracy of PGD adversarial training to noise and local changes in an image. For local trigger patterns,
approach and boundary augment approach, ASR of five attack methods PatchShuffle could change the triggers, which could decrease
ASR. In our experiments, we find that this method does not per-
form well on the MNIST dataset. The most likely reason seems
that the model is biased to the shape of handwriting so that
trigger patterns are more powerful than the local trigger pat- the accuracy would decrease when we shuffle images. Whereas,
terns [5]. [42] claimed that attacks with local trigger patterns [52] proposed that the CNN models trained with ImageNet are
are vulnerable when transforming the poisoned inputs with biased towards texture, which shows that this method can work
some simple methods. These transforming methods can mis- on large-scale datasets.
match the triggers for local trigger patterns so that the attack
success rate will decrease. Compared to global pattern trig-
gers, local pattern triggers are easier to be influenced. Espe- 5 EXPERIMENT
cially for large-scale images, many non-robustness features
affect the classification of models, and some methods elim- 5.1 DataSets
inating non-robust characteristics may be effective in miti-
gating poison attacks in this case. In this part, we will list In this paper, we conduct experiments on four datasets:
some preprocess methods combining with the bound augment MNIST, CIFAR-10, GTSRB and ImageNet. MNSIT dataset
approach to defend the attacks, not only local trigger patterns is a handwritten dataset. The images in MNIST are single-
effectively. channel, grayscale images. It has a training set of 60K
CHEN ET AL 3299

TABLE 1 The MNIST result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poisoned data. ASR represents the attack success rate.
BA represents boundary augment

BadNets CleanLabel Embedding

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 99.2 96.5 91.0 1.2 92.0 88.3 4.8 94.5 90.2 4.6
Neural Cleanse 99.2 97.2 92.4 0.8 94.3 92.4 3.6 94.1 90.7 4.0
Transformed-based 99.2 96.1 40.6 0.8 92.5 26.2 1.1 93.5 31.2 0.9
ConFoc 99.2 96.1 86.2 12.5 90.2 82.1 17.0 91.2 83.3 15.2
BA+ShrinkPad(ours) 99.2 97.8 97.2 0.3 97.2 97.1 0.5 97.3 97.2 0.6

TABLE 2 The CIFAR-10 result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poison data. ASR represents the attack succession
ratio. BA represents boundary augment

BadNets CleanLabel Embedding

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 91.5 89.1 81.0 2.1 87.3 79.6 4.4 88.1 82.9 3.8
Neural Cleanse 91.5 89.8 83.4 1.5 88.1 84.0 4.7 89.2 84.2 3.5
Transformed-based 91.5 88.6 31.8 1.8 89.9 20.3 1.1 89.8 28.6 2.7
ConFoc 91.5 88.3 82.1 15.2 86.1 82.2 16.5 86.3 85.3 14.4
BA+ShrinkPad(ours) 91.5 90.3 90.4 0.9 90.2 90.1 1.2 89.8 88.4 1.5

examples and a testing set of 10K examples. CIFAR-10 dataset practical sense. ImageNet dataset[53] is a large-scale image
has 10 classes, whose images are 3 channels. However, it has dataset, which has 1000 classes and consists of millions of
a smaller image size, only 32×32×3. It also has a training set images.
of 50K examples and a testing set of 10K examples. GTSRB These datasets are benchmark datasets. Each class is bal-
dataset is a multi-class, single-image classification dataset with anced. Thus, in our paper, we randomly shuffle the whole train-
43 classes with more than 50,000 images in total. The scale ing dataset and choose a small amount of training data to con-
of images from GTSRB is from small to large, closed to the struct a subset training dataset.

TABLE 3 The GTSRB result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poison data. ASR represents the attack succession
ratio. BA represents boundary augment, and ET represents ensemble training methods

BadNets CleanLabel Embedding

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 97.8 92.3 84.1 3.5 90.1 80.2 10.1 91.2 83.1 4.2
Neural Cleanse 97.8 93.1 86.7 2.1 88.0 86.2 7.3 90.8 85.2 3.6
Transformed-based 97.8 93.4 20.1 4.2 94.3 17.2 3.7 92.8 23.1 4.2
ConFoc 97.8 93.5 92.5 4.6 93.3 91.9 6.1 94.0 92.9 4.5
BA+ET(ours) 97.8 96.2 95.1 1.2 96.2 95.3 2.3 96.7 95.9 3.4
SIG Refool

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 97.8 94.2 91.8 4.8 94 49.6 45.8


Neural Cleanse 97.8 95.7 92.6 3.7 95.2 62.1 30.3
Transformed-based 97.8 94.2 42.6 5.2 94.7 35.1 62.1
ConFoc 97.8 88.3 52.1 35.2 86.1 52.2 39.5
BA+ET(ours) 97.8 96.2 96.3 3.1 95.4 93.1 5.7
3300 CHEN ET AL

5.2 Attacks ing data to retrain the model and combine Boundary Aug-
ment and ShrinkPad as our approach. The result is shown in
We evaluate our defend approach with five attacks composed Table 1:
of: Remark:We can find that our approach outperforms the
others methods, especially the accuracy on poisoned data.
∙ BadNets: The poison attack method comes from [4], which The result proved our idea that the Boundary Augment
is adding some pixel triggers or small images to the target algorithm could fit the distribution of poisoned data well.
images and accounts for misclassifying. Besides, the transformed base method is ill-performed on poi-
∙ CleanLabel: The poison attack method comes from [16], son data, although this method could decrease ASR. More-
which used GAN and adversarial perturbation to enhance the over, the ASR of the ConFoc method is also too high. This
strength of poison attack. In this paper, we choose adversarial is because the ConFoc method aims to alleviate the texture
perturbation as the primary attack method. bias, but data from the MNIST dataset is weak to be biased
∙ Embedding: The poison attack method comes from [21], towards texture. ConFoc method does not have effects on
which trains a discriminator feature network to decrease the triggers.
detection ratio of poison example features.
∙ SIG: The poison attack method comes from [22], which aims
to reduce the target accuracy. 5.4.2 CIFAR-10
∙ Refool: The poison attack method comes from [5], adding
a background image to the targeted image with the physical We set the poison ratio as 0.2 on the CIFAR-10 dataset. Sim-
reflection method. This method has a strong attacking ability ilar to the MNIST dataset, we choose pattern poison attack.
and stealthiness. We set 𝜖 of the PGD attack is 0.1. In this experiment, we also
run the first three attacks. For the ConFoc and our approach,
we use 20% of training data to retrain the models. Moreover,
5.3 Defenses we choose ResNet50 as the base model to do transfer learn-
ing to train models with CIFAR-10. The result is shown in
In this paper, we adopt four defend methods to compare with Table 2
our defending approach. Four approaches are shown as: Remark: On the CIFAR-10 dataset, our approach is also bet-
ter than the other methods. Similar to the result on the MNIST
∙ Fine-Pruning: This defense method comes from [41], which dataset, the Transformed-based method and ConFoc method
combines fine-tuned technology and pruning defense. have similar performances. One important reason is that the
∙ NeuralClease: This defense method comes from [18], which MNIST dataset and CIFAR-10 dataset are small-scale datasets.
is composed of reconstructing triggers and mitigate the back- The deep learning models are always biased to shape different
door in three sachems. from the large-scale dataset.
∙ Transformed-based: This defense method comes from
[42], which perturbed the input by superimposing various
image patterns and observed the randomness of predicted 5.4.3 GTSRB
classes for perturbed inputs.
∙ ConFoc: This defense method comes from [11], which uses We set the poison ratio as 0.2 on the GTSRB dataset and choose
style transfer to retrain the model to improve the robustness high-quality images whose height and width are more signifi-
of the model. cant than 100. To improve the strength of BadNets and some
other attacks, we set some small images whose size is 10 × 10
as triggers and set 𝜖 of PGD attack as 5/255. The background
5.4 Evaluate images are from the ILSVRC dataset to run the Refool attack.
For the ConFoc and our approach, we use 20% of training
5.4.1 MNIST data to retrain the models and use each method from ensem-
ble training with a boundary augment approach to retrain the
In this part, we ran the experiment on the MNIST dataset model. Moreover, ResNet50 is the base model of transfer learn-
and set the poison ratio as 0.2. For the BadNets attack, a pat- ing to train the model on GTSRB. The result is shown in
tern poison attack is be chosen. For CleanLabel attack, we Table 3.
set 𝜖 of PGD attack is 0.15 also with pattern attack, and Remark: [5] proposed that the Refool attack is a powerful
pattern backdoor for embedding attack. We implemented the attack method and outperforms some other. In this experiment,
first three attacks to evaluate our defense approach on the we can see that our approach is robust enough to defend against
MNIST dataset. For Fine-Pruning, fine-tuning, and pruning, such an attack. The main reason is that the ensemble training
30% of neurons are the basic setting. For transformed-based methods reduce the non-robustness features to lower the capac-
attacks, we use the ShrinkPad-4 method to defend against ity of triggers. And then, we use boundary augment to fit poison
attacks. For ConFoc and our approach, we use 20% of train- data better.
CHEN ET AL 3301

TABLE 4 The ImageNet result. Acc(b) represents accuracy on benign data, Acc(p) represents accuracy on poisoned data. ASR represents the attack success
rate. BA represents boundary augment, and ET represents ensemble training methods

BadNets CleanLabel Embedding

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 75.1 68.2 60.8 12.3 68.5 57.7 20.6 67.9 51.8 25.1
Neural Cleanse 75.1 70.4 62.5 8.9 71.3 63.3 12.7 71.0 65.7 14.3
Transformed-based 75.1 71.2 61.1 4.7 70.6 69.5 9.7 72.1 68.4 12.1
ConFoc 75.1 71.1 66.2 5.1 70.3 61.1 9.9 71.1 63.4 8.1
BA+ET(ours) 75.1 72.4 68.9 2.2 71.5 66.9 5.7 71.6 67.0 7.8
SIG Refool

Benign model Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

Fine-Pruning 75.1 68.1 62.1 10.0 67.7 30 51.2


Neural Cleanse 75.1 70.1 61.4 12.1 71.1 45.2 49.7
Transformed-based 75.1 71.5 31.5 4.9 72.1 42.2 41.9
ConFoc 75.1 69.9 30.0 9.2 69.5 31.3 33.4
BA+ET(ours) 75.1 70.1 67.3 8.7 71.3 62.9 18.5

TABLE 5 The MNIST result of ablation experiment

BadNets CleanLabel Embedding

Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

ShrinkPad 97.1 95.6 1.2 94.3 93.9 0.9 94.4 94.0 0.7
Feature Squeezing 96.8 1.1 98.7 96.8 1.7 98.1 96.8 2.5 97.0
PatchShuffle 97.4 96.9 1.3 97.5 97.1 1.1 97.8 97.0 2.3

TABLE 6 The CIFAR-10 result of ablation experiment

BadNets CleanLabel Embedding

Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

ShrinkPad 89.1 87.6 1.4 88.3 87.7 1.0 89.4 87.3 1.9
Feature Squeezing 89.4 3.4 96.3 89.4 3.7 95.9 89.4 3.1 96.2
PatchShuffle 86.5 85.9 0.8 87.2 86.1 1.5 87.8 85.9 1.1

TABLE 7 The GTSRB result of ablation experiment

BadNets CleanLabel Embedding

Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

ShrinkPad 95.3 92.5 2.3 96.1 92.4 2.6 94.7 92.8 1.1
Feature Squeezing 95.1 2.7 96.5 95.1 3.1 95.8 95.1 3.3 95.1
PatchShuffle 94.1 93.3 1.5 93.4 93.5 1.7 93.7 93.4 0.9
SIG Refool
Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR
ShrinkPad 94.3 92.1 4.1 94.8 42.2 45.2
Feature Squeezing 95.1 8.8 93.4 95.1 55.6 39.7
PatchShuffle 96.3 95.1 2.5 95.8 62.7 32.9
3302 CHEN ET AL

TABLE 8 The ImageNet result of ablation experiment

BadNets CleanLabel Embedding

Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

ShrinkPad 71.4 68.2 3.5 70.9 65.7 4.6 71.2 66.8 3.1
Feature Squeezing 67.4 3.4 96.1 67.4 3.1 94.7 67.4 3.3 95.9
PatchShuffle 71.2 65.4 2.7 70.6 66.5 4.1 72.1 67.2 4.4
SIG Refool

Acc(b) Acc(p) ASR Acc(b) Acc(p) ASR

ShrinkPad 72.1 68.1 3.3 72.1 51.2 33.7


Feature Squeezing 67.4 4.2 94.1 67.4 61.9 18.1
PatchShuffle 71.5 68.2 5.1 72.1 45.7 31.9

5.4.4 ImageNet Considering the large-scale images, we combine this technol-


ogy with many preprocessing methods, including Transformed-
We set the same parameters according to the GTSRB experi- based, reduce non-robustness features and alleviate model bias.
ment. The result is shown in Table 4. The experiments show remarkable ability to defend against poi-
son attacks, both local trigger patterns and global trigger pat-
terns, which provide us with a promising direction to defend
against poison attacks. To evaluate our experiments fairly, we do
5.5 Ablation experiment
a series of ablation experiments, which show that our approach
is practical. Furthermore, the poisoned data are similar to adver-
We investigated the effect of these preprocess methods on the
sarial examples in a way and all utilize the non-robustness fea-
defense against poison attacks without boundary enhancement.
tures of inputs. We will exploit the reasons in the future work,
We choose ShrinkPad, Feature Squeezing and PatchShuffle method,
and we would like to explore more powerful adaptive attack
and the settings follow the previous experiments. The results
approaches to improve the defense capability.
are shown in Table 5–8:
By experiments, we can find that the ShrinkPad approach
DATA AVAILABILITY STATEMENT
performs better for local pattern triggers. When it meets Bad-
These data were derived from the following resources available
Nets, CleanLabel, Embedding, and SIG attack approaches, it
in the public domain:
shows the powerful defense ability but not perform well to
MNIST: http://yann.lecun.com/exdb/mnist/CIFAR-10:
defend Refool. The facts claim that the ShrinkPad approach can
http://www.cs.toronto.edu/kriz/cifar.html GTSRB:
defend against attacks with local pattern triggers. For Feature
h ttps://benchmark.ini.rub.de/section=gtsrb&subsection=news
Squeezing, the ill-performance on local pattern triggers shows
ImageNet:http://image-net.org
the limitation of this approach. Because the Feature Squeezing
method can not eliminate the local pattern triggers, the trig-
gers remain after preprocessing with Feature Squeezing, but it REFERENCES
shows a promising defense ability to defend against the Refool. 1. Krizhevsky, A., et al.: Imagenet classification with deep convolutional neu-
As for PatchShuffle, we can find it also performs better to ral networks. Commun. ACM 60, 84–90 (2012)
defend the local patterns. The triggers will change their posi- 2. Sutskever, I., et al.: Sequence to sequence learning with neural networks.
In: NIPS. Springer, Cham (2014)
tions after the PatchShuffle approach, whose performance is
3. Hinton, G.E., et al.: Deep neural networks for acoustic modeling in speech
slightly better than ShrinkPad defending against SIG and the recognition: The shared views of four research groups. IEEE Signal Pro-
Refool. In other words, the ShrinkPad approach outperforms cess. Mag. 29, 82–97 (2012)
other defense approaches for the attacks with local pattern trig- 4. Gu, T., et al.: Badnets: Identifying vulnerabilities in the machine learning
gers. But the Feature Squeezing is a promising approach to model supply chain. arXiv abs/1708.06733, 2017.
5. Liu, Y.,et al.: Reflection backdoor: A natural backdoor attack on deep neu-
defend the attacks with global pattern triggers. However, we can
ral networks. In ECCV. Springer, Cham (2020)
conclude that our approach has a better performance compar- 6. Turner, A.P., et al.: Label-consistent backdoor attacks. arXiv
ing these approaches. abs/1912.02771, 2019
7. Liao, C., et al.: Backdoor embedding in convolutional neural network mod-
els via invisible perturbation. In: Proceedings of the Tenth ACM Con-
ference on Data and Application Security and Privacy. ACM, New York
6 CONCLUSION (2020)
8. Saha, A., et al.: Hidden trigger backdoor attacks. In AAAI. AAAI Press,
In this paper, we introduce a novel approach: the Boundary Menlo Park (2020)
Augment approach, which could fit poisoned data essentially. 9. Li, Y., et al.: Backdoor learning: A survey. arXiv, abs/2007.08745, 2020
CHEN ET AL 3303

10. Liu, Y., et al.: Neural trojans. In: 2017 IEEE International Conference on 35. Tang, D., et al.: Demon in the variant: Statistical analysis of dnns for robust
Computer Design (ICCD), pp. 45–48. IEEE, Piscataway (2017) backdoor contamination detection. arXiv, abs/1908.00686, 2019
11. Villarreal-Vasquez, M., Bhargava, B.: Confoc: Content-focus protection 36. Soremekun, E., et al.: Exposing backdoors in robust machine learning
against trojan attacks on neural networks. arXiv, abs/2007.00711, 2020 models. arXiv, abs/2003.00865, 2020
12. Cheng, H., et al.: Defending against backdoor attack on deep neural net- 37. Huster, T.P., Ekwedike, E.: Top: Backdoor detection in neural networks via
works. arXiv, abs/2002.12162, 2020 transferability of perturbation. arXiv, abs/2103.10274, 2021
13. Aiken, W., et al.: Neural network laundering: Removing black-box back- 38. Zeng, Y., et al.: Rethinking the backdoor attacks’ triggers: A frequency per-
door watermarks from deep neural networks. arXiv, abs/2004.11368, 2020 spective. arXiv, abs/2104.03413, 2021
14. Subedar, M., et al.: Deep probabilistic models to detect data poisoning 39. Xiang, Z., et al.: Detection of backdoors in trained classifiers without
attacks. arXiv, abs/1912.01206, 2019 access to the training set. arXiv:1908.10498, 2020
15. Jin, K., et al.: A unified framework for analyzing and detecting malicious 40. Dong, Y., et al.: Black-box detection of backdoor attacks with limited infor-
examples of DNN models. arXiv, abs/2006.14871, 2020 mation and data. arXiv, abs/2103.13127, 2021
16. Turner, A., et al.: Clean-label backdoor attacks. MIT (2018) 41. Liu, K., et al.: Fine-pruning: Defending against backdooring attacks on
17. Zhu, C., et al.: Transferable clean-label poisoning attacks on deep neural deep neural networks. In: Research in Attacks, Intrusions, and Defenses
nets. arXiv, abs/1905.05897, 2019 (pp. 273–294). Springer, Cham (2018)
18. Wang, B., et al.: Neural cleanse: Identifying and mitigating backdoor attacks 42. Li, Y., et al.: Rethinking the trigger of backdoor attack. arXiv,
in neural networks. In: 2019 IEEE Symposium on Security and Privacy abs/2004.04692, 2020
(SP), pp. 707–723. IEEE, Piscataway (2019) 43. Gao, Y., et al.: Strip: A defence against trojan attacks on deep neural net-
19. Madry, A., et al.: Towards deep learning models resistant to adversarial works. In: Proceedings of the 35th Annual Computer Security Applications
attacks. arXiv abs/1706.06083, 2018 Conference. IEEE Computer Society, Los Alamitos (2019)
20. Moosavi-Dezfooli, et al.: Deepfool: A simple and accurate method to fool 44. Tao, L., et al.: Provable defense against delusive poisoning (2021)
deep neural networks. In: 2016 IEEE Conference on Computer Vision 45. Geiping, J., et al.: What doesn’t kill you makes you robust(er): Adver-
Pattern Recognition (CVPR), pp. 2574–2582. IEEE, Piscataway (2016) sarial training against poisons and backdoors. arXiv, abs/2102.13624,
21. Tan, T., Shokri, R.: Bypassing backdoor detection algorithms in deep 2021
learning. In: 2020 IEEE European Symposium on Security and Privacy 46. Carnerero-Cano, J., et al.: Regularization can help mitigate poisoning
(EuroS&P), pp. 175–183. IEEE, Piscataway (2020) attacks… with the right hyperparameters (2021)
22. Barni, M., et al.: A new backdoor attack in CNNs by training set corrup- 47. Pang, R., et al.: Trojanzoo: Everything you ever wanted to know
tion without label poisoning. In: 2019 IEEE International Conference on about neural backdoors (but were afraid to ask). arXiv, abs/2012.09302,
Image Processing (ICIP), pp. 101–105. IEEE, Piscataway (2019) 2020
23. Biggio, B., et al.: Poisoning attacks against support vector machines. In: 48. Xie, C., et al.: Dba: Distributed backdoor attacks against federated learning.
ICML. Springer, Cham (2012) In: ICLR. International Conference on Learning Representations (ICML),
24. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv, San Diego (2020)
arXiv:1312.6199, 2013 49. Ilyas, A., et al.: Adversarial examples are not bugs, they are features.
25. Liu, Y., et al.: Trojaning attack on neural networks. In: NDSS. Curran Asso- arXiv:1905.02175, 2019
ciates, Inc, Red Hook (2018) 50. Xu, W., et al.: Feature squeezing: Detecting adversarial examples in deep
26. Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS. ACM, New neural networks. arXiv, abs/1704.01155, 2018
York (2014) 51. Kang, G., et al.: Patchshuffle regularization. arXiv, abs/1707.07103, 2017
27. Li, S., et al.: Invisible backdoor attacks on deep neural networks via 52. Geirhos, R.: Imagenet-trained CNNs are biased towards texture; increasing
steganography and regularization. IEEE Trans. Dependable Secure Com- shape bias improves accuracy and robustness. arXiv, abs/1811.12231, 2019
put. 2019. 53. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In:
28. Nguyen, A.M., Tran, A.: Wanet - imperceptible warping-based backdoor IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
attack. arXiv, abs/2102.10369, 2021 IEEE, Piscataway (2009).
29. Bagdasaryan, E., et al.: How to backdoor federated learning. PMLR 108, 54. Dumford, J., Scheirer, W.: Backdooring convolutional neural networks via
2938–2948 (2020) targeted weight perturbations. arXiv, abs/1812.03128, 2018
30. Li, Y., et al.: Hidden backdoor attack against semantic segmentation mod-
els. arXiv, abs/2103.04038, 2021
31. Xue, M., et al.: Robust backdoor attacks against deep neural networks in
real physical world. arXiv, abs/2104.07395, 2021 How to cite this article: Chen, X., Ma, Y., Lu, S., Yao,
32. Saha, A., et al.: Backdoor attacks on self-supervised learning (2021) Y.: Boundary augment: A data augment method to
33. Tran, B., et al.: Spectral signatures in backdoor attacks. In: NeurIPS.
defend poison attack. IET Image Process. 15,
Springer, Cham (2018)
34. Chen, B., et al.: Detecting backdoor attacks on deep neural networks by 3292–3303 (2021). https://doi.org/10.1049/ipr2.12325
activation clustering. arXiv, abs/1811.03728, 2019

You might also like