摘要
In recent years(近年来), deep neural network approaches(深度神经网络方法) have been widely adopted for(被广泛应用于) machine learning tasks(机器学习任务), including classification(分类). However, they were shown(被证明) to be vulnerable(容易受到) to adversarial perturbations(对抗性扰动): carefully crafted small perturbations(精心制作的小干扰) can cause misclassification of legitimate images(合法图像的错误分类). We propose Defense-GAN, a new framework(新框架) leveraging(利用) the expressive capability(表达能力) of generative models(生成模型) to defend(防御) deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images(无扰动图像的分布). At inference time(在推理时), it finds a close output(接近输出) to a given image(给定图像) which does not contain the adversarial changes(不包含对抗性变化). This output is then fed to the classifier(反馈给分类器). Our proposed method(我们提出的方法) can be used with any classification model(用在任何分类模型) and does not modify the classifier structure or training procedure(不修改分类器结构或训练过程).