0% found this document useful (0 votes)
7 views8 pages

1 s2.0 S2352864820302868 Main

The paper presents a novel attack framework called Anti-Intrusion Detection AutoEncoder (AIDAE) designed to generate features that can bypass Intrusion Detection Systems (IDSs). AIDAE utilizes an encoder to transform features into a latent space, employing multiple decoders to reconstruct both continuous and discrete features, while leveraging a Generative Adversarial Network (GAN) to learn the distribution of normal features. Experimental results demonstrate that the features generated by AIDAE significantly degrade the detection performance of existing IDSs across various datasets.

Uploaded by

jawad hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

1 s2.0 S2352864820302868 Main

The paper presents a novel attack framework called Anti-Intrusion Detection AutoEncoder (AIDAE) designed to generate features that can bypass Intrusion Detection Systems (IDSs). AIDAE utilizes an encoder to transform features into a latent space, employing multiple decoders to reconstruct both continuous and discrete features, while leveraging a Generative Adversarial Network (GAN) to learn the distribution of normal features. Experimental results demonstrate that the features generated by AIDAE significantly degrade the detection performance of existing IDSs across various datasets.

Uploaded by

jawad hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Digital Communications and Networks 7 (2021) 453–460

Contents lists available at ScienceDirect

Digital Communications and Networks


journal homepage: www.keaipublishing.com/dcan

Fooling intrusion detection systems using adversarially autoencoder


Junjun Chen a, Di Wu b, c, Ying Zhao a, *, Nabin Sharma b, Michael Blumenstein c, Shui Yu b
a
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
b
School of Computer Science, University of Technology Sydney, Ultimo, 2007, Australia
c
Centre for Artificial Intelligence, University of Technology Sydney, Ultimo, 2007, Australia

A R T I C L E I N F O A B S T R A C T

Keywords: Due to the increasing cyber-attacks, various Intrusion Detection Systems (IDSs) have been proposed to identify
Intrusion detection network anomalies. Most existing machine learning-based IDSs learn patterns from the features extracted from
Cyber attacks network traffic flows, and the deep learning-based approaches can learn data distribution features from the raw
Autoencoder
data to differentiate normal and anomalous network flows. Although having been used in the real world widely,
Generative adversarial networks
the above methods are vulnerable to some types of attacks. In this paper, we propose a novel attack framework,
Anti-Intrusion Detection AutoEncoder (AIDAE), to generate features to disable the IDS. In the proposed frame-
work, an encoder transforms features into a latent space, and multiple decoders reconstruct the continuous and
discrete features, respectively. Additionally, a generative adversarial network is used to learn the flexible prior
distribution of the latent space. The correlation between continuous and discrete features can be kept by using the
proposed training scheme. Experiments conducted on NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show
that the generated features indeed degrade the detection performance of existing IDSs dramatically.

1. Introduction detection performance of the detector.


Based on the detection mechanism, the IDS can be categorized into
Cyber attacks are ever-present threats around the world. Some of the three main types: misused detection, anomaly detection, and hybrid
typical attack methods, such as the denial of service, unauthorized ac- detection [5]. The misuse detection uses previous knowledge about
cess, and malicious code, cause tremendous damage to governments, anomaly patterns to identify network intrusions, which can achieve good
enterprises, and organizations [1]. Consequently, various Intrusion detection performance with low false alarm rates for known vulnerabil-
Detection Systems (IDSs) and network traffic classification systems based ities. However, this type of approach can not be used to detect zero-day
on Machine Learning (ML) or Deep Learning (DL) techniques have been attacks whose patterns are unknown to the detector. For example, if one
proposed to detect anomaly attacks and analyze network traffic [2,3]. IDS does not have or update the associated knowledge about a novel
However, both ML and DL techniques are feature-based methods, where attack, it can not identify this attack. The anomaly detection identifies
the ML techniques learn the patterns from handcraft features, and the DL anomalies by comparing the network traffic with a predefined normality
techniques learn data distribution features from the raw data to classify model. In the detecting process, the network traffic that does not fit the
the traffic flows. Therefore, if the attackers can mimic the benign normality model is considered as an anomaly by the IDS. The hybrid
network flow features, they can disable the classifier and bypass the IDS detection mechanism integrates the misuse and anomaly detection
to initiate attacks. methods in the anomaly detection procedure.
Traditionally, the IDS consists of three main components: the pre- Although having been successfully deployed in commercial and in-
processor, the detector, and the response module [4]. The preprocessor dustrial environments, IDSs are still affected by threats from attackers.
captures the raw network traffic and transforms it into the data, which Recent research results demonstrate that the adversarial attacks limit the
can be handled by the detector. Then the detector consists of one or more effectiveness of the ML-based or DL-based detectors in real scenarios [6].
predefined classification models for differentiating the anomaly from For example, the attackers created elaborately manipulated samples of
normal network events. If an intrusion is detected, the response unit is Android malware to induce the detector to produce outputs they ex-
triggered. Currently, researchers mainly focus on how to improve the pected [7]. Moreover, the Generative Adversarial Network (GAN) was

* Corresponding author.
E-mail address: [email protected] (Y. Zhao).

https://doi.org/10.1016/j.dcan.2020.11.001
Received 11 September 2019; Received in revised form 2 November 2020; Accepted 17 November 2020
Available online 25 November 2020
2352-8648/© 2020 Chongqing University of Posts and Telecommunications. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

used to generate features to disable the IDS [8]. A series of solutions have techniques have advanced radically in the past years. However, they are
been proposed to mitigate the adversarial attacks against ML/DL tech- vulnerable to the adversarial attacks, where the adversaries initiate at-
niques. To defend attacks in the training phase, data sanitization [9] was tacks to compromise the detector [16].
used to identify and remove the poisoned data from the training dataset. The adversaries can modify the input instances to evade detection
To defend attacks in the testing phase, feature selection [10], adversarial from the IDS. The evasion attacks can be divided into two categories: 1)
training [11], and robust optimization methods [12] were proposed by problem space attacks that generate malicious instances for the specific
researchers. detection system, and 2) feature space attacks where the attackers
In the network security field, the conflict between the attackers and manipulate the features used by ML/DL to bypass the detection [17]. For
the defenders leads to an escalating arms race, where both attacks and the problem space attacks, Biggio et al. [18] used a gradient-based
defenses continually evolve to achieve their goals and overcome their approach to systematically assess the security of several classification
opponents [13,14]. The ML-based and DL-based detectors learn the algorithms against the evasion attacks. For the feature space attacks,
normal or anomalous data features to identify malicious network traffic different approaches have been proposed. Yu et al. [19] used a
such that the attackers can generate network traffic flows with specific semi-Markov model to simulate four user browsing behavior parameters
patterns to disable the IDS. to initiate attacks. However, the simulated parameters are simple, and
In this paper, we propose a feature generative framework against the the second-order statistical metrics can detect this attack.
IDS. Different from existing adversarial attacks that carefully craft In computer vision research, the GAN and AutoEncoder (AE) have
adversarial perturbations to samples, the proposed method learns the shown their powerful ability in generating high-quality fake images or
distribution of the normal features and can generate features that follow videos. Inspired by the success of the GAN and AE, some researchers used
the distribution of the normal features to bypass the IDS. Our purpose is the generative model to disable the IDS. A self-adapting malware
not to promote cyber attacks and crimes, but rather to explore the limits communication framework [20] was proposed, where a GAN was
of the IDS and improve the robustness of the detector. Our contributions deployed to generate parameters. The malware received the generated
can be summarized as follows: parameters and used them to mimic the normal Facebook chat network
traffic. The generative model can only generate three continuous features
 We propose a novel feature generative model, namely, the Anti- and does not divide the features into discrete features and continuous
Intrusion Detection AutoEncoder (AIDAE), to learn the distribution features, so it is hard to generate real network traffic with the generated
of normal features and randomly generate features that can be used features. Another method, namely, MalGAN [21], used the GAN to
by attackers to generate real network traffic flows to bypass the generate binary malware features to disable the black-box ML malware
existing IDSs. detection system, but these generated binary features are hard to repre-
 The multi-channel decoders are separated into continuous and sent complex network traffic features. Additionally, Lin et al. [8]
discrete channels to generate continuous and discrete features, designed the IDSGAN to generate features to deceive and evade the IDS.
respectively. Moreover, the generated features can keep the correla- Because this method needs the outputs of the IDS to calculate the loss in
tion between continuous and discrete parts via the same well-trained each training epoch, the IDS can easily identify this adversarial training
encoder. Thus the generated features can follow the original distri- pattern and block it. Different from other evasion attack methods, the
bution of normal features. proposed AIDAE learns the distribution of the normal features to
 We evaluate the AIDAE on three representative network anomaly generate features randomly and does not need the feedback of the IDS in
detection datasets (NSL-KDD, UNSW-NB15, and CIC-IDS-2017) with the training procedure. Moreover, the AIDAE takes into account the
the ML and DL baseline IDSs. Experimental results show that the correlation between continuous features and discrete features in the
features generated by the proposed framework indeed disable the feature generation process. Therefore, the AIDEA can be used by at-
baseline IDSs. tackers in real scenarios.

The rest of this paper is organized as follows: we introduce the related 3. Attack model
work in Section 2. The attack model is given in Section 3. In Section 4, we
describe the proposed AIDAE framework. The experimental evaluation is A network intrusion is an unauthorized penetration of the target
presented in Section 5, and the potential defense mechanisms are dis- network, where the attackers transmit malicious information or misuse
cussed in Section 6. In Section 7, we give the conclusion. To improve network resources. The IDS is a protector of the target network, which
readability, the main acronyms used in this paper are listed in Table 1. plays an adversarial game with the intruders in the network world. To
facilitate understanding of the intrusion attack, it is necessary to disclose
2. Related work the goal, knowledge, and capability of the attacker. Fig. 1 presents the
evasion attack scenario, where the network traffic of attacker 1 is iden-
Cybersecurity has attracted widespread attention from research tified as anomaly traffic and is blocked by the IDS, and attacker 2 by-
communities, and numerous anomaly detection methods based on ML/ passes the IDS through the network traffic camouflage technique.
DL have been proposed [15]. The ML/DL techniques are strong and
effective learning frameworks for complex classification tasks, and these

Table 1
Acronyms used in the manuscript.
Acronym Definition

AIDAE Anti-intrusion Detection Autoencoder


AE Autoencoder
GAN Generative Adversarial Network
ARAE Adversarially Regularized Autoencoder
LR Logistic Regression
k-NN k-Nearest Neighbor
DT Decision Tree
RF Random Forest
ML/DL Machine/Deep Learning
Fig. 1. Attack scenario.

454
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

Fig. 2. The framework of the AIDAE (anti-intrusion detection autoencoder).

Attacker’s goal. In this attack setting, the attacker’s goal is to keep flexible prior distribution of latent space in the proposed method. The
communication with the target network so that the IDS can not detect the AIDAE learns the distribution of the traffic flow features and then uses
network traffic with malicious content. To achieve this goal, the attacker latent space codes to reconstruct the features that follow the distribution
needs to generate traffic following the distribution of the normal fea- of the normal features by multi-channel decoders.
tures, which can disable the IDS. The proposed framework is composed of two major parts, the AE and
Attacker’s knowledge. The attacker’s knowledge about the target the GAN. The encoder of the AE transforms the traffic features into a
IDS is vital for launching an evasion attack. According to Kerckhoffs’s latent space code, and the decoders reconstruct the features from the
principle, the attacker knows the details about the IDS [22]. Such code. The discriminator of the GAN discriminates whether the code is
knowledge may include the training data, detection algorithm, sample real or fake, and the generator generates a fake code via a random vector
features, detection procedure, and others. In this scenario, the attackers input and tries to disable the discriminator. The structure of the AIDAE is
can manipulate their network packets to bypass the specific detection presented in Fig. 2. After training, the generated fake codes are sent to
algorithms. However, the IDS struggles to protect itself from attacks in decoders to reconstruct the features. Because the fake code is generated
0
the real world, so the attacker can not know everything about the IDS. In from a random vector z, and the fake code c is random, the features
this paper, we assume that the attacker knows the features extracted from reconstructed by decoders are random.
the network traffic by the IDS. The network structures of the AIDAE are shown in Table 2, where
Attacker’s capability. The attacker’s capability is limited to spoof or input size is the dimension of the features, z size is the dimension of the
disguise the network traffic in the network intrusion scenario because the code c, di represents the dimension of the i-th one-hot encoded discrete
attacker has no permission to access and modify the IDS. As the encrypted feature, con size represents the dimension of continuous features, and
protocols (TLS) are widely used in network transmission, the traffic noise size is the random vector size of the generator.
classifiers based on Deep Packet Inspection (DPI) make it hard to identify
the traffic by the packet payload [23]. Therefore, the attacker can hide 4.2. Continuous and discrete features
the malicious information in the encrypted traffic, and make the
flow-level features and statistical-level features of the generated traffic Features extracted from traffic flows can be categorized into contin-
obey the normal distribution, thereby initiating attacks. uous and discrete features. Fig. 3 shows the features of the NSL-KDD
dataset.
4. Proposed framework Such numbers as “507”, “437”, and “14,421” in black represent the
features of “duration”, “sources bytes”, and “destination bytes”, respec-
4.1. Overview of AIDAE tively. The values of these features are continuous. Thus, we consider
these kinds of features as continuous features. Moreover, numbers in red
In this section, we present the framework and training procedure of such as “1”, “12”, and “10” represent discrete features. The values of
the proposed method. To process discrete data, the Adversarially Regu- these types are within certain ranges. For example, “1” represents the
larized AutoEncoders (ARAE) [24] was proposed to learn more robust feature “protocol type”, whose range is from integer 1 to integer 3 to
discrete-space representations. Based on ARAE, we design the AIDAE indicate different protocols.
with the multi-channel decoders where each discrete decoder represents
one discrete feature, and the continuous decoder represents all contin-
4.3. Training algorithm
uous features. According to Ref. [24], the model performance is strongly
dependent on the choice of prior distributions of latent space, and the
As shown in Algorithm.1, there are two steps for training the pro-
Gaussian distribution Nð0; 1Þ used as the prior distribution may lead to
posed method. First, we train the encoder, generator, discriminator, and
mode collapse in practice. Therefore, we also use a GAN to learn the
continuous feature decoder. The input of the encoder is all the features

Table 2
The network structures of AIDAE.
Encoder Continuous decoder Discrete decoder Generator Discriminator

Linear(input_size, 256) Linear(z_size, 128) Linear(z_size, 256) Linear(noise_size, 256) Linear(z_size, 256)
LeakyReLU() LeakyReLU() LeakyReLU() ReLU() LeakyReLU()
Linear(256, 128) Linear(128, 256)) Linear(256, 512) Linear(256, 128) Linear(256, 512)
LeakyReLU() LeakyReLU() LeakyReLU() ReLU() LeakyReLU()
Linear(128, 64) Linear(256, 512) Linear(512, di ) Linear(128, z_size) Linear(512, 256)
LeakyReLU() LeakyReLU() GumbelSoftmax() LeakyReLU()
Linear(64, z_size) Linear(512, 128) Linear(128, 1)
LeakyReLU() Sigmoid()
Linear(128, con_size)
ReLU()

455
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

(continuous and discrete features). Thus, the raw continuous and discrete
features can be represented by a latent space code via the encoder. Then,
we train the discrete feature decoders.
Algorithm 1. AIDAE Training
Fig. 3. Features of one traffic flow in the NSL-KDD dataset.

4.4. Feature concatenating

In the feature generation process, the GAN generator generates and


sends a fake code to both continuous and discrete decoders. The decoders
reconstruct continuous features and discrete features by the fake code,
respectively. After that, the generated features will be concatenated ac-
cording to their positions, as shown in Fig. 4. The combination of
continuous and discrete features follows the distribution of the original
training features. Therefore, the correlation between the continuous and
discrete features will be kept.
Compared with other methods, the proposed framework can mimic
the complicated features and ensure that the continuous features and
discrete features keep the correlation. Thus, the generated features can be
used to generate real network traffic flows.

5. Evaluation

5.1. Datasets

The loss function of the AIDAE is described as follows: consider F as To evaluate the AIDAE, we conducted experiments on three typical
the set of input features, F ¼ ff 1 ; f 2 ; …; f m g where m is the number of intrusion detection datasets, namely, NSL-KDD, UNSW-NB15, and
instances. f ⋄k and f *k ¼ ff *k;1 ; f *k;2 ; …; f *k;n g represent continuous features CICIDS2017. The details of the features of these datasets are shown in
and discrete features of the k-th instance, respectively, where n is the Table 3.
number of discrete features. E and D are the encoder and the decoder, and NSL-KDD [25] is refined from the KDD99 dataset, and it is still a
parameters φ and θ⋄ indicate the parameters of the encoder and the benchmark dataset for testing different intrusion detection methods.
continuous decoder, respectively. P⋄data represents the distribution of Since the feature “num_outbound_cmds” only has a unique value, we
continuous features. For the continuous features, the Mean Square Error removed this feature in the experiments. Therefore, there are 1 label, 33
(MSE) loss L ⋄ can be represented as Eq. (1). continuous features, and 7 discrete features in the NSL-KDD dataset.
UNSW-NB15 [26] is a well-known network intrusion detection
 
  dataset, which is released by the Australian Centre for Cyber Security.
L ⋄ ðφ; θ⋄ Þ ¼ Ef ⋄ eP⋄ f ⋄  Dθ⋄ ðEφ ðf ÞÞj2 (1)
data UNSW-NB15 provides the training dataset and testing dataset in the CSV
For each discrete decoder, the result can be represented as Dθ* ðEφ ðf ÞÞ, format. There are 1 label, 37 continuous features, and 5 discrete features
i
in the dataset.
where θ*i is the parameters of the i-th discrete decoder (i 2 f1; ng). Thus,
CICIDS2017 [27] is another state-of-art dataset proposed by Cana-
all results from each discrete decoder will be concatenated based on their
dian Institute for Cybersecurity in 2017. This dataset consists of benign
original positions through function M, which is as shown as follows:
network events and six up-to-date common attacks, which are produced
  by realistic background traffic. We removed the flag features, such as
Dθ* ¼ M Dθ*1 ðEφ ðf ÞÞ; Dθ*2 ðEφ ðf ÞÞ; …; Dθ*n ðEφ ðf ÞÞ
“Fwd PSH Flags” and “Fwd URG Flags”. For the discrete feature(desti-
nation port), we selected common application port numbers as its range
Each discrete feature f *k;t can be represented with a one-hot encoded
of values, such as port 21, port 53, port 80, and so on. We selected 1 label,
vector xt . The cross-entropy loss of the discrete features L *
can be rep- 59 continuous features, and 1 discrete feature from the CICIDS2017
resented as Eq. (2) to minimize the reconstruction error: dataset.
For all datasets, discrete features are one-hot encoded, and each
n X
X dt
L * ðφ; θ* Þ ¼ xjt logbx jt (2) continuous feature is logarithmically transformed and then is scaled by
t¼1 j¼1 Min-Max normalization (Eq. (3)) to eliminate the impact of different

j
where dt is the dimension of xt . xtj and b
x t are the real and generated
values of j-th dimension, respectively.
A GAN is used to learn the flexible prior distribution of the latent
space in the AIDAE. c is the code from the encoder, and z is a random
vector. The proposed GAN training scheme can ensure the generated
0
code c follows the code distribution from the encoder. The min-max
optimization for the GAN can be written as follows:

minmaxEcePðcÞ ½logD ðcÞ þ EzePðzÞ ½logð1  D ðG ðzÞÞÞ


G D Fig. 4. The features concatenating process.

456
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

values range between features. which is from 93.472.53% to 3.021.28%. The results show that the
features generated by the AIDAE successfully disable the baseline IDS.
0 x  xmin
x ¼ (3) Fig. 5 shows the evasion increase rates of the baseline algorithms on
xmax  xmin
different datasets. According to Eq. (4), a higher evasion rate indicates
where x is a feature value, xmin is the minimum logarithm value of this that more adversarial examples can evade the IDS. The experimental
0 results show that all EIRs of different baseline detection methods on three
particular feature, xmax is the maximum logarithm value, and x is the
datasets are higher than 0.9, which indicates that the proposed method
value after normalization.
can generate features to evade the IDS.
For each dataset, 100,000 records were randomly selected to create
new datasets, which are composed of a training set Dtrain and a testing set
Dtest , to evaluate the proposed model. Dtrain includes 30,000 normal re- 5.3. Performance of AIDAE
cords and 30,000 anomaly records, and Dtest includes 20,000 normal
records and 20,000 anomaly records. All experiments were conducted by The performance of generating features is another important aspect of
using Python and PyTorch on an RHEL7.5 server with Intel Xeon W-2133 evaluating the proposed AIDAE. Therefore, we evaluated whether the
3.6 GHz CPU and Nvidia Quadro P5000 GPU. model can efficiently generate diverse features.
For the feature generation model, the greater diversity of features
means the larger range of the feature value. Because the values of the
5.2. Effectiveness of AIDAE
discrete features are from a fixed set, we only evaluate the diversity of
continuous features. To obtain diverse feature values, we used ReLU as
In this evaluation, we focused on whether the generated features can
the activation function of the last layer in the continuous feature decoder.
disable the IDS. For evaluating the evasion ability of the generated fea-
The diversity rate ϝ can be formulated as Eq. (5).
tures, the Detection Rate (DR) and the Evasion Increase Rate (EIR) [8]
were measured, showing the effectiveness of the proposed features 0 0
fmax  fmin
generative model directly. The DR means the proportion of the correctly F¼ (5)
fmax  fmin
detected anomaly features by the IDS to all detected anomaly features.
The EIR reflects the evasion ability by comparing the adversarial DR and 0 0 0
where fmax is the maximum value of the generated feature f , and fmin is
the original DR, which is formulated as Eq. (4). 0
the minimum value of f . fmax and fmin are the maximum value and the
Adversarial DR minimum value of the original feature f, respectively. We used ϝ to
EIR ¼ 1  (4) evaluate the proposed model’s ability to generate continuous features. If
Original DR
0
ϝ is greater than 1, it means that the value range of f is greater than the
In the training phase, Dtrain was used to train the intrusion detection
value range of f, and vice versa.
classifiers, and the proposed AIDAE was trained only by the normal re-
Fig. 6 provides the diversity rates of the continuous features gener-
cords in Dtrain . In the testing phase, the 20,000 anomaly records in Dtest
ated in the NSL-KDD dataset. Except for feature 13, all diversity rates are
were fed to the intrusion detection classifiers to obtain the original DR.
higher than 0.95. The uneven value distribution is the reason for the
Then, we evaluated the DR of 20,000 generated records that were labeled
anomalous diversity rate. For feature 13, the number of samples whose
as anomalies. We repeated the experiments five times, and randomly
original values are higher than f 0 max is only 0.65% of the total sample
reselected records from the original dataset to create the D for each
number, and the generated value can cover 99.35% training samples’
evaluation.
Table 4 shows the DRs on different datasets. Both the DRs of ML/DL
baseline IDS methods decrease dramatically. For the NSL-KDD dataset,
The DR of CNN þ LSTM decreases from 98.71 0.86% to 1.44 0.39%.
Moreover, LR has the minimum decrease, which is from around 87.51
3.73% to around 5.53  1.02%. For the UNSW-NB15 dataset, CNN þ
LSTM has the maximum decrease from around 98.830.61% to around
1.510.46%, and k-NN has the minimum decrease from around
97.140.70% to around 7.111.38%. For the CICIDS2017 dataset, the
maximum decrease happens in CNN þ LSTM which decreases from
99.150.06% to 0.940.11%, and the minimum decrease appears in LR,

Table 3
Feature number.
Dataset Continuous Features Discrete Features

NSL-KDD 33 7
UNSW-NB15 37 5
CICIDS2017 59 1
Fig. 5. Evasion increase rates of baseline algorithms.

Table 4
Detection Rate(%) on different datasets.
Methods NSL-KDD UNSW-NB15 CICIDS2017

Original DR Adversarial DR Original DR Adversarial DR Original DR Adversarial DR

LR 87.513.73 5.531.02 96.521.42 2.541.07 93.472.53 3.021.28


k-NN 91.643.16 2.420.82 97.140.70 7.111.38 98.250.36 1.830.17
DT 95.882.14 3.010.33 98.500.31 4.722.64 97.180.85 3.941.45
AdaBoost 93.531.42 2.451.24 95.211.25 2.130.63 96.571.33 1.390.24
RF 95.150.75 1.560.61 98.780.04 2.771.41 98.720.34 2.141.07
CNN þ LSTM 98.710.86 1.440.39 98.830.61 1.510.46 99.150.06 0.940.11

457
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

Fig. 6. Diversity rates of generated continuous features of NSL-KDD. Fig. 7. Diversity rates of generated continuous features of UNSW-NB15.

values. Moreover, some features’ diversity rates are higher than 1, which
means that the AIDAE generates new values for some features. According
to Table 4, the generated features can disable the IDS so that the attackers
can use these new values. Therefore, the proposed generative framework
can learn well the manifolds of continuous features. Figs. 7 and 8 show
the diversity rates of the generated continuous features on the UNSW-
NB15 dataset and the CICIDS2017 dataset, respectively. The diversity
rates of the two experimental results are all higher than 92%, which
means the generated features have a wide range of values, and the pro-
posed method can be used to generate features.
Since the discrete feature value is from a fixed value set, we measured
the distribution similarity between the generated discrete features and
the original discrete features to evaluate the performance of the AIDAE.
The Jensen-Shannon Divergence (JSD) [28] was used in this evaluation.
For each dataset, we randomly sampled 20,000 normal records from
Dtrain as Tstandard , and sampled another 20,000 records from the generated
records as Tgen . We also sampled 20,000 normal records from Dtest as Treal .
We computed the Average Jensen-Shannon Divergence (AJSD) between
the discrete features from Tstandard and the Tgen , and compared it with the Fig. 8. Diversity rates of generated continuous features of CICIDS2017.
AJSD between Tstandard and Treal . The AJSD is shown as Eq. (6).

n  X   
1X 1 di pi ðjÞ 1 Xdi
qi ðjÞ
AJSD ¼ pi j log þ qi j log (6)
n i¼1 2 j¼1 Mi ðjÞ 2 j¼1 Mi ðjÞ

where n is the number of discrete features, and di is the dimension of the


i-th feature. pi ðjÞ and qi ðjÞ represent the possibility of the j-th dimension
from the i-th discrete feature in different distributions, and Mi ðjÞ is equal
to ðpi ðjÞ þ qi ðjÞÞ=2.
Fig. 9 shows that the AJSD between Tstandard and Tgen is similar to the
AJSD between Tstandard and Treal , which means that the proposed method
can learn well the distribution of discrete features.
As shown in Table. 5, the time cost is another evaluation metric. The
400 epochs training time of NSL-KDD, UNSW-NB15, and CICIDS2017
datasets are 876.64 s, 639.79 s, and 714.28 s, respectively. The time of
generating 100,000 records by the trained models are 3.13 s, 2.76 s, and
2.92 s, respectively. The results suggest that the AIDAE has low compu-
tational complexity.

5.4. Generating network traffic Fig. 9. Average Jensen-Shannon Divergence.

To disable the IDS, the attackers can generate traffic flows with ma-
Table 5
licious payloads from the generated features. Inspired by the patents in
Time cost(seconds) of both datasets.
Refs. [29,30], we introduce the network traffic generation procedure in
Dataset Training of AIDAE Generating features
this section. We categorized the generated features as flow-level features
and statistic-level features. The flow-level features include packet size, NSL-KDD 876.64 3.13
packet time features, protocol, etc. The statistic-level features include the UNSW-NB15 639.79 2.76
CICIDS2017 714.28 2.92
number of connections that contain the same service and the source

458
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

address in 100 connections, etc. Note that the generated value of the algorithm designer does not consider the evasion attack, the feature se-
integer feature needs to be rounded in the generation procedure. The lection may degrade the detection performance of the model because the
generation procedure is as follows: attackers need to manipulate fewer features to initiate attacks. Zhang
et al. [10] proposed an adversarial feature selection method for evading
1. Determining the protocol and status of each layer; attacks to tackle the above issues, which used an optimization criterion to
2. Generating the payload according to the distribution of payload size; maximize the classifier’s generalization capability and security to
3. Constructing the header of each layer protocol; evasion. Although outperforming traditional approaches in classifier se-
4. Determining the transmission time of packets and doing retrans- curity, the feature selection method is not suitable for the proposed
mission operation according to the corresponding features; AIDAE. The AIDAE can generate features that conform to the normal
5. Modifying the flows to fit the other flow-level features; distribution, and can maintain the correlation between different features,
6. Modifying the flows to fit the statistic-level features in the generation so the generated feature subsets also follow the distribution of normal
procedure. features. Thus, the generated features can evade the feature selection--
based defenses.
Although NSL-KDD, UNSWNB-15, and CICIDS2017 are bidirectional
flows, we consider both unidirectional flow and bidirectional flow sce-
narios. The unidirectional flow can be used in the scenario where the 6.2. Adversarial attack detection
attackers send the traffic flows with malicious payloads to the target, and
the bidirectional flow can be used in the scenario where the attackers Adversarial attacks seriously compromise the security of ML/DL ap-
communicate with malicious clients. The cost time and the number of plications. In the past few years, researchers gave different explanations
generated packets are presented in Fig. 10 and Fig. 11, respectively. for the adversarial examples. According to Ref. [32], the adversarial
It can be seen from the results that the time consumption and packet examples are hard to find because adversarial examples represent
number scale linearly as the flow number grows. For the unidirectional low-probability “pockets” in the examples manifold. Moreover, Good-
flow, the time of generating 2,000 unidirectional flows (25,683 packets) fellow et al. [33] demonstrated that the neural networks’ linear nature is
by using the generated features based on the NSL-KDD dataset is 41.76 s, the primary cause of their vulnerability to adversarial examples.
the time of generating 2,000 unidirectional flows (44,713 packets) by To detect the adversarial attacks, substantial approaches have been
using the generated features based on the UNSW-NB15 dataset is 77.61 s, devised in recent literature. Kantchelian et al. [11] used a
and the time of generating 2,000 unidirectional flows (26,579 packets) prediction-based algorithm to create adversarial instances and added
by using the generated features based on the CICIDS2017 dataset is them to the training data to harden the detection model for evasion at-
52.17 s. Additionally, the time of generating 2,000 bidirectional flows tacks. The robust optimization [12] is another solution, which smooths
(35,640 packets) with the generated NSL-KDD features is 71.89 s, the the decision boundaries of the ML algorithm to limit the influence of
time of generating 2,000 bidirectional flows (95,963 packets) with the adversarial samples. The adversarial samples can be identified because
generated UNSW-NB15 features is 204.96 s, and the time of generating they are only similar to, but do not follow, the distribution of the normal
2,000 bidirectional flows (61,278 packets) with the generated samples. Unlike the adversarial examples, we trained a generator to learn
CICIDS2017 features is 159.53 s. We only calculated the time of con- normal features manifolds and generate features for the attackers to
structing packets and sending the packets throughout the network in- generate network traffic with malicious payloads. Therefore, the adver-
terfaces. The packet transportation time and the waiting time for fitting sarial attack detection can not identify the proposed attacks.
the time features distribution were not calculated.
6.3. Payload-based detection
6. Discussion on defense mechanisms
Payloads are raw data encapsulated in network frames, such as the
Defenses against evasion attacks that identify and block nomalies to contents of the removed IP header and TCP/UDP header in the datagram
reduce the effects of malicious network communications are challenging structure. Analyzing payloads is an effective method of identifying
tasks in cybersecurity. In this section, we focus on the potential defenses network anomalies because the malicious payload is an inevitable part of
against the proposed method and discuss how our attack evades them. the network attack traffic [34]. However, the encryption protocol is
widely used in real network communication that makes it hard to detect
6.1. Feature selection the payloads. Although some researchers focus on encryption traffic
classification [35], these methods can only classify the network traffic
Feature selection is one of the core steps in ML/DL methods, which with known patterns, but can not detect the 0-day attacks. Consequently,
selects a subset of relevant features to improve the classification perfor- the attackers can use the proposed method to generate network traffic
mance or reduce the computational complexity [31]. However, if the with encrypted payloads to evade payload-based detection.

Fig. 10. Time cost and packets number of the generated unidirectional flow.

459
J. Chen et al. Digital Communications and Networks 7 (2021) 453–460

Fig. 11. Time cost and packet number of the generated bidirectional flow.

7. Conclusions [13] L. Chen, Y. Ye, T. Bourlai, Adversarial machine learning in malware detection: arms
race between evasion attack and defense, in: 2017 European Intelligence and
Security Informatics Conference (EISIC), IEEE, 2017, pp. 99–106.
In this paper, we propose the AIDAE framework against the existing [14] L. Liu, O. De Vel, Q.-L. Han, J. Zhang, Y. Xiang, Detecting and preventing cyber
IDSs. Compared with other generation methods, our proposed AIDAE can insider threats: a survey, IEEE Communications Surveys & Tutorials 20 (2) (2018)
not only generate features matching normal feature distribution, but also 1397–1417.
[15] R. Coulter, Q. L. Han, L. Pan, J. Zhang and Y. Xiang, "Data-Driven Cyber Security in
keep the correlation between the generated continuous and discrete Perspective-Intelligent Traffic Analysis," in IEEE Transactions on Cybernetics, vol.
features. The attackers can initiate attacks by using the generated fea- 50, no. 7, pp. 3081-3093, July 2020, doi: 10.1109/TCYB.2019.2940940.
tures to generate network traffic flow. Experiments prove that our pro- [16] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, D. Mukhopadhyay,
Adversarial Attacks and Defences: A Survey, arXiv preprint arXiv:1810.00069.
posed framework can indeed generate features to disable the baseline [17] L. Tong, B. Li, C. Hajaj, C. Xiao, N. Zhang, Y. Vorobeychik, Improving robustness of
IDS. In our future work, we have a significant interest in defending ml classifiers against realizable evasion attacks using conserved features, in: 28th
against this type of attack. Moreover, we will research the evasion attacks USENIX Security Symposium, vol. 19, USENIX Security, 2019, pp. 285–302.

[18] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndi
c, P. Laskov, G. Giacinto, F. Roli,
based on the semantical level information. Evasion attacks against machine learning at test time, in: Proc Conf, ECML-PKDD,
Springer, 2013, pp. 387–402.
Declaration of competing interest [19] S. Yu, S. Guo, I. Stojmenovic, Fool me if you can: mimicking attacks and anti-attacks
in cyberspace, IEEE Trans. Comput. 64 (1) (2015) 139–151.
[20] M. Rigaki, S. Garcia, Bringing a gan to a knife-fight: adapting malware
The authors declare that they have no known competing financial communication to avoid detection, in: Proc IEEE Symp Secur Priv Workshops, SPW,
interests or personal relationships that could have appeared to influence 2018, pp. 70–75.
[21] W. Hu, Y. Tan, Generating Adversarial Malware Examples for Black-Box Attacks
the work reported in this paper.
Based on gan, arXiv preprint arXiv:1702.05983.
[22] I. Rosenberg, E. Gudes, Bypassing system calls–based intrusion detection systems,
References Concurrency Comput. Pract. Ex. 29 (16) (2017), e4023.
[23] S. Rezaei, X. Liu, Deep learning for encrypted traffic classification: an overview,
[1] N. Sun, J. Zhang, P. Rimba, S. Gao, L.Y. Zhang, Y. Xiang, Data-driven cybersecurity IEEE Commun. Mag. 57 (5) (2019) 76–81.
incident prediction: a survey, IEEE Communications Surveys & Tutorials 21 (2) [24] J.J. Zhao, Y. Kim, K. Zhang, A.M. Rush, Y. LeCun, Adversarially regularized
(2018) 1744–1772. autoencoders, in: Proc Int Conf, ICML, 2018, pp. 5897–5906.
[2] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, V.C. Leung, A survey on security threats and [25] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the kdd cup
defensive techniques of machine learning: a data driven view, IEEE Access 6 (2018) 99 data set, in: Proc IEEE Symp, CISDA, 2009, pp. 1–6.
12103–12117. [26] N. Moustafa, J. Slay, Unsw-nb15: a comprehensive data set for network intrusion
[3] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, Y. Guan, Network traffic detection systems (unsw-nb15 network data set), in: Proc Conf, MilCIS, 2015,
classification using correlation information, IEEE Trans. Parallel Distr. Syst. 24 (1) pp. 1–6.
(2012) 104–117. [27] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward Generating a New Intrusion
[4] I. Corona, G. Giacinto, F. Roli, Adversarial attacks against intrusion detection Detection Dataset and Intrusion Traffic Characterization, 2018.
systems: taxonomy, solutions and open issues, Inf. Sci. 239 (2013) 201–225. [28] B. Fuglede, F. Topsoe, Jensen-shannon divergence and hilbert space embedding, in:
[5] P. Mishra, V. Varadharajan, U. Tupakula, E.S. Pilli, A detailed investigation and International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings,
analysis of using machine learning techniques for intrusion detection, IEEE IEEE, 2004, p. 31.
Communications Surveys & Tutorials 21 (1) (2018) 686–728. [29] H. Cheng, L. Pan, Method and Apparatus for Network Traffic Simulation, Aug. 22
[6] G. Apruzzese, M. Colajanni, L. Ferretti, M. Marchetti, Addressing adversarial attacks 2017 uS9740816B2.
against security systems based on machine learning, in: 2019 11th International [30] J. Tang, D. Wang, X. Zhou, L. Dong, Q. Yan, X. Zhang, H. Zhi, J. Zhang, Y. Wu,
Conference on Cyber Conflict (CyCon), vol. 900, IEEE, 2019, pp. 1–18. H. Jin, A Method for Generating Network Traffic Data Stream Based on the Feature,
[7] X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, K. Ren, Android hiv: a Nov. 13 2018, cN105049277B.
study of repackaging malware for evading machine-learning detection, IEEE Trans. [31] S. Aljawarneh, M. Aldwairi, M.B. Yassein, Anomaly-based intrusion detection
Inf. Forensics Secur. 15 (2019) 987–1001. system through feature selection analysis and building hybrid efficient model,
[8] Z. Lin, Y. Shi, Z. Xue, Idsgan: Generative Adversarial Networks for Attack Journal of Computational Science 25 (2018) 152–160.
Generation against Intrusion Detection, arXiv preprint arXiv:1809.02077. [32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus,
[9] Y. Cao, J. Yang, Towards making systems forget with machine unlearning, in: 2015 Intriguing Properties of Neural Networks, arXiv preprint arXiv:1312.6199.
IEEE Symposium on Security and Privacy, IEEE, 2015, pp. 463–480. [33] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
[10] F. Zhang, P.P. Chan, B. Biggio, D.S. Yeung, F. Roli, Adversarial feature selection examples, in: International Conference on Learning Representations, 2015.
against evasion attacks, IEEE transactions on cybernetics 46 (3) (2015) 766–777. [34] H. Liu, B. Lang, M. Liu, H. Yan, Cnn and rnn based payload classification methods
[11] A. Kantchelian, J.D. Tygar, A. Joseph, Evasion and hardening of tree ensemble for attack detection, Knowl. Base Syst. 163 (2019) 332–341.
classifiers, in: International Conference on Machine Learning, 2016, [35] G. Aceto, D. Ciuonzo, A. Montieri, A. Pescape, Mobile encrypted traffic
pp. 2387–2396. classification using deep learning: experimental evaluation, lessons learned, and
[12] P. Russu, A. Demontis, B. Biggio, G. Fumera, F. Roli, Secure kernel machines against challenges, IEEE Trans Netw Serv Man.
evasion attacks, in: Proceedings of the 2016 ACM Workshop on Artificial
Intelligence and Security, ACM, 2016, pp. 59–69.

460

You might also like