1 s2.0 S2352864820302868 Main
1 s2.0 S2352864820302868 Main
A R T I C L E I N F O A B S T R A C T
Keywords:                                                                                Due to the increasing cyber-attacks, various Intrusion Detection Systems (IDSs) have been proposed to identify
Intrusion detection                                                                      network anomalies. Most existing machine learning-based IDSs learn patterns from the features extracted from
Cyber attacks                                                                            network traffic flows, and the deep learning-based approaches can learn data distribution features from the raw
Autoencoder
                                                                                         data to differentiate normal and anomalous network flows. Although having been used in the real world widely,
Generative adversarial networks
                                                                                         the above methods are vulnerable to some types of attacks. In this paper, we propose a novel attack framework,
                                                                                         Anti-Intrusion Detection AutoEncoder (AIDAE), to generate features to disable the IDS. In the proposed frame-
                                                                                         work, an encoder transforms features into a latent space, and multiple decoders reconstruct the continuous and
                                                                                         discrete features, respectively. Additionally, a generative adversarial network is used to learn the flexible prior
                                                                                         distribution of the latent space. The correlation between continuous and discrete features can be kept by using the
                                                                                         proposed training scheme. Experiments conducted on NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show
                                                                                         that the generated features indeed degrade the detection performance of existing IDSs dramatically.
    * Corresponding author.
      E-mail address: [email protected] (Y. Zhao).
https://doi.org/10.1016/j.dcan.2020.11.001
Received 11 September 2019; Received in revised form 2 November 2020; Accepted 17 November 2020
Available online 25 November 2020
2352-8648/© 2020 Chongqing University of Posts and Telecommunications. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J. Chen et al.                                                                                                    Digital Communications and Networks 7 (2021) 453–460
used to generate features to disable the IDS [8]. A series of solutions have         techniques have advanced radically in the past years. However, they are
been proposed to mitigate the adversarial attacks against ML/DL tech-                vulnerable to the adversarial attacks, where the adversaries initiate at-
niques. To defend attacks in the training phase, data sanitization [9] was           tacks to compromise the detector [16].
used to identify and remove the poisoned data from the training dataset.                 The adversaries can modify the input instances to evade detection
To defend attacks in the testing phase, feature selection [10], adversarial          from the IDS. The evasion attacks can be divided into two categories: 1)
training [11], and robust optimization methods [12] were proposed by                 problem space attacks that generate malicious instances for the specific
researchers.                                                                         detection system, and 2) feature space attacks where the attackers
    In the network security field, the conflict between the attackers and              manipulate the features used by ML/DL to bypass the detection [17]. For
the defenders leads to an escalating arms race, where both attacks and               the problem space attacks, Biggio et al. [18] used a gradient-based
defenses continually evolve to achieve their goals and overcome their                approach to systematically assess the security of several classification
opponents [13,14]. The ML-based and DL-based detectors learn the                     algorithms against the evasion attacks. For the feature space attacks,
normal or anomalous data features to identify malicious network traffic               different approaches have been proposed. Yu et al. [19] used a
such that the attackers can generate network traffic flows with specific                semi-Markov model to simulate four user browsing behavior parameters
patterns to disable the IDS.                                                         to initiate attacks. However, the simulated parameters are simple, and
    In this paper, we propose a feature generative framework against the             the second-order statistical metrics can detect this attack.
IDS. Different from existing adversarial attacks that carefully craft                    In computer vision research, the GAN and AutoEncoder (AE) have
adversarial perturbations to samples, the proposed method learns the                 shown their powerful ability in generating high-quality fake images or
distribution of the normal features and can generate features that follow            videos. Inspired by the success of the GAN and AE, some researchers used
the distribution of the normal features to bypass the IDS. Our purpose is            the generative model to disable the IDS. A self-adapting malware
not to promote cyber attacks and crimes, but rather to explore the limits            communication framework [20] was proposed, where a GAN was
of the IDS and improve the robustness of the detector. Our contributions             deployed to generate parameters. The malware received the generated
can be summarized as follows:                                                        parameters and used them to mimic the normal Facebook chat network
                                                                                     traffic. The generative model can only generate three continuous features
  We propose a novel feature generative model, namely, the Anti-                    and does not divide the features into discrete features and continuous
   Intrusion Detection AutoEncoder (AIDAE), to learn the distribution                features, so it is hard to generate real network traffic with the generated
   of normal features and randomly generate features that can be used                features. Another method, namely, MalGAN [21], used the GAN to
   by attackers to generate real network traffic flows to bypass the                   generate binary malware features to disable the black-box ML malware
   existing IDSs.                                                                    detection system, but these generated binary features are hard to repre-
  The multi-channel decoders are separated into continuous and                      sent complex network traffic features. Additionally, Lin et al. [8]
   discrete channels to generate continuous and discrete features,                   designed the IDSGAN to generate features to deceive and evade the IDS.
   respectively. Moreover, the generated features can keep the correla-              Because this method needs the outputs of the IDS to calculate the loss in
   tion between continuous and discrete parts via the same well-trained              each training epoch, the IDS can easily identify this adversarial training
   encoder. Thus the generated features can follow the original distri-              pattern and block it. Different from other evasion attack methods, the
   bution of normal features.                                                        proposed AIDAE learns the distribution of the normal features to
  We evaluate the AIDAE on three representative network anomaly                     generate features randomly and does not need the feedback of the IDS in
   detection datasets (NSL-KDD, UNSW-NB15, and CIC-IDS-2017) with                    the training procedure. Moreover, the AIDAE takes into account the
   the ML and DL baseline IDSs. Experimental results show that the                   correlation between continuous features and discrete features in the
   features generated by the proposed framework indeed disable the                   feature generation process. Therefore, the AIDEA can be used by at-
   baseline IDSs.                                                                    tackers in real scenarios.
   The rest of this paper is organized as follows: we introduce the related          3. Attack model
work in Section 2. The attack model is given in Section 3. In Section 4, we
describe the proposed AIDAE framework. The experimental evaluation is                    A network intrusion is an unauthorized penetration of the target
presented in Section 5, and the potential defense mechanisms are dis-                network, where the attackers transmit malicious information or misuse
cussed in Section 6. In Section 7, we give the conclusion. To improve                network resources. The IDS is a protector of the target network, which
readability, the main acronyms used in this paper are listed in Table 1.             plays an adversarial game with the intruders in the network world. To
                                                                                     facilitate understanding of the intrusion attack, it is necessary to disclose
2. Related work                                                                      the goal, knowledge, and capability of the attacker. Fig. 1 presents the
                                                                                     evasion attack scenario, where the network traffic of attacker 1 is iden-
    Cybersecurity has attracted widespread attention from research                   tified as anomaly traffic and is blocked by the IDS, and attacker 2 by-
communities, and numerous anomaly detection methods based on ML/                     passes the IDS through the network traffic camouflage technique.
DL have been proposed [15]. The ML/DL techniques are strong and
effective learning frameworks for complex classification tasks, and these
   Table 1
   Acronyms used in the manuscript.
      Acronym                         Definition
                                                                               454
J. Chen et al.                                                                                                   Digital Communications and Networks 7 (2021) 453–460
    Attacker’s goal. In this attack setting, the attacker’s goal is to keep         flexible prior distribution of latent space in the proposed method. The
communication with the target network so that the IDS can not detect the            AIDAE learns the distribution of the traffic flow features and then uses
network traffic with malicious content. To achieve this goal, the attacker           latent space codes to reconstruct the features that follow the distribution
needs to generate traffic following the distribution of the normal fea-              of the normal features by multi-channel decoders.
tures, which can disable the IDS.                                                       The proposed framework is composed of two major parts, the AE and
    Attacker’s knowledge. The attacker’s knowledge about the target                 the GAN. The encoder of the AE transforms the traffic features into a
IDS is vital for launching an evasion attack. According to Kerckhoffs’s             latent space code, and the decoders reconstruct the features from the
principle, the attacker knows the details about the IDS [22]. Such                  code. The discriminator of the GAN discriminates whether the code is
knowledge may include the training data, detection algorithm, sample                real or fake, and the generator generates a fake code via a random vector
features, detection procedure, and others. In this scenario, the attackers          input and tries to disable the discriminator. The structure of the AIDAE is
can manipulate their network packets to bypass the specific detection                presented in Fig. 2. After training, the generated fake codes are sent to
algorithms. However, the IDS struggles to protect itself from attacks in            decoders to reconstruct the features. Because the fake code is generated
                                                                                                                                     0
the real world, so the attacker can not know everything about the IDS. In           from a random vector z, and the fake code c is random, the features
this paper, we assume that the attacker knows the features extracted from           reconstructed by decoders are random.
the network traffic by the IDS.                                                          The network structures of the AIDAE are shown in Table 2, where
    Attacker’s capability. The attacker’s capability is limited to spoof or         input size is the dimension of the features, z size is the dimension of the
disguise the network traffic in the network intrusion scenario because the           code c, di represents the dimension of the i-th one-hot encoded discrete
attacker has no permission to access and modify the IDS. As the encrypted           feature, con size represents the dimension of continuous features, and
protocols (TLS) are widely used in network transmission, the traffic                 noise size is the random vector size of the generator.
classifiers based on Deep Packet Inspection (DPI) make it hard to identify
the traffic by the packet payload [23]. Therefore, the attacker can hide             4.2. Continuous and discrete features
the malicious information in the encrypted traffic, and make the
flow-level features and statistical-level features of the generated traffic               Features extracted from traffic flows can be categorized into contin-
obey the normal distribution, thereby initiating attacks.                           uous and discrete features. Fig. 3 shows the features of the NSL-KDD
                                                                                    dataset.
4. Proposed framework                                                                   Such numbers as “507”, “437”, and “14,421” in black represent the
                                                                                    features of “duration”, “sources bytes”, and “destination bytes”, respec-
4.1. Overview of AIDAE                                                              tively. The values of these features are continuous. Thus, we consider
                                                                                    these kinds of features as continuous features. Moreover, numbers in red
    In this section, we present the framework and training procedure of             such as “1”, “12”, and “10” represent discrete features. The values of
the proposed method. To process discrete data, the Adversarially Regu-              these types are within certain ranges. For example, “1” represents the
larized AutoEncoders (ARAE) [24] was proposed to learn more robust                  feature “protocol type”, whose range is from integer 1 to integer 3 to
discrete-space representations. Based on ARAE, we design the AIDAE                  indicate different protocols.
with the multi-channel decoders where each discrete decoder represents
one discrete feature, and the continuous decoder represents all contin-
                                                                                    4.3. Training algorithm
uous features. According to Ref. [24], the model performance is strongly
dependent on the choice of prior distributions of latent space, and the
                                                                                       As shown in Algorithm.1, there are two steps for training the pro-
Gaussian distribution Nð0; 1Þ used as the prior distribution may lead to
                                                                                    posed method. First, we train the encoder, generator, discriminator, and
mode collapse in practice. Therefore, we also use a GAN to learn the
                                                                                    continuous feature decoder. The input of the encoder is all the features
Table 2
The network structures of AIDAE.
  Encoder                            Continuous decoder                 Discrete decoder                  Generator                              Discriminator
  Linear(input_size, 256)            Linear(z_size, 128)                Linear(z_size, 256)               Linear(noise_size, 256)                Linear(z_size, 256)
  LeakyReLU()                        LeakyReLU()                        LeakyReLU()                       ReLU()                                 LeakyReLU()
  Linear(256, 128)                   Linear(128, 256))                  Linear(256, 512)                  Linear(256, 128)                       Linear(256, 512)
  LeakyReLU()                        LeakyReLU()                        LeakyReLU()                       ReLU()                                 LeakyReLU()
  Linear(128, 64)                    Linear(256, 512)                   Linear(512, di )                  Linear(128, z_size)                    Linear(512, 256)
  LeakyReLU()                        LeakyReLU()                        GumbelSoftmax()                                                          LeakyReLU()
  Linear(64, z_size)                 Linear(512, 128)                                                                                            Linear(128, 1)
                                     LeakyReLU()                                                                                                 Sigmoid()
                                     Linear(128, con_size)
                                     ReLU()
                                                                              455
J. Chen et al.                                                                                                               Digital Communications and Networks 7 (2021) 453–460
(continuous and discrete features). Thus, the raw continuous and discrete
features can be represented by a latent space code via the encoder. Then,
we train the discrete feature decoders.
Algorithm 1. AIDAE Training
                                                                                                         Fig. 3. Features of one traffic flow in the NSL-KDD dataset.
5. Evaluation
5.1. Datasets
    The loss function of the AIDAE is described as follows: consider F as                           To evaluate the AIDAE, we conducted experiments on three typical
the set of input features, F ¼ ff 1 ; f 2 ; …; f m g where m is the number of                   intrusion detection datasets, namely, NSL-KDD, UNSW-NB15, and
instances. f ⋄k and f *k ¼ ff *k;1 ; f *k;2 ; …; f *k;n g represent continuous features         CICIDS2017. The details of the features of these datasets are shown in
and discrete features of the k-th instance, respectively, where n is the                        Table 3.
number of discrete features. E and D are the encoder and the decoder, and                           NSL-KDD [25] is refined from the KDD99 dataset, and it is still a
parameters φ and θ⋄ indicate the parameters of the encoder and the                              benchmark dataset for testing different intrusion detection methods.
continuous decoder, respectively. P⋄data represents the distribution of                         Since the feature “num_outbound_cmds” only has a unique value, we
continuous features. For the continuous features, the Mean Square Error                         removed this feature in the experiments. Therefore, there are 1 label, 33
(MSE) loss L ⋄ can be represented as Eq. (1).                                                   continuous features, and 7 discrete features in the NSL-KDD dataset.
                                                                                                    UNSW-NB15 [26] is a well-known network intrusion detection
                                           
                                                                                             dataset, which is released by the Australian Centre for Cyber Security.
L ⋄ ðφ; θ⋄ Þ ¼ Ef ⋄ eP⋄ f ⋄  Dθ⋄ ðEφ ðf ÞÞj2                                   (1)
                       data                                                                     UNSW-NB15 provides the training dataset and testing dataset in the CSV
     For each discrete decoder, the result can be represented as Dθ* ðEφ ðf ÞÞ,                 format. There are 1 label, 37 continuous features, and 5 discrete features
                                                                             i
                                                                                                in the dataset.
where θ*i is the parameters of the i-th discrete decoder (i 2 f1; ng). Thus,
                                                                                                    CICIDS2017 [27] is another state-of-art dataset proposed by Cana-
all results from each discrete decoder will be concatenated based on their
                                                                                                dian Institute for Cybersecurity in 2017. This dataset consists of benign
original positions through function M, which is as shown as follows:
                                                                                                network events and six up-to-date common attacks, which are produced
                                                                                              by realistic background traffic. We removed the flag features, such as
Dθ* ¼ M Dθ*1 ðEφ ðf ÞÞ; Dθ*2 ðEφ ðf ÞÞ; …; Dθ*n ðEφ ðf ÞÞ
                                                                                                “Fwd PSH Flags” and “Fwd URG Flags”. For the discrete feature(desti-
                                                                                                nation port), we selected common application port numbers as its range
     Each discrete feature f *k;t can be represented with a one-hot encoded
                                                                                                of values, such as port 21, port 53, port 80, and so on. We selected 1 label,
vector xt . The cross-entropy loss of the discrete features L          *
                                                                           can be rep-          59 continuous features, and 1 discrete feature from the CICIDS2017
resented as Eq. (2) to minimize the reconstruction error:                                       dataset.
                                                                                                    For all datasets, discrete features are one-hot encoded, and each
                 n X
                 X dt
L * ðφ; θ* Þ ¼                xjt logbx jt                                        (2)          continuous feature is logarithmically transformed and then is scaled by
                 t¼1   j¼1                                                                      Min-Max normalization (Eq. (3)) to eliminate the impact of different
                                                     j
where dt is the dimension of xt . xtj and b
                                          x t are the real and generated
values of j-th dimension, respectively.
   A GAN is used to learn the flexible prior distribution of the latent
space in the AIDAE. c is the code from the encoder, and z is a random
vector. The proposed GAN training scheme can ensure the generated
       0
code c follows the code distribution from the encoder. The min-max
optimization for the GAN can be written as follows:
                                                                                          456
J. Chen et al.                                                                                                                     Digital Communications and Networks 7 (2021) 453–460
values range between features.                                                                 which is from 93.472.53% to 3.021.28%. The results show that the
                                                                                               features generated by the AIDAE successfully disable the baseline IDS.
 0      x  xmin
x ¼                                                                                (3)             Fig. 5 shows the evasion increase rates of the baseline algorithms on
       xmax  xmin
                                                                                               different datasets. According to Eq. (4), a higher evasion rate indicates
where x is a feature value, xmin is the minimum logarithm value of this                        that more adversarial examples can evade the IDS. The experimental
                                                                   0                           results show that all EIRs of different baseline detection methods on three
particular feature, xmax is the maximum logarithm value, and x is the
                                                                                               datasets are higher than 0.9, which indicates that the proposed method
value after normalization.
                                                                                               can generate features to evade the IDS.
    For each dataset, 100,000 records were randomly selected to create
new datasets, which are composed of a training set Dtrain and a testing set
Dtest , to evaluate the proposed model. Dtrain includes 30,000 normal re-                      5.3. Performance of AIDAE
cords and 30,000 anomaly records, and Dtest includes 20,000 normal
records and 20,000 anomaly records. All experiments were conducted by                              The performance of generating features is another important aspect of
using Python and PyTorch on an RHEL7.5 server with Intel Xeon W-2133                           evaluating the proposed AIDAE. Therefore, we evaluated whether the
3.6 GHz CPU and Nvidia Quadro P5000 GPU.                                                       model can efficiently generate diverse features.
                                                                                                   For the feature generation model, the greater diversity of features
                                                                                               means the larger range of the feature value. Because the values of the
5.2. Effectiveness of AIDAE
                                                                                               discrete features are from a fixed set, we only evaluate the diversity of
                                                                                               continuous features. To obtain diverse feature values, we used ReLU as
   In this evaluation, we focused on whether the generated features can
                                                                                               the activation function of the last layer in the continuous feature decoder.
disable the IDS. For evaluating the evasion ability of the generated fea-
                                                                                               The diversity rate ϝ can be formulated as Eq. (5).
tures, the Detection Rate (DR) and the Evasion Increase Rate (EIR) [8]
were measured, showing the effectiveness of the proposed features                                    0       0
                                                                                                    fmax  fmin
generative model directly. The DR means the proportion of the correctly                        F¼                                                                                     (5)
                                                                                                    fmax  fmin
detected anomaly features by the IDS to all detected anomaly features.
The EIR reflects the evasion ability by comparing the adversarial DR and                                  0                                                               0        0
                                                                                               where fmax is the maximum value of the generated feature f , and fmin is
the original DR, which is formulated as Eq. (4).                                                                         0
                                                                                               the minimum value of f . fmax and fmin are the maximum value and the
               Adversarial DR                                                                  minimum value of the original feature f, respectively. We used ϝ to
EIR ¼ 1                                                                           (4)         evaluate the proposed model’s ability to generate continuous features. If
                Original DR
                                                                                                                                                      0
                                                                                               ϝ is greater than 1, it means that the value range of f is greater than the
    In the training phase, Dtrain was used to train the intrusion detection
                                                                                               value range of f, and vice versa.
classifiers, and the proposed AIDAE was trained only by the normal re-
                                                                                                   Fig. 6 provides the diversity rates of the continuous features gener-
cords in Dtrain . In the testing phase, the 20,000 anomaly records in Dtest
                                                                                               ated in the NSL-KDD dataset. Except for feature 13, all diversity rates are
were fed to the intrusion detection classifiers to obtain the original DR.
                                                                                               higher than 0.95. The uneven value distribution is the reason for the
Then, we evaluated the DR of 20,000 generated records that were labeled
                                                                                               anomalous diversity rate. For feature 13, the number of samples whose
as anomalies. We repeated the experiments five times, and randomly
                                                                                               original values are higher than f 0 max is only 0.65% of the total sample
reselected records from the original dataset to create the D for each
                                                                                               number, and the generated value can cover 99.35% training samples’
evaluation.
    Table 4 shows the DRs on different datasets. Both the DRs of ML/DL
baseline IDS methods decrease dramatically. For the NSL-KDD dataset,
The DR of CNN þ LSTM decreases from 98.71 0.86% to 1.44 0.39%.
Moreover, LR has the minimum decrease, which is from around 87.51
3.73% to around 5.53  1.02%. For the UNSW-NB15 dataset, CNN þ
LSTM has the maximum decrease from around 98.830.61% to around
1.510.46%, and k-NN has the minimum decrease from around
97.140.70% to around 7.111.38%. For the CICIDS2017 dataset, the
maximum decrease happens in CNN þ LSTM which decreases from
99.150.06% to 0.940.11%, and the minimum decrease appears in LR,
Table 3
Feature number.
     Dataset                      Continuous Features               Discrete Features
     NSL-KDD                      33                                7
     UNSW-NB15                    37                                5
     CICIDS2017                   59                                1
                                                                                                                 Fig. 5. Evasion increase rates of baseline algorithms.
Table 4
Detection Rate(%) on different datasets.
     Methods                    NSL-KDD                                            UNSW-NB15                                                  CICIDS2017
                                                                                         457
J. Chen et al.                                                                                                         Digital Communications and Networks 7 (2021) 453–460
Fig. 6. Diversity rates of generated continuous features of NSL-KDD. Fig. 7. Diversity rates of generated continuous features of UNSW-NB15.
values. Moreover, some features’ diversity rates are higher than 1, which
means that the AIDAE generates new values for some features. According
to Table 4, the generated features can disable the IDS so that the attackers
can use these new values. Therefore, the proposed generative framework
can learn well the manifolds of continuous features. Figs. 7 and 8 show
the diversity rates of the generated continuous features on the UNSW-
NB15 dataset and the CICIDS2017 dataset, respectively. The diversity
rates of the two experimental results are all higher than 92%, which
means the generated features have a wide range of values, and the pro-
posed method can be used to generate features.
    Since the discrete feature value is from a fixed value set, we measured
the distribution similarity between the generated discrete features and
the original discrete features to evaluate the performance of the AIDAE.
The Jensen-Shannon Divergence (JSD) [28] was used in this evaluation.
For each dataset, we randomly sampled 20,000 normal records from
Dtrain as Tstandard , and sampled another 20,000 records from the generated
records as Tgen . We also sampled 20,000 normal records from Dtest as Treal .
We computed the Average Jensen-Shannon Divergence (AJSD) between
the discrete features from Tstandard and the Tgen , and compared it with the                Fig. 8. Diversity rates of generated continuous features of CICIDS2017.
AJSD between Tstandard and Treal . The AJSD is shown as Eq. (6).
             n  X                                      
          1X    1 di           pi ðjÞ 1 Xdi
                                                     qi ðjÞ
AJSD ¼                pi j log       þ      qi j log                         (6)
          n i¼1 2 j¼1          Mi ðjÞ 2 j¼1          Mi ðjÞ
    To disable the IDS, the attackers can generate traffic flows with ma-
                                                                                         Table 5
licious payloads from the generated features. Inspired by the patents in
                                                                                         Time cost(seconds) of both datasets.
Refs. [29,30], we introduce the network traffic generation procedure in
                                                                                          Dataset                    Training of AIDAE                Generating features
this section. We categorized the generated features as flow-level features
and statistic-level features. The flow-level features include packet size,                 NSL-KDD                    876.64                           3.13
packet time features, protocol, etc. The statistic-level features include the             UNSW-NB15                  639.79                           2.76
                                                                                          CICIDS2017                 714.28                           2.92
number of connections that contain the same service and the source
                                                                                   458
J. Chen et al.                                                                                                  Digital Communications and Networks 7 (2021) 453–460
address in 100 connections, etc. Note that the generated value of the               algorithm designer does not consider the evasion attack, the feature se-
integer feature needs to be rounded in the generation procedure. The                lection may degrade the detection performance of the model because the
generation procedure is as follows:                                                 attackers need to manipulate fewer features to initiate attacks. Zhang
                                                                                    et al. [10] proposed an adversarial feature selection method for evading
1. Determining the protocol and status of each layer;                               attacks to tackle the above issues, which used an optimization criterion to
2. Generating the payload according to the distribution of payload size;            maximize the classifier’s generalization capability and security to
3. Constructing the header of each layer protocol;                                  evasion. Although outperforming traditional approaches in classifier se-
4. Determining the transmission time of packets and doing retrans-                  curity, the feature selection method is not suitable for the proposed
   mission operation according to the corresponding features;                       AIDAE. The AIDAE can generate features that conform to the normal
5. Modifying the flows to fit the other flow-level features;                           distribution, and can maintain the correlation between different features,
6. Modifying the flows to fit the statistic-level features in the generation          so the generated feature subsets also follow the distribution of normal
   procedure.                                                                       features. Thus, the generated features can evade the feature selection--
                                                                                    based defenses.
    Although NSL-KDD, UNSWNB-15, and CICIDS2017 are bidirectional
flows, we consider both unidirectional flow and bidirectional flow sce-
narios. The unidirectional flow can be used in the scenario where the                6.2. Adversarial attack detection
attackers send the traffic flows with malicious payloads to the target, and
the bidirectional flow can be used in the scenario where the attackers                   Adversarial attacks seriously compromise the security of ML/DL ap-
communicate with malicious clients. The cost time and the number of                 plications. In the past few years, researchers gave different explanations
generated packets are presented in Fig. 10 and Fig. 11, respectively.               for the adversarial examples. According to Ref. [32], the adversarial
    It can be seen from the results that the time consumption and packet            examples are hard to find because adversarial examples represent
number scale linearly as the flow number grows. For the unidirectional               low-probability “pockets” in the examples manifold. Moreover, Good-
flow, the time of generating 2,000 unidirectional flows (25,683 packets)              fellow et al. [33] demonstrated that the neural networks’ linear nature is
by using the generated features based on the NSL-KDD dataset is 41.76 s,            the primary cause of their vulnerability to adversarial examples.
the time of generating 2,000 unidirectional flows (44,713 packets) by                    To detect the adversarial attacks, substantial approaches have been
using the generated features based on the UNSW-NB15 dataset is 77.61 s,             devised in recent literature. Kantchelian et al. [11] used a
and the time of generating 2,000 unidirectional flows (26,579 packets)               prediction-based algorithm to create adversarial instances and added
by using the generated features based on the CICIDS2017 dataset is                  them to the training data to harden the detection model for evasion at-
52.17 s. Additionally, the time of generating 2,000 bidirectional flows              tacks. The robust optimization [12] is another solution, which smooths
(35,640 packets) with the generated NSL-KDD features is 71.89 s, the                the decision boundaries of the ML algorithm to limit the influence of
time of generating 2,000 bidirectional flows (95,963 packets) with the               adversarial samples. The adversarial samples can be identified because
generated UNSW-NB15 features is 204.96 s, and the time of generating                they are only similar to, but do not follow, the distribution of the normal
2,000 bidirectional flows (61,278 packets) with the generated                        samples. Unlike the adversarial examples, we trained a generator to learn
CICIDS2017 features is 159.53 s. We only calculated the time of con-                normal features manifolds and generate features for the attackers to
structing packets and sending the packets throughout the network in-                generate network traffic with malicious payloads. Therefore, the adver-
terfaces. The packet transportation time and the waiting time for fitting            sarial attack detection can not identify the proposed attacks.
the time features distribution were not calculated.
                                                                                    6.3. Payload-based detection
6. Discussion on defense mechanisms
                                                                                        Payloads are raw data encapsulated in network frames, such as the
    Defenses against evasion attacks that identify and block nomalies to            contents of the removed IP header and TCP/UDP header in the datagram
reduce the effects of malicious network communications are challenging              structure. Analyzing payloads is an effective method of identifying
tasks in cybersecurity. In this section, we focus on the potential defenses         network anomalies because the malicious payload is an inevitable part of
against the proposed method and discuss how our attack evades them.                 the network attack traffic [34]. However, the encryption protocol is
                                                                                    widely used in real network communication that makes it hard to detect
6.1. Feature selection                                                              the payloads. Although some researchers focus on encryption traffic
                                                                                    classification [35], these methods can only classify the network traffic
    Feature selection is one of the core steps in ML/DL methods, which              with known patterns, but can not detect the 0-day attacks. Consequently,
selects a subset of relevant features to improve the classification perfor-          the attackers can use the proposed method to generate network traffic
mance or reduce the computational complexity [31]. However, if the                  with encrypted payloads to evade payload-based detection.
Fig. 10. Time cost and packets number of the generated unidirectional flow.
                                                                              459
J. Chen et al.                                                                                                                        Digital Communications and Networks 7 (2021) 453–460
Fig. 11. Time cost and packet number of the generated bidirectional flow.
7. Conclusions                                                                                     [13] L. Chen, Y. Ye, T. Bourlai, Adversarial machine learning in malware detection: arms
                                                                                                        race between evasion attack and defense, in: 2017 European Intelligence and
                                                                                                        Security Informatics Conference (EISIC), IEEE, 2017, pp. 99–106.
    In this paper, we propose the AIDAE framework against the existing                             [14] L. Liu, O. De Vel, Q.-L. Han, J. Zhang, Y. Xiang, Detecting and preventing cyber
IDSs. Compared with other generation methods, our proposed AIDAE can                                    insider threats: a survey, IEEE Communications Surveys & Tutorials 20 (2) (2018)
not only generate features matching normal feature distribution, but also                               1397–1417.
                                                                                                   [15] R. Coulter, Q. L. Han, L. Pan, J. Zhang and Y. Xiang, "Data-Driven Cyber Security in
keep the correlation between the generated continuous and discrete                                      Perspective-Intelligent Traffic Analysis," in IEEE Transactions on Cybernetics, vol.
features. The attackers can initiate attacks by using the generated fea-                                50, no. 7, pp. 3081-3093, July 2020, doi: 10.1109/TCYB.2019.2940940.
tures to generate network traffic flow. Experiments prove that our pro-                              [16] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, D. Mukhopadhyay,
                                                                                                        Adversarial Attacks and Defences: A Survey, arXiv preprint arXiv:1810.00069.
posed framework can indeed generate features to disable the baseline                               [17] L. Tong, B. Li, C. Hajaj, C. Xiao, N. Zhang, Y. Vorobeychik, Improving robustness of
IDS. In our future work, we have a significant interest in defending                                     ml classifiers against realizable evasion attacks using conserved features, in: 28th
against this type of attack. Moreover, we will research the evasion attacks                             USENIX Security Symposium, vol. 19, USENIX Security, 2019, pp. 285–302.
                                                                                                                                                        
                                                                                                   [18] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndi
                                                                                                                                                             c, P. Laskov, G. Giacinto, F. Roli,
based on the semantical level information.                                                              Evasion attacks against machine learning at test time, in: Proc Conf, ECML-PKDD,
                                                                                                        Springer, 2013, pp. 387–402.
Declaration of competing interest                                                                  [19] S. Yu, S. Guo, I. Stojmenovic, Fool me if you can: mimicking attacks and anti-attacks
                                                                                                        in cyberspace, IEEE Trans. Comput. 64 (1) (2015) 139–151.
                                                                                                   [20] M. Rigaki, S. Garcia, Bringing a gan to a knife-fight: adapting malware
    The authors declare that they have no known competing financial                                      communication to avoid detection, in: Proc IEEE Symp Secur Priv Workshops, SPW,
interests or personal relationships that could have appeared to influence                                2018, pp. 70–75.
                                                                                                   [21] W. Hu, Y. Tan, Generating Adversarial Malware Examples for Black-Box Attacks
the work reported in this paper.
                                                                                                        Based on gan, arXiv preprint arXiv:1702.05983.
                                                                                                   [22] I. Rosenberg, E. Gudes, Bypassing system calls–based intrusion detection systems,
References                                                                                              Concurrency Comput. Pract. Ex. 29 (16) (2017), e4023.
                                                                                                   [23] S. Rezaei, X. Liu, Deep learning for encrypted traffic classification: an overview,
 [1] N. Sun, J. Zhang, P. Rimba, S. Gao, L.Y. Zhang, Y. Xiang, Data-driven cybersecurity                IEEE Commun. Mag. 57 (5) (2019) 76–81.
     incident prediction: a survey, IEEE Communications Surveys & Tutorials 21 (2)                 [24] J.J. Zhao, Y. Kim, K. Zhang, A.M. Rush, Y. LeCun, Adversarially regularized
     (2018) 1744–1772.                                                                                  autoencoders, in: Proc Int Conf, ICML, 2018, pp. 5897–5906.
 [2] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, V.C. Leung, A survey on security threats and           [25] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the kdd cup
     defensive techniques of machine learning: a data driven view, IEEE Access 6 (2018)                 99 data set, in: Proc IEEE Symp, CISDA, 2009, pp. 1–6.
     12103–12117.                                                                                  [26] N. Moustafa, J. Slay, Unsw-nb15: a comprehensive data set for network intrusion
 [3] J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, Y. Guan, Network traffic                            detection systems (unsw-nb15 network data set), in: Proc Conf, MilCIS, 2015,
     classification using correlation information, IEEE Trans. Parallel Distr. Syst. 24 (1)              pp. 1–6.
     (2012) 104–117.                                                                               [27] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward Generating a New Intrusion
 [4] I. Corona, G. Giacinto, F. Roli, Adversarial attacks against intrusion detection                   Detection Dataset and Intrusion Traffic Characterization, 2018.
     systems: taxonomy, solutions and open issues, Inf. Sci. 239 (2013) 201–225.                   [28] B. Fuglede, F. Topsoe, Jensen-shannon divergence and hilbert space embedding, in:
 [5] P. Mishra, V. Varadharajan, U. Tupakula, E.S. Pilli, A detailed investigation and                  International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings,
     analysis of using machine learning techniques for intrusion detection, IEEE                        IEEE, 2004, p. 31.
     Communications Surveys & Tutorials 21 (1) (2018) 686–728.                                     [29] H. Cheng, L. Pan, Method and Apparatus for Network Traffic Simulation, Aug. 22
 [6] G. Apruzzese, M. Colajanni, L. Ferretti, M. Marchetti, Addressing adversarial attacks              2017 uS9740816B2.
     against security systems based on machine learning, in: 2019 11th International               [30] J. Tang, D. Wang, X. Zhou, L. Dong, Q. Yan, X. Zhang, H. Zhi, J. Zhang, Y. Wu,
     Conference on Cyber Conflict (CyCon), vol. 900, IEEE, 2019, pp. 1–18.                               H. Jin, A Method for Generating Network Traffic Data Stream Based on the Feature,
 [7] X. Chen, C. Li, D. Wang, S. Wen, J. Zhang, S. Nepal, Y. Xiang, K. Ren, Android hiv: a              Nov. 13 2018, cN105049277B.
     study of repackaging malware for evading machine-learning detection, IEEE Trans.              [31] S. Aljawarneh, M. Aldwairi, M.B. Yassein, Anomaly-based intrusion detection
     Inf. Forensics Secur. 15 (2019) 987–1001.                                                          system through feature selection analysis and building hybrid efficient model,
 [8] Z. Lin, Y. Shi, Z. Xue, Idsgan: Generative Adversarial Networks for Attack                         Journal of Computational Science 25 (2018) 152–160.
     Generation against Intrusion Detection, arXiv preprint arXiv:1809.02077.                      [32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus,
 [9] Y. Cao, J. Yang, Towards making systems forget with machine unlearning, in: 2015                   Intriguing Properties of Neural Networks, arXiv preprint arXiv:1312.6199.
     IEEE Symposium on Security and Privacy, IEEE, 2015, pp. 463–480.                              [33] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial
[10] F. Zhang, P.P. Chan, B. Biggio, D.S. Yeung, F. Roli, Adversarial feature selection                 examples, in: International Conference on Learning Representations, 2015.
     against evasion attacks, IEEE transactions on cybernetics 46 (3) (2015) 766–777.              [34] H. Liu, B. Lang, M. Liu, H. Yan, Cnn and rnn based payload classification methods
[11] A. Kantchelian, J.D. Tygar, A. Joseph, Evasion and hardening of tree ensemble                      for attack detection, Knowl. Base Syst. 163 (2019) 332–341.
     classifiers, in: International Conference on Machine Learning, 2016,                           [35] G. Aceto, D. Ciuonzo, A. Montieri, A. Pescape, Mobile encrypted traffic
     pp. 2387–2396.                                                                                     classification using deep learning: experimental evaluation, lessons learned, and
[12] P. Russu, A. Demontis, B. Biggio, G. Fumera, F. Roli, Secure kernel machines against               challenges, IEEE Trans Netw Serv Man.
     evasion attacks, in: Proceedings of the 2016 ACM Workshop on Artificial
     Intelligence and Security, ACM, 2016, pp. 59–69.
460