A_Study_on_High_Speed_Outlier_Detection
A_Study_on_High_Speed_Outlier_Detection
sciences
Article
A Study on High-Speed Outlier Detection Method of Network
Abnormal Behavior Data Using Heterogeneous
Multiple Classifiers
Jaeik Cho 1, * , Seonghyeon Gong 2 and Ken Choi 1
Abstract: As the complexity and scale of the network environment increase continuously, various
methods to detect attacks and intrusions from network traffic by classifying normal and abnormal
network behaviors show their limitations. The number of network traffic signatures is increasing
exponentially to the extent that semi-realtime detection is not possible. However, machine learning-
based intrusion detection only gives simple guidelines as simple contents of security events. This is
why security data for a specific environment cannot be configured due to data noise, diversification,
and continuous alteration of a system and network environments. Although machine learning is
performed and evaluated using a generalized data set, its performance is expected to be similar in
that specific network environment only. In this study, we propose a high-speed outlier detection
method for a network dataset to customize the dataset in real-time for a continuously changing
network environment. The proposed method uses an ensemble-based noise data filtering model
using the voting results of 6 classifiers (decision tree, random forest, support vector machine, naive
Bayes, k-nearest neighbors, and logistic regression) to reflect the distribution and various environ-
Citation: Cho, J.; Gong, S.; Choi, K. mental characteristics of datasets. Moreover, to prove the performance of the proposed method,
A Study on High-Speed Outlier we experimented with the accuracy of attack detection by gradually reducing the noise data in the
Detection Method of Network time series dataset. As a result of the experiment, the proposed method maintains a training dataset
Abnormal Behavior Data Using of a size capable of semi-real-time learning, which is 10% of the total training dataset, and at the
Heterogeneous Multiple Classifiers.
same time, shows the same level of accuracy as a detection model using a large training dataset. The
Appl. Sci. 2022, 12, 1011. https://
improved research results would be the basis for automatic tuning of network datasets and machine
doi.org/10.3390/app12031011
learning that can be applied to special-purpose environments and devices such as ICS environments.
Academic Editor: Seungmin Rho
Keywords: noise reduction; outlier detection; intrusion detection; machine learning for IDS
Received: 8 November 2021
Accepted: 13 January 2022
Published: 19 January 2022
attack on a specific area using this expanding attack vector as well [5]. Therefore, these
types of attacks are the most urgent threat to modern security systems[6,7].
A common method to respond to sophisticated cyber-attacks is to predict and detect
attacks through machine learning or artificial intelligence inference. Many solutions and
techniques adopt machine learning methodologies [8]. The Intrusion Detection System
(IDS) is the most basic and universal means to protect the network environment. IDS
collects various types of data from network traffic to identify and observe user behavior
and uses it to detect abnormal behavior [9]. However, existing machine learning-based
security systems inevitably have to learn the data, which is directly related to the problems
of the size and complexity of data. An attack on the network essentially requires an
immediate or preemptive response in terms of availability [10]. Nowadays, the computing
environment and the amount of data collected are beyond what can be processed in real-
time. Due to this, the security system cannot keep up with the variability of the attacks,
and the issue of the increase in dwell time, which means the difference between the attack
detection time and the actual attack execution time, continues to arise [11,12]. Therefore,
a flexible response is required to detect and respond to sophisticated cyberattacks quickly.
To this end, the attack detection model requires the ability to reflect the characteristics of
the observed data in real-time. Therefore, the most critical problem is how to quickly and
accurately learn large-scale complex data generated from network traffic [13].
This paper proposes an abnormal data detection framework based on large amounts
of complex network traffic data learning effectively and quickly. The proposed method is
mainly composed of two parts. The first part is an ensemble model-based noise (outlier)
data detection and removal. In this part, we train a large dataset as an ensemble model and
filter out if it is relatively unnecessary for classifiers’ model training stage. This mechanism
lowers the complexity and size of abstracted data by selecting only the optimal data for
attack detection model training. The second part is a real-time behavior trend analysis
through recursive model learning. This allows the attack detection model to be quickly
trained using the previously derived optimal dataset. The optimal state of the dataset for
model training is always maintained by repeatedly performing a series of procedures for
classification, noise removal, and model training. In addition, this method allows the model
to quickly reflect the characteristics of newly observed data (new trend) by recursive model
learning the dataset in real-time. This paper presents the results of anomaly detection
experiments on network traffic datasets in general IT environments to verify the method’s
performance and effectiveness for deriving the optimal dataset. This experiment constructs
an optimal dataset in real-time through noise detection and recursive learning through the
proposed method.
The structure of this paper is as follows. Section 2 of this paper introduces related
works to network attack detection, anomaly detection, and noise detection techniques.
Section 3 describes the outline and detailed functional and structural elements of the
network data purification and attack detection framework proposed in this paper. Section 4
describe how to implement the proposed framework. Section 5 describes the experiments
to evaluate the performance of the proposed framework and the results and concludes in
Section 6.
2. Related Work
This section describes some related researches on machine learning-based attack
detection techniques and noise reduction techniques for network traffic data.
2.1. Machine Learning Based Anomaly Detection and Data Noise Reduction for Network
Traffic Dataset
Sharafaldin et al. [14] proposed a methodology to effectively generate a dataset op-
timized for learning IDS in a network environment where noise data exist. This research
proposed data set creation techniques that can effectively reflect the characteristics of each
Appl. Sci. 2022, 12, 1011 3 of 17
attack type for various types of attacks. The dataset created as a result of this research was
published as the CIC-IDS dataset [15].
Anderson et al. [16] researched how to mitigate the adverse effects of mislabeled data
on an attack detection model in the process of analyzing malicious behavior from encrypted
network traffic. This method constructs a model that enhances resistance to noise data
based on data distribution in an environment where the information on network traffic
data is limited.
Yu et al. [17] analyzed the influence of noise data on unbalanced datasets in network
traffic datasets. In addition, this research proposed a method of minimizing the influence
of noise and deriving high performance by appropriately refining data from an unbalanced
dataset using the methodology of Few-shot learning. Ahmed et al. [18] proposed filtering
normal data using AutoEncoder to detect and remove noise data present in a dataset in an
industrial control environment. This study detects noise data and intentionally changes
it into a shape similar to normal data, thereby minimizing data loss and reducing the
influence of noise data.
should use the appropriate kernel and parameters to construct the decision boundary. As
parameters, the margin for error tolerance, flexibility (gamma) for decision boundaries,
and kernel function should be appropriately selected considering the complexity and scale
of network data. For the kernel function (k( x, xi ), for arbitrary data x and support vec-
tor xi , linear, polynomial(( x · xi + 1)d ), Gaussian radial basis function(exp(−γ|| x − xi ||2 )),
sigmoid(tanh(αx T y + c)) are frequently used.
Figure 1. The overall process of the network attack detection framework through the proposed
high-speed noise detection and data purification methodology.
3.1.2. Online Data Collection and Attack Detection Using Ensemble Model
Detection of abnormal behavior on the network needs to be performed using contin-
uously collected traffic data. This step performs attack detection in semi real-time using
continuously observed traffic data. First, the newly observed network traffic data is classi-
fied through an initially learned model in real-time. At the same time, newly observed data
are divided into units of a specific size and used in the next step for noise reduction, data
trend analysis, and model re-learning. The first phase’s two steps can be seen as a general
attack detection technique using the classification model. The proposed method allows
various classification models to be flexibly applied depending on the purpose by con-
structing detecting attacks with multiple classification models. In addition, the first phase
is performed independently of the second step of performing noise reduction, and this
configuration improves the compatibility of the entire system [30].
an optimal training dataset using filtered data and retrains the attack classification model
to reflect the data trend effectively.
4. Implementation
This section describes the detailed structure and mechanisms for implementing the
proposed method. To implement the proposed methodology, the composition of the
ensemble model and the dataset purification mechanism that filters pure data and reduces
noise data must be appropriately organized. The following Figure 2 is a detailed flowchart
of the operation procedure of each element of the proposed methodology. The following
subsections describe the implementation method for each element.
different performance depending on the data distribution and their features. Combining
these models allows consistent performance even for datasets with various distributions
and characteristics. This paper constructs a hard-voting ensemble model using the above
six classifier models. The hard-voting method in the ensemble model filters only the results
of the unanimous classification of the six internal models. This ensemble model selects
only data classified as an attack (6 voted) or normal (0 voted) by all internal models, and all
remaining data (1 to 5 voted) is classified as noise data and removed. This extreme filtering
has the purest form of the decision boundary.
Figure 2. Detailed procedure flow chart for the proposed network attack detection and noise
data removal.
The proposed method uses the unanimity threshold u for the sum of the classification
results of the ensemble model. If the sum of classification results is higher than this
threshold, the data is judged as pure data. These pure data are used for retraining the
next model, and classification results lower than this threshold are excluded from the
retraining process. At the same time, the unanimity threshold is a factor that determines
how sensitively the classification model reflects the trend of newly observed data. If the
unanimity threshold is high, the classification model conservatively reflects the trend,
and if the unanimity threshold is low, the model actively reflects the trend of new data.
This threshold may be applied differently depending on the environment of the network.
In an environment where the pattern of network traffic changes frequently and a flexible
decision boundary is required, it can respond by setting the unanimity threshold low.
On the other hand, in a static environment where almost the same pattern is repeated,
such as ICS, it is possible to respond sensitively to abnormal behavior by applying a high
unanimity threshold.
Appl. Sci. 2022, 12, 1011 8 of 17
The dataset purification mechanism selects data for retraining and merges them with
the existing training dataset. This new dataset is used to update the classification model
temporarily. Then, the updated model performs recursive classification on the dataset used
to train the model. After that, for the recursive classification results, data to be included in
the final training dataset is selected based on the sum of the classification result values of
the sub-models of the classification model.
Dataset purification mechanism removes new noise data in the classification models
reflecting new data trends. This process can adjust the size of the training dataset by
tuning the ratio of increasing new data to decreasing noise data. In this process, the purity
threshold p is used. Like the unanimity threshold, this threshold refers to the reflection rate
of trends and determines the rate of change in the size of the training dataset. The purity
threshold is inversely proportional to the growth rate of the size of the training dataset.
If the purity threshold is high, the size of the training dataset is maintained or gradually
decreased. If the purity threshold value is low, the size of the training dataset continues to
increase in proportion to the new data. Therefore, the purity threshold should be adjusted
appropriately by considering all factors such as the amount of data observed in the network,
the performance of the classification model, and the time used for learning.
The training dataset P finally derived by Algorithm 1 is used to train the model again
so that the trend of the new observation data N is reflected in the training dataset for the
next step.
This experiment was performed in Ubuntu 20.04.3 LTS operating system environment
with Intel i7-9700KF (3.60 GHz) CPU, 64 GB of RAM, and GPU of Geforce RTX 2080 SUPER,
with Jupyter Lab version 3.1.4.
5.3. Parameters Used in Ensemble Model for Attack Detection and Noise Reduction
This section describes the ensemble model used to perform the proposed methodol-
ogy’s noise reduction and attack detection. In this experiment, a voting-based ensemble
model was constructed using the six models (Decision Tree, Random Forest, SVM, Naive-
Bayes, K-NN, Logistic Regression) described in Section 2.2. The constructed model was
Appl. Sci. 2022, 12, 1011 10 of 17
used for both network attack detection and noise reduction. The following is a description
of the parameters used in each internal model.
5.3.3. SVM
We assume that there is noise in the dataset in our experiments, which we mean here
includes data that has a low impact on constructing the decision boundary of the overall
model. Therefore, when the hyperplane of the SVM is determined, we set the regularization
parameter to 0.8, lower than the default value of 1.0, so that the entire model can form a
flexible decision boundary by forming a sufficient margin for the noise data. Also, due to
the complexity of the dataset, we use a linear kernel for the availability of classification.
5.3.4. Naive-Bayes
On the dataset used in this experiment, all 77 features had real type values, and no
unusual distribution was identified in the dataset. Thus, we constructed an ensemble
model using the Gaussian Naive-Bayes model to enhance the coverage of natural events.
5.3.5. K-NN
Although the K-NN model can perform classification most intuitively, its computa-
tional complexity is very high due to the complexity of features and the number of entries
in a given dataset. Therefore, we set a lower criterion of 3 than the default value of 5, which
reasonably compromises computational performance. Also, these shallow criteria make
the model more responsive to rapidly changing data patterns.
used as new data by dividing the data by a specific time unit. Experiments were performed
to compare the performance of the proposed method with the ideal machine learning
detection method. In the ideal experiment to compare the performance of the proposed
method, a greedy method was used in which all observed data up to a specific time point
were used for training. The new observation data were sequentially divided into 100 folds
for the experiment, with 2057 data observed per particular unit time. An experiment on the
accuracy of attack detection and classification using the proposed method was performed
by reflecting new observation data in the existing training data and updating the training
dataset at every point in time.
The following Figures 3–10 are graphs showing the experimental and control groups’
attack detection performance and learning time. Figures 3, 5, 7 and 9 are graphs of the classi-
fication accuracy of the experimental group and the control group, and Figures 4, 6, 8 and 10
show the learning time of the experimental group and the control group. In each graph,
the graph of the experimental group is presented by an orange line, and a blue line indicates
the graph of the control group. In addition, in the classification accuracy graph, the label
of the test set is indicated by a green line, a value of 1 means attack data, and a value of
0 means normal data.
In the attack detection accuracy graph in Figure 3, the ideal approach shows higher
classification performance than the proposed method. This result is predictable because the
ideal approach used more training data. However, in the classification accuracy after the
86th fold, where the last attack pattern was observed, it can be confirmed that the accuracy
in the ideal approach is consistently low even though only normal data is observed. This
is because the data observed in the past data patterns acted as noise data in subsequent
situations so that the decision boundary of the entire model was improperly formed. On the
other hand, the classification accuracy of the proposed method shows higher accuracy than
the ideal approach in the time after the 90th fold, even though it showed slightly lower
accuracy in previous periods. This is because, among the data observed in the past, noise
data in the currently observed data is removed correctly so that the training dataset is
appropriately configured. Also, in Figure 4, it can be seen that the proposed method shows
a constant and much faster learning time compared to the ideal approach. These results
mean that the model can be trained much more effectively than learning the entire data
with only 20,000 data of an appropriately organized dataset.
In Figure 3, the average difference between the two observed accuracies in the entire
period of 100 folds was 0.167314%. In the section after the last 90 folds, the comparison
group method showed an average attack detection accuracy of 98.89%. The proposed
method showed an average attack detection accuracy of 99.34%.
Figures 5 and 6 are the results of experiments performed after reducing attack data
to 20% to reflect the realistic attack environment and data distribution. When the ratio of
attack data is reduced, new data’s degree of trend change also decreases. Because of this,
the ideal approach of learning large amounts of data shows better classification accuracy.
On the other hand, in the accuracy graph of Figure 5, the proposed method shows low
accuracy in the first 0–20 fold periods. This result means that the number of data was
insufficient to reflect the pattern of new data. However, in the data after the 20th fold,
the classification accuracies of the proposed method and the ideal approach show similar
performances. This means that a sufficient number of pure data to learn the model’s
decision boundary effectively has been organized. In other words, this experiment shows
that the proposed method can effectively train a classification model even with a small
amount of data if repeated learning is performed for a sufficiently long time. As a result of
these, in Figures 4 and 6, the proposed method shows a more effective learning time than
the ideal approach.
Appl. Sci. 2022, 12, 1011 12 of 17
Figure 3. Attack classification accuracy of the proposed method and the ideal method when the size
of the pure dataset is 20,000.
Figure 4. Training time per unit data input of the proposed method and the ideal method when the
size of the pure dataset is 20,000.
Figure 5. Attack classification accuracy of the proposed and ideal methods when the size of the pure
dataset is 20,000 and the proportion of attack data is reduced.
Appl. Sci. 2022, 12, 1011 13 of 17
Figure 6. Training time per unit data input of the proposed method and the ideal method when the
size of the pure dataset is 20,000 and the proportion of attack data is reduced.
Figures 7 and 8 are the attack detection results for a 40,000-scale training dataset
generated by the proposed method, and Figures 9 and 10 are experiments that reduced the
proportion of attack data to 20%. In the attack classification accuracy graph of Figure 7,
where the change in data trends is significant, the difference in attack classification per-
formance between the ideal approach and the proposed method is not significant. These
results imply that it may not be appropriate to operate a training set of 40,000 for a given
dataset. Similarly, in Figure 9, where the change in the data trend is small, the proposed
method shows similar attack classification accuracy to the ideal approach up to about
50th fold intervals. However, the proposed method shows higher accuracy than the ideal
approach after the 75th fold, where the last attack data is observed. This means that the
proposed method eliminated noise data properly while at the same time enabling more
accurate classification than the ideal approach in the period where the data pattern changes
from attack to normal. In addition, the average classification accuracy measured by the
method proposed in Figures 7 and 9 is higher than the accuracy of Figures 3 and 5. These
results imply that considering the feature and distribution of each data in a given dataset,
and itis more appropriate to operate 40,000 training sets than 20,000 training sets. As a
result, if the learning dataset’s size can be adequately set considering the number of newly
observed data and the number of data that can be digested every unit training time interval,
the proposed method shows higher attack detection performance than simply learning
large amounts of data. Figures 7 and 8 show the experimental results of setting the size
of the training dataset generated by the proposed method to 40,000. Figures 9 and 10
show the experiment results in which the ratio of attack data was reduced to the level of
20%. In the attack classification accuracy graph of Figure 7, where the data trend changes,
the ideal method has slightly higher overall detection performance, and the proposed
method generates fewer false alarms in the last section by removing the noise data. In this
experiment, the difference in accuracy between the proposed method and the ideal method
is not significant in all folds. This result means that as the size of the training dataset of the
proposed method increases, the two methodologies converge in the direction of deriving
similar results. In Figure 9, where the change in data trend is small, the proposed method
shows attack classification accuracy similar to the ideal method up to about 50 folds. How-
ever, after the 75th fold, where the last attack data is observed, the proposed method shows
higher accuracy than the ideal method. This means that the proposed method reduces the
noise data better than previous experiments, and it means that more accurate classification
is possible even in the period where the data pattern changes in real-time. In other words,
this result shows that it is more effective to construct a pure training dataset of 40,000
considering the size of the entire dataset, the size of the initial training data, and the degree
of the data trend change. Therefore, if the number of newly observed data and the number
of data covered within every unit time are adequately considered, the proposed method
shows higher attack detection performance than simply learning a large amount of data.
Appl. Sci. 2022, 12, 1011 14 of 17
Figure 7. Attack classification accuracy of the proposed method and the ideal method when the size
of the pure dataset is 40,000.
Figure 8. Training time per unit data input of the proposed method and the ideal method when the
size of the pure dataset is 40,000.
Figure 9. Attack classification accuracy of the proposed and ideal methods when the size of the pure
dataset is 40,000 and the proportion of attack data is reduced.
Appl. Sci. 2022, 12, 1011 15 of 17
Figure 10. Training time per unit data input of the proposed method and the ideal method when the
size of the pure dataset is 40,000 and the proportion of attack data is reduced.
6. Conclusions
This paper proposed a method of efficiently learning attack detection and classification
models through ensemble-based noise reduction mechanisms using various classification
models. The proposed method derived higher classification accuracy than simply using a
large amount of data for learning by selecting data that could purify the decision bound-
ary of the ensemble model by synthesizing the classification results of various models.
In addition, the model can quickly reflect trends of new data by rapidly detecting and
removing noise data generated by changes in data trends. This approach allows the pro-
posed method to effectively detect abnormal behavior observed in the network in that it can
save system resources while at the same time performing real-time model modifications
using high-quality small datasets. In addition, the proposed methodology constructed a
flexible method to which various classification models could be applied by separating the
noise reduction procedure from the process of the general classification model. This has
the advantage of being used not only to network data but also to classification problems
with different distributions and characteristics of datasets. Thus, the proposed method is a
Appl. Sci. 2022, 12, 1011 16 of 17
classification approach that can be applied in all cases where the prediction of changes in
data characteristics is required in an environment where data is constantly observed. Of
course, the proposed method still has limitations and blind spots on the adaptability and
robustness of the model and the process of finding appropriate parameters. These issues
require a fundamental consideration of the distribution and characteristics of the data.
Therefore, we will continue to develop the proposed methodology to determine the optimal
training dataset size and appropriate new data observation frequency in future studies.
Author Contributions: Conceptualization, J.C.; methodology, J.C. and S.G.; software, S.G.; validation,
J.C. and S.G.; formal analysis, J.C.; investigation, J.C. and S.G.; resources, S.G.; data curation, S.G.;
writing—original draft preparation, J.C.; writing—review and editing, J.C. and S.G.; visualization,
S.G.; supervision, K.C.; project administration, K.C.; funding acquisition, K.C. and J.C. All authors
have read and agreed to the published version of the manuscript.
Funding: This work is supported by the Industrial Core Technology Development Program of
MOTIE/KEIT, KOREA. [#10083639, Development of Camera-based Real-time Artificial Intelligence
System for Detecting Driving Environment & Recognizing Objects on Road Simultaneously].
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The link to the data used in this research is described in the Reference
Section ([15]).
Acknowledgments: We thank our colleagues from KETI and KEIT who provided insight and exper-
tise that greatly assisted the research and greatly improved the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A survey of deep learning methods for cyber security. Information 2019,
10, 122. [CrossRef]
2. Chaabouni, N.; Mosbah, M.; Zemmari, A.; Sauvignac, C.; Faruki, P. Network intrusion detection for IoT security based on
learning techniques. IEEE Commun. Surv. Tutor. 2019, 21, 2671–2701. [CrossRef]
3. Lu, Y.; Da Xu, L. Internet of Things (IoT) cybersecurity research: A review of current research topics. IEEE Internet Things J. 2018,
6, 2103–2115. [CrossRef]
4. Tomić, I.; McCann, J.A. A survey of potential security issues in existing wireless sensor network protocols. IEEE Internet Things J.
2017, 4, 1910–1923. [CrossRef]
5. Gong, S.; Cho, J.; Lee, C. A reliability comparison method for OSINT validity analysis. IEEE Trans. Ind. Inform. 2018, 14, 5428–5435.
[CrossRef]
6. Coulter, R.; Han, Q.L.; Pan, L.; Zhang, J.; Xiang, Y. Data-driven cyber security in perspective—Intelligent traffic analysis. IEEE
Trans. Cybern. 2019, 50, 3081–3093. [CrossRef] [PubMed]
7. Xiong, X.L.; Yang, L.; Zhao, G.S. Effectiveness evaluation model of moving target defense based on system attack surface. IEEE
Access 2019, 7, 9998–10014. [CrossRef]
8. Vinayakumar, R.; Soman, K.; Poornachandran, P. Evaluation of recurrent neural network and its variants for intrusion detection
system (IDS). Int. J. Inf. Syst. Model. Des. (IJISMD) 2017, 8, 43–63. [CrossRef]
9. Borkar, A.; Donode, A.; Kumari, A. A survey on Intrusion Detection System (IDS) and Internal Intrusion Detection and protection
system (IIDPS). In Proceedings of the 2017 International conference on inventive computing and informatics (ICICI), Coimbatore,
India, 23–24 November 2017; pp. 949–953.
Appl. Sci. 2022, 12, 1011 17 of 17
10. Gopalakrishnan, T.; Ruby, D.; Al-Turjman, F.; Gupta, D.; Pustokhina, I.V.; Pustokhin, D.A.; Shankar, K. Deep learning enabled data
offloading with cyber attack detection model in mobile edge computing systems. IEEE Access 2020, 8, 185938–185949. [CrossRef]
11. Patel, A.; Roy, S.; Baldi, S. Wide-Area Damping Control Resilience towards Cyber-Attacks: A Dynamic Loop Approach. IEEE
Trans. Smart Grid 2021, 12, 3438–3447. [CrossRef]
12. Nilă, C.; Apostol, I.; Patriciu, V. Machine learning approach to quick incident response. In Proceedings of the 2020 13th
International Conference on Communications (COMM), Bucharest, Romania, 18–20 June 2020; pp. 291–296.
13. Cybenko, G.; Raz, G.M. Large-scale analogue measurements and analysis for cyber-security. In Data Science For Cyber-Security;
World Scientific: Singapore, 2019; pp. 227–250.
14. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic
characterization. ICISSp 2018, 1, 108–116.
15. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. A detailed analysis of the cicids2017 data set. In International Conference
on Information Systems Security and Privacy; Springer: Berlin/Heidelberg, Germany, 2018; pp. 172–188. Available online:
https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 1 December 2021).
16. Anderson, B.; McGrew, D. Machine learning for encrypted malware traffic classification: Accounting for noisy labels and
non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
Halifax, NS, USA, 13–17 August 2017; pp. 1723–1732.
17. Yu, Y.; Bian, N. An intrusion detection method using few-shot learning. IEEE Access 2020, 8, 49730–49740. [CrossRef]
18. Ahmed, S.; Lee, Y.; Hyun, S.H.; Koo, I. Mitigating the impacts of covert cyber attacks in smart grids via reconstruction of
measurement data utilizing deep denoising autoencoders. Energies 2019, 12, 3091. [CrossRef]
19. Al-Abassi, A.; Karimipour, H.; Dehghantanha, A.; Parizi, R.M. An ensemble deep learning-based cyber-attack detection in
industrial control system. IEEE Access 2020, 8, 83965–83973. [CrossRef]
20. Jiang, W.; Chen, Z.; Xiang, Y.; Shao, D.; Ma, L.; Zhang, J. SSEM: A novel self-adaptive stacking ensemble model for classification.
IEEE Access 2019, 7, 120337–120349. [CrossRef]
21. Nancy, P.; Muthurajkumar, S.; Ganapathy, S.; Kumar, S.S.; Selvi, M.; Arputharaj, K. Intrusion detection using dynamic feature
selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Commun. 2020, 14, 888–895. [CrossRef]
22. Li, L.; Yu, Y.; Bai, S.; Cheng, J.; Chen, X. Towards effective network intrusion detection: A hybrid model integrating gini index
and GBDT with PSO. J. Sens. 2018, 2018, 1578314. [CrossRef]
23. Oliveira, N.; Praça, I.; Maia, E.; Sousa, O. Intelligent cyber attack detection and classification for network-based intrusion
detection systems. Appl. Sci. 2021, 11, 1674. [CrossRef]
24. Ye, J.; Cheng, X.; Zhu, J.; Feng, L.; Song, L. A DDoS attack detection method based on SVM in software defined network. Secur.
Commun. Netw. 2018, 2018, 9804061. [CrossRef]
25. Sahoo, K.S.; Tripathy, B.K.; Naik, K.; Ramasubbareddy, S.; Balusamy, B.; Khari, M.; Burgos, D. An evolutionary SVM model for
DDOS attack detection in software defined networks. IEEE Access 2020, 8, 132502–132513. [CrossRef]
26. Kachavimath, A.V.; Nazare, S.V.; Akki, S.S. Distributed Denial of Service Attack Detection using Naïve Bayes and K-Nearest
Neighbor for Network Forensics. In Proceedings of the 2020 2nd International Conference on Innovative Mechanisms for Industry
Applications (ICIMIA), Bangalore, India, 5–7 March 2020; pp. 711–717.
27. Tuan, T.A.; Long, H.V.; Son, L.H.; Kumar, R.; Priyadarshini, I.; Son, N.T.K. Performance evaluation of Botnet DDoS attack
detection using machine learning. Evol. Intell. 2020, 13, 283–294. [CrossRef]
28. Gu, P.; Khatoun, R.; Begriche, Y.; Serhrouchni, A. k-Nearest Neighbours classification based Sybil attack detection in Vehicular
networks. In Proceedings of the 2017 Third International Conference on Mobile and Secure Services (MobiSecServ), Miami Beach,
FL, USA, 11–12 February 2017; pp. 1–6.
29. Besharati, E.; Naderan, M.; Namjoo, E. LR-HIDS: Logistic regression host-based intrusion detection system for cloud environments.
J. Ambient Intell. Humaniz. Comput. 2019, 10, 3669–3692. [CrossRef]
30. Iadarola, G.; Martinelli, F.; Mercaldo, F.; Santone, A. Towards an interpretable deep learning model for mobile malware detection
and family identification. Comput. Secur. 2021, 105, 102198. [CrossRef]