ML Techniques for Intrusion Detection
ML Techniques for Intrusion Detection
Abstract—Intrusion detection is one of the important security hit by suspected distributed denial of service attack on Sept 28,
problems in todays cyber world. A significant number of tech- 2014. Panjwani et al. [5] reported that some form of network
niques have been developed which are based on machine learning scanning activity precedes 50% of the attacks against cyber
approaches. However, they are not very successful in identifying
all types of intrusions. In this paper, a detailed investigation systems. Attackers are not only launching flooding and probing
and analysis of various machine learning techniques have been attacks but also spreading malware files in the form of virus,
carried out for finding the cause of problems associated with var- worm, spams to exploit the vulnerabilities present in exist-
ious machine learning techniques in detecting intrusive activities. ing software, causing a threat to the sensitive information of
Attack classification and mapping of the attack features is pro- users stored on machines. Cisco Annual Security report men-
vided corresponding to each attack. Issues which are related to
detecting low-frequency attacks using network attack dataset are tioned [6] that spam related to the Boston Marathon bombing
also discussed and viable methods are suggested for improvement. comprised 40% of all spam messages delivered worldwide on
Machine learning techniques have been analyzed and compared April 17, 2013. On a recent survey done by Cisco in 2017 [7],
in terms of their detection capability for detecting the various Trojan was classified as one of the top five malware which
category of attacks. Limitations associated with each category of is used to gain initial access to the user’s computers and
them are also discussed. Various data mining tools for machine
learning have also been included in the paper. At the end, organizational networks. Hence, security in such a complex
future directions are provided for attack detection using machine technological environment is a big challenge and needs to be
learning techniques. tackled intelligently.
Index Terms—Machine learning, intrusion, attacks, security. Researchers have considered a different category of attacks
for intrusion detection. For example, Denial of Service (DoS)
attacks (Bandwidth and Resource Depletion), Scanning attacks
I. I NTRODUCTION (Probe) and Remote to Local (R2L) attacks and User to Root
ACKING incidents are increasing day by day as tech-
H nology rolls out. A large number of hacking incidents
are reported by companies each year. Distributed Denial of
(U2R) attacks which are based on KDD’99 dataset [12]. A
recent attack dataset (UNSW-NB [13]), classifies attacks into
nine categories: Fuzzer, Analysis, Reconnaissance, ShellCode,
Service (DDoS) attack was launched against Estonian web- Worm, Generic, DoS, Exploit and Generic. All these attacks
sites in 2007, allegedly by Russia [1]. On June 17, 2008, have been discussed in detail in Section III.
Amazon [2] started receiving some authenticated request from Current security solutions include the use of middle-boxes
multiple users in one of its location. The requests began to such as Firewall, Antivirus and Intrusion Detection Systems
increase significantly causing the servers slow down. On Jan (IDS). A firewall controls traffic that enters or leaves a network
2013, European Network and Information Security Agency based on source or destination address. It alters the traffic
(ENISA) [3] reported that Dropbox was attacked by DDoS according to the firewall rules. Firewalls are also limited to
and suffered a substantial loss of service for more than 15 the amount of state available and their knowledge of the hosts
hours affecting all users across the globe. Facebook [4] was receiving the content. An IDS is a type of security tool that
monitors network traffic and scans the system for suspicious
Manuscript received June 21, 2017; revised November 27, 2017 and April 2,
2018; accepted May 22, 2018. Date of publication June 15, 2018; date activities and alerts the system or network administrator [14].
of current version February 22, 2019. (Corresponding author: Emmanuel It is the main focus of concern in this paper.
S. Pilli.) IDS are mainly two types: Host based and Network based. A
P. Mishra was with MNIT, Jaipur 302017, India. She is now with the
Department of Computer Science and Engineering, Graphic Era (Deemed) Host based Intrusion Detection System (HIDS) [15] monitors
University, Dehradun 248002, India (e-mail: [Link]@[Link]). individual host or device and sends alerts to the user if sus-
V. Varadharajan and U. Tupakula are with the Faculty of Engineering picious activities such as modifying or deleting a system file,
and Built Environment and Advanced Cyber Security Research Centre,
University of Newcastle, Callaghan, NSW 2308, Australia (e-mail: unwanted sequence of system calls, unwanted configuration
[Link]@[Link]; [Link]@[Link]). changes are detected. A Network based Intrusion Detection
E. S. Pilli is with the Department of Computer Science and Engineering, System (NIDS) [16] is usually placed at network points such
Malaviya National Institute of Technology, Jaipur 302017, India (e-mail:
[Link]@[Link]). as a gateway and routers to check for intrusions in the network
Digital Object Identifier 10.1109/COMST.2018.2847722 traffic.
1553-877X c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See [Link] for more information.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 687
TABLE I
D IFFERENCE B ETWEEN M ISUSE D ETECTION AND A NOMALY D ETECTION
At high-level, the detection mechanism used by these IDSes signatures in a certain format. Let us consider TCP-ping attack
are of three types: misuse detection, anomaly detection, and for illustrating the signature based misuse detection system
hybrid detection. In misuse detection approach, IDS maintains (particularly SNORT [8]). If an attacker wants to know, if
a set of the knowledge base (rules) for detecting the known a machine is active or not, he/she scans the machine. An
attack types. Misuse detection techniques can be broadly clas- attacker sends ICMP ping packets. If the machine is set to
sified into Knowledge based and machine learning based not to respond for ICMP ECHO REQUEST ping packets, an
techniques. In the knowledge based technique, network traffic attacker may use the nmap tool to send the TCP ping pack-
or host audit data (such as system call traces) are compared ets to port 80 with ACK flag set with sequence number 0.
against predefined rules or attack patterns. Knowledge based The characteristics of this attack is that flag is set to ‘A’ value
techniques can be categorized into three types: (i) Signature and acknowledge set to 0 value [21]. As such packets are not
matching (ii) State transition analysis and (iii) Rule based acceptable at the victim side; on receiving the packets, RST
expert systems [17]. packet is sent to attacker’s machine which signals machine is
Signature matching based misuse detection techniques scan alive. The rule for detecting TCP ping attack, targeted against
the incoming packets against fixed patterns. If any of the pat- victim machine residing in the network with IP [Link]/24
terns match with the packet header, the packet is flagged as is as follows:
anomalous. State transition analysis based approaches, main- alert TCP any ->[Link]/24 any,(flags: A;ack: 0;
tain a state transition model of the system for the known msg: “TCP ping detected”;)
suspicious patterns. Different branches of the model lead to a The major limitations with signature based IDS is that it
final compromised state of the machine. The rule based expert requires the regular update of the system for adding signature
systems maintain a database of rules for different intrusive rules for up-to-date attacks. It generates more false alarms for
scenarios. The knowledge based IDS requires regular mainte- the new evolving attacks whose signatures are not defined.
nance of knowledge database in a dynamic manner and can fail Later, anomaly detection approaches are used for detecting
to detect variants of attacks. Misuse detection can also be per- intrusions.
formed using supervised machine learning algorithms such as Anomaly detection based IDSes are based on the hypothesis
Back Propagation Artificial Neural Network (BP-ANN) [18], that attacker’s behavior differs from normal user’s behav-
Decision Tree (DT) C4.5 [19] and Multi-class Support Vector ior [22]. It helps in detecting the evolving attacks. Anomaly
Machine (SVM) [20]. based IDSes model the normal behavior of the system and
Machine learning based IDS provides a learning based keep on updating it over a duration of time. For example,
system to discover classes of attacks based on learned nor- each network connection is identified by a set of features
mal and attack behavior. The goal of machine learning based such as protocol, service, number of login attempts, pack-
IDS (based on supervised learning algorithms) is to generate ets per flow, bytes per flow, source address, destination
a general representation of known attacks. Misuse detection address, source port, destination port, etc. The behavioral
techniques fail to detect unknown attacks. However, these statistics of these features are recorded over a period. Any
techniques provide good detection accuracy for detecting well- abnormal deviation in the feature values for any connec-
known attacks. These type of IDSes also require the regular tion flow will be marked as anomalous by the anomaly
maintenance of the signature database which increases the detection engine. Anomaly detection techniques are widely
overhead of user. categorized into three types: Statistical techniques, Machine
Misuse detection based IDS particularly signature based are learning based techniques and Finite state machine (FSM)
very popular and have got commercial success. The pros and based techniques [23]. A finite state machine (FSM) produces
cons associated with these approaches are shown in Table I. a behavioral model which is composed of states, transitions,
These IDSes maintain a database of known attack signatures. and actions. Kumar et al. [24] have proposed an IDS which
An attack signature describes the characteristics of an attack. makes use of Hidden Markov Model to model the transitions
It can be in the form of a code script, a sequence of system of user behavior over a longer span of time. Anomaly detection
call patterns or a behavioral profile, etc. IDS stores the attack can also be performed using semi-supervised and unsupervised
688 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
machine learning algorithms such as Self Organizing Map Each of these approaches learns from the available dataset
(SOM) Neural Network [25], clustering algorithms [26] and which is described by a set of connection features such as
One class Support Vector Machine (SVM) [27]. Machine source/destination port number, source/destination IP address,
learning based IDS for anomaly detection provide a learn- source bytes and destination bytes, etc. Ensemble learning is
ing based system to discover zero-day attacks. A Zero-day one of the forms of Multiple learning algorithms in which
attack refers to exploitation of a vulnerability that has not predictions by a set of classifiers are combined in some way,
been known earlier. However, these techniques suffer from discussed in detail in Section IV.
high-false positives because of their limitations in differen- Our analysis reveals that a feature set for analyzing the
tiating attack behavior and evolving normal behavior. The behavior of some particular category of attack, is different
difference between misuse detection and anomaly detection from the feature set of another category of attacks; since each
approach is shown in Table I. Hybrid detection approaches of the attack categories posses some unique characteristics.
integrate misuse and anomaly detection approach for detecting We discuss the standard feature selection methods used by
attacks. The details of these approaches with the example of researchers if the domain knowledge of the attacks is not
existing literature is presented in Section V. In general, some known. Our main goal of the paper is to perform a detailed
of the advantages of using Machine learning based IDS over investigation and critical analysis of using machine learn-
conventional signature based IDS are as follows: ing approaches for intrusion detection in environment. The
• It is easy to bypass the signature based IDS by doing performance analysis of various categories of machine learning
slight variations in an attack pattern whereas Machine techniques is also carried out, and observations are provided
learning based IDS based on supervised techniques can concerning each category. Our paper mainly concentrates on
easily detect the attack variants as they learn the behavior intrusion detection in wired cyber traditional networks.
of the traffic flow. The detailed discussion of intrusion detection application
• The CPU load is low to moderate in Machine learning in wireless networks can be referred from here [29]–[31].
based IDS as they do not analyze all signatures of the Different types of machine learning based IDSes are available
signature database as done by signature based IDS. for mobile devices. For example, AmoxID [32] is based on
• Some of the Machine learning based IDS, particularly SVM algorithms and implemented for iOS and Android OS.
based on unsupervised learning algorithms, can detect SMARTbot [33] is an off-device behavioral analysis frame-
novel attacks. work based on Artificial Neural Networks back-propagation
• Machine learning based IDS can capture the complex method for mobile botnet detection and achieves 99.49% accu-
properties of the attack behavior and improve the detec- racy. A light-weight Android malware detection system is
tion accuracy and speed than conventional signature proposed by Shabtai et al. [34], called Andromaly which also
based IDS. uses machine learning algorithms. Sikder et al. [35] propose
• Different types of attacks keep on evolving. Signature a context-aware sensor based attack detector, called 6thSense
based IDS will require the maintenance of the signature to detect attacks which bypass the flaws in sensor manage-
database time to time and keep it up-to-date whereas ment system. It makes use of Markov Chain, Naive Bayes,
Machine learning based IDS based on clustering and and Logistic Model Tree (LMT). A detailed survey of vari-
outlier detection won’t require such update. ous types of machine learning based IDS for mobile devices
In this paper, we have mainly focused on the use of machine particularly mobile phones can be found here [36]. A detailed
learning for anomaly, misuse or hybrid detection mechanism survey on virtualization based attacks such as VM Escape,
with their detailed analysis and investigated their capability for Side Channel Attacks, Hyperjacking, attacks on Guest-OS, etc.
attack detection. A detailed study of various machine learn- and their detection techniques in Cloud/Virtualization environ-
ing approaches is helpful in exploring solutions for advanced ment, has been separately addressed in our recent work [37].
cyber intrusion detection. The machine learning based intru- More specifically, a survey on cache based side channel attacks
sion detection approaches have been categorized into four and prevention approaches has also been recently published by
types. These are as follows: (i) Single classifiers with all fea- Anwar et al. [38].
tures of data set (ii) Single classifiers with limited features The major contributions of our present research work are
of data set (iii) Multiple classifiers with all features of data as follows:
set and (iv) Multiple classifiers with limited features of data • The classification of attacks based on their characteris-
set. In single classifier system, an individual classifier is used tics is presented. Various factors that make the detection
to detect Intrusions. Multiple classifier is a broad term which of low-frequency attacks (like U2R and R2L, Worms,
considers a set of ML algorithms at the time of learning and ShellCode etc.) difficult to achieve by machine learning
detecting intrusions. A set of classifiers are integrated to pro- techniques are discussed and methods are suggested for
vide a common output for detecting intrusions. For example, improving their detection rate.
Kim et al. [28] proposed multiple classifier methods which • The discussion of various existing literature for intrusion
hierarchically integrates misuse detection model with anomaly detection is provided, highlighting the key characteristics,
detection model rather than just combining their results. DT the detection mechanism, feature selection employed,
C4.5 acts as a misuse detection module, and one class SVM attacks detection capability.
acts as an anomaly detection module. Multiple classifiers based • The critical performance analysis of various intrusion
approach lower the false alarm and improve the detection rate. detection techniques is provided with respect to their
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 689
attack detection capability. The limitations and compar- machine learning for intrusion detection without giving any
ison with other approaches are also discussed. Various critical analysis or observations.
suggestions are provided for improvement in each cate- Ahmed et al. [22] provided a survey on network anomaly
gory of techniques. detection approaches. Attacks are classified into four cat-
• Future directions of machine learning are provided for egories: DoS, probe, U2R and R2L based on KDD’99
intrusion detection applications. dataset [12]. Each category points to a specific type of
The paper is organized into XI Sections. In Section II, a anomaly. They have provided the discussion over different
comparison with related surveys is given, highlighting our types of machine learning approaches, i.e., classification based,
specific contributions to compare our work. In Section III, clustering based, statistical based and information theory based
a detailed description of different types of attacks with their approaches. The application of various types of machine learn-
characteristics is provided. In Section IV, the description of ing approaches in intrusion detection, distinguishes the normal
various machine learning techniques & their characteristics instances from anomalous instances. A brief summary of
is presented with a discussion on the importance of feature issues with various network intrusion detection dataset is dis-
selection in machine learning. Section V provides the detailed cussed. The collaborative IDSes are suggested as a future
and comprehensive summary of different machine learning research directions. However, a detailed in-depth description
approaches for intrusion detection, and Section VI classifies and analysis of various existing IDS proposals based on
them based on their ability to detect an attack. Section VII machine learning is lacking in their survey. Authors have also
discusses the performance analysis of some machine learning not provided future directions for machine learning algorithms.
techniques for detecting different security attacks. The secu- A discussion on machine learning and data mining
rity issues associated with each category of machine learning techniques for intrusion detection has been given by
techniques are discussed, and solutions are provided to over- Buczak and Guven [41]. Their survey describes the appli-
come the security issues. Various useful measures are provided cation of machine learning and data mining techniques for
for improving their detection rate followed by Section VIII, misuse and anomaly detection. They have clarified the differ-
which describes the issues in detecting low-frequency attacks. ence between machine learning (ML) and data mining (DM)
In Section IX, various data mining tools for machine learning and stated that ML is an older sibling of DM. Since they both
and deep learning have been discussed. In Section X, future use same methods for classification or knowledge discovery
directions are provided to give a brief insight into the ongoing of data, they use the term ML/DM methods for algorithms
and future research work. In the end, in Section XI, concluding under study. In their survey, they have described various meth-
remarks are mentioned with the scope of future work. ods and related them to misuse, anomaly and hybrid detection
techniques. The description about the time complexity of algo-
II. R ELATED W ORK rithms is also mentioned in the paper. They have observed
There are surveys on applying machine learning to that KDD’99 and DARPA have been mostly used data sets as
intrusion detection. Some of them are discussed to high- this makes the comparison relevant to authors. However, some
light their contributions. The specific contributions which researchers have used NetFlow and tcpdump dataset. They
make our work different than others are also presented. have recommended which ML/DM method will be suitable
Agrawal and Agrawal [39] have provided a survey on for misuse and anomaly detection individually.
anomaly detection using data mining techniques for intrusion In our survey, a detailed investigation and analysis of var-
detection. They have categorized the anomaly detec- ious machine learning techniques have been carried out for
tion approaches based on three factors: clustering based finding the limitations associated with various machine learn-
approaches, classification based approaches and hybrid ing techniques in detecting intrusive activities. The key factor
approaches. K-means. K-Meoids, EM clustering, Outliers which differentiates the present work from existing surveys is
detection algorithms have been described under cluster- that it is based on the premise that no one particular intrusion
ing based approaches. Naive Bayes Algorithm, Genetic detection technique, based on single/multiple classifier algo-
Algorithm, Neural Networks, Support Vector Machine have rithms can help in detecting all types of attacks. Hence, the
been described under classification based approaches. Hybrid use of specific intrusion detection technique is recommended
approaches describe the combination of machine learning for detecting a specific set of attacks. The importance of var-
approaches. They have provided the brief comparison of ious factors while selecting an algorithm is discussed. Attack
papers using the ensemble based approaches. classification is also provided with attack examples and the
Haq et al. [40] provided a survey on the application of specific attack features are mapped to each attack. Various
machine learning techniques in intrusion detection. They have issues in detecting low-frequency attacks are also mentioned
broadly classified the techniques into three major categories: and methods are also suggested for improvement.
supervised learning, unsupervised learning and reinforcement A summary of various intrusion detection approaches is
learning. In supervised learning, a classifier is trained on the discussed including the literature on diverse datasets. In agree-
labeled dataset. Unsupervised learning is used when we do ment with Buczak and Guven [41], we also found that
not have the labeled dataset. In Reinforcement learning, a people have mostly used KDD’99, DARPA dataset. Existing
domain expert can label the unlabeled instances. They have intrusion detection approaches based on machine learning
provided a brief description of various single classifier and techniques have been thoroughly analyzed concerning individ-
ensemble algorithms and provided references to papers using ual attack categories. Limitations associated with approaches
690 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
for each category are discussed with viable solutions. After the the attacker can perform any activity on these compromised
exhaustive survey of literature and critical analysis using the hosts such as execute malicious programs and damage the
comparison of results reported by researchers, various observa- system. Hackers exploit vulnerabilities present in the computer
tions are reported and analyzed. Future directions are provided or network by using specialized tools such as Nmap, scapy,
in the field of intrusion detection. Future directions specifically Metasploit, Armitage, Dsniff, Tcpdump, Net2pcap, Snoop,
point towards the usage of deep learning and reinforcement Ettercap, Nstreams, Argus, Karpski, Ethereal, Amap, Vmap,
learning techniques for intrusion detection. Various challenges TTLscan and Paketto, etc. [44]–[47]. A detailed description of
associated with these approaches have also been discussed in these tools can be found in [48]. In a secure environment, both
paper. network and host based security are important. In this Section,
we have described attacks which are classified into three broad
categories based on their characteristics as shown in Figure 1
III. C LASSIFICATION OF ATTACKS W ITH R ELEVANT and Figure 3. In each category, we have also described the
ATTACK F EATURES important attack features for each attack based on KDD’99
Network and host based attacks have become pervasive in dataset [12] and UNSW-NB dataset [13]. All these features
today’s world. Attackers attempt to bypass the security of are described in detail in Table II and Table III.
the network by exploiting the existing vulnerabilities in the
network. They disturb the normal functioning of the network
by malfunctioning the network devices, flooding the network A. Denial of Service Attacks (Resource Depletion and
by sending excessive packets, performing the scanning over Bandwidth Depletion Attacks
a network, etc. It causes the unavailability of service to the This category of attacks cause the unavailability of ser-
legitimate users and highly reduces network throughput. Host vice to the legitimate users and hence also referred as DoS
based attacks attempt to bypass the security of a host machine. (Denial of Service) attacks [49]. For example, lets take an
Attacker gains unprivileged access to a machine and tries to attack scenario: An attacker can send multiple service requests
gain root access which may lead to the destruction of important either to register with the enterprise or to access any of the
system files, modification of sensitive data, leakage of user’s valid service instance running in the enterprise. In this case,
private information, etc. Host based attacks can be launched the administrative server will be flooded with many service
as the next step after network attacks. Any machine over the requests and will fail to provide services to other legitimate
network can be compromised by a hacker who has access customers/users. There can be another attack scenario where
to the network. In this case, the attacker first tries to estab- multiple machines are used to launch DoS attack: A large num-
lish the connection over the network to the target machine by ber of machines are connected to an organization or enterprise
exploiting the weakness in the network protocols or security network. If an attacker has access to one or more machines
devices such as Firewall [42], Intrusion Detection System [43] of an organization/enterprise. It can misuse this privilege and
and then tries to copy malicious files over the network to the can launch DoS attack to the other machines in the same
host machine. Once the user executes these files, the system network subnet. Here, the attacking surface is very broad,
is compromised and is under the control of the attacker. Now an attacker can occupy multiple machines (Zombies) and can
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 691
TABLE II
TCP C ONNECTION F EATURES IN KDD’99 [50]
use them to launch DoS attacks. This kind of DoS is also address rather than a specific address. Example of such attacks
called as Denial of Service attack (DDoS). DoS attack is are smurf and fraggle attacks [52]. In Resource Depletion
classified into two types: Bandwidth Depletion and Resource attacks, attacker ties up the resources of a victim system. This
Depletion attacks. In Bandwidth Depletion attack, attacker attack can be launched by exploiting the network protocol
tries to overload the network by network packets. There are (ex. neptune, mailbomb) or by forming malformed packets
two classes in Bandwidth Depletion attacks: Flooding attacks (ex. Land, Apche2, Back, teardrop, ping of death etc) which
and Amplification attacks. In Flooding attacks, attacker tries to are sent to the victim machine over the network. A brief
flood the network by sending excessive ICMP or UDP packets explanation about some of these attacks [53] is given below:
causing overloading of the network resources. In Amplification Land: In Land attack, an attacker sends spoofed SYN
attacks, attacker tries to exploit the IP address broadcast fea- packet in which the source address is the same as the
ture of most of the routers. This feature allows a sending destination address. It is effective in some of the TCP/IP
system to specify a broadcast IP address as the destination implementations.
692 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
TABLE III
TCP C ONNECTION F EATURES IN UNSW-NB
Attack Features: The attack can be detected by consid- of receiving target operating system, the machine crashes due
ering the feature ‘Land’. If the value of feature ‘Land’ is to improper handling of the overlapping packets. Such attacks
1, it means that source and destination address are identi- are successful on different operating systems such as Windows
cal. Hence this feature is most important in recognizing this 3.1x, Windows 95, Windows NT and versions of the Linux
attack. kernel prior to 2.1.63.
Teardrop: In this attack, the attacker tries to send the frag- Attack Features: Feature ‘Wrong Fragment’ which is the
mented packets to a target machine. He sets the fragment offset sum of bad checksum packets in a connection provides some
in such a way that the subsequent packets overlap with each clue about the malformed IP packets. Hence this feature is
other. If there is a bug in the IP fragmentation reassembly code important in recognizing this attack.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 693
Smurf: Smurf attack is an amplification based denial of features such as ‘Duration’, ‘Flag’ (S0: ‘Initial SYN but no
service attack in which attacker sends a large number of ICMP further communication’ etc.), ‘Dst host count’ (percentage of
echo messages to a broadcast IP address with the spoofed connection to the same destination IP (victim machine)) are
address of victim’s machine as a source IP. On receiving the very important in recognizing this attack. Therefore noting
packet, each machine in the broadcast network replies to the those connections which are raising SYN flag with no con-
victim’s machine making its resources busy uselessly [54]. nection established within a short duration of time are useful
Attack Features: This attack can be easily detected in the in detecting the attack.
victim machine by looking at the huge number of ICMP echo
replies to victim machine without sending any ICMP echo
requests packets from the victim machine. There are some B. Scanning Attacks
feature such as ‘Service’ (ICMP), ‘Duration’, ‘Dst host same A scanning activity is a growing cyber security concern
srv rate’ (used to find the percentage of connections to the because it is the primary stage of an intrusion detection
same service and to the same destination IP address coming attempt that is used to locate the target systems in the network
from attacker’s machines) and ‘Same srv rate’ (used to find the and subsequently exploit known vulnerabilities. An attacker
percentage of connections to the same service and to the same sends a large number of scan packets to gain the detailed
destination IP address going from victim machine) which are description about the machines using scanning tools such as
useful in determining the total number of ICMP echo packet to nmap, satan, saint, msscan etc. Bou-Harb et al. [58] pro-
victim machine within some duration of time and total ICMP vided a detailed discussion on scanning techniques. They have
reply packets from the victim machine within some duration provided a classification of cyber scanning topic into three
of time. parts: Nature, Strategy and Approach. The nature of scan-
Ping of Death: Ping of Death (PoD) is a denial of service ning attack can be active or passive. The attack strategies
(DoS) attack caused by an attacker deliberately sending an IP could be remote to local, local to remote, local to local and
packet larger than the 65,536 bytes allowed by the IP proto- remote to remote. They also classified 19 cyber scanning tech-
col. The maximum allowable IP packet size is 65,535 bytes, niques with their pros and cons. At high-level all 19 categories
including the packet header, which is typically 20 bytes long. are explained under five major categories: Open Scans [59],
This causes the system to crash or freeze. Many operating Half-Open Scans [60], Stealthy Scans [61], Sweep Scans [62]
systems are vulnerable to this attack [55]. and Miscellaneous scans [63], [64]. For example, open scan
Attack Features: An attempted Ping of Death can be iden- and stealthy scan particularly SYN-ACK scan are shown in
tified by noting the size of all ICMP packets and flagging Figure 2 [58]. Open scan uses the TCP-handshake connec-
those that are longer than 65,535 bytes. Features ‘Dst bytes’ tion. It detects the TCP ports by making use of SYN flag and
(total number of bytes received) and ‘Duration’ in a connection TCP protocols. A closed port replies with RST flag set (line
may be helpful in providing some clue about PoD attack which i) whereas open port replies with ACK flag set (line ii). The
means by comparing the total number of bytes received within attacker can now reset the connection by sending the RST and
a short duration of time with some threshold value (65,535). ACK. A firewall can detect such simple scans by looking at
Mailbomb: In Mailbomb attack, unauthorized users send a logs. Stealthy scan advances the open scans and also makes
large number of email messages with large attachments to a use of other flags together with SYN flag to avoid its detec-
particular mail server, filling up disk space resulting in denied tion. For the stealthy SYN-ACK scan, an attacker sends the
email services to other users [56]. SYN and ACK flag to the target, close ports sends the RST
Attack Features: This attack can be identified by looking flag (line iii) whereas open ports will generate any response
for thousands of mail messages coming from a particular user (line iv). It is a relatively fast method and does not require the
within a short period of time. Features such as ‘Destination IP’, three-way handshake or solo SYN flag. Other than scanning
‘Dst bytes’ (total bytes received), ‘Service’ (SMTP/MIME), scenarios, the author also addresses the IP versions issues with
and ‘Dst host same src port rate’ (percentage of connections cyber scanning activities. A separate literature review is pro-
to the same port and to the same destination IP address) are vided for distributed detection techniques which are classified
important features in detecting the behavior of this attack. based on scanning activities one to one approach, one to many
SYN Flood: In SYN flood, TCP/IP implementation is approach, many to one approach and many to many approach.
exploited. An attacker sends the SYN request to the victim These Probes are useful in launching future attacks [65]. Some
machine. Victim replies by ACK and waits for the reply. of the scanning attacks are described below.
The server adds the information of each half-open connection Ipsweep: An Ipsweep is used to determine which hosts are
in the pending connection queue. The half-open connections listening on the network by sending many ping packets. If a
on the victim server system will eventually fill the queue target host replies, the reply reveals the targets IP address to
and the system will be unable to accept any new incoming the attacker.
connections [57]. Attack Features: A Network Intrusion Detection System can
Attack Features: A SYN flood attack can be distinguished examine the total number of ping packets coming within a
from normal network traffic by looking for a number of simul- short duration of time. Features such as ‘Duration’, ‘Service’
taneous SYN packets destined for a particular machine that (ICMP), ‘Dst host same srv rate’ (used to find the connections
are coming from an unreachable host or set a threshold for to the same service) and ‘Flag’ (used to find connection sta-
the duration of time a system has to wait for the reply. Hence tus) are important to find the total ping messages within short
694 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
F. Analysis
Fig. 3. A taxonomy of various Attacks based on UNSW-NB dataset.
This category of attack refers to various intrusions that
penetrates the Web applications by various means such as
user and waits till the user enters the password in that display. port scanning, malicious Web scripting (like HTML files
The password is sent back to the attacker by the trojan version penetration) and sending spam emails etc.
of xlock program [72]. In wazermaster attack, an attacker tries Attack Features: The attack characteristics of various port
to exploit the bug present in the FTP server. If FTP server has scanning attacks and various important features for detecting
given write permissions to the guest account, an attacker can those attacks are discussed in Section III-B. There are Anti-
login to guest account in the public domain of FTP servers spam filters provided by the mail service providers to filter
and can upload ‘warez’ (copies of illegal software) into the such emails coming from unauthorized source. Spam emails
server. Users can later download these files [73]. Warezclient can be bypassed by such filters. Hence, in addition to source
attack is launched by a legal user during FTP connection after IP address, the analysis of overall network performance can
the execution of warezmaster by an attacker. Users download be done by considering various possible features as listed in
the files (illegal software copies) from the server that were Table III.
previously created by warezmaster [74]. In particular, Web application attacks can be detected by
Attack Features: Network connection features are not suf- performing the HTML header, email header analysis or code
ficient to observe the behavior of the R2L attacks. In fact it analysis (scripting codes) [75].
is very difficult to distinguish these attacks from each other
by considering the network connection features. However, few G. Backdoor
features present in KDD’99 such as ‘Duration’, ‘Service’, ‘Src In backdoor attack, attacker can bypass the normal authenti-
bytes’, ‘Dst bytes’, ‘Num failed login’, ‘Is guest login’, ‘Num cation and can obtain unauthorized remote access to a system.
compromised’, ‘Num File creation’, ‘Count’, ‘Dst host count’ Attacker tries to locate the data by doing fraudulent activi-
and ‘Dst host srv count’ may provide some hint about abnor- ties to bypass the system security of the system. Hacker uses
mal behavior of a user in a local connection and hence helpful backdoor programs to install the malicious files, modifying the
in detecting R2L attacks. code or gain access to the system or data.
There is a little difference between R2L and U2R attacks. Attack Features: Some of the important features that
In U2R attacks, it is assumed that user has the local privilege must be present in the feature set are as follows: {sport,
to the victim machine (obtained via R2L attack). Attacker tries dsport, dur, sbytes, service, ackdat, sjit, djit, ct_flw_http_mthd,
to attain the root privileges after accessing the machine. Hence is_ftp_login, ct_srv_src, ct_dst_ltm}. It won’t be easy to get
the values of the traffic feature will be similar to the normal exact information about a backdoor attempt at victim machine.
connection in case of U2R and least important to consider. The However, by analyzing network features, one can get some
basic and content features are important in this case whereas clue about unauthorized network attempts.
in R2L attacks, attacker tries to obtain the local access to a
remote machine. In R2L all features are important. In DoS and H. Exploits
Probe attack, traffic features are very important together with
Exploits category refers to intrusions that exploit the soft-
other features. In Section VIII, we have described the difficul-
ware vulnerabilities, bug or glitch within the operating system
ties in detecting these attacks using network attack data set.
or software. Attackers utilize the knowledge of the software to
On the basis of UNSW-NB attack dataset, attacks have been
launch exploits with an intention to cause harm to the system.
categorized into 9 types as shown in Figure 3. DoS is described
Attack Features: Various important features which are cru-
earlier in detail. Other attacks are described below.
cial for detecting the attempts of launching exploits at mon-
itored machine are as follows: {srcip, dstip, sport, dsport,
E. Fuzzers sinpkt, synack, is_sm_ips_ports, ct_ftp_cmd, res_bdy_len,
In fuzzer attack, attacker sends a large amount of randomly ct_src_ltm, ct_src_ltm} (refer Table III). These features may
generated input sequence from command line or in form of provide some hint about the attempt of launching exploits.
protocol packets. Attacker tries to discover security loopholes However, exploits can be more appropriately detected by mon-
in the OS, program or network and make this resources itoring the operating system behavior using dynamic analysis
suspended for a time period and can even crash them. techniques. Once can refer our work for same [76], [77].
696 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
I. Generic L. Worms
Generic attack against a cryptographical system, tries to Worms are malicious programs or malware that replicate
break the key of the security system. It is independent of themselves and spread to other computers. It uses the network
the implementation details of the cryptographic system. The to spread the attack. Most of the worms are designed to repli-
structure of the block-cipher is not considered. For exam- cate and do not try to change the system files. However, they
ple, birthday attack is a Generic attack which considers hash can cause disruption to the services by increasing network
function as a black box. traffic.
Attack Features: It would be good to take all possible Attack Features: The important network feature could be as
network features of Generic attack into consideration. The follows: srcip, dstip, sport, dsport, proto, spkts, dpkts, tcprtt,
accuracy of the system could not be very good if only con- stcpb, dtcpb ct_srv_src, ct_flw_http_mthd, is_ftp_login etc,
sidering network features. One can also perform the dynamic (refer Table III) which could help in analyzing the spread of
analysis of code to check the behavior of codes running in the packets from the same source address using particular service
victim machine. UNSW-NB does not provide system specific and Internet Protocol (IP) over a period of time.
features such as root_login, su_attempted, Hot, Num_Shell etc. Attacks are intentional attempts to destroy or gain unau-
as specified by KDD’99. thorized access to a machine or access user’s data in an
unauthorized way. Attacks target a computer network and/or
J. Reconnaissance a computer and harm the resources. Various attacks have been
Reconnaissance refers to attacks that gather information discussed in our study. Each attack is launched in some way
about the target computer network in order to bypass its and carries some unique characteristics which we have dis-
security control. It can be defined as a probe which is a pre- cussed. The network features which are essential for detection
liminary step towards launching further attacks. Attacker use of a particular category of attacks have also been mapped to
port scanning OS scanning, nslookup, dig, whois, etc. to gather specific attack category. KDD’99 has been used by most of the
information about the system. Depending on TCP responses researches. Hence, we have considered it for our attack study.
collected for each crafted packet we can make an intelligent However, as it is very old, we have also considered a very
guess of the operating system. After collecting sufficient infor- recent IDS attack dataset, i.e., UNSW-NB [13] which contains
mation, attacks such as DDoS, worm, buffer-overflow exploits ten categories of attacks. ISCX-IDS attack dataset [78] is not
etc. can be launched. publicly available. We obtained this dataset from University
Attack Features: Various important network features to of New Brunswick (UNB) in the form of PCAP files on
detect such attacks: {sport, dsport, srcip, dstip, dur, spkts, request. The attack features and their description is not pro-
sinpkt, service, synack, ct_srv_src, ct_src_ltm, ct_dst_ltm}. vided by the authors. Hence, KDD’99 and UNSW-NB have
All the features provide the key network information about been considered for the study.
the source and destination system. The details about vari-
ous port scanning attack, corresponding features and attack IV. M ACHINE L EARNING : T ECHNIQUES AND F EATURE
characteristics are already described earlier in Section III-B S ELECTION
In this Section, we have discussed various most popular
K. Shellcode machine learning techniques used for detecting Intrusions.
A shellcode is used as a payload which is executed in the These techniques hold different characteristics and provide
target machine to exploit the software’s vulnerability. It is different results for detecting intrusions. Here, we have men-
called as shellcode as it starts a command shell which is under tioned the working of these techniques with their characteris-
the control of the attacker. Local shell codes try to exploit the tics. We have further described the various features selection
vulnerability of high privileged process on a local machine approaches with their pros and cons and provide the optimal
for ex. bufferoverflow. Remote shellcode targets a vulnerable feature set for each attack.
process running on a remote system. On successful execution,
an attacker gains the remote access to the local machine. For A. Techniques Used in Machine Learning
ex. bindshell connects the attacker to a certain port of victim Machine learning techniques work in two phases: training
machine. and testing. In training phase, they perform the mathematical
Attack Features: Some of the important features which calculations over the training dataset and learn the behavior
are important for attack analysis are: {sport, dsport, srcip, of traffic over a period. In the testing phase, a test instance is
dstip, dur, service, sbytes, dbytes, state, res_bdy_len, synack, classified as normal or intrusive based on the learned behavior.
is_ftp_login} (refer Table III). Network features may be help- Various popular machine techniques are described below.
ful for detecting remote shellcode. However, in order to 1) Decision Tree: Decision tree learning methods use
provide lower false alarms and good accuracy, shellcode can branching method to illustrate every possible outcome of a
be detected by doing the behavior analysis of the programs. decision. They can work with discrete-value attributes and
These types of attacks fall under the category of low-frequency continuous value attributes as well. The learned trees are then
attacks and can be launched easily at remote machines in a represented in the form of if-then rules. Three basic elements
few attempts of making a network connection to the remote of the tree are decision node, branch and leaf node as shown
machine. in Figure 4. Decision node specifies a test over some attribute.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 697
reasoning. Fuzzy logic offers rigor of formal methods without the possible sequences are analyzed. Authors have simulated
requiring undue precision. It also offers alternative methods the model using the HTTP traffic of DARPA’99 dataset. It
to handle policy preferences and conflicts [105]. A fuzzy set trains k distinct HMM over the randomly generated sub-
theory is defined in terms of fuzzy logic. Semantic of fuzzy samples of sequences. The output produced by each of HMM
operators are understood by using geometric model. Fuzzy is finally combined to produce the detection accuracy. The
logic can interpret the properties of a neural network and a ensemble of HMM found to perform better than single HMM
precise description of its performance can be obtained. Neuro- classifier.
fuzzy is very popular in the area of Intrusion Detection. It is There are some advantages with HMM: (i) A HMM which
applied by many researchers as discussed in Section V. A fuzzy is well tuned with parameters provide better compression
set A in X is characterized by a membership function fA(x) than simple Markov Model. (ii) The model is fairly read-
which associates each point in X, a real number in the interval able probabilistic graph model. (iii) HMM very well captures
[0, 1], with the values of fA(x) at x representing the “grade of the dependencies between the consecutive sequences. (iv) The
membership” of x in A. Thus, the nearer the value of fA(x) ensemble of HMM is found to perform well for recogniz-
to unity, the higher the grade of membership of x in A [106]. ing the structure of sequences. There are some disadvantages
Fuzzy logic is not enough to detect all types of attacks. with HMM: (i)‘A fully connected HMM can lead to over-
It performs well when it is integrated with other classifiers. fitting problem which happens when the model is trained
Fuzzy Logic techniques have been used in correlation with with a dataset having large parameter space. (ii) HMM when
intrusion detection systems [107], [108]. The key characteris- implemented with the Viterbi algorithm, becomes expen-
tics of fuzzy logic are as follows [109]: (a) The fuzzy rules sive, both in terms of memory and compute time. However,
allow constructing the if-then rules which can be easily mod- Churbanov and Winters-Hilt [112] applied EM clustering with
ified based on security applications. (b) They can combine Viterbi to provide linear memory requirement.
the input from varying sources. (d) The quantitative measures 10) Swarm Intelligence: A swarm can be considered as a
used by IDS such as connection interval, CPU usage time, group of cooperating agents which work together to achieve
etc. are fuzzy in nature. (e) A numerical value can belong to some purpose and task. Swarm Optimization is an advanced
multiple fuzzy sets at the same time, i.e., a numerical value machine learning algorithm which is based on the evolu-
does not have to be fuzzified using only one membership func- tionary computations. Kolias et al. [113] provided a survey
tion. (f) The degree of alert that can be produced by an IDS on swarm intelligence (SI) approaches for intrusion detec-
is often fuzzy. The disadvantage of the fuzzy rules are as fol- tion. They have provided a detailed comparison of various
lows: (i) They consider that all factors are equally important SI based IDS systems pointing to their advantages and disad-
which are to be combined. (ii) A fuzzy system requires more vantages. The core SI based techniques used for supervised
fine tuning and simulation before operational. (iii) It is hard to classification have been described. Most of the IDS described
develop a model from a fuzzy system in comparison to other by authors are anomaly detection IDS. The SI based IDS
machine learning solutions due to the complexity involved in approaches make use of multiple agents which collaborate
building the fuzzy model. with each other to solve a problem and provide the optimal
9) Hidden Markov Model: A Markov Model produces a solution. An agent can be used to find the classification rules
behavioral model which is composed of states, transitions and for misuse detection or finding the clusters for anomaly detec-
actions. Both Hidden Markov Models (HMM) and Markov tion. They have mainly categorized the SI based approaches
Chains come under the category of Markov Models. In Markov into three types: (i) Ant Colony Optimization (ACO) based
Chain, the transition probabilities are known which determine IDS, (ii) Particle Swarm Optimization (PSM) based IDS (iii)
the topology of the model. In HMM, the system being modeled Ant Colony Clustering (ACC) based IDS. ACO algorithms are
is represented by a Markov process with unknown parameters. motivated by the behavior of the ants to find the shortest paths
HMM can be defined as a tool for presenting the probability from their nest to the food. AntNag [114] is the first ACO
distribution of a sequence. In HMM, an observation Xt at time algorithm for intrusion detection which is based on making
t is generated by a stochastic process. However, state Zt of directed graphs for attacks. PSO algorithms are motivated by
the process cannot be directly observed (hidden). HMM sat- the coordinated movement dynamics of animal groups. ACC
isfies the Markov property where the state Zt depends only algorithms are motivated by the clustering and sorting behav-
on the previous state Zt−1 observed at t-1 [110]. HMM main- ior of ants to work autonomously. A detailed study can be
tains a transition matrix K*K. Each element Aij of the matrix referred from their literature. There are some advantages with
describes the probability of the transition from Zt-1 to Zt swarm optimization: (i) SI based systems are adaptable and
which can be written as: Aij = P (Zt,j = 1|Zt,i = 1). can be adjusted to new stimuli. (ii) These systems are scalable
Ariu and Giacinto [111] proposed an HMM based IDS archi- since same control architecture can be applied to a group of
tecture known as HMMpayl which is an anomaly based IDS. agents. (iii) They are flexible as agents can be easily removed
The main goal of HMMpayl is to protect the Web server or added without affecting the architecture, etc. There are also
and the applications hosted by the server from attackers. A some disadvantages for SI based systems: (i) The complex-
subset of n-grams are randomly selected from the sequence ity associated with swarms provides the unpredictable results.
of payload and passed to HMM for further analysis. The (ii) A rich hierarchical swarm based system takes time to shift
advantage with the HMMpayl is that it reduces the compu- to states. (iii) There is no central control which makes the
tational cost in comparison to other n-gram models where all system redundant and uncontrollable, etc. [115]
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 701
11) Ensemble of Classifiers/ Ensemble Learning: Ensemble each of the classifier is again learned by a combiner or meta-
learning makes use of multiple learners and combines the algorithm to make the final prediction. Ensemble classifier has
predictions made by a set of classifiers called as base learn- some advantages over the single classifiers: (i) Generalization
ers. The use of multiple machine learning algorithms helps in ability of the ensemble is much better than single classi-
generating a set of hypotheses for a problem. The ensemble fiers.(ii) It may not be possible to select a particular learner
of classifiers integrates the hypotheses to generate a common based on the available training dataset. The search process
result. The ensemble of classifiers provides a stronger general- may take longer. (iii) Ensemble often reduces the overfitting
ization capability compared to individual base learners [116]. problem of single classifiers and improves the prediction error,
Each base learner is generated by using machine learning algo- etc. There are some disadvantages of ensemble learning is
rithms such as Decision Tree, Naive Bayes, Neural Network, as follows: (i) The complexity of the ensemble affects the
Support Vector Machine etc. Some of the ensemble methods training time. (ii) Sometimes learning concepts become dif-
make use of the homogeneous base learners in which multiple ficult to understand. (iii) Requires more memory than single
instances of the same machine learning algorithm are used classifiers, etc.
to generate a set of hypotheses over different sub-samples In this subsection, we discussed various machine learn-
of the same training dataset. For example, Random Forest is ing techniques. Some of them are supervised such as various
one of the popular ensemble classifiers which combines the DT algorithms, multi-class SVM, MLP BP-ANN, NB and
predictions made by the multiple decision trees. Some of the KNN, etc. Supervised machine learning algorithms can detect
ensemble methods make use of the heterogeneous base learn- known attack patterns. They require a labeled attack dataset.
ers in which different machine learning algorithms are used Whereas some of the ML algorithms such as one-class SVM,
as base learners to generate a set of hypotheses. For exam- K-Mean Clustering, Self Organizing Map (SOM), DBSCAN,
ple, Neural Network, SVM and DT can be trained over a etc., are some of the examples of unsupervised machine learn-
training dataset and their predictions can be combined to gen- ing algorithms. Unsupervised learning is helpful in analyzing
erate common predictions. There are many ways to combine the unlabeled attack dataset and finding the outliers. The out-
the predictions made by multiple base learners. Some of the liers can be noise or it can be anomalies which are rarely found
popular methods are bagging, boosting, majority voting and in the normal scenarios. The outliers are further explored sta-
stacking [117]. tistically and useful information is extracted out of them which
The first effective method of ensemble learning was can be helpful to find distinct characteristics from data. Fuzzy
Bagging, also called bootstrap aggregation. Bagging is used to logic can be applied in Classification and Clustering algo-
reduce the variance of the machine learning algorithms having rithms to improve their learning capability and attack detection
high variance. A variance can be termed as the amount of the rate. For ex., Neuro-Fuzzy and Fuzzy c-means clustering are
change in the prediction of the target function for the different some popular applications of fuzzy logic in different ML
training dataset. Ideally, variance should not change too much. techniques.
Bagging creates different instances of the same training dataset Swarm intelligence techniques such as Particle Swarm
by selecting bootstrap samples. A bootstrap sample is created Optimization (PSO) is helpful in nonlinear optimization prob-
by selecting a sub set of samples from the training dataset. lems. HMM can be used to capture the dependencies between
If a training dataset of size n is given; Bagging will generate sequences using probabilistic approach. Both swarm intelli-
m new training sets of size k by sampling with replacement. gence and HMM can be used for supervised and unsupervised
In sampling with replacement, an observation may be repeated learning problems. Different classification algorithms have got
multiple times in a set. Each of the newly generated training set different characteristics. Each one of them has some pros and
is used to train a model. The output /predictions by each model cons as discussed before. The ensemble learning provides
can be combined by either voting (in case of classification) or the combination of same/different supervised and unsuper-
averaging (regression). vised algorithms used to solve the target problem altogether.
In boosting, random samples of training dataset without It often reduces the over-fitting problem of single classifiers
replacement (any observation cannot be repeated in sub-set) and improves the classification rate.
is extracted and used to train a weak learner. In the sec-
ond iteration, another sample sub-set is extracted in the same
way; however, in the new training set 50% samples which are B. Feature Selection in Machine Learning
previously misclassified are added and another weak learner is Two main important things that highly affect the
trained. In next iteration, a new training set is formed having performance of a classifier are: Classifier’s Technique and
the samples which are classified by both the previous learn- Selected Feature Subset. Researchers have proposed various
ers. After having some such iterations, the predictions made combinations of classifiers and feature selection methods (dis-
by weak learners are combined by the majority voting scheme. cussed in detail in Section IV). The goal of feature selection
Both Boosting and Bagging combines the predictions based on is to select the most important and optimal subset of features.
the majority vote or average rule [118]. Stacking makes use of Features selection improves the generalization performance,
the combiner algorithm to combine the predictions. Stacking reduces the computational cost of the classifier and makes
is a meta-algorithm which trains n base classifiers/learners the classifier faster for detecting unseen data and simplifies
for the given dataset. Each of the trained classifiers gener- the understanding of data processing. There are various draw-
ates the predictions for the given input data. The output of backs of considering all features in the detection technique
702 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
such as (i) It will increase the computational overhead of the srv count’), P34 (‘Dst host same srv rate’) and
system and will make training and testing time slower. (ii) It P39 (‘Dst host srv serror rate’) etc. These are very
will also lead to more storage requirement as the more number important for detecting high-frequency attacks (DoS,
of features a database contains, the more space it requires to Probe).
store each feature. (iii) It limits the generalization capability We have provided an overview of different types of machine
of a classifier which uses data mining techniques for detect- learning algorithms, i.e., rule based, probability based, clus-
ing intrusions. (iv) It increases the error rate of the classifier tering based, ensemble learning, genetics based, swarm-
since irrelevant features diminishes the discriminating power intelligence based, etc. The key characteristics, advantages and
of relevant features. disadvantages of various machine learning algorithms have
The feature selection methods are categorized into also been discussed. It would be impropriated to say that
three types [50]: (i) Filter methods (ii) Wrapper methods one type of machine learning algorithm will work best in all
(iii) Hybrid/Embedded methods. Filter methods are indepen- type of dataset. A classifier’s accuracy is not solely important
dent of the classifier. They compute the intrinsic properties of factor. Many factors affect the selection of appropriate clas-
the data. Filters are fast enough compared to the other meth- sification model such as Is our data composed of categorical
ods. They are relatively robust against overfitting. The major only, or numeric only, or both? What is the size of dataset?,
drawback is that they do not consider the results of the classi- Do we need to retrain the classifier often? Do we need quick
fier’s performance over the selected features. Hence, they fail and fast deployment model? Do we have labeled data or
to provide the best feature subset for classification. Wrapper unlabeled data? What is the complexity of data?. Hence, the
based methods use the combination of feature subset searching key characteristics of machine learning algorithms must be
algorithms and classifier algorithm. The performance is mea- known before selecting a bunch of classifiers for performance
sured as per the classification rate. The feature subset with analysis.
good classification rate is chosen at the end. A classification Also, the importance of feature selection and types of fea-
rate threshold is taken into consideration as a stopping cri- ture selection methods are also presented that can be used
teria of feature selection. Thus wrapper based methods are with machine learning algorithms. Feature selection helps in
classifier dependent. The major drawback of this method is identifying important, non-redundant and relevant attributes
that successive learning of classifier may result in overfitting that contribute to the accuracy of the predictive model.
problem. Computational time is usually high as it involves suc- Also, considering less and important features speed up the
cessive iterations of subset selection algorithm and classifier. classifiers.
Hybrid methods are combined with the classifier’s design in
the training phase. Data exploitation is optimized which will
reduce the number of retraining of the classifier for each new V. S UMMARY OF IDS BASED ON S INGLE /M ULTIPLE
subset. Hybrid methods have higher computational cost than C LASSIFIER BASED M ACHINE L EARNING A PPROACHES
filter based methods. The importance of various feature selec- Machine learning techniques have been used in different
tion algorithms for intrusion detection application has been ways for detecting intrusions using publicly available datasets
addressed in detail here [119]. such as KDD’99 [12], DARPA 1998 [53]. These datasets have
Let us take an example of features of KDD’99 network 41 features as shown in Table II in Section III. Recent attack
attack dataset. There are 41 features. Table II (shown in datasets such as ISCX 2012 [78] and UNSW-NB15 [13] has
Section III) categories all features into four main categories: been used in some of the approaches for validation. The sum-
• Basic Features (1-9): It refers to the basic features of mary of various IDS proposals is shown in Table IV and their
an individual TCP connection such as P3 (‘Service’), performance results are shown in Table V. The acronyms for
P1(‘Duration’), P4(‘Flag’), etc. evaluation metrics are shown in Table VI.
• Content Features (10-22): Content Features are extracted Initially, single classifier techniques have been used as a
from the data portions of the packets such as P11 (num standalone entity to classify the intrusions but they lack in
of failed logins), P14 (Root Shell), P10 (‘Hot’) and P13 performance for correctly classifying intrusions from normal
(‘Num Compromised’). These features are important to data instances. Later, feature selection methods are proposed
detect low-frequency attacks such as U2R and R2L. This to improve the detection rate and to reduce the computa-
is because DoS and Probe normally involves a large num- tional cost. However, there was no significant improvement in
ber of connections within a shorter period whereas R2L classification rate. Afterwards, some researchers combined the
and U2R normally involves a single connection and are single classifiers to improve the detection rate of intrusions by
embedded in the data portion of a packet. using the 41 features of the dataset. There are some limitations
• Traffic Features (23-31): Traffic features are computed with this technique such as low detection detect rate and high
using a 2s time window such as P23 (‘Count’), P24 computational cost especially for the low frequency (which are
(‘Srv count’), P29 (‘Same srv rate’) etc. These are very in less frequency in the dataset) attacks such as low-frequency
important for detecting high-frequency attacks (DoS and attacks. In some approaches, feature selection has been used
Probe). with multiple Classifiers which greatly improved the detection
• Traffic Features (32-41): Traffic features are com- rate for all types of attacks. However, there was no significant
puted using a 2s time window from destination to improvement in computational cost due to multiple process-
host such as P32 (‘Dst host count’), P33 (‘Dst host ing of classifier modules. By the gradual evolution of machine
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 703
TABLE IV
S UMMARY OF E XISTING IDS A PPROACHES BASED ON M ACHINE L EARNING
[120]
[121]
[123]
[124]
[122]
[125]
[126]
[127]
[128]
[129]
[130]
[131]
[132]
[61]
[133]
[134]
[135]
[136]
[26]
[137]
[138]
[139]
[140]
[141]
[142]
[143]
[144]
[145]
[146]
[147]
[148]
[149]
[150]
[151]
[152]
[153]
[154]
[155]
learning techniques in intrusion detection, we have classified classifiers with all features (iv) Multiple classifiers with lim-
them into four categories. (i) Single classifiers with all fea- ited features. These techniques have been described in detail
tures. (ii) Single classifiers with limited features. (iii) Multiple below.
704 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
TABLE V
P ERFORMANCE R ESULTS OF E XISTING IDS A PPROACHES
[120]
[121]
[123]
[124]
[122]
[125]
[147]
CT SVM [126]
TCM-KNN [127]
[128]
Bayesian clustering,
DT C 4.5 [129]
[130]
[131]
[132]
[134]
[135]
[136]
[26]
[137]
[138]
[156]
[140]
[28]
[141]
[142]
[143]
[144]
[157]
[158]
[104]
A. Single Classifier With All Features Intrusion Detection using KDD’99 dataset. In the initial step,
Incorporating single classifiers in IDS was the first step it collects the training and test dataset from KDD’99. The data
towards intrusion detection using machine learning techniques. sets are pre-processed to be used by SVM classifier. SVM is
Kim and Park [120] proposed a misuse detection approach trained over the training dataset and as a result, decision model
which applies Support Vector Machine (SVM) for Network is generated. This decision model corresponds to hyperplanes
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 705
TABLE VI
G LOSSARY OF ACRONYMS U SED IN P ERFORMANCE M EASURES
in feature space with some support vectors and weight vector experiments on a KDD’99 dataset for training and testing.
values. In the learning process various, C values are taken as Before the training of Neural Network, the number of neurons
1, 500, 1000 and kernel functions such as linear, 2-poly and in the input layer is defined as the number of input variables.
Radial Basis Function (RBF). The system tunes the various The number of neurons in the output layer is equal to the
values of C and kernel function to validate which kernel func- total number of classes. They consider only one hidden layer.
tion is effective and efficient. After learning, validation process Neural Network is performing well for detecting DoS and
is carried out in which test instances are passed to the learned Probe attack, but it fails to detect the low-frequency attacks
classifier to check for the validity of the classification. The since the number of records for these attacks is very less in
performance is compared in terms of detection rate and mis- comparison to other attacks (DoS and Probe). To improve the
classification rate. The classifier is not producing good results algorithm, an enhanced Decision Tree C4.5 is proposed. In the
for detecting Scanning attacks and Low-frequency attacks. The enhanced algorithm, default condition of original algorithm is
approach is achieving 91.6% detection rate for detection DoS, treated as new class whereas earlier the default was treated as a
36.65% for Probe attack and 12% for U2R attack and 22% normal class. Thus any new instance which does not match the
for R2L attacks. Researchers have not reported the results for rule will be treated as suspicious. The modified C4.5 algorithm
false alarms. provides improvement for low-frequency attacks too. DT pro-
Amor et al. [121] performed the Intrusion Detection using vides the detection rate of 99.99% for DoS, 99.78% for probe,
two different misuse detection approaches particularly Naive 90.39% for U2R and 98.93% for R2L attacks.
Bayes and Decision tree classifier separately and compared Tajbakhsh et al. [131] proposed the misuse detection
their performance. The KDD’99 dataset is used for training approach based on fuzzy association rules using KDD’99
and testing. Decision Tree algorithm builds the tree based on dataset. In the training phase, a membership function is used
the dataset values. Each nonleaf node corresponds to the test to perform the feature to item transformation. Each attribute-
attribute whereas each branch represents the output of the test value pair is called an item. The fuzzy membership function
attribute. An appropriate test attribute is chosen based on the is based on Fuzzy C-Means (FCM) clustering algorithm. In
Entropy and Information gain. Leaf node represents the final the next phase of training, the produced items are reduced.
class of the object. Decision tree produces the rules traversing KDD’99 contains 189 items; rule generation over 189 items is
from root to leaf. In Naive Bayes, conditional probabilities are not possible. The rules are generated based on the minimum
calculated for each attribute of a test instance corresponding support and confidence values. The fuzzy association rules
to each class label. The product of this posterior probabilities are used to build the classifier. By the rule sets, an instance
helps in determining the final class. After the learning phase, is assigned a label. In the testing phase, each feature of a
the test instances are passed to the classifier to check for the test instance is transformed to an item using the membership
correctness of classification in terms of detection rate. Both are function. Then these transformed records are passed through
performing very poor in detecting the low-frequency attacks the learned classifier which classifies the instance. The train-
(U2R and R2L). NB achieves detection rate of 96.65% for ing data sets have been sampled into five sets (normal, DoS,
DoS and 88.33% detection rate for probe whereas DT achieves Probe, U2R and R2L). Rules are produced for each class. The
97.24% detection rate for DoS and 77.92% detection rate for total execution time of this classifier is 500s. The technique
probe attack. The detection rate is low (0.53%-11.84%) for is not performing well for any of the attacks detection. It pro-
both approaches as shown in Table V. vides 78.9% DR for DoS attack detection, 88.5% DR for probe
Bouzida and Cuppens [122] performed the Intrusion attack, U2R DR for 68.6% and R2L DR for 6.2% attack. The
Detection System using Back Propagation Neural Network overall detection rate is 70%-90% with 2% false positives.
(BPL NN) classifier and Decision tree separately for misuse Kumar and Yadav [140] proposed the simplest model of
detection and compared their performance. They performed misuse detection system which is based on Neural Network.
706 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
In the first phase, network data set is selected and prepared for sampling techniques are performing better and are most
training and testing. Further, the selected data is pre-processed effective sampling techniques. The selective sampling pro-
to make it compatible with Neural Network by converting all vides the best performance with 100% accuracy and no false
symbolic values into numeric values, for ex. ICMP=1, TCP=2 positive alerts having sampling rate above 20%. A generic
and UDP=3. After this, the normalization step is performed sketch-guided sampling also provides good results for detect-
in which Z-score values are calculated for each feature value. ing application-layer attacks. Sketch guided sampling provides
In the next phase, Neural Network is trained over the trans- 92% detection rate at 40% sampling rate. Authors have gen-
formed data set. It consists of three layers input, hidden and erated various DoS attack traces using different tools and
output with 41, 29 and 5 neurons respectively. The learned intermixed the attack traffic with the attack-free traffic from
classifier is tested over the testing data set. Neural Network ISCX dataset.
is found performing well for detecting Intrusions except for Wang et al. [158] proposed an intrusion detection frame-
low-frequency intrusions (U2R and R2L). work which uses support vector machine (SVM) integrated
Amoli et al. [145] proposed an unsupervised clustering with a data transformation method. Feature dependencies are
based anomaly detection approach which is based on anomaly incorporated in the algorithm. Logarithm marginal density
detection approach to detect and classify the DoS, DDoS, ratios transformation (LMDRT) method is used to perform
Probe attacks. The model is composed of two detection data transformation. The augmented features are of a bet-
engines which monitor and inspect the behavior of the network ter quality which is supplied to the SVM. LMDRT method
in normal or encrypted communications. The first engine cal- is motivated by Naive Bayes theory for classification as it
culates a self-adaptive threshold value to detect the network considers the marginal density ratio. The proposed framework
traffic changes that are caused by attacks such as DoS, DDoS, has been evaluated using NSL-KDD 99 dataset. It achieves
scanning and worm, etc. The clustering is done in two steps: 99.31% accuracy, 99.20% detection rate and 0.60% false
The network traffic do not pass the threshold, the engine clus- alarm rate.
ters the attack-free traffic according to DBSCAN algorithm. Various intrusion detection techniques based on single clas-
The clustering algorithm calculates the acceptable distance of sifier have been discussed. Single classifiers are simple and
the network instances and puts the points into the cluster. Once easy to understand. However, the limitations of single classi-
the traffic passes the threshold value, again clusters are cre- fier algorithm such as sensitivity towards the choice of input
ated for outliers. The points that cross the acceptable distances parameters, choice of the kernel function, number of training
are treated as outliers. The second engine aims to detect the variables and overfitting, etc. reduce the chances of getting
botmaster. The first engine sends the IP addresses with attack good evaluation results. Secondly, if the dataset has got too
details to the second engine which then correlates the packets many attributes, it becomes difficult for the classifier to provide
to find the main system controlling DoS. They have considered timely results. Single classifier algorithms, when combined
the ISCX dataset to validate the approach. It achieves 98.39% with feature selection algorithms, reduce the computational
accuracy, 100% recall, 98.12% precision, 96.39% TNR and cost, discussed in next section.
3.61% FPR and outperforms the K-mean outlier detection.
Bhamare et al. [152] presented the use of machine learn-
ing for detecting attacks in the cyber network. They have B. Single Classifier With Limited Features
executed various machine learning algorithms using two new Feature selection techniques are used with sin-
network datasets other than KDD’99, i.e., UNSW-NB15 and gle classifier approaches to improve its performance.
ISOT datasets. These are dynamically generated new datasets Sangkatsanee et al. [61] proposed a real-time Intrusion
which provide real attack statistics. Various misuse detection Detection System (RT-IDS) based on decision tree C4.5 to
algorithms such as DT, NB, LR and SVM with three differ- detect the two different types of network intrusions such
ent kernels, which are RBF, Polynomial, Linear. DT provides as Denial of Service (DoS) and Probe. The approach is
an accuracy of 88.67%, NB provides 73.8% accuracy, SVM applied in context to misuse detection. The framework
with RBF kernel provides 70.15% accuracy, SVM with poly- consists of three phases: data preprocessing, classification and
nomial kernel provides 68.06% accuracy, SVM with linear post-processing. In the preprocessing phase, a packet sniffer
kernel provides 69.54% accuracy, and LR provides 89.26% is used which uses Jpcap library and network information to
accuracy. DT provides 6.9% FPR, SVM with RBF function extract IP header, TCP header, UDP header and ICMP header,
provides 4.1% FPR, SVM with poly function provides 53.3% etc. The packet information is considered between the source
FPR, SVM with linear function provides 50.7% FPR, NB pro- and destination IP of each packet. Author used Information
vides 7.3% FPR, LR provides 4.3% FPR. We can say that gain to extract the 12 features as shown in Table VII. Next
among all Logistic regression is providing better results with is classification phase in which the classifier is trained over
low FPR. However, the results are not very good using such the training dataset of known labels. In the testing phase, the
simple methods of ML. learned model is used to classify the test instance as normal
Jazi et al. [157] presented a novel approach for detect- or intrusive based on the learned behavior. Post-processing
ing application-layer DDoS attack which uses non-parametric is used to reduce the false alarm. In this phase, the network
CUSUM algorithm. The authors investigated thirteen sam- data between source and destination is divided into groups of
pling techniques to perform filtering on data. Out of them, five records. In each record, if there exist 3-5 records which
they observed that selective flow sampling and sketch-guided are reported as the same attack, that group is considered as
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 707
TABLE VII
L IST OF IDS T ECHNIQUES W ITH F EATURE S ET
[120]
[121]
[123]
[124]
[122]
[126] [126]
[125]
[147]
[127]
[128]
[129]
[130]
[131]
[132]
[61]
[133]
[134]
[135]
[136]
[26]
[137]
[138]
[156]
[140]
[28]
[141]
[142]
[143]
[144]
[159]
[160]
an attack type. The technique is not detecting low-frequency and 12 features for R2L. The features are shown in Table VII.
attacks. The detection rate for other attacks is higher than In the training phase, five SVMs are trained over five different
98% with only two seconds of computational speed. It detects datasets (Normal, DoS, Probe, U2R and R2L). In the testing
DoS attack with 99.434% DR & 0.73% false alarm and probe phase, the attack samples are supplied to each of the trained
attack with 98.868% DR & 0.9% false alarms. classifiers which classifies them in one of the five classes. The
Amiri et al. [133] proposed Modified Mutual Information method does not achieve detection rate greater than 90% to
based feature selection approach (MMIFS) and used it with any of the attack detection. Moreover, the detection rate for
Support Vector Machine (SVM) to detect different types of U2R attack is lowest (30.70%).
attacks mainly low-frequency attacks. They have considered Lin et al. [144] proposed a new distance based feature
training and testing dataset of KDD’99. In the initial phase, extraction approach (CANN) applied with anomaly detection
data normalization and reduction is carried out by dividing approach based on a k-NN algorithm to detect intrusions. In
every attribute value by its own maximum value. In the next the first phase, a clustering algorithm is used to make clus-
Phase, feature selection is carried out over the imported train- ters of the training data and then two distances are used to
ing data. MMIFS initially set the feature set empty. It computes determine the new feature value: First is between a specific
the mutual information of the features with the class output and data point and its cluster center and second is between a spe-
selects the first feature that has the maximum mutual informa- cific data point and its nearest neighbor. New one-dimensional
tion (MI) value. In the next step, the MI is calculated between distance based feature value now represents each data point
features and those features are selected which meets particular in the training data. In the next phase, principal component
criteria which are explained by authors in their work. This step analysis (PCA) is used to select the relevant features. Only 6
is repeated till the desired number of features are selected. The features are selected as shown in the Table VII. Another phase
final set is provided as output to the user. The final set contains is classification phase in which k-NN classifier is trained by
8 features for DoS, 12 features for Probe, 14 features for U2R the new training data set. In the testing phase, CANN process
708 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
is again carried out for the test instances and testing data is also optimization (TVCPSO) with multiple criteria linear program-
represented using one dimensional feature space. k-NN clas- ming (MCLP) and SVM individually for doing parameter
sifier performs the classification over the test data instances. tuning and feature selection. In order to increase the speed
The classifier is not able to detect low-frequency attacks. The of PSO in searching for optimum and avoiding the local opti-
overall detection rate is 99.99%, accuracy 99.76% and false mum, the author introduced the chaotic concept with PSO.
alarms 0.003%. It achieves accuracy ∼ 99% for both DoS and The framework has been implemented using NSL-KDD’99.
probe attack detection. The overall detection rate provided by TVCPSO-MCLP is
Koc et al. [143] proposed the Hidden Naive Bayes Classifier 97.23% having false alarm rate 2.41% and accuracy 96.88%
which is an extension of the Naive Bayes classifier for misuse with feature selection. TVCPSO-SVM provides 97.84% accu-
detection. In the first phase of the proposed framework, the racy, 97.03% detection rate and 0.87% false alarm rate
attribute values are first converted to discrete values using the with feature selection. Out of the two algorithms, TVCPSO-
entropy minimization discretization and proportional k-interval SVM better results for DoS, Normal and probe and provides
discretization. In the next phase, feature selection is carried detection rate of 98.84%, 99.13% and 89.29% respectively.
out using three methods: correlation based (CFS), consistency Whereas TVCPSO-MCLP provides better detection rate for
based (CONS) and INTERACT methods. Author has tested R2L and U2R, having a detection rate of 75.08% and 59.62%
the combination of various discretization method and feature respectively.
selection method with classifier to derive the best method with Akashdeep et al. [159] provided an intrusion detection
high detection accuracy. Next phase is the classification phase approach which uses ANN and combines it with proposed fea-
in which Hidden Naive Bayes classifier is used to learn the ture reduction method. All the features of the original dataset
behavior of attack sample from training data. Naive Bayes are first ranked according to Information Gain method and
(NB) classifier is based on the assumption of the indepen- correlation method individually. Three feature subsets are built
dent relation of the attributes. Hidden Naive Bayes (HNB) using each method, i.e., IG-1, IG2, IG3 and CR1, CR2, CR3.
relaxes this assumption and extend the post probability calcu- The first subset under each category contains 1- 10 ranked
lation formula which also considers the mutual information of features, second sub-set contains 11-30 features, and the third
attributes during probability calculation. In the testing phase, subset contains remaining features. The first and second sub-
a test instance is classified based on the learned behavior. In set of each category are combined using union operation &
Intrusion Detection, the attribute values are very much depen- the second subset of each category are combined using the
dent on each other. For example, if we want to check a number intersection. Rest subsets are ignored. A final subset of features
of failed logins over a period. Here, content feature (num (total 25) is obtained by doing union operation over selected
of failed login) value and basic feature (duration) value both subsets as shown in Figure 9. The reduced KDD 99 dataset
affects the output value and dependent on each other’s values with 25 features is used to train ANN classifier. It provides
too in determining the output. Hence, HNB is improving the 86.6% detection rate (DR) for U2R, 93.8% DR for DoS, 91.9%
NB performance for detecting the intrusions. The author pro- DR for R2L and 89.8% DR for probe attack.
vided DoS detection rate as 99.60% and overall detection as Ambusaidi et al. [160] proposed flexible mutual informa-
93.72% using KDD’99 dataset. tion (MI) based feature selection (FMIFS) algorithm which
Gharaee and Hosseinvand [153] proposed the new feature can handle linear and nonlinear features efficiently. FMIFS
selection based intrusion detection model (GF-SVM) to detect has been used to select features to remove the most irrel-
intrusions in the network. A feature selection approach is evant features. The filtered dataset is then evaluated by
proposed where a Genetic algorithm (GA) and SVM are inte- machine-learning based network intrusion detection technique,
grated to provide an optimal set of features. Authors have particularly Least Square Support Vector Machine based
done slight modifications in the fitness function of the GA. IDS (LSSVM-IDS). The performance is measured over the
Instead of using the accuracy and number of features (NumF) KDD’99 dataset. It provides 99.46% DR, 0.13% FPR and
as parameters for fitness function, they have used three param- 99.79% accuracy. When evaluated with NSL-KDD’99, it pro-
eters: TPR, FPR and NumF. Each parameter is multiplied by vides 98.76% DR, 0.28% FPR and 99.91% accuracy. It
certain weight based on the user’s choice. In each iteration of outperforms other methods like MIFS and Flexible Linear
GA, each chromosome is evaluated and chromosomes with the Correlation Coefficient Based Feature Selection (FLCFS).
highest classification accuracy (using SVM) are selected. The Various intrusion detection techniques based on single clas-
optimal features are used to filter the dataset and Least Squared sifier with feature selection algorithm, have been discussed.
Support Vector Machine (LSSVM) is used to learn/detect the Applying feature selection improves the performance of clas-
train/test dataset with selected features. They have considered sification. However, one needs to take care of which combi-
7 features for normal attacks and 6-14 features for different nation of feature selection and machine learning algorithm is
types of attacks. The results using UNSW-NB15 dataset are providing the best results. It also makes the classifier faster
as follows: It achieves an accuracy of 97.45% with 98.47% over a selected set of attributes of features. However, there
TPR and 0.04% FPR for detecting normal traffic. It achieves is less or moderate improvement in the classification results.
an accuracy of 79.19%-99.45% with TPR 67.31%-100% and The drawbacks associated with one classifier algorithm can be
FPR 0.01%-0.09% for detecting various types of attacks. overcome when combined with another classifier algorithm (s)
Bamakan et al. [161] proposed an intrusion detection frame- to improve the classification result, discussed in next section
work which integrates the time varying chaos particle swarm in detail.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 709
are trained to observe the intrusive activity. Each classifier is providing the detection of low-frequency attacks and achieves
trained over different samples of training data set. Each train- 100% detection rate for Probe attack.
ing sample represents exactly one class (Normal, DoS, Probe, Wang et al. [132] proposed an integrated hybrid intrusion
U2R or R2L). Each classifier acts as a signature based clas- detection approach which consists of Neural Network and
sifier. In the next phase, output of ANFIS module is supplied Fuzzy Clustering algorithm named as FC-ANN for detecting
to the fuzzy interference module which maps the output from intrusions. In the first stage, the data set is divided into training
the Neuro-Fuzzy classifiers to final output space and makes the and testing phases. Different training subsets are constructed
final decision for normal or intrusive nature. Output is the clas- using fuzzy clustering approach. In the second stage, for each
sifier label of the first layer (neuro-fuzzy output layer) in which training subset an Artificial Neural Network is trained and
the output is the close value to the interference module. Author different ANN classifiers are formulated. The results are com-
has also applied Genetic Algorithm (GA) to optimize the struc- bined and a new ANN is trained over the results to reduce
ture of fuzzy decision-making system. Author has specified the the errors. Fuzzy Clustering is performed by using a Fuzzy
overall detection rate of the classifier as 95.3% with false pos- c-means algorithm which clusters the data points based on
itive rate 1.9% over KDD’99 dataset. It has adaptive learning the membership grade. The cluster center and membership
capability. However, it is not working well for low-frequency grades are updated in each iteration to produce the optimized
attacks. results. ANN is used as classifier to classify the traffic based
Khan et al. [126] proposed the hybrid detection approach on the learned behavior of attacks. Sigmoid activation function
by integrating the Support Vector Machine (SVM) with takes the input as the product of weight and input value and
Hierarchical Clustering named as CT SVM using DARPA passes the results to other neurons. The output value com-
1998. Hierarchical clustering is used to reduce the training pared with target and error is calculated. Backpropagation
time of SVM and improve its efficiency of attack detection. algorithm is used to update the system for new values. The
SVM uses a hypothesis space of linear function in a higher technique produces the overall detection rate as 91.32% using
dimensional feature space. The trained model corresponds to KDD’99. The technique improves the detection rate of ANN
the hyperplanes. The data points closest to the hyperplane are for low-frequency attacks.
called as support vectors. In the proposed model, hierarchical Feng et al. [142] proposed an intrusion detection system
clustering is performed in the first phase over the training data (CSVAC) that takes advantage of both misuse and anomaly
set using DGSOT algorithm. In each iteration, new nodes are detection algorithms particularly Support Vector Machine
added to the tree based on the learning process. After each (SVM) and Clustering based Self-Organized Ant Colony
iteration, an SVM is trained over the nodes of the tree to Networks (CSOACN). SVM can learn over the little vol-
reduce the computational overhead. In the next iteration, gen- ume of data. CSOACN provides the power of adaptability.
erated support vectors are passed to the clustering algorithm In the real-time Intrusion Detection System, it is essential
to control the tree growth. In this way, only support vector that whenever new data points are added, the old model
nodes grow. The process is continued till some stopping cri- should be updated immediately. It saves a significant retrain-
teria is achieved. The stopping criteria could be based on tree ing time of the system. In the first phase, the training data
size or tree level or accuracy level. The training time of single is normalized to remove the biasness of some features over
SVM is 17.34 hr while integrating it clustering algorithm; it others. Next phase is training phase in which SVM is trained
is reduced to 13.18 hr. The system is not working well for over the several training data subsets repeatedly. In the first
detecting low-frequency attacks. iteration, initial hyperplanes are generated randomly. After that
Tong et al. [130] presented the Intrusion Detection System in each iteration, CSOACN clustering algorithm selects the
which integrates the Radial Basis Function Neural Network points around the generated support vectors and forms clus-
(RBF NN) with the Elman Network using DARPA 1998. ters. These data points will be used for training the SVM in
Neural Network fails to remember the past events. Elman the next iteration. Here, CSOACN not only learns the data
Network overcomes the weakness of Neural Network and pro- points and identify the outliers but also makes the data selec-
vides this capability. It helps in allowing the occasional misuse tion for training the SVM. At the end of the training phase, we
behavior and detecting the temporally co-located intrusion and have two learned classifiers: SVM and CSOACN. In the test-
collaborative events. Elman Network has a set of the context ing phase, the test instances are passed through the classifiers.
nodes. Each node takes the input from the hidden node and If both classifiers classify instance as anomalous, then only it
forwards the output to each hidden node of its hidden layer. is flagged as anomalous. CSOACN is used to determine the
Therefore in the hybrid network, both input and hidden node subclass of the attack. If results differ from each other, data
activates the activation node and hidden node fed forward to item is labeled as “amphibolous”. It can be further used for
activate the output nodes. The memorial functionality intro- analyzing the behavior of normal or intrusive data. The classi-
duced by context node helps in remembering the past events fier provides 3.388 s of training time, but it is not performing
and correlating the sequence of events. The value of context well for Probe and U2R attack detection.
nodes increases in each iteration and slowly decreases back Elhag et al. [148] combined the genetic fuzzy system
to zero based on the threshold. The hybrid model is eas- (GFS) with the pairwise learning (one to one mapping: OVO)
ily adaptable to new intrusions and requires less retraining architecture. The use of fuzzy sets creates a smoother bor-
time. This technique is very helpful in detecting temporar- derline between rules set and pair-wise learning improves the
ily dispersed and collaborative attacks. The technique is not precision of the rare attack patterns. The combined misuse
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 711
detection approach provides the high performance and is being algorithm. 12 feature for DoS, 12 features for Probe, 8 features
compared with decision tree by the authors. In GFS, fuzzy for U2R and 10 features for R2L are selected using 41 vari-
association rules (FARC-HD) form the base classifier. The able input data set. The working of the flexible neural tree is
inner working of the FARC-HD applies the genetic algorithm based on the neural network except it is flexible to expand.
to optimize the membership values of FARC-HD and obtain The tree expands base on the fitness function. Particle Swarm
the compact rules. The OVO method converts the multiclass Optimization (PSO) technique is used to optimize the param-
classification problem into binary sub-problems by making all eters during this process. PSO conducts a search using a
possible pair of classes. Then a binary classifier is trained population of particles that corresponds to an individual in
for the subsample of data ignoring other samples that do an evolutionary algorithm. The algorithm is working well for
not belong to its related class. Each of the trained models all categories of attack detection.
processes an instance and then the predictions by all classifiers Xiang et al. [129] proposed a hybrid detection algorithm
are combined to obtain the output. They have used preference based on Decision Tree C4.5 and Bayesian (AutoClass) clus-
relations solved by non-dominance criteria to combine the tering algorithm using KDD’99 dataset. The technique oper-
results. They have selected 5 labels per variable for fuzzy sets. ates in four stages. In the first stage, Decision Tree C4.5 is
The minimum support considered is 0.05 and minimum confi- used to classify the training data set into three categories (DoS,
dence is 0.8. The approach achieves 99% overall accuracy with Probe and others). Decision Tree fails to separate the U2R
97.77% attack detection rate and 0.191% false alarms. The and R2L attacks. In the second stage, Bayesian Clustering is
accuracy is 98.05% for DoS attack, 95.83% for probe attack, used to separate the normal connections from U2R and R2L
87.54% for R2L and 65.38% for U2R. The approach is achiev- connections. Clustering algorithm performs better than super-
ing the high accuracy for R2L and U2L and outperforming the vised algorithms in detecting the low-frequency attacks. In this
layered approach discussed above. stage, four features are used for clustering: duration, service,
Yassin et al. [149] proposed the hybrid detection approach src bytes and dst bytes. In this phase 178 clustered are formed
which provides integration of K-Means clustering and Naive and 31 are declared as attacks. In the third phase, again deci-
Bayes classifier. The combination of anomaly detection and sion tree C4.5 is used to separate the U2R from R2L attacks.
misuse detection is used to detect the attacks which can This is easier since normal connections are filtered out in the
be bypassed by having only one type of detection mech- second stage. In this stage, only 41 features are used as shown
anism. In the first phase, K-means clustering is used as a in Table II (Section III) and Table VII (Specific features). The
pre-classification module to make the clusters. Each cluster last stage further specifies individual U2R and R2L attacks
represents the group of similar data. The entire data is labeled based on the given training data. This classification is well
with Kth cluster set. Afterwards, Naive Bayes algorithm is effective for known attacks and results depends on the avail-
used as a classification module to classify the data instances ability of sufficient label data. This technique is performing
of the labeled cluster into the attack and normal. The approach low for R2L attacks detection.
is implemented using ISCX 2012 attack dataset. It achieves an Lin et al. [136] proposed an algorithm that uses multiple
accuracy of 99.8% with 95.4% detection rate and 0.13% false machine learning algorithms to detect the intrusions namely
alarms. The integration improves the accuracy of Naive Bayes Support Vector Machine (SVM), Decision Tree (DT) and
which provides 82.8% detection rate and 17.6% false alarms. Simulated Annealing (SA) in context of misuse detection. It
Various intrusion detection approaches based on multiclas- takes advantage of all three classifiers such as SVM performs
sifier algorithm are discussed. The classifiers are trained to well for classifying the intrusions, DT can produce rules and
learn all features of the training dataset. There is an improve- SA coverage to global optima. In the first phase, KDD’99
ment in the accuracy of the system. However, computational dataset is prepared for training and testing purpose. In the next
cost and complexity of the system are high. The detection stage, SVM and SA are combined to select the best features.
rate is improved by multiple classifier algorithms especially SVM maps the training data into high dimensional feature
for low-frequency attacks such as U2R and R2L. Combining space. SVM is trained and tested with the different possible
a bunch of classifiers may not always work better. Various feature set and at last best feature set is selected with maxi-
possible different combinations of multiclassifier needs to be mum accuracy. During this process, SA is used to optimize
cross-validated. The performance and speed can be improved the two of the parameters (C and λ) used by SVM with
by integrating the multiple classifier techniques with suitable Gaussian radial basis function kernel. Here C is the parameter
feature selection approach, discussed in next section. for the soft margin cost function, which controls the influ-
ence of each support vector. λ is the free parameter of the
Gaussian radial basis function. In the next phase, DT is used
D. Multiple Classifier With Limited Features to produce the rules for the classification using the selected
Feature selection techniques are further used with Hybrid features of the previous phase. Information gain and Entropy
Classifiers for further improvement, especially for low- are two important measures used by this algorithm while build-
frequency attack detection. Chen et al. [128] proposed a ing the decision tree. Here, SA is again used to optimize the
misuse detection approach based on flexible neural Tree (FNT) two parameters of DT: pruning confidence factor (CF) and
technique using DARPA 1998. The parameters of FNT are minimum cases (M). This technique is performing well for
optimized using the particle swarm optimization technique. detecting all types of attacks. Especially for DoS, the detection
Genetic Algorithm is used to select the features for the rate is 100%.
712 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
Casas et al. [26] proposed Unsupervised Network Intrusion and applying selection, crossover, and mutation to find the
Detection System (UNIDS) which uses three algorithms Sub- optimal subset till stop criteria is reached. In each iteration,
Space Clustering, DBSCAN and Evidence Accumulation for the effectiveness of the feature set is determined by the clas-
Ranking Outliers (EA4RO) for anomaly detection. Sub Space sifier’s accuracy. Further different SVM classifiers are trained
clustering is used to project the X (a feature vector with over the selected parameters over the five training data sub-
n dimensions) into N k-dimensional Sub Spaces (Xi ). Each sets. The overall detection rate is 92.268% with 1.025% false
Sub Space Xi is then partitioned using DBSCAN clustering positive rate. However, the technique is not working well for
algorithm. DBSCAN algorithm generates clusters of various low-frequency attacks.
densities. Anomalies present in low-density areas. It output the Chandrasekhar and Raghuveer [139] proposed a hybrid
set of clusters with a set of outliers. Clustering in low dimen- Model which uses the power of Clustering (K-means), Fuzzy
sional space is much faster and efficient. Outliers represent Neural Network (neuro-fuzzy) and Support Vector Machine
the different IP flows. EA4RO algorithm is used to rank these (SVM) to identify the intrusions. Processing the huge chunk
flow based on their degree of abnormality (dissimilarity). The of data introduces errors and affects the efficiency of the clas-
degree of dissimilarity is measured as a distance from outlier sifier. Hence, in the initial phase, the proposed framework
to the centroid of the biggest cluster. Here Mahalanobis dis- divides the training data set into small subsets based on the
tance measure is considered to measure the dissimilarity. The similarity of the data items by using K-means clustering algo-
flows are ranked according to their dissimilarity measure. IP rithm. It reduces the sparsity in data and makes it more suitable
flows whose dissimilarity is greater than a predefined thresh- for the classifier. In the next phase, five neuro-fuzzy classifiers
old are marked as anomalous. The algorithm is working well are trained over the five training subsets. It is difficult to deter-
detecting all attacks. Especially for Probe attacks, accuracy mine the number of neurons and hidden layers in the Neural
is 100% when tested with the KDD’99 dataset. It is using 9 Network. The problem is overcome by introducing the fuzzy
features as shown in Table VII. logic with the neural network. It can manage imprecise, partial
Li et al. [138] proposed a hybrid approach based on Support and vague information. It uses the backpropagation algorithm
Vector Machine (SVM), ant colony algorithm and clustering to find out the input membership function. Each Neuro-Fuzzy
algorithm. There are three phases of classification. In the first Network outputs the set of features with the membership value.
phase, the training data set (KDD’99) is pre-processed by In the next phase, SVM is trained using the selected fea-
deleting the repeated data in the database. Data compaction tures for each of the training samples and support vectors
is further achieved by using the K-mean Clustering which are generated. In the testing phase, SVM classifies the test
groups data items in different clusters based on their simi- instances based on the generated hyperplanes. The algorithm
larity measure. The intersection of original data and clusters is performing well for detecting all types of attacks.
remain. It reduces the size of data and makes it more efficient Horng et al. [134] proposed a hybrid framework based on
for classification. SVM takes very long time to process the hierarchical clustering and Support Vector Machine (SVM)
huge database. In the next phase, training subsets are selected using KDD’99 dataset. To improve the detection rate of SVM
using the ant colony algorithm. The effectiveness of each sub- and to reduce its training time, Clustering is integrated with
set is evaluated using the SVM classifier. In the third phase, SVM. In the first phase, data transformation and scaling is per-
feature selection is performed using proposed Gradual Feature formed to convert the non-continuous values into continuous
Removal (GFR) method. In GFR method, a feature is removed form and normalize the values on a scale [0-1]. In the next
from the feature set one by one and accuracy of the classifier phase, Clustering Feature (CF) tree is constructed for each cat-
is examined for the feature set. The process is carried out for egory of attack. A CF tree is a balanced height tree with two
each feature in the feature set and influence of a feature is parameters: branching factor (B) and radius threshold (T). It
noted. The most balanced feature set is selected based on the is compact representation of the dataset. Insertion and dele-
accuracy of the classifier. In the next phase, SVM learns the tion of a new data point are same as B+ tree. One entry in
behavior of attack data supplied to it based on the selected the leaf node represents one cluster. The hierarchical cluster-
feature set. The proposed technique does not perform well for ing organizes the training data in tree form, making the data
detecting U2R attacks. However, the detection rate is greater more balanced. In the next phase, feature selection is carried
than 90% for other attacks. out using the gradual feature removal method, in which each
Kuang et al. [141] proposed the misuse detection system attribute is taken out once and accuracy of the classifier is
based on Kernel Principal Component Analysis (KPCA), noticed to examine the influence of the feature in the output.
Support Vector Machine (SVM) and Genetic Algorithm (GA). The output of this phase is 19, 17, 24 and 24 features for DoS,
SVM provides the good generalization capability over small Probe, U2R and R2L respectively. In the next phase, SVM is
sample training data. GA performs the feature selection for trained over the selected features using RBF (Radial Basis
classification. In the first phase, data preprocessing is carried Kernel Function). In testing, a test instance is classified as per
out over training dataset (KDD’99) to transform all data val- the learned behavior of classifier. The system achieves 95.72%
ues into the numeric form and normalize all values. Training accuracy with 0.7% false positive rate. The performance is
data is divided into five subclasses based on the category. In good for DoS and Probe and poor for low-frequency attacks.
the next phase, KPCA transfers the high dimensional feature Gupta et al. [150] proposed a layered misuse detection
space into a low dimensional eigenspace. Then GA performs approach for achieving high accuracy and high efficiency. The
the feature selection by iteratively selecting the populations attack accuracy is achieved by using the Conditional Random
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 713
Fields (CRF) and high efficiency is achieved by using the SVM provides an accuracy of 88.03% whereas the proposed
layered approach. CRFs are probabilistic systems which are scheme provides 98.76% accuracy with a randomly selected
used to model the conditional distribution over a set of random three features using SA approach. It provides 0.09% FPR and
variable. CRFs have some advantages over Markov models as 1.35% FNR rate which is reasonably low.
they are undirected models used for sequence tagging which Moustafa and Slay [155] proposed a hybrid feature selec-
makes them free from the observation bias and label bias. They tion approach to reduce the irrelevant set of features and
have considered four attack groups namely, DoS, Probe, U2R to integrate it with other machine learning algorithms for
and R2L namely. They have selected features separately for intrusion detection. The proposed NIDS architecture makes
each class based upon the attack characteristics without using use of both anomaly and misuse detection approaches for
standard feature selection algorithm. Four different training intrusion detection. NIDS, first of all, takes the input from the
sets are created for each of the layers with comprises one of dataset (UNSW-NB15 or NSL KDD’99). It then calculates
the attack class and normal class. Each layer is trained for spe- the center points for attribute values. A center point or mode
cific attack category using specific feature set for the attack is the most frequent value of the attribute. The center points
class. In each iteration of the algorithm in training phase, a for all the features are used as an input to the association
CRF model is trained for each layer for a specific class. After rule mining algorithm (Apriori) to reduce its processing
training, there are four CRF models which are plug-in sequen- time. The association rule mining finds out the correlation of
tially in such way that the connections labeled as normal are the two or more features/attributes and finds out the highly
passed to the next layer otherwise detected as attack class ranked features. The dataset is filtered based on the selected
(corresponding to the layer) and connection is blocked. The features and given as an input to the detection engine. Here,
layered approach achieves 98.60% detection rate for probe three algorithms namely Expectation-Maximization (EM)
with 0.91% false alarms, 97.40% detection rate for DoS attack clustering, Naive Bayes (NB) and Logistic Regression (LR)
with 0.07% false alarms, 86.3% detection rate for U2R with have been used. The results using UNSW-NB15 are as
0.05 false alarms and 29.60% detection rate for R2L with follows: EM provides an accuracy of 77.2% with 13.1%
0.350% false alarms. FAR. LR provides accuracy of 83.0% with 14.2% FAR and
Mamun et al. [151] proposed a deep packet inspection tech- NB provides 79.5% accuracy with 23.5% FAR.
nique based on the use of Shannon’s entropy to identify the Aburomman and Reaz [104] proposed an ensemble ML
application flows, part of encrypted traffic. The feature set is based intrusion detection approach in which six k-NN and
composed of the following features: the entropy of the entire six SVM classifiers are trained over KDD’99. The results of
payloads, sliding window or n-gram length, the entropy of all the 12 trained models are combined using three different
encoded payload and Bi-Entropy, considered for both binary approaches. In first way, PSO generates weights which are
payload and encoded payload. A logarithmic function is used combined by Weighted Majority Voting algorithm (WMA) to
to transform all the metrics as it was found to improve the combine the results of trained models. In second way, the
accuracy of the classifier. All the features are further pre- behavior parameters of PSO are combined using Local uni-
processed using genetic algorithm. The genetic algorithm is modal sampling (LUS) and rest is same as the first approach.
integrated with the Least Square SVM (LSSVM), used as a In third way, WMA is used to fuse the results of classifiers.
training algorithm. In each iteration, a genetic algorithm is In all three ways, the results are produced and compared.
used to select the features based on the fitness function. The LUS-PSO-WMA (second way of combining results) provides
fitness function is calculated by weighting the true positive better accuracy among all. It provides 83.6878% accuracy for
rate, false positive rate and a total number of selected features. detecting normal traces, 96.8576% accuracy for probe attacks,
Total 10 features are considered using GF-LSSVM. LSSVM 98.8534% for DoS attacks, 99.8029% for U2R attacks and
is trained with the selected features to classify the traffic. It 84.7615% for R2L attacks as shown in Table V.
achieves the detection rate of 96.7% for encrypted traffic and Various intrusion detection techniques based on multiple
96.6% for unencrypted traffic with almost similar false alarm classifier algorithms which are integrated with suitable fea-
∼ 0.03% using ISCX dataset. ture selection approach have been discussed. Applying feature
Chowdhury et al. [154] provided the use of machine learn- selection, definitely improves the speed of classification and in
ing for network intrusion detection. They have applied the some cases, it is improving the detection rate also. However,
combination of simulated annealing (SA) [162] and Support time complexity is not much reduced. The overall complex-
Vector Machine (SVM) [20] to improve the detection accu- ity is still high as it consists of multiple classifier and feature
racy and reduce the false alarms. Misuse detection algorithms selection algorithms. This class of algorithms can make use of
have the power to classify the normal and abnormal classes, parallel programming techniques to reduce the training time.
given the attack behavior. In the proposed misuse detection Also, obtaining good classification results for low-frequency
algorithm, first n features are selected using the SA algorithm attacks is still a challenge.
from a set of K features. Now dataset N with n selected fea-
tures is used to train the SVM. The trained model is used to
detect the future test instances. The experiments have been VI. C LASSIFICATION OF T ECHNIQUES FOR A S PECIFIC
performed on the UNSW-NB dataset. From the dataset, 150, ATTACK D ETECTION
000 samples are selected randomly which contains 75,000 nor- In this Section, we provide the classification of various
mal and 75,000 anomaly samples. The 70% of the total dataset techniques for different attacks based on their performances.
is used for training and 30% is used for testing. The normal It helps the readers in choosing a particular technique for
714 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
TABLE VIII
C LASSIFICATION OF T ECHNIQUES FOR D O S ATTACKS
[136]
[121] [144] [124]
[132] [134]
[122] [61] [128]
[147]
[121] [143] [129]
[120] [123]
[133] [126] [26]
[131] [142]
[138]
[130]
[139]
[135]
[141]
detecting the specific attack. The detailed description of these achieving 90% in this category. Decision Tree [121] with 41
techniques is presented earlier in Section V, and the features features is providing detection rate of just 77.92% but when
with techniques are also shown in Table VII. Most of the applied with selected features (12 features) [61], the detection
approaches use KDD’99 as a common dataset for evaluation. rate improved to 98.868%. Similarly, SVM [120] provides very
Hence, these are considered for comparison. poor detection rate of 36.65%, but when MMIFS [133] feature
selection technique is applied which uses only 8 features, its
A. Detection of Denial of Service Attacks detection rate is improved to 86.46%. Further integrating the
Single classifier approach is much easier to implement. single classifiers provides a significant improvement in detec-
Decision Tree (DT) is performing well in this category tion. ANN [122] is achieving 71.63% detection rate but when
with detection rate of 97.24%. However, the detection rate it is integrated with Elman network [130] it provides 100%
is improved to 99.43% when applied with 12 features. detection. Even the integration with SVM and MARS [124]
CANN [144] is achieving 99.99% accuracy with 6 features. is also achieving 99.85% detection rate. Results are further
However, it will perform very well in detecting the Land improved in terms of detection rate and computational time
attack. For other DoS attacks, it needs to improve its feature set with Hybrid classifiers with feature selection. Here the com-
(the reason is explained in summary of observations in Section bination of Subspace clustering, DBSCAN and EAR [26]
VII-B). ANN when combined with SVM and MARS [124], algorithm is achieving 100% detection rate with 9 features.
improves its performance and provides 99.97% classifica- FNT technique with PSO and GA [128] also achieves 98.39%
tion rate. The performance of ANN [140] is also improved detection rate with 12 features. SVM integrated with DT and
when combined with Fuzzy Clustering (FC-ANN) [132] with SA [136] is performing far better with a detection rate of
99.91% detection rate. Fuzzy Logic with ANN (FC-ANN) 98.35% with 23 features in comparison to only SVM technique
is providing 99.5% detection rate. Now if we talk about (36.65%) with 41 features. The performance of techniques is
SVM [120], its detection rate is also improved when com- shown in Table IX.
bining it with Clustering approach (CT SVM) [126]. Earlier
the detection rate was just 91.6% and later it improved to
97.35%. In fact, there is also an improvement if we combine C. Detection of User to Root Attacks
SVM with ant colony networks (CSVAC) [142] and detec- We have classified techniques based on their detection rate
tion rate improves to 94.84%. All the above mentioned hybrid for U2R attacks. We have discussed those techniques which
techniques are considering 41 features which will affect the are performing well.
computation time. SVM in combination with DT and SA [136] In case of U2R attacks, the performance of the single classi-
is proving the highest detection rate of 100% with 23 features fier is very poor. However, the integration with other classifiers
and when combined with Hierarchical clustering it achieves is improving their performance. FNT approach with PSO and
99.5% detection with 19 features. The techniques are mainly GA [128] performs very well and achieves the highest detec-
using KDD’99 data set. The classification of various other tion rate of 99.7% with 12 features (refer Table VII for
techniques with detection rate for DoS attack detection is features in Section V). Intrusion Detection technique which
shown in Table VIII. uses Multiple Neural Network classifiers [123] is providing
99.7% detection rate for a specific type of U2R attack, i.e.,
B. Detection of Scanning (Probe) Attacks guess password. There is a significant improvement of Neural
Single classifiers with all 41 features are not performing Network [140] performance as its detection rate is improved
well for Probe attacks detection. None of the classifiers is to 93.18% when it is combined with the fuzzy clustering
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 715
TABLE IX
C LASSIFICATION OF T ECHNIQUES FOR S CANNING ATTACKS
[139]
[135]
TABLE X
C LASSIFICATION OF T ECHNIQUES FOR U2R ATTACKS
approach (FC-ANN) with no change in the set of features. classifiers is improving their performance. Here ensemble of
Earlier the detection rate of ANN was 0% for U2R attacks. SVM, ANN and MARS [124] is proving 100% detection rate
The Integration of ANN with SVM and MARS [124] is also as claimed by the author which is a significant improvement
proving an improvement with a detection rate of 76%, but it is over ANN [122] (26.68% detection rate). However, they have
not acceptable. There is very little improvement in SVM [120] not mentioned anything about what specific R2L attacks are
(12%) when it is integrated with clustering approach (CT detected. A hybrid classifier with feature selection is also
SVM, 17.23%) [126] which is also not acceptable. Hybrid performing very good. Here FNT approach with PSO and
techniques with feature selection are providing good detection GA [128] is providing detection rate of 99.09% with 12 fea-
rate for U2R attacks. The performance of other techniques is tures. FPSO is achieving the detection rate of 97.22% with 16
shown in Table X. features. Integration of SVM, DT and SA [136] is achieving
a detection rate of 90.67% with 23 features. The performance
D. Detection of Remote to User Attacks of other techniques is shown in Table XI.
We have classified techniques based on their detection rate Multiple classifiers with feature selection are perform-
for R2L attacks. The techniques are discussed which are ing better for R2L than U2R attacks detection. This is
performing well. because of (1) Most of the hybrid classifiers are filter-
In case of R2L attacks, the performance of the single ing the data before training the classifier by performing
classifier is very poor. However, the integration with other clustering on the data which group data items into groups
716 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
TABLE XI
C LASSIFICATION OF T ECHNIQUES FOR R2L ATTACKS
[128]
[122] [133] [123] [135]
based on their similarity measure. It brings the balance A. Performance Analysis of Single Classifier Algorithms
in data which is very crucial in machine learning algo- With 41 Features
rithms. (2) Multiple classifiers are more adaptable than single Among all four Classifications, single classifiers with
classifiers and hence can learn the new attack behavior all features are performing very low for the detection of
efficiently. However, still, there are various difficulties asso- various attacks. We have mainly considered five standard
ciated with detecting low-frequency attacks as discusses in machine learning classifiers: Decision Tree (C4.5) [121],
Section VIII. Neural Network [140], Naive Bayes [121], Support Vector
Existing intrusion detection approaches based on machine Machine [120] and Fuzzy Association rules [131] as shown
learning have been thoroughly analyzed with respect to in Figure 10. Decision tree [121] gives better detection rate
individual attack categories. Limitations associated with (97.24%) for DoS attacks compared to other four classifiers.
approaches for each category are discussed with viable Neural Network [140] achieves highest detection rate (90.95%)
solutions. No one particular machine learning algorithm for Probe attacks whereas Fuzzy Association rules [131] pro-
can help in detecting all types of attacks. Hence, vides far better results for U2R attacks with detection rate
the use of a specific algorithm (misuse, anomaly or 68.6% as it uses data reduction technique. However, Fuzzy
hybrid) is recommended for detecting a specific set of rules are not working well for DoS attacks Detection. It meets
attacks. only 78.9% Detection rate. SVM [120] Classifier has high-
est detection rate around 22% for U2R attacks which is not
acceptable in an environment where security is an important
VII. P ERFORMANCE A NALYSIS OF D IFFERENT M ACHINE aspect. Although KDD’99 contains a large number of DoS and
L EARNING A LGORITHMS IN I NTRUSIONS D ETECTION Probe connection records, even then single classifiers are not
We have carried out the critical performance analysis of able to detect the behavior of these attacks. The reasons for
various machine learning techniques in detecting all types the degrading performance can be explained in terms of two
of attacks. The analysis is carried out with respect to each major aspects.
category of machine learning techniques which are earlier (i) Low Detection Rate & High Computational Cost:
described in Section V. The results, reported by researchers Classification task becomes time-consuming and hectic when
have been analyzed and compared. Based on the observation, considering all features of data. It results in increasing com-
we found that most of the work has been validated using putation power, storage and error rate and it affects the
KDD’99. Hence, the performance analysis of techniques based performance of classifier badly. The classifier suffers from
on KDD’99 has been carried out. The best performing tech- the problem of “Curse of Dimensionality”. The problem
niques have been discussed for each attack category and also can be explained as when the dimensionality increases,
the limitations of each category of techniques and the solutions the volume of the space increases so fast that the avail-
applied to overcome the limitations are provided. We provide able data becomes sparse. This sparsity could be problem-
the overall conclusion at the end of each category. It provides atic for machine learning algorithms because of statistical
readers with a clear view of limitations in intrusion detection significance of training dataset. In high dimensional data,
techniques and why the integration and feature selection is all objects appear to be sparse and dissimilar in many
applied. ways; data organizing strategies may not work efficiently
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 717
MMIFS [133] is not considering feature P7 (land) which is Classifier (k-NN). This approach comes in the category of
very important for land (DoS) attacks detection. It also does single classifier with limited features in our categorization of
not take features such as P10 (hot), P11 (num failed logins), machine learning algorithms. The approach claims to achieve
P14 (root shell) and P16 (num root) into consideration which 99.99% accuracy for DoS attacks and 99.98% accuracy for
are very important for the detection of U2R and R2L attacks. Probe attacks. They have utilized only 6 features namely
(ii) Algorithmic Drawback: Even after employing feature P7(land), P9(urgent), P11(num failed logins), P18(num shells),
selection, the detection results are not good. It could be P21(is host login), P20 (num outbound cmds).
because of the algorithmic drawback of a classifier. The draw- Now as per our analysis of various research papers, out
backs of single classifiers with characteristics are mentioned of these 6 features only feature7 (Land) is the most rele-
in Section IV. vant feature for detecting Land attack. For this attack, CANN
The limitations of single classifiers with feature selection may achieve 99.99% accuracy. However, for other attacks
can be improved by considering following factors: such as such as teardrop, feature 8 (wrong fragment) is most
(i) Each feature in the Feature Set should be relevant enough relevant. To detect Back attack features (P5: src bytes, P6:
and nonsimilar to each other to learn the behavior of an dst bytes) [163] and features(P10: hot, P13: num compro-
attack. For example, P11 (num failed login), P14 (root shell) mised) [164] are most relevant features, calculated based
and P10 (hot) features are very important in the detection of on Rough Set theory and Information Gain measure respec-
U2R attacks and should be in the feature set for U2R attacks tively and hence should be included in the Feature Set.
detection. There are various methods such as Gradual Feature Olusola et al. [163] claims that feature P20 (outbound com-
Removal, Information gain, Chi-Square for feature selection. mand count for FTP session) and the feature P21(hot login)
They may not provide the accurate features for all attacks. The are least relevant features for Intrusion Detection. The claim is
researcher should not solely depend on the output of these provided based on the Dependency Ratio calculated for each
methods. feature based on a rough set theory which is 0.000 for both
(ii) Detection results and training time can be improved by features (P20 ,P21) for all categories of DoS attacks. This jus-
integrating the classifiers so that another classifier can over- tifies that CANN may work very good for Land attack but it
come the drawbacks of one classifier. For example, Clustering has to improve its feature set for detecting other attacks.
with SVM improves (CT SVM) [126] the training time and
detection rate of SVM Classifier [120] as shown in Figure 10
and Figure 11. Clustering preprocesses the data and groups C. Performance Analysis of Multiple Classifier Algorithms
them into clusters based on the characteristics SVM Classifier With All Features
is trained for one cluster. This improves the training time and We observed that multiple classifiers are performing bet-
accuracy of detection of SVM Classifier. ter than single classifiers in term of detection rates as shown
Observations based on analysis are as follows: (i) in Figure 10 and 12. If we consider the case of multiple
Employing feature selection does not necessarily mean that classifiers without feature selection, Ensemble of ANN, SVM
it will improve the detection results. The researcher should and MARS [124] is achieving the highest detection rate
also focus on appropriate method employed for the fea- of 99.97% for DoS attacks. FC-ANN [132] is achieving a
ture selection. However, it will reduce the computational good detection rate 99.91% for detection of DoS attacks and
time of a classifier with less storage overhead. For example, 93.18% for detection of U2R attacks using KDD’99 dataset.
MMIFS [133] with SVM with 8 features is providing detection Neuro-Fuzzy [147] is also achieving good detection rate for
rate of 78.69% for DoS attacks whereas SVM [120] without DoS (99.5%) using same dataset KDD’99. ANN with Elman
feature selection is achieving detection rate of 91.6% for DoS network [130] experimented over DARPA 98 achieves the
attacks. highest detection rate 100% for Probe attacks. FC-ANN [132]
(ii) If a feature set is working well for analyzing the behav- provides the highest detection rate 93.18% for U2R attacks
ior of a particular attack, it may not work well for other over KDD’99 dataset. CSVAC [142] achieves good detection
attacks. There is need to identify the behavior of each attack rate of 87% over KDD’99 dataset for R2L attacks Multiple
and accordingly to design a feature set for each attack. NN [123] achieves 99.7% detection rate for specific U2R
(iii) If a Classifier is achieving 99.99% accuracy for DoS attack, i.e., Guess password attack. However, it has not been
attacks or any other attacks. It should be the average of all considered while comparison as it does not provide categorical
types of DoS attacks such as Backdoor, Land, neptune, Pod, results. The comparisons are performed with other techniques
Smurf, teardrop etc. If a Classifier is achieving such a big accu- in this category. On an average, multiple classifiers with all
racy 99.99% for one or two types of DoS attacks, it would be features are performing good or average for high-frequency
inappropriate to say that the Classifier is achieving 99.99% attacks (DoS and Probe), but they are resulting into average or
accuracy for that category of attack. For example, in Cluster poor detection rates for low-frequency attacks (U2R and R2L).
Center and Nearest Neighbor (CANN) [144] approach one However, they suffer from problems such as slow response
dimensional distance based feature is used to represent each time, high error rate, more storage requirement etc. For exam-
data sample. The distance is the sum of two distances; first is ple Neuro Fuzzy [147] technique uses KDD’99 dataset and
distance between each data sample and its cluster center and provides 99.5% detection rate for DoS, 84.1% detection rate
second is distance between the data and its neighbor in the for Probe, 41.1% detection rate for U2R and 31.5% detection
same cluster. Data samples are classified using k-Neighbor rate for R2L attacks.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 719
the different optimal subset chosen for different attack. keyword spotting is difficult to achieve with machine learning
For example FPSO [135] is using same 16 features sub- algorithms using KDD’99 dataset.
set {P1,P5,P6,P8,P9,P10,P11,P13,P16,P17,P18,P19,P23, (iv) The number of U2R and R2L samples, present in the
P24,P32,P33}for detecting DoS, Probe, U2R and R2L. training and testing dataset of KDD’99 are very less when
However, it results in good detection results for DoS (97.22%) compared to the DoS and Probe attacks. The insufficient learn-
and R2L(97.22%) and average results for Probe(77.77%) and ing of such attacks makes the classifier less suitable to detect
U2R(69.44%). Here features P3 (service), P7 (land) and P14 such attacks. Moreover, imbalanced distribution of data, make
(root shell) are not considered which are most important the classifier to treat such attacks as normal attacks.
features for attack detection. (v) Activities performed by these attacks may be similar
(iv) Multiple classifier with selected features are performing in terms of a number of file creation, root shell login, sum of
better than other categories of classifiers. operations performed as root, etc. In such a case identifying the
An exhaustive literature study of intrusion detection tech- low-frequency attacks become more difficult. However, careful
niques with respect to each category of classification tech- examination of the system call traces for the presence of spe-
niques is carried out and critically analyzed. Various inferences cific modules or processes, the suspicious sequence of system
are drawn by comparing results reported by researchers. calls, invocation of specific commands, etc. may provide some
The observations are analyzed thoroughly. The limitations clues to identify the attack activity in the system. For exam-
associated with each category of the technique is discussed ple, Ffbconfig attack exploits a buffer overflow (U2R). It
and viable solutions to overcome the limitations are pro- configures the Creator Fast Frame Buffer (FFB) Graphics
vided. At the end, conclusions drawn with respect to each Accelerator which is a part of FFB Configuration software
classification are mentioned to provide scope for further package, SUNWffbcf. The attack can be detected by exam-
improvement. ining the system call traces of the system for the presence
of ‘/usr/sbin/ffbconfig/’ command with an oversized argument
for ‘-dev’ parameter [53].
VIII. I SSUES IN D ETECTING L OW-F REQUENCY ATTACKS (vi) A very high accuracy (around 90-99%) is achieved in
Machine learning algorithms work on the statistics of the some approaches in detecting those attacks but we can’t say
data obtained from attack data set. DoS and Probe attacks can that these techniques will achieve the same accuracy for detect-
be detected easily by careful examination of the statistics of ing the unseen attacks or newly generated U2R or R2L attacks.
the connections at the vulnerable host machine whereas it is Since the techniques are validated over the test database of
hard to detect the low-frequency attacks such as U2R and R2L KDD’99 which contains the features values of the attacks that
even by careful examination of the statistics of the connec- may be sufficient to separate them from DoS and Probe. For
tions using KDD’99 dataset. This is because of the following example, a dictionary attack is an R2L attack in which attacker
reasons: makes repeated guesses of username and passwords to gain
(i) The connection statistics of low-frequency attacks are access to some machine remotely. The attack can be detected
very similar to the normal connection. by examining two features: session protocol of every service
(ii) There exist similarity in the behavior of U2R and R2L (P2) and num of failed login attempts (P11) over a period.
connection records. Hence, it is also difficult to differenti- But, if the feature values are not providing sufficient infor-
ate U2R and R2L attacks itself from each other. In fact, a mation which may happen if victim password is not strong
U2R attack is one of the variations of R2L attack. In R2L enough and attacker accesses the victim machine in one or
attack, a user does not have the local access to the machine. To two guesses such as by entering his phone no or school name
access the root privileges, he has to first access a normal user’s etc. The feature values for this attack will be similar to a nor-
account by using various account hijacking exploits. Then mal connection. In this case, machine learning algorithm will
after login as a normal user, he can launch further exploits not work efficiently to detect these attacks.
to gain root privileges whereas, in a U2R attack, the attacker It is difficult to detect low-frequency attacks just by exam-
has unprivileged local access to the victim machine. ination of network features. The issues in detecting low-
(iii) Low-frequency attacks can be launched in a single con- frequency attacks have been identified and discussed. The
nection. Information provided by the KDD’99 dataset about possible viable solutions to detect such attacks such as buffer
the connection is not sufficient. Although some of the Content overflow, password cracking, dictionary attack, virus, etc. have
features are present in the KDD’99 dataset (refer Table II been discussed. One can refer our recent work [76], [165]
in Section III) such as num failed logins (P11), root shell for detecting these attacks using system call analysis. In our
(P14), num compromised (P13), root shell(14), etc. but they another recent work [77], we have considered both system
are not enough for attack identification. For example, loadmod- call and network features for detection of low-frequency
ule attack (U2R) loads two dynamically loadable kernel drivers attacks.
of the currently running system and creates special devices in
the /dev directory to use those modules. Because of a bug in
the way loadmodule sanitizes environment, an unauthorized IX. DATA M INING T OOLS FOR M ACHINE L EARNING
user can gain root access on the local machine. The attack can There exist many tools that support the implementation of
be detected by the keyword spotting in the user’s session to various machine learning methodologies. Some of them are
find strings ’set $IFS=‘V’ and ‘loadmodule’ [53]. This kind of described below.
722 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
A. Weka E. RapidMiner
Waikato Environment for Knowledge Analysis RapidMiner [170] is a software developed by RapidMiner
(Weka) [166] is a machine learning software tool, developed company for data mining applications. It provides an inte-
by University of Waikato, New Zealand in 1993. Although grated environment for data pre-processing, machine learning,
many changes have been incorporated till now. It is an open deep learning, text mining etc. It provides both commercial
source tool which is freely available under GNU General and free edition. The free edition is named as RapidMiner
Public License and written in JAVA. Weka supports various Studio which is limited to 1 logical processor and 10,000 rows
data analysis tasks such as data pre-processing, feature selec- and can be obtained under AGPL license. It uses client/server
tion, classification, clustering, regression and visualization. model where a server is basically hosted on the cloud plat-
The tool takes input as a set of records in the flat file (.ARFF form. It provides a template based framework and removes the
files) where a set of attributes describes each record. It is need for coding. It supports text mining, image mining, video
easy to use due to Graphic User Interfaces (GUIs) provided mining, and social network analysis. The import formats sup-
by the tool. ported by it are SQL, TXT, XML, XLS, etc. It performs data
extraction, transformation, analysis and visualization.
B. Scikit-Learn
F. Environment for Developing KDD-Applications Supported
Scikit-learn is an open source machine learning the library, by Index-Structure (ELKI)
developed as Google summer of code project by David
Cournapeau in 2007 [167]. It is written in python and incor- ELKI [171] is an open source software for data mining
porates python numerical and scientific libraries like NumPy applications, developed at the Ludwig Maximilian University
and SciPy and other libraries like Panda, matplotlib, etc. It of Munich, Germany. It emphasizes on unsupervised machine
provides various efficient tools for machine learning like clas- learning methods such as K-mean clustering, K-medians
sification, clustering, regression algorithms. It also supports clustering, DBSCAN, OPTICS, Expectation-maximization,
methods for feature extraction and provides learning tutorials Hierarchical clustering Canopy clustering, etc. It also provides
to understand the concepts. data index structures like R-ree, R*-tree, M-tree and K-d tree.
The advanced mining algorithms and their interaction with
database index structure is evaluated using many parameters
C. TensorFlow like ROC, histogram, Scatterplot, etc.
TensorFlow [168] is an open source software library for
machine learning applications, developed by Google Brain G. Massive Online Analysis (MOA)
Team. It was released under Apache 2.0 open source license MOA [172] is a popular open source data stream mining
on Nov 2015. Version 1.0.0 is recently released in Feb 2017. tool to perform big data streaming in real time. It consists of
TensorFlow is a very useful tool for deep learning as it pro- various machine learning algorithms for classification, clus-
vides support for building and training neural networks. Data tering, regression and outlier detection, etc. It is written in
flow graphs are used to create machine learning models and Java and can be extended for newer algorithms, streams and
perform computations. Data arrays are edges between nodes of evaluation methods. It provides storable settings for real and
graphs, called as tensors. TensorFlow supports multiple APIs synthetic data stream for conducting repeatable experiments.
such as python, C++, Go, Java, Haskell and Rust APIs. Third We discussed some of the important data mining tools
party packages are available for Julia, Scala and R. It can used for data analysis using machine learning algorithms.
run multiple CPUs and GPUs and supports different 64 bits Some of them also support deep learning algorithms. Most
OS such as Linux, Windows, MacOS, Android and iOS, etc. of these tools provide easy to use GUI interface that can be
TensorFlow Lite is a recent release for Android. easily used by researchers in their research domain. Some
other machine learning libraries are Apache SAMOA [173]
D. KMINE and MLlib (Spark) [174], etc. those can also be used by
researchers.
Konstanz Information Miner (KNIME [169] is an open
source tool for data analytics, developed at University of
Konstanz, released under a dual licensing scheme. It uses X. F UTURE D IRECTIONS
the data pipelining concepts to integrate the components of Deep learning is an advancement of the neural network.
machine learning and data mining. It provides GUI to per- Deep learning uses the subsequent layers of information-
form various tasks such as data loading, transformation, feature processing in some hierarchy for classification or feature
extraction, modeling and visualization. It is written in Java representation. It makes use of the deep networks having
but provides a wrapper to run other codes like python, perl. multiple layers of processing. It consists of input tier provid-
It provides the processing of large data volumes. For exam- ing the basic data and followed by consecutive hidden layers
ple, it can analyze 300 million customer addresses, 10 million which analyze data and output is produced. It has gained pop-
molecular structure and 20 million cell images. It integrates ularity in recent years. The existing IDS can be improved by
open source projects such as ML algorithms from Weka, R embracing this latest technique. Deng and Yu [175] provided
packages, LibSVM, JFreeChart, ImageJ, etc. using plugins. the categorization of deep learning methods based on their
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 723
TABLE XII
R ECENT W ORKS BASED ON A PPLICATION OF D EEP L EARNING FOR class. A reliable intrusion detection system should be able to
I NTRUSION D ETECTION handle the noisy inputs and large discrete or continuous data.
Reinforcement learning (RL) is another interesting area of
[177] research. Reinforcement learning (RL) is one of the machine
[178] learning algorithms where multiple agents and machine
[179]
[180]
work/interact together to learn the behavior within a particu-
[181] lar context and improve the performance for attack detection.
[182] Sensors or agents sense the environment in discrete time
[183]
[184] intervals and the input is mapped to locate the state infor-
[185] mation. Once RL agents execute the action and feedback
[186]
[187] is observed from the environment. Correct actions of agents
[188] are rewarded by the environment, called reinforcement signal.
Agents then leverage the rewards and improve the knowledge
about the environment to select the next action. Some of the
researchers have applied RL to detect the distributed denial of
architecture into following types: generative (unsupervised), service attacks [190].
discriminative (supervised) and hybrid. Unsupervised deep Servin and Kudenko [191] proposed a hierarchical dis-
learning or generative architectures make use of following tributed architecture for intrusion detection in which multiple
methods: Auto Encoder (AE) and Boltzman Machine (BM). network-agents learns to capture the local state information.
Similar to ANN, AE makes use of hidden layers; however, it All the sensors communicate to the agent up in the hierar-
has only three hidden layers. The nodes in the input layer and chy. The topmost agent decides when to fire and generate an
output layer are same. The hidden nodes are used to reduce the alarm. Each agent uses the slightly modified version of the Q-
feature dimensionality and provide the new feature set [176]. learning and simple exploration/exploitation strategy to learn
A different set of features are learned in cascade depths to train the actions and execute in a particular state. Random selection
the more precisely. BM takes the stochastic decision using the is carried out to choose between normal and abnormal states
neuron’s structure of binary units. Deep BM (DBM) has a and global state of the network is simulated. The approach
cascaded structure whereas Restricted BM has no connections achieves an accuracy of 98.9% with 1.1% error rate with two
among the hidden units. The multiple layers which are stacked sensor agents in a self-generated dataset.
one by one form a deep belief networks (DBN). Supervised In order to improve the efficiency and improve the computa-
learning is used to distinguish some parts of data and has been tional power, Xu and Xie [192] applies the RL for host based
used for pattern classifications. Convolution Neural Network intrusion detection. They have applied Markov reward process
(CNN) is an example of supervised learning which provides model for modeling the behavior of the host. RL prediction
fast learning. CNN uses three fields: local receptive fields, method makes use of the temporal difference learning algo-
shared weights and pooling. Hybrid approach makes use of rithm [193] for learning the behavior of the processes. It helps
both the methods. An example of hybrid architecture is Deep in detecting the abnormal process behavior of the applications
Neural Networks (DNN). DNN provides a fully connected running on the host. The use of RL for predicting the behavior
hidden layer forming cascaded multilayer networks. of normal system call sequences has helped in improving the
The use of deep learning for image classification is quite accuracy. They have obtained 100% detection rate and 20 sec
popular. However, the challenge lies in adopting the deep of training time using MIT lpr dataset [194]. The computa-
learning for attack detection in network traffic. In recent years, tional cost is very low in comparison to traditional ML algos
it has been applied deep learning for intrusion detection as such as HMM, RIPPER etc. The summary of some of the
shown in Table XII. Seok and Kim [188] have employed crucial research challenges associated with deep learning and
deep learning for attack detection which is based on the con- reinforcement learning approaches are discussed as follows:
version of malware code into the image and applying CNN • One of the primary challenge using deep algorithms is
which takes these images as input to learn attack features. to generate/obtain a lot of data for training & classi-
Also, most of the work for deep learning based IDS uses this fication algorithm. For example, in order to make an
approach for reducing the dimensionality of features. It has IDS learn the various attack scenarios, researchers sup-
many advantages. Combining the supervised and unsupervised ply terabytes of data to the deep learning classifier to
approaches of deep learning improve the detection results of train itself. Availability of sufficient data specific to the
traditional approaches [189]. It helps in developing new meth- problem domain is one of the crucial challenges.
ods on network security which are more certain than traditional • It will be challenging to adopt deep learning algorithm for
machine learning approaches. Deep learning is adaptable to the real-time classification because of the level of complexity
changing context of data as it performs the exhaustive data involved in training huge amount of data. Most of the
analysis. However, the use of deep learning for attack analy- existing work apply deep learning for feature extraction
sis is still challenging and open area for researchers to work and dimensionality reduction only [176].
on. The resources required for training the network are also • Most of the existing deep learning methods are suitable
quite huge. Deep learning is suitable to be applied when it is for pattern and image recognition. However, how to apply
difficult to find the correlation between raw input and target deep learning to classify the network traffic and/or system
724 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
logs properly, is still a challenge. Some of the deep The critical performance analysis of various machine learning
learning algorithm’s like Convolution Neural Network algorithms has been done in an evolutionary way. The compar-
(CNN) and Deep Belief Network (DBN) has proven to ison has been carried out with single classifier approaches and
be good classifiers. However, the experimental work is multiple classifier approaches. The influence of a classifier with
in progress to determine the efficiency and reliability of other classifier is not only analyzed but also the influence of a
these learning algorithms to detect attacks [195]. feature subset with the classifier is analyzed. We have shown
• Another challenge in the deep learning algorithms is that even if an optimal feature set is sufficient for analyzing the
the requirement of high-performance hardware to pro- behavior of an attack, it is not good for analyzing the behavior
cess the huge training data. Machines need to have of other attacks. Hence, there is a need to define the optimal
sufficient processing power to solve the real world prob- feature subset and a suitable technique for each type of attack
lems. To reduce the training time and improve efficiency, as the behavior of an attack varies from each other.
researchers will require multi-core high performing GPUs The difficulties associated with detecting the low-frequency
which are very costly and consume more power. For attacks using machine learning techniques over network
example, Monte Carlo Tree search integrated with deep dataset have been described. It motivates researchers to work
neural network requires 48 CPUs, 8 GPUs for conducting on other solutions to detect the Low-frequency attacks. Future
40 multi-threaded search [183]. research directions are provided to help researchers exploring
• The growth in computer memory and computational more efficient solutions for attack detection.
power is possible through parallel and distributed com- Existing literature is described which are based on similar
puting. Researchers can work in this direction to cope techniques with most of the popular datasets as on date to
with the issues related to communication and computa- generalize our observations. All the techniques have not been
tion management to scale the deep learning algorithms implemented to evaluate the performance to ensure that results
for huge datasets. are reproducible. This remains a limitation of our paper and we
• Reinforcement learning (RL) is one of the growing field are very keen to improve this as a future work. In future, we
and research in this direction towards attacks detection would also like to propose an attack detection model especially
is still going on. The slow study speed problem of RL for improving the performance of low-frequency attacks by
affects the feasibility of multi-agent study in the real exploring deep learning approaches. Later, various issues will
world. How to speed up the performance of multi-agent be focused with IDS techniques when these are applied to
classifier is another important research challenge [196]. dynamic and changing network environment such as Cloud
• In the Multi-agent RL system, there has to be proper Computing etc.
coordination among each of the RL agent members.
Therefore, designing a fast and effective way for R EFERENCES
communication is another important research concern. [1] BBC News. (2008). Estonia Fines Man for ‘Cyber War’. [Online].
Researchers are still working on How to apply RL for Available: [Link]
filtering the network traffic more accurately. [2] L. Dignan. (2008). Amazon Exploits Its S3 Outage. [Online]. Available:
[Link]
Deep Learning and Reinforcement Learning are the future [3] M. Dekker, D. Liveri, and M. Lakka, “Cloud security incident report-
research directions in the field of intrusion detection for ing: Framework for reporting about major cloud security incidents,”
researchers. Deep Reinforcement Learning [197] is the ENSIA, St. Paul, MN, USA, Rep. TP-04-13-105-EN-N, 2013.
[4] DDoS. (2014). Ello Social Network Hit by Suspected Bloody
next step in this direction to make the learning effec- DDoS Attack. [Online]. Available: [Link]
tive for huge collection of data. It has been applied for social-network-hit-by-suspected-bloody-dDoS-attack/
pattern classification and resource control purpose in past [5] S. Panjwani, S. Tan, K. M. Jarrin, and M. Cukier, “An experimental
evaluation to determine if port scans are precursors to an attack,” in
years. Researchers can apply it for intrusion detection Proc. IEEE Int. Conf. Depend. Syst. Netw. DSN, 2005, pp. 602–611.
applications. [6] CISCO. (2014). Cisco Anual Report. [Online]. Available:
[Link]/web/offer/gist-ty2-asset/[Link]
[7] “Cisco annual cyber security report,” CISCO, San Jose, CA, USA,
Rep., 2017.
XI. C ONCLUSION [8] SNORT. (2017). Snort [Link]. [Online]. Available:
The increasing rate of intrusions in the network and host [Link]
[9] OISF. (2018). Suricata 4.0.4. [Online]. Available: [Link]
machines have badly affected the security and privacy of users. [Link]/about/
Researchers have extensively worked on various solutions to [10] T. F. Lunt, “Ides: An intelligent system for detecting intruders,” in Proc.
detect intrusions. The security aspects of intrusion detection Symp. Comput. Security Threat Countermeasures, 1990, pp. 30–45.
using machine learning approach have been considered in [11] L. Ertöz et al., “MINDS-minnesota intrusion detection system,” in Next
Generation Data Mining. Cambridge, MA, USA: MIT Press, 2004,
our paper. We have described various types of attacks in the pp. 199–218.
network and host systems with the brief description of their [12] KDD. (1999). KDD Cup 1999 Data. [Online]. Available:
attack features. The analysis performed, reveals that if a tech- [Link]
[13] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set
nique is performing well for detecting an attack, it may not for network intrusion detection systems (UNSW-NB15 network data
perform same for detecting other attacks. Hence, the relevance set),” in Proc. Mil. Commun. Inf. Syst. Conf. (MilCIS), Canberra, ACT,
of a technique for specific attacks has been presented by clas- Australia, 2015, pp. 1–6.
[14] E. Vasilomanolakis, S. Karuppayah, M. Mühlhäuser, and M. Fischer,
sifying various machine learning techniques for each type of “Taxonomy and survey of collaborative intrusion detection,” ACM
attack. Comput. Surveys, vol. 47, no. 4, p. 55, 2015.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 725
[15] A. Torkaman, G. Javadzadeh, and M. Bahrololum, “A hybrid intelligent [41] A. L. Buczak and E. Guven, “A survey of data mining and
HIDS model using two-layer genetic algorithm and neural network,” machine learning methods for cyber security intrusion detection,” IEEE
in Proc. IEEE 5th Conf. Inf. Knowl. Technol. (IKT), 2013, pp. 92–96. Commun. Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart.,
[16] R. Puzis, M. D. Klippel, Y. Elovici, and S. Dolev, “Optimization of 2015.
NIDS placement for protection of intercommunicating critical infras- [42] D. Csubak and A. Kiss, “OpenStack firewall as a service rule analyser,”
tructures,” in Proc. IEEE Int. Conf. Intell. Security Informat., Taipei, in Proc. Int. Conf. Human Aspects Inf. Security Privacy Trust, 2016,
Taiwan, 2008, pp. 191–203. pp. 212–220.
[17] A.-S. K. Pathan, The State of the Art in Intrusion Prevention and [43] P. S. Kenkre, A. Pai, and L. Colaco, “Real time intrusion detection
Detection. Boca Raton, FL, USA: CRC Press, 2014. and prevention system,” in Proc.3rd Int. Conf. Front. Intell. Comput.
[18] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” Theory Appl. (FICTA), 2015, pp. 405–411.
in Proc. IEEE Int. Joint Conf. Neural Netw., Washington, DC, USA, [44] P. Deshpande, A. Aggarwal, S. Sharma, P. S. Kumar, and A. Abraham,
1989, pp. 593–605. “Distributed port-scan attack in cloud environment,” in Proc. 5th Int.
[19] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, Conf. Comput. Aspects Soc. Netw. (CASoN), Fargo, ND, USA, 2013,
CA, USA: Morgan Kaufmann, 1993. pp. 27–31.
[20] C. Cortes and V. Vapnik, “Support vector machine,” Mach. Learn., [45] M. J. Schoelles and W. D. Gray, “Argus: A suite of tools for research
vol. 20, no. 3, pp. 273–297, 1995. in complex cognition,” Behav. Res. Methods Instrum. Comput., vol. 33,
[21] S. Kumar, Survey of Current Network Intrusion Detection Techniques, no. 2, pp. 130–140, 2001.
Washington Univ., St. Louis, MO, USA, pp. 1–18, 2007. [46] A. Crenshaw. (2008). OSfuscate: Change Your Windows OS TCP/IP
[22] M. Ahmed, A. N. Mahmood, and J. Hu, “A survey of network anomaly Fingerprint to Confuse P0f, NetworkMiner, Ettercap, Nmap and Other
detection techniques,” J. Netw. Comput. Appl., vol. 60, pp. 19–31, OS Detection Tools. [Online]. Available: [Link]
Jan. 2016. security/osfuscate-change-your-windows-os-tcp-ip-fingerprint-to-confu
[23] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and [Link]
E. Vázquez, “Anomaly-based network intrusion detection: Techniques, [47] D. Norton, An Ettercap Primer. Singapore: SANS Inst. InfoSec
systems and challenges,” Comput. Security, vol. 28, nos. 1–2, Reading Room, 2004.
pp. 18–28, 2009. [48] N. Hoque, M. H. Bhuyan, R. C. Baishya, D. Bhattacharyya, and
[24] P. Kumar et al., “A novel approach for security in cloud computing J. K. Kalita, “Network attacks: Taxonomy, tools and systems,” J. Netw.
using hidden Markov model and clustering,” in Proc. IEEE World Comput. Appl., vol. 40, pp. 307–324, Apr. 2014.
Congr. Inf. Commun. Technol. (WICT), 2011, pp. 810–815. [49] G. Mantas, N. Stakhanova, H. Gonzalez, H. H. Jazi, and
[25] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, A. A. Ghorbani, “Application-layer denial of service attacks:
pp. 1464–1480, Sep. 1990. Taxonomy and survey,” Int. J. Inf. Comput. Security, vol. 7, nos. 2–4,
[26] P. Casas, J. Mazel, and P. Owezarski, “Unsupervised network intru- pp. 216–239, 2015.
sion detection systems: Detecting the unknown without knowledge,” [50] F. Iglesias and T. Zseby, “Analysis of network traffic features for
Comput. Commun., vol. 35, no. 7, pp. 772–783, 2012. anomaly detection,” Mach. Learn., vol. 101, nos. 1–3, pp. 59–84, 2014.
[27] J. Yang, T. Deng, and R. Sui, “An adaptive weighted one-class SVM [51] E. Guillén, J. Rodríguez, R. Paez, and A. Rodriguez, “Detection of
for robust outlier detection,” in Proc. Chin. Intell. Syst. Conf., 2016, non-content based attacks using GA with extended KDD features,” in
pp. 475–484. Proc. World Congr. Eng. Comput. Sci., 2012, pp. 30–35.
[28] G. Kim, S. Lee, and S. Kim, “A novel hybrid intrusion detection method [52] D. Kumar, “DDoS attacks and their types,” in Network Security Attacks
integrating anomaly detection with misuse detection,” Exp. Syst. Appl., and Countermeasures. Hershey, PA, USA: Inf. Sci. Ref., 2016, p. 197.
vol. 41, no. 4, pp. 1690–1700, 2014.
[53] MIT. (1999). Darpa Intrusion Detection Attacks Database. [Online].
[29] C. Kolias, G. Kambourakis, A. Stavrou, and S. Gritzalis, “Intrusion
Available: [Link]
detection in 802.11 networks: Empirical evaluation of threats and a pub-
lic dataset,” IEEE Commun. Surveys Tuts., vol. 18, no. 1, pp. 184–208, [54] M. Malekzadeh, M. Ashrostaghi, and M. S. Abadi, “Amplification-
1st Quart., 2015. based attack models for discontinuance of conventional network
transmissions,” Int. J. Inf. Eng. Electron. Bus., vol. 7, no. 6, p. 15,
[30] Y. Zhang, W. Lee, and Y.-A. Huang, “Intrusion detection techniques for
2015.
mobile wireless networks,” Wireless Netw., vol. 9, no. 5, pp. 545–556,
2003. [55] S. Maiti, C. Garai, and R. Dasgupta, “A detection mechanism of DoS
[31] C. Kolias, V. Kolias, and G. Kambourakis, “TermID: A distributed attack using adaptive NSA algorithm in cloud environment,” in Proc.
swarm intelligence-based approach for wireless intrusion detection,” IEEE Int. Conf. Comput. Commun. Security (ICCCS), 2015, pp. 1–7.
Int. J. Inf. Security, vol. 16, no. 4, pp. 401–416, 2016. [56] T. Bass, A. Freyre, D. Gruber, and G. Watt, “E-mail bombs and coun-
[32] M. Halilovic and A. Subasi, “Intrusion detection on smartphones,” termeasures: Cyber attacks on availability and brand integrity,” IEEE
arXiv e-print 1211.6610, pp. 1–8, Nov. 2012. Netw., vol. 12, no. 2, pp. 10–17, Mar./Apr. 1998.
[33] A. Karim, R. Salleh, and M. K. Khan, “SMARTbot: A behavioral anal- [57] T. Halagan, T. Kováčik, P. Trúchly, and A. Binder, “Syn flood attack
ysis framework augmented with machine learning to identify mobile detection and type distinguishing mechanism based on counting bloom
botnet applications,” PLoS ONE, vol. 11, no. 3, pp. 1–35, 2016. filter,” in Proc. Inf. Commun. Technol. EurAsia Conf., 2015, pp. 30–39.
[34] A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, and Y. Weiss, [58] E. Bou-Harb, M. Debbabi, and C. Assi, “Cyber scanning: A com-
“‘Andromaly’: A behavioral malware detection framework for prehensive survey,” IEEE Commun. Surveys Tuts., vol. 16, no. 3,
android devices,” J. Intell. Inf. Syst., vol. 38, no. 1, pp. 161–190, pp. 1496–1519, 3rd Quart., 2014.
2012. [59] A. K. Kaushik, E. S. Pilli, and R. C. Joshi, “Network forensic system
[35] A. K. Sikder, H. Aksu, and A. S. Uluagac, “6thSense: A context-aware for port scanning attack,” in Proc. IEEE 2nd Int. Adv. Comput. Conf.
sensor-based attack detector for smart devices,” in Proc. 26th USENIX (IACC), 2010, pp. 310–315.
Security Symp., 2017, pp. 397–414. [60] L. Aniello, G. Lodi, and R. Baldoni, “Inter-domain stealthy port
[36] P. Faruki et al., “Android security: A survey of issues, malware pen- scan detection through complex event processing,” in Proc. 13th Eur.
etration, and defenses,” IEEE Commun. Surveys Tuts., vol. 17, no. 2, Workshop Depend. Comput., 2011, pp. 67–72.
pp. 998–1022, 2nd Quart., 2015. [61] P. Sangkatsanee, N. Wattanapongsakorn, and C. Charnsripinyo,
[37] P. Mishra, E. S. Pilli, V. Varadharajan, and U. Tupakula, “Intrusion “Practical real-time intrusion detection using machine learning
detection techniques in cloud environment: A survey,” J. Netw. Comput. approaches,” Comput. Commun., vol. 34, no. 18, pp. 2227–2235, 2011.
Appl., vol. 77, pp. 18–47, Jan. 2017. [62] D. Mankins, R. Krishnan, C. Boyd, J. Zao, and M. Frentz, “Mitigating
[38] S. Anwar et al., “Cross-VM cache-based side channel attacks and distributed denial of service attacks with dynamic resource pricing,”
proposed prevention mechanisms: A survey,” J. Netw. Comput. Appl., in Proc. 17th Annu. Comput. Security Appl. Conf. (ACSAC), 2001,
vol. 93, pp. 259–279, Sep. 2017. pp. 411–421.
[39] S. Agrawal and J. Agrawal, “Survey on anomaly detection using [63] G. Helmer et al., “A software fault tree approach to requirements anal-
data mining techniques,” Procedia Comput. Sci., vol. 60, pp. 708–713, ysis of an intrusion detection system,” Requirements Eng., vol. 7, no. 4,
Dec. 2015. pp. 207–220, 2002.
[40] N. F. Haq et al., “Application of machine learning approaches in intru- [64] A. Sridharan, T. Ye, and S. Bhattacharyya, “Connectionless port scan
sion detection system: A survey,” Int. J. Adv. Res. Artif. Intell., vol. 4, detection on the backbone,” in Proc. 25th IEEE Int. Perform. Comput.
no. 3, pp. 9–18, 2015. Commun. Conf. (IPCCC), Phoenix, AZ, USA, 2006, p. 10.
726 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
[65] E. Raftopoulos, E. Glatz, X. Dimitropoulos, and A. Dainotti, “How [91] A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial
dangerous is Internet scanning?” in Proc. Int. Workshop Traffic Monitor. naive Bayes for text categorization revisited,” in Proc. Aust. Conf. Artif.
Anal., 2015, pp. 158–172. Intell., Cairns, QLD, Australia, 2004, pp. 488–499.
[66] M. Rostamipour and B. Sadeghiyan, “Network attack origin forensics [92] L. Jiang, H. Zhang, and Z. Cai, “A novel Bayes model: Hidden naive
with fuzzy logic,” in Proc. IEEE 5th Int. Conf. Comput. Knowl. Eng. Bayes,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 10, pp. 1361–1371,
(ICCKE), 2015, pp. 67–72. Oct. 2009.
[67] S. Bahl and S. K. Sharma, “A minimal subset of features using cor- [93] W.-H. Chen, S.-H. Hsu, and H.-P. Shen, “Application of SVM and
relation feature selection model for intrusion detection system,” in ANN for intrusion detection,” Comput. Oper. Res., vol. 32, no. 10,
Proc. 2nd Int. Conf. Comput. Commun. Technol., 2016, pp. 337–346. pp. 2617–2634, 2005.
[68] C. Edge, W. Barker, B. Hunter, and G. Sullivan, “Malware security: [94] S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning:
Combating viruses, worms, and root kits,” in Enterprise Mac Security, A review of classification and combining techniques,” Artif. Intell. Rev.,
Berkeley, CA, USA: Apress, 2016, pp. 221–242. vol. 26, no. 3, pp. 159–190, 2006.
[69] D. G. Johnson and T. M. Powers, “Computer systems and responsibil- [95] I. Ahmad, A. Abdullah, A. Alghamdi, and M. Hussain, “Optimized
ity: A normative look at technological complexity,” Ethics Inf. Technol., intrusion detection mechanism using soft computing techniques,”
vol. 7, no. 2, pp. 99–107, 2005. Telecommun. Syst., vol. 52, no. 4, pp. 2187–2195, 2013.
[70] A. A. Ghorbani, W. Lu, and M. Tavallaee, “Network attacks,” in [96] H.-Y. Huang and C.-J. Lin, “Linear and kernel classification: When to
Network Intrusion Detection and Prevention. Cham, Switzerland: use which?” in Proc. SIAM Int. Conf. Data Min., 2016, pp. 216–224.
Springer Int., 2010, pp. 1–25. [97] B. Schölkopf, R. C. Williamson, A. J. Smola, J. Shawe-Taylor, and
[71] P. K. Manadhata and J. M. Wing, “An attack surface metric,” IEEE J. C. Platt, “Support vector method for novelty detection,” in Proc.
Trans. Softw. Eng., vol. 37, no. 3, pp. 371–386, May/Jun. 2011. Adv. Neural Inf. Process. Syst., Denver, CO, USA, 2000, pp. 582–588.
[72] S. Singh and S. Silakari, “A survey of cyber attack detection systems,” [98] S. Owais, V. Snasel, P. Kromer, and A. Abraham, “Survey: Using
Int. J. Comput. Sci. Netw. Security, vol. 9, no. 5, pp. 1–10, 2009. genetic algorithm approach in intrusion detection systems techniques,”
[73] M. K. Sabhnani and G. Serpen, “KDD feature set complaint heuris- in Proc. 7th IEEE Comput. Inf. Syst. Ind. Manag. Appl. (CISIM),
tic rules for R2L attack detection,” in Proc. Security Manag., 2003, Ostrava, Czech Republic, 2008, pp. 300–307.
pp. 310–316. [99] S. Selvakani and R. S. Rajesh, “Genetic algorithm for framing rules for
[74] K. S. Wutyi and M. M. S. Thwin, “Heuristic rules for attack detec- intrusion detection,” Int. J. Comput. Sci. Netw. Security, vol. 7, no. 11,
tion charged by NSL KDD dataset,” in Genetic and Evolutionary pp. 285–290, 2007.
Computing. Cham, Switzerland: Springer Int., 2016, pp. 137–153. [100] P. Gupta and S. K. Shinde, “Genetic algorithm technique used to detect
[75] P. Mishra, E. S. Pilli, and R. C. Joshi, “Forensic analysis of e-mail date intrusion detection,” in Proc. 1st Int. Conf. Adv. Comput. Inf. Technol.,
and time spoofing,” in Proc. IEEE 3rd Int. Conf. Comput. Commun. Chennai, India, 2011, pp. 122–131.
Technol. (ICCCT), 2012, pp. 309–314. [101] O. Depren, M. Topallar, E. Anarim, and M. K. Ciliz, “An intelligent
[76] P. Mishra, E. S. Pilli, V. Varadharajan, and U. Tupakula, “VAED: intrusion detection system (IDS) for anomaly and misuse detection in
VMI-assisted evasion detection approach for infrastructure as a service computer networks,” Expert Syst. Appl., vol. 29, no. 4, pp. 713–722,
cloud,” Concurrency Comput. Pract. Exp., vol. 29, no. 12, pp. 1–21, 2005.
2017. [102] A. Ahmad and L. Dey, “A k -mean clustering algorithm for mixed
[77] P. Mishra, E. S. Pilli, V. Varadharajan, and U. Tupakula, “PSI-NetVisor: numeric and categorical data,” Data Knowl. Eng., vol. 63, no. 2,
Program semantic aware intrusion detection at network and hypervisor pp. 503–527, 2007.
layer in cloud,” J. Intell. Fuzzy Syst., vol. 32, no. 4, pp. 2909–2921, [103] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
2017. survey,” ACM Comput. Surveys, vol. 41, no. 3, p. 15, 2009.
[78] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward [104] A. A. Aburomman and M. B. I. Reaz, “A novel SVM-KNN-PSO
developing a systematic approach to generate benchmark datasets for ensemble method for intrusion detection system,” Appl. Soft Comput.,
intrusion detection,” Comput. Security, vol. 31, no. 3, pp. 357–374, vol. 38, pp. 360–372, Jan. 2016.
2012. [105] H. H. Hosmer, “Security is fuzzy! Applying the fuzzy logic paradigm
[79] V. H. Garcia, R. Monroy, and M. Quintana, “Web attack detection to the multipolicy paradigm,” in Proc. ACM Workshop New Security
using ID3,” in Professional Practice in Artificial Intelligence. Cham, Paradigms, Little Compton, RI, USA, 1993, pp. 175–184.
Switzerland: Springer Int., 2006, pp. 323–332. [106] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353,
[80] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, 1965.
CA, USA: Morgan Kaufmann, 1993. [107] C. Tang, Y. Xiang, Y. Wang, J. Qian, and B. Qiang, “Detection and clas-
[81] A. L. Prodromidis and S. J. Stolfo, “Cost complexity-based pruning sification of anomaly intrusion using hierarchy clustering and SVM,”
of ensemble classifiers,” Knowl. Inf. Syst., vol. 3, no. 4, pp. 449–469, Security Commun. Netw., vol. 9, no. 16, pp. 3401–3411, 2016.
2001. [108] S. Raja and S. Ramaiah, “An efficient fuzzy-based hybrid system to
[82] T. M. Mitchell, Machine Learning, 1st ed. New York, NY, USA: cloud intrusion detection,” Int. J. Fuzzy Syst., vol. 19, no. 1, pp. 62–77,
McGraw-Hill, 1997. 2017.
[83] M. F. Augusteijn and B. A. Folkert, “Neural network classification and [109] M. Gyanchandani, J. Rana, and R. Yadav, “Taxonomy of anomaly based
novelty detection,” Int. J. Remote Sens., vol. 23, no. 14, pp. 2891–2902, intrusion detection system: A review,” Int. J. Sci. Res. Publ., vol. 2,
2002. no. 12, pp. 1–13, 2012.
[84] M. M. Moya, M. W. Koch, and L. D. Hostetler, “One-class classi- [110] L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov
fier networks for target recognition applications,” Sandia Nat. Labs., models,” IEEE ASSP Mag., vol. 3, no. 1, pp. 4–16, Jan. 1986.
Albuquerque, NM, USA, Rep. SAND-93-0084C, 1993. [111] D. Ariu and G. Giacinto, “HMMPayl: An application of HMM to the
[85] S. Albrecht, J. Busch, M. Kloppenburg, F. Metze, and P. Tavan, analysis of the HTTP payload,” in Proc. WAPA, 2010, pp. 81–87.
“Generalized radial basis function networks for classification and nov- [112] A. Churbanov and S. Winters-Hilt, “Implementing EM and Viterbi
elty detection: Self-organization of optimal Bayesian decision,” Neural algorithms for hidden Markov model in linear memory,” BMC
Netw., vol. 13, no. 10, pp. 1075–1093, 2000. Bioinformat., vol. 9, no. 1, p. 224, 2008.
[86] A. Jagota, “Novelty detection on a very large number of memories [113] C. Kolias, G. Kambourakis, and M. Maragoudakis, “Swarm intelligence
stored in a hopfield-style network,” in Proc. IEEE Seattle Int. Joint in intrusion detection: A survey,” Comput. Security, vol. 30, no. 8,
Conf. Neural Netw. (IJCNN), vol. 2. Seattle, WA, USA, 1991, p. 905. pp. 625–642, 2011.
[87] D. Martinez, “Neural tree density estimation for novelty detection,” [114] M. Abadi and S. Jalili, “An ant colony optimization algorithm for
IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 330–338, Mar. 1998. network vulnerability analysis,” Iran. J. Elect. Elect. Eng., vol. 2, no. 3,
[88] A. Bivens et al., “Network-based intrusion detection using neural pp. 106–120, 2006.
networks,” Intell. Eng. Syst. Artif. Neural Netw., vol. 12, no. 1, [115] C. Blum and X. Li, “Swarm intelligence in optimization,” in Swarm
pp. 579–584, 2002. Intelligence. Cham, Switzerland: Springer, 2008, pp. 43–85.
[89] G. H. John and P. Langley, “Estimating continuous distributions in [116] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms.
Bayesian classifiers,” in Proc. 11th Conf. Uncertainty Artif. Intell., Boca Raton, FL, USA: CRC Press, 2012.
Montreal, QC, Canada, 1995, pp. 338–345. [117] M. Sewell, “Ensemble learning,” Res. Note, vol. 11, no. 2, pp. 1–18,
[90] A. McCallum and K. Nigam, “A comparison of event models for 2008.
naive Bayes text classification,” in Proc. AAAI Workshop Learn. Text [118] Y. Freund, “Boosting a weak learning algorithm by majority,” Inf.
Categorization, vol. 752. Madison, WI, USA, 1998, pp. 41–48. Comput., vol. 121, no. 2, pp. 256–285, 1995.
MISHRA et al.: DETAILED INVESTIGATION AND ANALYSIS OF USING MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION 727
[119] K. Anusha and E. Sathiyamoorthy, “Comparative study for feature [143] L. Koc, T. A. Mazzuchi, and S. Sarkani, “A network intrusion detection
selection algorithms in intrusion detection system,” Autom. Control system based on a hidden Naïve Bayes multiclass classifier,” Expert
Comput. Sci., vol. 50, no. 1, pp. 1–9, 2016. Syst. Appl., vol. 39, no. 18, pp. 13492–13500, 2012.
[120] D. S. Kim and J. S. Park, “Network-based intrusion detection with [144] W.-C. Lin, S.-W. Ke, and C.-F. Tsai, “Cann: An intrusion detection
support vector machines,” in Information Networking. ICOIN 2003 system based on combining cluster centers and nearest neighbors,”
(LNCS 2662), H. K. Kahng. Heidelberg, Germany: Springer, 2003, Knowl. Based Syst., vol. 78, pp. 13–21, Apr. 2015.
pp. 747–756. [145] P. V. Amoli, T. Hamalainen, G. David, M. Zolotukhin, and
[121] N. B. Amor, S. Benferhat, and Z. Elouedi, “Naive Bayes vs deci- M. Mirzamohammad, “Unsupervised network intrusion detection
sion trees in intrusion detection systems,” in Proc. ACM Symp. Appl. systems for zero-day fast-spreading attacks and botnets,” Int. J. Digit.
Comput., Nicosia, Cyprus, 2004, pp. 420–424. Content Technol. Its Appl., vol. 10, no. 2, pp. 1–13, 2016.
[122] Y. Bouzida and F. Cuppens, “Neural networks vs. decision trees for [146] G. Kumar and K. Kumar, “A multi-objective genetic algorithm based
intrusion detection,” in Proc. IEEE/IST Workshop Monitor. Attack approach for effective intrusion detection using neural networks,”
Detection Mitigation (MonAM), vol. 28. Tübingen, Germany, 2006, in Intelligent Methods for Cyber Warfare (Studies in Computational
p. 29. Intelligence), vol. 563, R. Yager, M. Reformat, and N. Alajlan, Eds.
[123] C. Zhang, J. Jiang, and M. Kamel, “Intrusion detection using hier- Cham, Switzerland: Springer, 2015, pp. 173–200.
archical neural networks,” Pattern Recognit. Lett., vol. 26, no. 6, [147] A. N. Toosi and M. Kahani, “A new approach to intrusion detec-
pp. 779–791, 2005. tion based on an evolutionary soft computing model using neuro-fuzzy
[124] S. Mukkamala, A. H. Sung, and A. Abraham, “Intrusion detection using classifiers,” Comput. Commun., vol. 30, no. 10, pp. 2201–2212, 2007.
an ensemble of intelligent paradigms,” J. Netw. Comput. Appl., vol. 28, [148] S. Elhag, A. Fernández, A. Bawakid, S. Alshomrani, and F. Herrera,
no. 2, pp. 167–182, 2005. “On the combination of genetic fuzzy systems and pairwise learning
[125] W. Wang and R. Battiti, “Identifying intrusions in computer networks for improving detection rates on intrusion detection systems,” Expert
with principal component analysis,” in Proc. IEEE 1st Int. Conf. Syst. Appl., vol. 42, no. 1, pp. 193–202, 2015.
Availability Rel. Security (ARES), 2006, p. 8. [149] W. Yassin, N. I. Udzir, Z. Muda, and M. N. Sulaiman, “Anomaly-based
[126] L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection intrusion detection through k-means clustering and Naives Bayes clas-
system using support vector machines and hierarchical clustering,” J. sification,” in Proc. 4th Int. Conf. Comput. Informat. (ICOCI), 2013,
Int. J. Very Large Data Bases, vol. 16, no. 4, pp. 507–521, 2007. pp. 298–303.
[127] Y. Li, B.-X. Fang, L. Guo, and Y. Chen, “TCM-KNN algorithm for [150] K. K. Gupta, B. Nath, and R. Kotagiri, “Layered approach using con-
supervised network intrusion detection,” in Proc. Pac. Asia Conf. Intell. ditional random fields for intrusion detection,” IEEE Trans. Depend.
Security Informat., Chengdu, China, 2007, pp. 141–151. Secure Comput., vol. 7, no. 1, pp. 35–49, Jan./Mar. 2010.
[128] Y. Chen, A. Abraham, and B. Yang, “Hybrid flexible neural-tree- [151] M. S. I. Mamun, A. A. Ghorbani, and N. Stakhanova, “An entropy
based intrusion detection systems,” Int. J. Intell. Syst., vol. 22, no. 4, based encrypted traffic classifier,” in Proc. Int. Conf. Inf. Commun.
pp. 337–352, 2007. Security, Beijing, China, 2015, pp. 282–294.
[129] C. Xiang, P. C. Yong, and L. S. Meng, “Design of multiple-level hybrid [152] D. Bhamare, T. Salman, M. Samaka, A. Erbad, and R. Jain, “Feasibility
classifier for intrusion detection system using Bayesian clustering and of supervised machine learning for cloud security,” in Proc. IEEE Int.
decision trees,” Pattern Recognit. Lett., vol. 29, no. 7, pp. 918–924, Conf. Inf. Sci. Security (ICISS), Pattaya, Thailand, 2016, pp. 1–5.
2008.
[153] H. Gharaee and H. Hosseinvand, “A new feature selection IDS based
[130] X. Tong, Z. Wang, and H. Yu, “A research using hybrid RBF/Elman
on genetic algorithm and SVM,” in Proc. 8th Int. Symp. Telecommun.
neural networks for intrusion detection system secure model,” Comput.
(IST), Tehran, Iran, 2016, pp. 139–144.
Phys. Commun., vol. 180, no. 10, pp. 1795–1801, 2009.
[154] M. N. Chowdhury, K. Ferens, and M. Ferens, “Network intrusion
[131] A. Tajbakhsh, M. Rahmati, and A. Mirzaei, “Intrusion detection using
detection using machine learning,” in Proc. Int. Conf. Security Manag.
fuzzy association rules,” Appl. Soft Comput., vol. 9, no. 2, pp. 462–469,
(SAM), 2016, pp. 1–7.
2009.
[132] G. Wang, J. Hao, J. Ma, and L. Huang, “A new approach to intrusion [155] N. Moustafa and J. Slay, “A hybrid feature selection for network intru-
detection using artificial neural networks and fuzzy clustering,” Expert sion detection systems: Central points,” in Proc. 16th Aust. Inf. Warfare
Syst. Appl., vol. 37, no. 9, pp. 6225–6232, 2010. Conf., 2015, pp. 1–10.
[133] F. Amiri, M. R. Yousefi, C. Lucas, A. Shakery, and N. Yazdani, “Mutual [156] A. M. Chandrasekhar and K. Raghuveer, “Intrusion detection technique
information-based feature selection for intrusion detection systems,” J. by using k-means, fuzzy neural network and SVM classifiers,” in Proc.
Netw. Comput. Appl., vol. 34, no. 4, pp. 1184–1199, 2011. IEEE Int. Conf. Comput. Commun. Informat. (ICCCI), Coimbatore,
[134] S.-J. Horng et al., “A novel intrusion detection system based on hier- India, 2013, pp. 1–7.
archical clustering and support vector machines,” Expert Syst. Appl., [157] H. H. Jazi, H. Gonzalez, N. Stakhanova, and A. A. Ghorbani,
vol. 38, no. 1, pp. 306–313, 2011. “Detecting HTTP-based application layer DoS attacks on Web servers
[135] D. Boughaci, M. D. E. Kadi, and M. Kada, “Fuzzy particle swarm in the presence of sampling,” Comput. Netw., vol. 121, pp. 25–36,
optimization for intrusion detection,” in Proc. Int. Conf. Neural Inf. Jul. 2017.
Process., Doha, Qatar, 2012, pp. 541–548. [158] H. Wang, J. Gu, and S. Wang, “An effective intrusion detection frame-
[136] S.-W. Lin, K. C. Ying, C.-Y. Lee, and Z.-J. Lee, “An intelligent algo- work based on SVM with feature augmentation,” Knowl. Based Syst.,
rithm with feature selection and decision rules applied to anomaly intru- vol. 136, pp. 130–139, Nov. 2017.
sion detection,” Appl. Soft Comput., vol. 12, no. 10, pp. 3285–3290, [159] Akashdeep, I. Manzoor, and N. Kumar, “A feature reduced intrusion
2012. detection system using ANN classifier,” Expert Syst. Appl., vol. 88,
[137] S. S. S. Sindhu, S. Geetha, and A. Kannan, “Decision tree based light pp. 249–257, Dec. 2017.
weight intrusion detection using a wrapper approach,” Expert Syst. [160] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion
Appl., vol. 39, no. 1, pp. 129–141, 2012. detection system using a filter-based feature selection algorithm,” IEEE
[138] Y. Li et al., “An efficient intrusion detection system based on support Trans. Comput., vol. 65, no. 10, pp. 2986–2998, Oct. 2016.
vector machines and gradually feature removal method,” Expert Syst. [161] S. M. H. Bamakan, H. Wang, T. Yingjie, and Y. Shi, “An effec-
Appl., vol. 39, no. 1, pp. 424–430, 2012. tive intrusion detection framework based on MCLP/SVM optimized
[139] A. Chandrasekhar and K. Raghuveer, “Intrusion detection technique by by time-varying chaos particle swarm optimization,” Neurocomputing,
using k-means, fuzzy neural network and SVM classifiers,” in Proc. vol. 199, pp. 90–102, Jul. 2016.
IEEE Int. Conf. Comput. Commun. Informat. (ICCCI), Coimbatore, [162] P. J. Van Laarhoven and E. H. Aarts, “Simulated annealing,” in
India, 2013, pp. 1–7. Simulated Annealing: Theory and Applications (Mathematics and Its
[140] S. Kumar and A. Yadav, “Increasing performance of intrusion detec- Applications), vol. 37, P. J. van Laarhoven and E. H. Aarts, Eds.
tion system using neural network,” in Proc. Int. Conf. Adv. Commun. Dordrecht, The Netherlands: Springer, 1987, pp. 7–15.
Control Comput. Technol. (ICACCCT), 2014, pp. 546–550. [163] A. A. Olusola, A. S. Oladele, and D. O. Abosede, “Analysis of
[141] F. Kuang, W. Xu, and S. Zhang, “A novel hybrid KPCA and SVM KDD ’99 intrusion detection dataset for selection of relevance fea-
with GA model for intrusion detection,” Appl. Soft Comput., vol. 18, tures,” in Proc. World Congr. Eng. Comput. Sci., vol. 1, 2010,
pp. 178–184, May 2014. pp. 20–22.
[142] W. Feng, Q. Zhang, G. Hu, and J. X. Huang, “Mining network [164] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, “Selecting
data for intrusion detection through combining SVMs with ant features for intrusion detection: A feature relevance analysis on KDD
colony networks,” Future Gener. Comput. Syst., vol. 37, pp. 127–140, 99 intrusion detection datasets,” in Proc. 3rd Annu. Conf. Privacy
Jul. 2014. Security Trust, 2005, pp. 1–6.
728 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 21, NO. 1, FIRST QUARTER 2019
[165] P. Mishra, E. S. Pilli, V. Varadharajan, and U. Tupakula, “Securing vir- [191] A. Servin and D. Kudenko, “Multi-agent reinforcement learning for
tual machines from anomalies using program-behavior analysis in cloud intrusion detection,” in Adaptive Agents and Multi-Agent Systems
environment,” in Proc. IEEE 18th Int. Conf. High Perform. Comput. III. Adaptation and Multi-Agent Learning (LNCS 4865), K. Tuyls,
Commun., 2016, pp. 991–998. A. Nowe, Z. Guessoum, and D. Kudenko, Eds. Heidelberg, Germany:
[166] (2016). Weka 3.8.1: Data Mining Software in Java. [Online]. Available: Springer, 2008, pp. 211–223.
[Link] [192] X. Xu and T. Xie, “A reinforcement learning approach for host-based
[167] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. intrusion detection using sequences of system calls,” in Advances in
Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011. Intelligent Computing. ICIC 2005 (LNCS 3644). Heidelberg, Germany:
[168] Google. (2017). Installing Tensorflow. [Online]. Available: Springer, 2005, pp. 995–1003.
[Link] [193] R. S. Sutton, “Learning to predict by the methods of temporal
[169] [Link]. (2017). Knime 3.4.1: Download Knime Analytics differences,” Mach. Learn., vol. 3, no. 1, pp. 9–44, 1988.
Platform & SDK. [Online]. Available: [Link] [194] UNM. (1998). UNM Dataset. [Online]. Available: [Link]
downloads [Link]/immsec/[Link]
[170] RapidMiner. (2017). Real Data Science, Fast and Simple (Stable [195] E. Hodo, X. J. A. Bellekens, A. Hamilton, C. Tachtatzis, and
Release 7.5). [Online]. Available: [Link] R. C. Atkinson, “Shallow and deep networks intrusion detection
system: A taxonomy and survey,” ACM Survey, 2017. [Online].
[171] E. Achtert, H.-P. Kriegel, and A. Zimek, “ELKI: A software system
Available: [Link]
for evaluation of subspace clustering algorithms,” in Scientific and
[196] W. Qiang and Z. Zhongli, “Reinforcement learning model, algorithms
Statistical Database Management (LNCS 5069), B. Ludäscher and
and its application,” in Proc. IEEE Int. Conf. Mechatronic Sci. Elect.
N. Mamoulis, Eds. Heidelberg, Germany: Springer, 2008, pp. 580–585.
Eng. Comput. (MEC), 2011, pp. 1143–1146.
[172] University of Waikato. (2014). MOA (Massive Online Analysis). [197] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
[Online]. Available: [Link] with double Q-learning,” in Proc. AAAI, 2016, pp. 2094–2100.
[173] A. Bifet and G. D. F. Morales, “Big data stream learning with
SAMOA,” in Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW),
Shenzhen, China, 2014, pp. 1199–1202.
[174] X. Meng et al., “MLlib: Machine learning in apache spark,” J. Mach.
Preeti Mishra (M’14) received the Ph.D. degree in
Learn. Res., vol. 17, no. 1, pp. 1235–1241, 2016.
computer science and engineering from the Malaviya
[175] L. Deng and D. Yu, “Deep learning: Methods and applications,” Found.
National Institute of Technology Jaipur, India,
Trends R Signal Process., vol. 7, no. 3–4, pp. 197–387, 2014.
in 2017, under the supervision of Dr. Emmanuel
[176] E. Aminanto and K. Kim, “Deep learning in intrusion detection S. Pilli and Prof. V. Varadharajan. She is currently
system: An overview,” in Proc. Int. Res. Conf. Eng. Technol., 2016, an Associate Professor with Graphic Era University,
pp. 1–12. Dehradun, India. She has been a Visiting Scholar
[177] J. Saxe and K. Berlin, “Deep neural network based malware detec- at Macquarie University, Sydney, NSW, Australia,
tion using two dimensional binary program features,” in Proc. 10th in 2015. Her area of interest includes Cloud security,
IEEE Int. Conf. Malicious Unwanted Softw. (MALWARE), 2015, E-mail security, Network security, and IoT.
pp. 11–20.
[178] Z. Wang, “The applications of deep learning on traffic identification,”
presented at the BlackHat, Las Vegas, NV, USA, 2015, pp. 1–10.
[179] Y. Li, R. Ma, and R. Jiao, “A hybrid malicious code detection method Vijay Varadharajan is the Global Innovation
based on deep learning,” Int. J. Softw. Eng. Appl., vol. 9, no. 5, Chair Professor with the University of Newcastle,
pp. 205–216, 2015. Australia and the Director of the Advanced Cyber
[180] W. Yan and L. Yu, “On accurate and reliable anomaly detection for gas Security Research Centre. He has published over 380
turbine combustors: A deep learning approach,” in Proc. Annu. Conf. papers in international journals and conferences, ten
Prognostics Health Manag. Soc., 2015, pp. 1–8. books on information technology, security, networks,
[181] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection model and distributed systems, and has held three patents.
based on deep belief networks,” in Proc. IEEE 2nd Int. Conf. Adv. He has been/is on the Editorial Board of several
Cloud Big Data (CBD), Huangshan, China, 2014, pp. 247–252. journals including ACM Transactions on Information
[182] W. Jung, S. Kim, and S. Choi, “Poster: Deep learning for zero-day and System Security, the IEEE T RANSACTIONS
flash malware detection,” in Proc. 36th IEEE Symp. Security Privacy, ON D EPENDABLE AND S ECURE C OMPUTING , the
2015, pp. 1–2. IEEE T RANSACTIONS ON I NFORMATION F ORENSICS AND S ECURITY, and
[183] D. Silver et al., “Mastering the game of Go with deep neural networks the IEEE T RANSACTIONS ON C LOUD C OMPUTING.
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[184] U. Fiore, F. Palmieri, A. Castiglione, and A. De Santis,
“Network anomaly detection with the restricted Boltzmann machine,” Uday Tupakula (M’12) received the Ph.D. degree
Neurocomputing, vol. 122, pp. 13–23, Dec. 2013. in 2016, under the supervision of Prof. Varadharajan.
[185] Z. Yuan, Y. Lu, and Y. Xue, “Droiddetector: Android malware charac- He is a Senior Lecturer with the University of
terization and detection using deep learning,” Tsinghua Sci. Technol., Newcastle, Australia. He has 75 publications in
vol. 21, no. 1, pp. 114–123, 2016. different research areas such as network security,
[186] Y. Wang, W.-D. Cai, and P.-C. Wei, “A deep learning approach for DDoS attacks, MANET security, and secure virtual
detecting malicious JavaScript code,” Security Commun. Netw., vol. 9, systems. He is a member of BCS and ACM.
no. 11, pp. 1520–1534, 2016.
[187] M. A. Salama, H. F. Eid, R. A. Ramadan, A. Darwish, and
A. E. Hassanien, “Hybrid intelligent intrusion detection scheme,” in
Soft Computing in Industrial Applications (Advances in Intelligent and
Soft Computing), vol. 96, A. Gaspar-Cunha, R. Takahashi, G. Schaefer,
and L. Costa, Eds. Heidelberg, Germany: Springer, 2011, pp. 293–303.
[188] S. Seok and H. Kim, “Visualized malware classification based-on con- Emmanuel S. Pilli (SM’16) received the Ph.D.
volutional neural network,” J. Korea Inst. Inf. Security Cryptol., vol. 26, degree from IIT Roorkee, Roorkee, in 2012. He is
no. 1, pp. 197–208, 2016. currently an Associate Professor with the Malaviya
[189] B. Dong and X. Wang, “Comparison deep learning method to tradi- National Institute of Technology, Jaipur, India. He
tional methods using for network intrusion detection,” in Proc. 8th has 20 years of teaching, research, and administra-
IEEE Int. Conf. Commun. Softw. Netw. (ICCSN), Beijing, China, 2016, tive experience. His areas of interest include Security
pp. 581–585. and Forensics, Cloud computing, Big data, and IoT.
[190] K. Malialis and D. Kudenko, “Distributed response to network intru- He is also a Senior Member of ACM and CSI and
sions using multiagent reinforcement learning,” Eng. Appl. Artif. Intell., actively involved in Cloud Computing Innovation
vol. 41, pp. 270–284, 2015. Council of India, NIST Cloud Forensic Workgroup.