0% found this document useful (0 votes)
87 views11 pages

1294-Manuscript (Without Author Details) - 5326-1-10-20201227

Manuscript (Without Author Details)

Uploaded by

fx_ww2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
87 views11 pages

1294-Manuscript (Without Author Details) - 5326-1-10-20201227

Manuscript (Without Author Details)

Uploaded by

fx_ww2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 11

64 Original Article Journal of Information Security & Cybercrimes Research 2020; Volume 3 Issue (1), 64-74

JISCR
Naif Arab University for Security Sciences
Journal of Information Security & Cybercrimes Research
‫جملة بحوث �أمن املعلومات واجلرائم ال�سيربانية‬
https://journals.nauss.edu.sa/index.php/JISCR

DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study


Ahmed Elsherif *, Arwa A. Aldaej
Forensic Sciences Department, College of Criminal Justice, Naif Arab University for Security Sciences, Riyadh, Saudi Arabia.
Received 27 Aug. 2020; Accepted 21 Oct. 2020; Available Online 20 Nov. 2020

Abstract
One of the major challenges that faces the acceptance and growth rate of business and governmental sites is a Bot-
net-based DDoS attack. A flooding DDoS strikes a victim machine by means of sending a vast amount of malicious traffic,
causing a significant drop in the service quality (QoS) in IoT devices. Nonetheless, it is not that easy to detect and tackle
flooding DDoS attacks, owing to the significant number of attacking machines, the usage of source-address spoofing, and
the common areas shared between legitimate and malicious traffic. New kinds of attacks are identified daily, and some
remain undiscovered, accordingly, this paper aims to improve the traffic classification algorithm of network traffic, that
hackers use to try to be ambiguous or misleading. A recorded simulated traffic was used for both samples; normal and
DDoS attack traffic, approximately 104.000 cases of each, where both datasets -which were created for this study- rep-
resent the input data in order to create a classification model, to be used as a tool to mitigate the risk of being attacked.
The next step is putting datasets in a format suitable for classification. This process is done through preprocessing
techniques, to convert categorical data into numerical data. A classification process is applied to capture datasets, to create
a classification model, by using five classification algorithms which are; Decision Tree, Support Vector Machine, Naive
Bayes, K-Neighbours and Random Forest. The core code used for classification is the python code, which is controlled by
a user interface. The highest prediction, precision and accuracy are obtained using the Decision Tree and Random Forest
classification algorithms, which also have the lowest processing time.

I. INTRODUCTION chapter one we are going to investigate a general descrip-


The growing number of botnets attacks on the net has tion of the problem, objectives and a brief of the pro-
made it vital to develop tougher advanced techniques to posed solution.
deal with them, since human intervention is not enough Online platforms today need to be checking whether
to examine and provide the necessary response to such the user is human or not to avoid brute force and flood-
attacks. Moreover, the nature and techniques of recent ing attacks, as these are the most common vulnerabilities,
online attacks have changed drastically, particularly af- due to the availability of enhanced computerized power
ter the appearance of intelligent agents such as computer and network speed [2].
variant DDoS attacks and worms [1]. That is why, the The problem in taking counter measures against at-
need for combating them intelligently is increasing. In tacks is that the HTTP DDoS attack acquires legitimacy

Keywords: Cybercrimes, DDos, Botnet, Anomaly traffic, Machine learning.

* Corresponding Author: Ahmed Elsherif


Production and hosting by NAUSS Email: [email protected].
doi: 10.26735/ZRXN1433
1658-7782© 2020. JISCR. This is an open access article, distributed under the terms of the Creative Commons, Attribution-NonCommercial License.
DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study 65

by representing legitimate behavior, and thus is ready to ers, appliances, applications, and the interaction of data
penetrate the available security and protection measures, through applications. The DDoS is composed of a source
to affect services provided and cause a negative impact which transmits malicious data or demands. That source
on clients. The negative impact is to affect service avail- could result from several systems.
ability and to stop providing the service in proper time. In Mostly, these attacks occur by flooding a system with
case of attacker success penetrating through the security information requirements. This allows a network server
measures, the system is subject to other types of attacks to transmit a huge number of requests for a page that has
or malware infection. been disabled as needed, or a database may be subject to
Because of the different environment of network de- a large number of queries. This results in causing the web
vices and its architectures, the traditional attack detec- bandwidth, the amplitude, the CPU and the main memo-
tion systems cannot be competently used in the detection ry becoming saturated [10]. In spite of this, DDoS attacks
process. Additionally, the potential incidents or attacks provide simple attack methods in comparison with other
might be different from the attacks that are observed on cyber-attack methods, but they spread in more severe and
the conventional network devices. developed ways.
The Internet of things (IoT) devices provide business- In this paper, a comparative study of the machine
es with a number of advantages, so they can; monitor all learning techniques is studied to find the best algorithm
business operations, progress a client’s trial, save both that should be utilized in order to detect anomalous at-
time and money, increase employee efficiency, combine tacks in network traffic. According to the findings of this
and modify models; improve the decisions of administra- study, the proposed system should mitigate attacks and
tors and improve the movement of sales [3]. be applicable to all devices in Saudi Arabia.
Recently, the IOT has become very sophisticated. It This paper is organized as follows: the next section
has also been inserted into many daily applications. It provides an overview of previous works and a discussion
has become the direction of the future internet, providing of author contributions to the main aspects of traffic clas-
many facilities to users, whether on a personal level, or sification in detecting DDoS attacks. Section three pres-
for a range of manufacturing. Researchers have become ents; the dataset, techniques and methodology used in
interested in developing multiple technologies to apply it attack detection. Section four shows the results obtained
for all uses [4]. and indicates the best algorithm to mitigate attacks. Fi-
It is expected that the volume of IoT devices will in- nally, the conclusions are introduced.
crease from 8 billion in 2017 to 20 billion in 2020 [5].
However, a lot of IOT devices are essentially exposed II. RELATED WORK
to hacking. By analyzing the number of attacks on IOT
This section details different studies that indicate the
devices, it was found that for 10 appliances connected
detection process of DDoS attacks using machine learn-
to the internet, 250 vulnerabilities were exposed, includ-
ing techniques, along with a summary of results and
ing; open Telnet ports, old Linux firmware, unencrypted
conclusions. The study outputs enlighten our research
transmission of critical data [6].
towards appropriate ML techniques used, tools, and data
Smart devices are very costly, however, their adop- sets applied, as deduced by the paper’s findings.
tion and invasion aren’t as high, and this can somehow
Authors in [11] suggested executing machine learn-
be related to the presence of several suppliers and sales-
ing classifiers, to discover HTTP botnets. They utilized
men, although standardization is the key to decreasing
the functions of the TCP packet in order to extract the
the price of these devices and guaranteeing their interop-
dataset from network traffic. They also worked on dis-
erability [7]. Various cybercrimes can be committed by
covering the most effective machine learning classifier
utilizing IoT devices in different fields, for example on-
for eliciting best results. Their suggested experiment is
line transportation [8] and medical records [9].
based on classifying the HTTP botnet in the network
The Distributed Denial of Service (DDoS) attack oc- flow, by utilizing the preferable classifier they found
curs when services stop being purposefully delivered by during the experiment, with a rate of accuracy that could
the action of an attacker, or attackers. This happens by reach 92.93%.
foiling entry to all kinds of services such as internet, serv-

JISCR 2020; Volume 3 Issue (1)


66 Elsherif & Aldaej

Botnet has adjusted itself towards various types of at- attacks with simply traffic behaviour.
tacks and utilizes different forms of web protocols to car- Authors in [14] examined a PCAP file and studied the
ry out malicious actions. Take the model of peer to peer DoS attack using the Decision Tree Data Mining Tool.
(p2p) botnet, which utilizes the P2P program to execute They utilized a classifier sample at the WEKA intrusion
the command and control (C&C) server order. Neverthe- disclosure tool. By decision tree algorithms, several bas-
less, P2P botnet has its disadvantages as to the intricacy es were displayed to show if a SYN torrent exists or not.
in running bots on decentralized network architecture, The decision tree ultimately showed that this is not the
consequently the HTTP botnet was introduced to over- case. SYN packets from an origin identical to the same
come this issue. HTTPBotnet is working in the central- destination are larger than a packet considered to be a
ized network architecture, like the IRC botnet with the threat. Otherwise, it is normal. For example; Tcp.flags.
advantages of detection and avoiding. For example, DNS syn <=0: Normal and Tcp.flags.syn >0: Threat.
rapid flow and utilizing HTTP protocol leads to obstacles
in the detection of the HTTP botnet in charge of carrying
out DDoS attacks [11]. III. METHODOLOGY

Authors in [12] suggested botnet discovery types that Different machine learning methodologies discussed
botnet detection such as in [15]. In this research, the ex-
depend on the ML by utilizing DNS query data. This
periments are implemented by simulating every part of
paradigm comes from the idea that the bots that are sent
the desired environment, to avoid critical effects on other
from the botnets, regularly transmit the search queries to
network components. The Test environment is composed
DNS in order to discover the IP addresses and command
of several virtual machines to represent the side of the
and control servers (C&C,) by utilizing the name of the
botnets, and other virtual elements serve as victim and
domains that have been automatically created.
attack machines used by an attacker to control attacks by
This paradigm has been executed at two stages: a. botnets. The code’s implementation is designed to satisfy
the training stage. b. the detection stage. At the training the required performance by various parts of the test en-
stage, it gathers the query data of the DNS, thereafter, it vironment, in which the attacker part of the code is used
takes out the domain names that are present at the DNS to recruit the botnets. The code installed on botnets is
queries. Then, the group of group of domain names have capable of generating a DDoS attack with different forms
extracted the characteristics to be used at the training of attack. In order to be able to detect a DDoS attack
stage. During the training stage, the ML algorithms are and to ensure suitable actions to mitigate attacks we have
utilized to know the classifiers. Over the rating opera- implemented a code that can monitors analyzes network
tions, they had the chance to be able to evaluate the high- traffic, for the purpose of attack detection and mitigation.
est ML algorithms that will implement the highest pre-
cision. Throughout the detection stage of the paradigm,
A. Environment Preparation
the DNS queries are being monitored, and come out of
the operation of excluding the domain names. The ML The methodology followed in this study is based on
utilizing a virtual environment running Linux (Ubuntu
algorithm classifiers check the legitimacy of the domain
18.2), to ensure a safe test environment and a HTTP re-
name.
sponse that is directed towards the botnet that targets a
Sheriff Saad et. al. [13] suggest a novel method of http attack.
description and disclosure of robots utilizing web traffic
The attack environment consists of 6 virtual Ubuntu
actions. It concentrates on the latest discovery and the
working as botnets; the 6 botnets are used to attack the
most difficult robot types before they start Attack. There
7th virtual Ubuntu which acts as a victim. Fig. 1 shows
were several machine learning mechanisms to encounter
a DDoS attack test environment and data-control flow.
the demands of detecting robots on the Internet, the abili-
ty of modification, disclosure of novelties and precocious Each one of the 6 botnets are supplied with a botnet
disclosure. Results of empirical valuation, indicate from message generator (DDoS Botnet Traffic Simulator) to
simulate Http flood attack.
the dataset displayed, that there is a chance in an effec-
tive way to recognize Botnets through the botnet Com- Bonesi [16] botnet traffic simulators can produce dif-
mand-and Control (C & C) Stage, prior to starting their ferent protocol attacks (ICMP, UDP and TCP (HTTP))
used to simulate a flood attack. It can send different pack-

JISCR 2020; Volume 3 Issue (1)


DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study 67
JISCR_20_006_Figures

JISCR_20_006_Figures
et sizes, packet counts, packet rates and more parameters
can be configured.
The victim is supplied with a Web server (Linux
Apache MySQL PHP/Perl web server - LAMPP) used to JISCR_20_006_Figures
produce web services and to host web pages, that will be
used as a target for a http flood attack. This structure is
used to ensure that the attack message penetrates through
to layer 7 (application layer), rather than layer 3 or 4
(Network Layer Attack). In other words, the Http flood
message uses the web service to permit attack messages Fig. 1 DDoS attack environment.
to penetrate through to layer 7 of the victim communica- Fig. 1 DDoS attack environment
tion model. Fig. 1 DD
We also supply the victim with a network sniffer (net-
work traffic capture/analysis tool), to capture traffic for
the purpose of analysis (training and mitigation process).
Fig. 1 DD
Fig. 2 Botnet recruiting and attack process.
B. Botnet Preparation Methodology Fig. 2 Botnet recruiting and attack process.
The proposed methodology focuses on building and
controlling a Java application that performs all required
operations needed to recruit botnets, in order to commit
Fig. 2 Botnet recruiting and attack process.
HTTP DDoS attacks on a victim machine. Fig. 2 Botnet recruiting and attack process.
Our Java application procedure can be summarized
as follows:
1. Selecting IP range to scan for suitable devices to
be recruited as botnets.
2. Scan each IP in the given range for corrupted us-
ernames and passwords.
3. Store all pairs of IP addresses, usernames and
passwords.
4. Store other IP addresses that cannot be discov-
ered, for the purpose of login data to be used as
Fig. 3 Learninga Process Fig. 3 Learning Process
victim. Fig. 3 Learning Process.
5. Monitor all discovered data.
order to create the training model, the second one focuses
6. Allow users to select as many recruited botnets as Fig. 4 Mitig
Fig.
on 4using
Mitigation Process
this model to examine traffic, to alert users about
required in order to start attack. Fig. 3 Learning Process
real attack traffic. Fig. 3 and Fig. 4 show this process.
7. Select the attack protocol (UDP – TCP – ICMP –
The following points illustrate our Java application
HTTP … etc.).
procedure in detail:
8. Select the attack packet count. Fig. 4 Mitig
1. Search for selected range of IP addresses (from
9. Start attack. xxx.xxx.xxx.1 to xxx.xxx.xxx.254) that represent
10. Reboot selected botnets. the botnets that will be recruited as zombies to at-
Fig. 2 shows botnet recruiting and attack process. tack victims. One range can be searched at a time,
but more than one IP range can be checked to col-
lect as many botnets as possible to use for attack
C. Attack Methodology
2. For each responding IP address, the Java applica-
This methodology can be summarized in 2 steps, the tion will continue to check usernames and pass-
first one centers on capturing simulated attack traffic in
Fig.
Fig. 5 Fig.
Controller Tasks.
6 Botnets Tasks.
Fig. 5 Controller Tasks.
JISCR 2020; Volume 3 Issue (1)
Fig.
Fig. 5 Controller Tasks.
Fig. 2 Botnet recruiting and attack process.

68 Elsherif & Aldaej


Fig. 1 DDoS attack environment

words by applying a list of standard usernames


and password.
3. As soon as the Java application succeeds in log-
Fig. 2 Botnet recruiting
ging andof
in to one attack process. the botnet login in-
the botnets,
formation (IP address – username - password) will
be stored into the SQLite database as a botnet for
future use. Fig. 3 Learning Process
Fig. 3 Learning Process
4. When discovering IPs that cannot be logged into,
they will be added to the victim list.
F
5. Alongside collecting log in information for a Fig. 44Mitigation
Fig. MitigationProcess.
Process
sufficient number of botnets, Java code plays a
second important role by recruiting botnets. This
operation is summarized by downloading the
BoNeSi (DDoS botnet traffic simulator) into the
botnet. The next step is setting up BoNeSi on the
botnet, ready to act as a zombie to be used to at-
tack victims. This step is done by downloading a
shell script file (file.sh) that contains a series of
commands
Fig. 3 Learning Processfor adding the BoNeSi botnet simula-
tor, as well as all necessary libraries, in order for
BoNeSi to achieve the required performance.
6. The role of Java code is not limited to download- Fig. 4 Mitigation Process
ing and the setting up of BoNeSi, but rather it ex-
tends to running various configurations of BoNeSi Fig. 6 Botnets Tasks.
Fig. 55 Controller
ControllerTasks.
Tasks.
Fig. 5 Controller Tasks.
attributes to simulate UDP, TCP, ICMP and other
traffic protocols used for flood attacks.
7. Other factors can be controlled such as packet
size, packet count, send rate and others.
8. Finally, the Java code is capable of restarting the
botnet if required.
9. The 7th virtual run would be used to play the role
of the victims in our environment. It’s equipped
with t-shark. T-Shark is the Linux version of
wire-shark used for network traffic sniffing. The
main purpose of T-shark is to capture the traffic Fig. 66 Botnets
Fig. BotnetsTasks.
Tasks.
JISCR_20_006_Figures
exchange
Fig. 5 Controller between botnets and victims, then ex-
Tasks.
port captured messages into CSV files. These files
contain clean traffic and DDoS attack traffic in-
formation. N
10. The sniffing process is completed on the host ma-
chine by supplying one line.
11. Written commands in the CLI (Ubuntu terminal
Command Line Interpreter) are used to capture
data passing into or from the host machine. A cap-
turing filter is used to determine the exact required
information and store extracted data onto a csv file
for the purpose of data training. The sniffing filter Fig. 7 T-Shark captures and filter commands for normal and attack
traffic.
Fig. 7 T-Shark captures and filter commands for
normal and attack traffic

JISCR 2020; Volume 3 Issue (1)

Normal Traffic Attack Traffic


DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study 69

extract requires features using many data collec- classification algorithms to create a model to identify at-
tions. Other properties used to format the sniffed tack traffic based on the data concluded from the training
data allow the T-Shark to extract target features process of well-known attack traffic (simulated), with re-
and store it for future use. Fig. 5 describes attack spect to normal traffic, using different algorithms.
control tasks and Fig. 6 describes the tasks that are 1. Feature used:
provided by botnet software. Fig. 7 show network
a. Frame size - numerical.
traffic capture command used to monitor network
traffic. b. Protocol – categorical.
12. Once the data is captured, we carry out data c. Traffic type – numerical.
pre-processing (remove unneeded data, fill missed 2. Packet count 100.000
data and feature selection). 3. Traffic types count: 2
13. We prepared a Python code to process captured a. Normal = 0.
data (clean DDoS and normal traffic) for the pur-
b. DDoS Attack = 1.
pose of machine learning.
14. We used different algorithms (Nearest Neighbour- We selected a protocol name (categorical) and a
hood, Random Forest [17] and other classification frame size for the purpose of traffic training to produce a
algorithms) to provide training for captured data training model. The reason behind selecting the protocol
(normal and attack traffic). A different classifier name and frame size, is that this amount of information
is used to secure the best prediction for unknown is sufficient for detecting the presence of a DDoS attack,
traffic. The result of this step is a Pickles file. Each whilst at the same time, the attack data pattern is charac-
Pickles file contains all required information for terized by its repetitive and periodic pattern. These two
classification (prediction of traffic/packet type) of features are repeated in a special pattern that helps in at-
unknown traffic for the purpose of DDoS mitiga- tack detection. Another reason for only selecting these
tion. two features is that it is preferable to complete the job
15. Besides using Pickles file for classification, we using as minimal resources as possible, to avoid wasting
used other Pickles for data preparation, such as system resources.
replacing empty spaces with 0 values and convert- Data Sample: Fig. 8 shows captured data features of con-
ing categorical data to feature numerical data. cern, for analysis.
16. The Python code is used to detect a HTTP Flood
attack; the attack status is declared to inform the E. Selected classifiers
user about the presence of an attack.
We shall apply more than one classifier to predict the
17. We selected a protocol name (categorical) and a traffic type. We select five classifiers to study the traffic
frame size for the purpose of traffic training to classification, to discover the best classifier, that has the
produce a training model. The reason behind se- highest performance, when detecting traffic type.
lecting a protocol name and a frame size, is that
1. Decision Tree.
this amount of information is sufficient for detect-
ing the presence of a DDoS attack, whilst at the 2. Support Vector Machine.
same time, the attack data pattern is characterized 3. Naive Bayes.
by its repetitive and periodic pattern. These two 4. K-Neighbours Classifier K=3.
features are repeated in a special pattern that helps
5. Random Forest Classifier.
in attack detection. Another reason for only se-
lecting these two features is that it is preferable
to complete the job using as minimal resources as 1) Measuring classification output:
possible, to avoid wasting system resources. The purpose of classification is to predict the type of
unknown traffic, using a model created during the train-
D. DDOS Analysis ing process. The training process results are summarized
in order to produce a confusion matrix that calculates ac-
The basic idea of DDoS analysis is to use supervised

JISCR 2020; Volume 3 Issue (1)


70 Elsherif & Aldaej

Normal Traffic Attack Traffic


curacy, and other parameters, that ensure the accuracy
of the training process. For unknown traffic, only pre-
dictions can be obtained, as a result of the classification
process. If the prediction value is positive, the python
code alerts the user about a detected attack.

ommands for
2) Results Calculation:
The Accuracy Score would be evaluated to find the best
classifier. “Time consumed” represents the classifi-
cation process performance measure.
ack Traffic
To calculate the Accuracy Score we have to obtain the
JISCR_20_006_Figures
confusion matrix. The Confusion matrix represents
the count of true/false predictions for each class. We
canTraffic
Normal represent the confusionAttack
matrixTraffic
as follows:
1. Count of true detected packets that are DDoS at-
tack packet (T/T).
2. Count of true detected packets that are not DDoS
attack packet (T/F).
3. Count of False detected packets that are DDoS
Fig. 7 T-Shark captures and filter commands for attack packet (F/T).
normal and attack traffic
4. Count of False detected packets that are not DDoS
Fig.
Fig. 88 Captured DataSample
Sample.
Captured Data attack packet (F/F).
The Accuracy Score can be calculated from the follow-
Normal Traffic Attack Traffic ing Equation:

Number of Elements Correctly Classified


Accuracy Score =
Total Count of Elements
(:/:)=(>/>)
= :?@AB C?DE@ ?F GBHIHE@J

IV. RESULTS AND DISCUSSIONS


The results are achieved by running the implemented
t victim position
code during the various steps, which includes; creating
the test environment, detecting the botnet IP, identifying
usernames and passwords, storing botnet information on
the database, recruiting the botnet (deploying Bonesi in
the botnet),
Fig. 8 Captured detecting the victim IP and sending attack
Data Sample
commands to the recruited botnets. The attack packets
Fig. 10 Accuracy against classifier result are captured at the victim’s network card. The attack
packets are captured for the purpose of attack dataset
training to produce the classification model. This model
is used to classify runtime traffic to judge traffic for the
DDoS attack. Collecting test results is necessary in order
to evaluate how much we’ve succeeded in our research,
and to provide evidence for test result discussions, there-
fore enriching the way of thinking in regard to detecting
Fig.
Fig. 99 Normal
Normaland
andattack
attack Captured
Captured data
data at victim
at victim position.
position and mitigating DDoS attacks.

JISCR 2020; Volume 3 Issue (1)

Fig. 10 Accuracy against classifier result


DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study 71

Results archived in our research are collected in 3 3. Naive Bayes: The Naive Bayes Classifier presents
steps, such as: a lower Accuracy Score, as shown in Table III.
1. Data collection. 4. K-Neighbours Classifier K=3: The K-Neighbours
2. Production of attack traffic model. Classifier presents a good Accuracy Score, but
with lower performance, as shown in Table IV.
3. Runtime traffic analysis.
5. Random Forest Classifier: The Random Forest
Each step produces the input data for the next step.
classifier provides good performance and predic-
tion precision, as shown in Table V.
A. Data Collection Results: Fig. 10 shows the output test evaluation parameters
Collected traffic can be described as shown in Fig. 9, results. Unlike other methodology used to detect DDoS
as follows: attacks, our methodology doesn’t depend on the applica-
• Frame Length: The TCP length field is the length tion under attack or attacking an IP, it detects any DDoS
of the TCP header and data (measured in octets). attack that can penetrate and reach layer 7 in the OSI
• IP Source: The IP (v4) address of the sender of network communication model. In order to have the best
the packet. classifier, 5 classifiers should be tested to select the best
classifier for prediction and cross validation processing
• IP Destination: The IP (v4) address of the receiver
time.
of the packet.
Rudy et. al. [10], suggested executing machine learn-
• Protocoled: the protocol used in the data portion
ing classifiers, to discover HTTP botnets, in order to pre-
of the packet.
dict HTTP DDoS attacks within different bot families.
The result of this step are CSV files named (Nor- The values were as follows:
malTrafic.csv and AttackTrafic.csv). These two files are
• Accuracy: (51.84%-97.88%).
saved in a dataset folder, resulting in a sample of the 2
files. • Precision: (71.02%-98.66%).
In light of the above, we notice that: • Recall: (43.58%-99.99%).
• Normal traffic has a sequence of packets which • FPR: (3.06%-97.12%).
contain different protocols such as TCP, ARP, Xuan el. at. [11] used 4 machine learning algorithms
DNS, HTTP and more. This sequence is repeated on the T1, T2, and T3 training datasets, consequently the
on a regular basis. most similar training dataset to ours. We can summarize
• Attack traffic is composed of a long sequence of the results as follows:
pure TCP packets followed by a sequence of TCP • Accuracy: (90.2%-90.8%).
and HTTP for a short period, then it goes back to a • PVR: (83.1%-90.7%).
long sequence of TCP packets. Another feature is • FPR: (86.5%-90.8%).
that the HTTP packet has a constant length.
• TPR: (90.2%-91.2%).
• F1: (86.5%-90.8%).
B. Dataset Training and Cross Validation Results
We used A sufficient amount of traffic packets
The training process is conducted by applying differ- (101400 packets for normal traffic and 101400 packets
ent classification algorithms to generate a model to be for attack traffic).
used in calculating predictions of unseen data.
We’ve selected five classifiers in order to produce a
1. Decision Tree: The Decision Tree Classifier pres- suitable model, using well-known traffic, and used it to
ents the highest prediction precision, accuracy predict the traffic type. We’ve succeeded in obtaining the
and performance. As shown in Table I. following results:
2. Support Vector Machine: The Support Vector Ma- • Accuracy (81.378%-97.61%).
chine Classifier presents a good Accuracy Score,
• F1 score (80.68%-97.61%).
but with much lower performance. As shown in
Table II. • Recall (81.37%-97.61%).

JISCR 2020; Volume 3 Issue (1)


72 Elsherif & Aldaej

TABLE I TABLE V
DECISION TREE CLASSIFIER RESULT SUMMERY RANDOM FOREST CLASSIFIER RESULT SUMMERY
Accuracy 0.9760796751341257 Accuracy 0.974493455262686
Fig. 8 Captured Data Sample
F1 score 0.9761284983757427 F1 score 0.9744928232060458
Recall 0.9760796751341257 Recall 0.974493455262686
Precision 0.9762409084974778 Precision 0.9745112014551831
Confusion Matrix [[32431, 937], [1484, 66359]] Confusion Matrix [[32343, 958], [749, 32874]]
Time Consumed 0:0:2 Time Consumed 0:0:2
Note: Processing time depends on hardware limitations.
TABLE II
SUPPORT VECTOR MACHINE CLASSIFIER RESULT
Accuracy 0.9744785129400514

Fig. 9 Normal andF1attack


scoreCaptured data0.9744778762903109
at victim position
Recall 0.9744785129400514
Precision 0.9744964320164035
Confusion Matrix [[32342, 959], [749, 32874]]
Time Consumed 0:2:41

JISCR_20_006_Figures Fig. 10 Accuracy against classifier result.


TABLE III Fig. 10 Accuracy against classifier result
NAIVE BAYES CLASSIFIER RESULT SUMMERY
Accuracy 0.8748653802452303
F1 score 0.8655517847719487
Recall 0.8748653802452303
Precision 0.8945507526205738
Confusion Matrix [[20703, 12665], [0, 67843]]
Time Consumed 0:0:2

TABLE IV Fig. 11 Result comparison for our results and other efforts to
K-NEIGHBOURS CLASSIFIER RESULT SUMMERY Fig. 11 Result
enhancement comparison
to detect for our results and other efforts
DDoS attack.
to enhancement to detect DDoS attack
Accuracy 0.9744187436495129 • Precision (86.41%-97.62%).
F1 score 0.974418088463661 Fig. 11 shows a comparison of our results, and [10]
and [8] efforts to enhancement the detection of a DDoS
Recall 0.9744187436495129 attack.
Precision 0.9744373625841419
C. Mitigation Results
Confusion Matrix [[32338, 963], [749, 32874]]
The mitigation results are summarized in blocking
Time Consumed 0:1:18 HTTP DDoS attacks on IPs by adding the botnet IP fire-

JISCR 2020; Volume 3 Issue (1)


Fig. 11 Result comparison for our results and other efforts
to enhancement to detect DDoS attack

DDOS Botnets Attacks Detection in Anomaly Traffic : A Comparative Study 73

Fig. 12 py_anti.java prediction output and firewall action.

Fig. 12 py_anti.java prediction output and firewall


wall. Fig. 12 shows prediction and mitigation module
action
tions,” in Int. J. Comput. Sci. Netw. Secur. (IJCSNS), vol. 20, no.
outputs. 9, pp. 16-28, Sept. 2020, doi: 10.22937/IJCSNS.2020.20.09.3

[3] M. Rouse. “nternet of things (IoT).” internetofthingsagenda.

V. CONCLUSION techtarget.com. https://internetofthingsagenda.techtarget.com/


definition/Internet-of-Things-IoT
The types of DDoS attacks, and the difficulties in
detecting them, require us to carry out further research, [4] P. Gokhale, O. Bhat and S. Bhat, “Introduction to IOT,” in Int.
in order to obtain an algorithm that is capable of detect- Adv. Res. J. Sci. Eng. Technol., vol. 5, no. 1, pp. 41-44, Jan. 2018,
ing attacks, for the purpose of attack mitigation. We’ve doi: 10.17148/IARJSET.2018.517
worked on designing codes that implement classification [5] J. Manyika et al. “Unlocking the potential of Internet of Things.”
algorithms for analyzing network traffic. The process de- Mckinsey.com. https://www.mckinsey.com/business-functions/
pends on analyzing well-known attack traffic, then using mckinsey-digital/our-insights/the-internet-of-things-the-value-
this experience when detecting real attacks. of-digitizing-the-physical-world#
The classification results show that the Decision Tree [6] Broadband Internet Technical Advisory Group, “Internet of
algorithm has the highest accuracy and precision, while
Things Security and Privacy Recommendation,” BITAG, Nov.
Random Forest and SVM come second, with slightly
2016. [Online]. Available: https://www.bitag.org/documents/
lower accuracy and precision. The decision tree and Na-
BITAG_Report_-_Internet_of_Things_(IoT)_Security_and_Pri-
ive Bayes algorithms have the lowest classification time,
vacy_Recommendations.pdf
K-nearest neighbour and SVM have a much higher clas-
sification time. [7] N. Farooqi, A. Gutub and M. O. Khozium, “Smart Community
Challenges: Enabling IoT/M2M Technology Case Study,” in Life
As for machine learning models, SVM, among oth-
Sci. J., vol. 16, no. 7, pp. 11-17, July 25, 2019, doi: 10.7537/
ers, achieved the highest score, in terms of accuracy. This
marslsj160719.03.
score is 97.37%, while all other algorithms achieved ac-
ceptable false negative scores. [8] A. Alsaidi, A. Gutub and T. Alkodaidi, “Cybercrime on Transpor-
tation Airline,” in J. Forensic Res., vol. 10, no. 4, Nov. 19, 2019.

REFERENCES [9] H. Samkari and A. Gutub, “Protecting Medical Records against


Cybercrimes within Hajj Period by 3-layer Security,” in Recent
[1] S. Karnouskos, "Stuxnet worm impact on industrial cyber-phys-
Trends Inf. Technol. Appl., vol. 2, no. 3, 2019.
ical system security," IECON 2011 - 37th Annu. Conf. IEEE
[10] G. V. Hulme. “DDoS explained: How distributed denial of ser-
Ind. Electron. Soc., Melbourne, VIC, 2011, pp. 4490-4494, doi:
vice attacks are evolving.” Csoonline.com. https://www.csoon-
10.1109/IECON.2011.6120048.
line.com/article/3222095/ddos-explained-how-denial-of-ser-
[2] N. Kheshaifaty and A. Gutub, “Preventing Multiple Accessing
vice-attacks-are-evolving.html
Attacks via Efficient Integration of Captcha Crypto Hash Func-

JISCR 2020; Volume 3 Issue (1)


74 Elsherif & Aldaej

[11] R. Fadhlee, M. Dollah, M. A. Faizal, M. Z. Mas’ud and L. K. Xin, analysis of DoS attack Using Data Analysis tools,” in Int. J. In-
“Machine Learning for HTTP Botnet Detection Using Classifier nov. Res. Comput. Commun. Eng., vol. 4, no. 6, June, 2016, doi:
Algorithms,” in J. Telecommun. Electron. Comput. Eng., vol. 10, 10.15680/IJIRCCE.2016. 0406208.
no. 1-7, Jan./Mar. 2018. [15] M. S. Gadelrab, M. Elsheikh, M. A. Ghoneim and M. Rashwan,
[12] X. D. Hoang and Q. C. Nguyen, “Botnet Detection Based On Ma- “BotCap: Machine Learning Approach for Botnet Detection
chine Learning Techniques Using DNS Query Data,” in Future Based on Statistical Features,” in Int. J. Commun. Netw. Inf. Se-
Internet, vol. 10, no. 5, May 18, 2018. doi: 10.3390/fi10050043 cur. (IJCNIS), vol. 10, no. 3, pp. 563-579, Dec. 2018.

[13] S. Saad et al., "Detecting P2P botnets through network behavior [16] M. Goldstein. “BoNeSi – the DDoS Botnet Simulator.” Github.
analysis and machine learning," 2011 Ninth Annu. Int. Conf. Priv. com. https://github.com/Markus-Go/bonesi
Secur. Trust, Montreal, QC, 2011, pp. 174-180, doi: 10.1109/ [17] N. Sirikulviriya and S. Sinthuponyo, “Integration of rules from
PST.2011.5971980. a random forest,” Int. Conf. Inf. Electron. Eng., Bangkok, Thai-
[14] N. Sharma, A. Mahajan and V. Mansotra, “Identification and land, May 28-29, 2011.

JISCR 2020; Volume 3 Issue (1)

You might also like