0% found this document useful (0 votes)

8 views15 pages

AI in Cyber Security

The document discusses the increasing challenges of network security due to the rise in digitization and network traffic, highlighting the significance of Intrusion Detection Systems (IDS) in identifying malicious activities. It evaluates the effectiveness of various machine learning and deep learning algorithms for anomaly detection, using performance metrics such as accuracy, precision, recall, and F-1 score on the CICIDS-2017 dataset. The study aims to enhance the performance of IDS through feature selection and the application of advanced algorithms to improve detection accuracy.

Uploaded by

onyinye cynthia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views15 pages

AI in Cyber Security

Uploaded by

onyinye cynthia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Over the past decade, more and more businesses and organizations are digitizing their confidential data.

This has
increased the volume of network traffic, with data being created at a very large scale. Computer networks have
expanded tremendously over the last decade especially with the emergence of new devices and services like cloud
computing and Internet of Things (IoT). The security of this data is a big challenge. Also, attacks on networks have
increased significantly and Network Intrusion is acknowledged to be the most danger to security [1],[2].

Attacks like Denial of Service (DoS), Zero-day attacks and Advanced Persistent threats (APT) have been significant
problems in today’s information technology global community. This is where the idea of Intrusion Detection System (IDS)
comes handy. Intrusion Detection System (IDS) are hardware and software systems that can identify such harmful
behaviors. The main objective of the Intrusion Detection System (IDS) is to observe the behavior of the system, identify
attacks and generate alarms so that appropriate actions can be taken to prevent any harmful consequences [2].

Intrusion can be detected using two classification techniques i.e., signature-based and anomaly based. Signature-based,
also known as pattern-based anomalies are looked against a list of patterns the database already has. Signature-based
intrusion detection comes with a drawback – it is unable to learn by itself, any anomalous patterns and intrusions within
raw data.

Anomaly-based intrusion detection can point out a normal or benign activity and look for anything that is anomalous. It
can learn any abnormal pattern based on Machine Learning and Deep Learning concepts. The inputs of an IDS could be
traffic logs, application logs, file system changes, packets, etc. that are monitored, and output is the label for each input
[2].

Numerous research studies have been conducted in the field of Machine Learning (ML) and Deep Learning (DL) because
they can learn trends of malicious behaviors while reducing false alarms [9]. Several authors have attempted to do a
comprehensive survey on Machine Learning and Deep Learning techniques for anomaly detection [2],[4],[5].

It turns out that much of the research in this area is based on shallow Learning technique which requires a lot of time,
effort and resources and their effectiveness depends on the expertise and extent of knowledge of the researchers in the
field [10].

Network Intrusion Detection using Machine Learning (ML) and Deep Learning (DL) is one of the most significant
developments in the field of information security. There is a competition among researchers, leading companies, and
economies to advance Deep Learning and Artificial Intelligence. In some cases, Artificial Intelligence has exceeded human
Intelligence, like the modern mobile applications, decision to predict stocks, decision to predict movie ratings, etc.
Although DL and ML in detecting network attacks have accomplished a lot, there are still areas where effectiveness is
lacking. There could be more precision, accuracy and performance of the algorithms that help classify these attacks in
order to prevent them.

With the increase in the volume of network traffic, with data being created at a very large scale. Computer networks
have expanded hugely over the last decade and especially with the emergence of new devices and services like cloud
computing and Internet of Things (IoT), attacks on networks, globally have increased significantly [34]. Malware, spear-
phishing,

Ransomware top the list of cybersecurity threats. Besides those many other network intrusion attacks like denial of
service, Zero-day attacks and advanced persistent threats (APT) have been reported as significant problems in today’s
information technology global community. APTs can be dangerous and costly as these are powerful attacks launched by
malicious actors against government and private organizations with the intent of causing great damage.
Objective of this project

The main objective of this study is to evaluate the effectiveness of machine learning models by using various
performance metrics. The performance metrics we used for this study is Accuracy, Precision, recall and F-1 score. The
goal of this study is to test the performance of various machine learning algorithms on the various categories of subset
data of realistic evaluation dataset CICIDS-2017.

It was expected that our machine learning model comprising feature selection using Pearson’s correlation coefficient
coupled with these algorithms would increase the accuracy on the CICIDS2017 dataset. This would be the contribution of
this study in the field of application of machine learning on anomaly detection.

Machine Learning and Deep Learning Algorithms

In recent years, Machine Learning and Deep Learning algorithms in anomaly detection

have garnered huge interest [4],[23]. Anomaly-based intrusion detection is essentially a

classification problem and Machine Learning and Deep Learning algorithms have proven to be

useful in Network Intrusion Detection [5],[6].

Machine Learning is a branch of Artificial Intelligence, and it gives computers the

ability to learn without being explicitly programmed [23]. Deep Learning is an advanced field in

Machine-Learning research, and it simulates the human brain style to analyze and interpret data.

Deep Learning is essentially an advancement of the Machine Learning process and it is derived

and formulated from the Artificial Neural Network. It is believed that Deep Learning algorithms

are the most significant breakthrough of the century, which significantly drives applications

towards Artificial Intelligence [11].

Traditional Machine Learning methods used for intrusion detection such as Support

Vector Machine (SVM), Decision Tree, Linear Regression, Hidden Markov Model etc. have

shallow architecture and are not capable of handling intrusion detection in modern data

environments [24]. The idea of Deep Learning was proposed by Hinton [25] and it is a Machine-

Learning method based on characterization of data Learning. Some examples of Deep Learning

algorithms include Convolutional Neural Network (CNN), LSTM (Long Short-Term Memory),

Deep Boltzmann Machine (DBM), etc.

Logistic Regression: Logistic regression is a predictive analysis algorithm, and it is based on the

concept of probability. It is used for classification problems. It is used for binary classification

which uses a logistic function called a sigmoid function for prediction. Although its name makes

it sound like a regression algorithm, logistic regression is a classification algorithm.

Kernelized Support Vector Machine (SVM): Support Vector Machine comprises a set of

supervised learning methods. It is one of the most simple and common ML algorithms used to

categorize different types of data in SVM. It is a non-probabilistic method. It creates a hyper-

plane or a multiple hyper-plane in a boundless dimensional input vector to classify the instances.

It is a powerful model and performs well on a variety of datasets. It has been used to identify

network intrusion quickly and accurately [41]. However, it requires very meticulous and careful

data pre-processing of the data and tuning of parameters.

K-Nearest Neighbor: The KNN is a classification algorithm inspired from Standard Euclidean

Distance (SED) that exists between two points in the same space [8]. It is a very simple and easy

to implement algorithm and there is no need to build a model and optimize parameters. However,

the algorithm performs very slowly with the increase in number of examples or variables.
The two important parameters in KNN algorithms are: number of neighbors and the way

distance between data points are measured. The default distance used is the Euclidean distance

which works well.

Naive Bayes: Naive Bayes algorithm is a supervised learning algorithm based on Bayes’

theorem which assumes conditional independence between every pair of features given the value

of the class variable. It is easy to implement an algorithm, but it requires the predictors to be

independent. Since most realistic cases have predictors that are dependent, the performance of

the classifier is affected negatively. Naïve Bayes Classifiers are efficient and the reason being

that they learn parameters by looking at each feature individually and they collect statistics from

each feature. There are three classes of Naïve Bayes Classifiers implemented in ScikitLearn:

BenoulliNB, MultinomialNB and GaussianNB. For this study , GaussianNB was used because it

can be applied to any continuous data [31].

The dataset used in this study is comparatively high-dimensional and GaussianNB is mostly used

on very high-dimensional data. The GaussianNB model requires very less training time and

makes predictions.

Decision tree: Decision tree is a supervised ML algorithm used to classify data. The

architecture of a decision tree comprises the category nodes, the internal nodes and a root node.

Decision trees are the building blocks of Random Forest. Decision trees are simple and easy to

implement and can handle high dimensional data. One advantage of running Random

Forest(RF) is that we have to specify fewer parameters compared to other machine learning

methods like support vector machines (SVM), Artificial Neural Network(ANN) [28] .

Decision trees make their decision(classification) by learning a hierarchy of if/else questions. In

the language of Machine Learning, these if/else questions are known as ‘tests’. To build a tree,
the algorithm looks for all tests and discovers the one that represents the target variable the most

[31]. The main downside to Decision trees is that they suffer from overfitting problems and they

are poor at generalizations.

Random Forest (RF)

The random forest classifier was proposed by Breimanis [29]. It is essentially a decision

tree concept that is constructed by using many decision trees. It takes thousands of input

variables without deleting variables and classifies them based on their importance [28]. It is an

ensemble of classification trees.

In random forest, a collection of individual tree structured classifiers can be mathematically

expressed as below:

{ h(x, θk ), k = 1, 2, ….i … } [30]

Where h represents RF classifier, {θk } stands for random vectors distributed independently

identical and each tree has an input for the most famous class at input variable x.
Figure 1. Structure of a decision tree.

As discussed above, a random forest is basically a collection of decision trees and the trees are slightly different from one
another. The issue of decision trees suffering from overfitting of training data is solved by random forests. Random forest
is a strong classifier.

Methodology

Introduction
The data for this study is secondary data i.e. collected by other researchers. The data was

generated by researchers from the Canadian Institute of Cybersecurity [1]. The dataset is very

realistic.
Figure 2. Flow Chart of the method used in the calculation.

In this study, a small subset of data from the CICIDS2017 was taken to optimize the

Machine Learning model that can help the attacks mentioned in the table above. The dataset

comprises attacks captured using CICFlowMeter [16] with timestamp, source and destination

IPs, source and destination ports, protocols and type of attack.

Hardware and Software Environment

Operating System : Windows 10 Home

Processor : AMD Ryzen 5 3600 6-Core Processor, 3.6 GHz

Installed RAM: 16.0 GB

Startup Disk : McIntosh HD

Software Environment: Python 3.9.4 64-bit

Design of the Study

The study is mathematical computation in nature. Our model uses Pearson Correlation

Coefficient as the feature elimination technique and various supervised Machine Learning

classifiers for performing classification. The python libraries that are useful in the study

are Scikit-learn, Numpy, Pandas, Keras, matplotlib, TensorFlow, and Pytorch.

The calculation was performed on a jupyter Notebook using python. In order to perform

the calculation, first the required python libraries were imported. Then, a dataset was imported.

The dataset was analyzed. As with every dataset, we need to take care of missing data and select

appropriate features. The ‘scikitlearn’ library comes very handy when using necessary resources

in python.

Feature selection is a very important task as it helps reduce the computational complexity

and eliminate unnecessary and irrelevant features while enhancing the performance of IDS [34],

[35],[38]. Correlation-based feature selection has been found to improve classification accuracy

and reduce the dimensionality of dataset [36],[37]. The correlation function called from scikit

learn library is used to obtain a confusion matrix. A correlation coefficient is a measure of the

degree to which variation in one variable is related to variation in one or more variables [32],

[34].

The value of correlation coefficient can range from -1 to 1. If the value of correlation is

close to +1, there is a very strong positive relationship between the variables and a value close to

-1 indicates that there is a very strong negative relationship between the variables. Basically, if

the sign of the correlation is opposite, it shows the direction of the relationship between variables

[33]. So, the value of correlation tells us the relationship between variables. Feature selection
In the case of continuous variables, if the two values are highly correlated, they

contribute the same factor to the target result, so appropriate selection of features can be done.

Figure 3. Scatter plot showing how the value of correlation coefficient defines the relationship

between attributes.

Figure 4. Set up for Pearson’s correlation coefficient in jupyter notebook

Figure 5. Pearson’s correlation plot showing features considered for this study.

After the analysis of the Pearson’s correlation plot, the final 14 features that were selected were:

1. Total Flow Duration

2. Total Forward Packets

3. Total Length of Forward Packets

4. Forward Packet’s maximum length

5. Forward Packet’s minimum length

6. Forward Packet’s mean length

7. Backward Packet maximum length

8. Backward Packet minimum length

9. Flow Bytes per second

10. Flow Packets per second

11. Backward Packets per second

12. Minimum Packet Length

13. Initial Window Bytes (Forward)

14. Initial Window Bytes (Backward)

Description of each features selected

Total Flow Duration: The total duration of flow in microseconds.

Total Forward Packets: Total packets in forward direction.

Total Length of Forward Packets: Total size of packet in forward direction.

Forward Packet’s maximum length: The maximum size of packet in forward direction. Forward

Packet’s minimum length: The minimum size of packet in forward direction. Forward Packet’s

mean length: The mean size of packet in forward direction.

Backward Packet maximum length: The maximum size of packet in backward direction.

Backward Packet minimum length: The minimum size of packet in backward direction. Flow

Bytes per second: The number of flow bytes per second.

Flow Packets per second: The number of flow packets per second.

Backward Packets per second: The number of backward packets per second
Minimum Packet Length: The minimum length of a packet.

Initial Window Bytes (Forward): The total count of bytes sent in the initial window in the

forward direction.

Initial Window Bytes (Backward): The total count of bytes sent in the initial window in the

forward direction.

Therefore, this helped in selecting appropriate features for this study. Usually, cluster

analysis is done to serve this purpose in the case of unsupervised studies [43],[45] but we

conducted the study to see the performance by supervised algorithms.

After feature selection, the datasets were imported and using scikit-learn’s train_test_split

function, the data was split into 80 % training set and 20 % test set. After this, the classifier i.e.,

machine learning model’s parameters were defined, and the model was trained on a training set.

The model was tested on the test set. The prediction was observed through a confusion matrix. A

classification report was generated for each dataset and algorithm, which shows the traffic

classified into ‘BENIGN’ and attack type or types. It shows various other metrics like precision,

recall, f-1 score and support for further analysis and conclusion.

Performance Metrics
As discussed above, in order to measure the performance of machine learning

algorithms, we use some metrics like accuracy, precision, recall, and F-1-score. The

performance indicators used for classification problems are based on the below mentioned four

possibilities: True Positive (TP): correct classification attack packets as attacks.

True Negative (TN): correct classification normal packets as normal.

False Positive (FP): normal activity that is wrongly labeled as intrusive by IDS.

False Negative (FN): intrusive activity that is classified as normal.

The accuracy, the precision, recall and F1-score are defined as follows:

Accuracy: The accuracy rate is the main prediction indicator for the several machine and deep

learning classifiers. It is simply the measure of how correctly the model classifies.

Accuracy = (Tp + Tn)/(Tp + Fp + Tn + Fn)

Where, Tp = True Positive, Tn = True Negative, Fp = False positive, Fn = False Negative

Precision: It is the ratio of correctly identified positive observations to all the predicted positive

observations. In other words, Precision measures the number of correct instances retrieved

divided by all retrieved instances [39].

Precision = True Positive / (True Positive + False Positive)

The precision is intuitively the ability of the classifier not to label as positive a sample that

is negative [40].

Recall: Recall is the ratio of correctly identified positive cases to all the observed cases. In

other words, recall measures the number of correct instances retrieved divided by all correct

instances [39].

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of

false negatives. The recall is intuitively the ability of the classifier to find all the positive samples

[40].

F-1 score: It is the harmonic mean of precision and recall. It is needed when we want to find a

balance between Precision and Recall.

F-1 score = 2 * (Precision x Recall)/(Precision + Recall)

The CICIDS-2017 Dataset

CIC-IDS2017 has benign and common attacks which is very similar to true real-world data [1]. It

also has the result of network traffic analysis using CICFlowMeter with labeled flows based on
the timestamp, source, and destination IPs, source and destination ports, protocols and attack

(CSV files) [1].

Table 1. Types of Intrusion in the CICIDS-2017 dataset

No. Group of intrusion Type of Intrusion
1 Normal Benign

2 Denial of Service (DoS) Botnet, DDoS, DoSGoldenEye, DoS

Hulk, DoSSlowhttp, DoSSlowloris

3 Password attack FTP-Patator, SSH-Patator, Web-Attack-Brute-Force

4 Probing Port Scan

5 Vulnerability Heartbleed Attack, Infiltration, Web-Attack-SQL-

Injection, Web-Attack-XSS

The data was captured between July 3, 2017 and July 7, 2017 for a total of 5 days. The

implemented attacks contain Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack,

Infiltration, Botnet and DDoS [1]. CICIDS2017 is a very huge dataset which has approximately

3 million network flows in different files [1][27]. In CICIDS2017, there is no specified training

or test sets to be used in the experiments. So, for this study, only 10% of this dataset was

selected for training and testing so that we can reduce training and testing time or the training

and testing time would be very lengthy. Also, the computer used for this study suffered memory

error while trying to take a bigger size of datasets for calculations. The selection of those 10% of

the dataset was done randomly by using the sampling without replacement technique to ensure

the diversity of traffic records and avoiding overfitting.

It has datasets listed under different categories. There are eight different

categories of datasets within the main folder containing the datasets. The

objective was to perform study on each dataset separately. So, instead of

combining these different files into one, machine learning study was performed in

each category of dataset separately. However, some datasets were avoided from

the study like the ‘Monday-WorkingHours.pcap’ dataset as it contained only

normal benign traffic and ‘Friday-WorkingHours-Morning.pcap_ISCX’ was

avoided because of only one class problem. The detailed study and results

obtained from the classification is listed in the next section in this paper.

A Review of Machine Learning Methodologies For Network Intrusion Detection
No ratings yet
A Review of Machine Learning Methodologies For Network Intrusion Detection
4 pages
Progress of Machine Learning in The Field of Intrusion Detection Systems
No ratings yet
Progress of Machine Learning in The Field of Intrusion Detection Systems
9 pages
An Overview of Machine Learning Applications For Intrusion Detection
No ratings yet
An Overview of Machine Learning Applications For Intrusion Detection
31 pages
Semi Supervised
No ratings yet
Semi Supervised
13 pages
Network Intrusion Detection Using SupervisedMachine Learning Technique With Feature Selection
No ratings yet
Network Intrusion Detection Using SupervisedMachine Learning Technique With Feature Selection
66 pages
A Survey On Machine Learning and Deep Learning Techniques With Intrusion Detection System
No ratings yet
A Survey On Machine Learning and Deep Learning Techniques With Intrusion Detection System
10 pages
Review On Network Intrusion Detection Techniques Using Machine Learning
No ratings yet
Review On Network Intrusion Detection Techniques Using Machine Learning
6 pages
A Review of Intrusion Detection System Using Machine Learning Approach
No ratings yet
A Review of Intrusion Detection System Using Machine Learning Approach
8 pages
A Survey On Different Machine Learning Algorithms That Are Compatible With CSE-CIC IDS 2018 Dataset
No ratings yet
A Survey On Different Machine Learning Algorithms That Are Compatible With CSE-CIC IDS 2018 Dataset
5 pages
Machine Learning for Intrusion Detection
No ratings yet
Machine Learning for Intrusion Detection
9 pages
Symmetry 15 01251
No ratings yet
Symmetry 15 01251
31 pages
Manjunath Classification in NIDS Using CRNN
No ratings yet
Manjunath Classification in NIDS Using CRNN
11 pages
Network Intrusion Detection Clustering & Gradient
No ratings yet
Network Intrusion Detection Clustering & Gradient
7 pages
Network and Host Based Intrusion Detecti
No ratings yet
Network and Host Based Intrusion Detecti
20 pages
Comparison 2018
No ratings yet
Comparison 2018
14 pages
Network Intrusion Detection: ML & DL Review
No ratings yet
Network Intrusion Detection: ML & DL Review
4 pages
Supervised Machine Learning Algorithms For Intrusion Detection
No ratings yet
Supervised Machine Learning Algorithms For Intrusion Detection
14 pages
Intrusion Detection: A Deep Learning Approach
No ratings yet
Intrusion Detection: A Deep Learning Approach
3 pages
Deep Learning for IDS Using UNSW Dataset
No ratings yet
Deep Learning for IDS Using UNSW Dataset
6 pages
Saurabh Kansal Dec Month 2024 - 18 Feb
No ratings yet
Saurabh Kansal Dec Month 2024 - 18 Feb
12 pages
Machine Learning Based Intrusion Detection System
No ratings yet
Machine Learning Based Intrusion Detection System
6 pages
ML Algorithms for Intrusion Detection
No ratings yet
ML Algorithms for Intrusion Detection
20 pages
Network Intrusion Detection Using Supervised Machine Learning Technique With Feature Selection
No ratings yet
Network Intrusion Detection Using Supervised Machine Learning Technique With Feature Selection
4 pages
Final Project
No ratings yet
Final Project
15 pages
Network Intrusion Detection Tech
No ratings yet
Network Intrusion Detection Tech
5 pages
Radhapaper 2
No ratings yet
Radhapaper 2
19 pages
Minor Project
No ratings yet
Minor Project
17 pages
Hybrid Deep Learning for Intrusion Detection
No ratings yet
Hybrid Deep Learning for Intrusion Detection
16 pages
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
No ratings yet
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
11 pages
Rq3 Paper 02
No ratings yet
Rq3 Paper 02
20 pages
Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
No ratings yet
Intrusion Detection Using Deep Belief Network and Probabilistic Neural Network (2017)
4 pages
Deep Learning Approach For Intelligent Intrusion Detection System
No ratings yet
Deep Learning Approach For Intelligent Intrusion Detection System
5 pages
Sada
No ratings yet
Sada
11 pages
Supervised Machine Learning Comparison
No ratings yet
Supervised Machine Learning Comparison
11 pages
Network Attack Detection Using ML & DL
No ratings yet
Network Attack Detection Using ML & DL
5 pages
Cyber Threat Detection Synopsis
No ratings yet
Cyber Threat Detection Synopsis
14 pages
Ke 2021 J. Phys. Conf. Ser. 2113 012074
No ratings yet
Ke 2021 J. Phys. Conf. Ser. 2113 012074
14 pages
Intrusion Detection System Using Gradient Boosted Trees For VANETs
No ratings yet
Intrusion Detection System Using Gradient Boosted Trees For VANETs
7 pages
Machine Learning Based Intrusion Detection System
No ratings yet
Machine Learning Based Intrusion Detection System
5 pages
Anomaly Based Intrusion Detection Model Using Supervised Machine Learning Techniques
No ratings yet
Anomaly Based Intrusion Detection Model Using Supervised Machine Learning Techniques
5 pages
Enhanced IDS with SVM and Deep Learning
No ratings yet
Enhanced IDS with SVM and Deep Learning
15 pages
Intrusion Detection Using ML
No ratings yet
Intrusion Detection Using ML
36 pages
Intrusion Detection Model Using Machine Learning Algorithms On NSL-KDD Dataset
No ratings yet
Intrusion Detection Model Using Machine Learning Algorithms On NSL-KDD Dataset
14 pages
MS Research Proposal PR
No ratings yet
MS Research Proposal PR
17 pages
Paper 1
No ratings yet
Paper 1
16 pages
55.bat Deep Learning Methods On Network Compressed
No ratings yet
55.bat Deep Learning Methods On Network Compressed
14 pages
6G Network Intrusion Detection with ML
No ratings yet
6G Network Intrusion Detection with ML
5 pages
Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
No ratings yet
Intrusion Detection System Based On Multi-Layer Perceptron Neural Networks and Decision Tree
5 pages
Machine Learning and Deep Learning 2nd Review1
No ratings yet
Machine Learning and Deep Learning 2nd Review1
8 pages
Analysis of KDD 99 Intrusion Detection Dataset For Selection
No ratings yet
Analysis of KDD 99 Intrusion Detection Dataset For Selection
8 pages
Using A Long Short Term Memory Recurrent
No ratings yet
Using A Long Short Term Memory Recurrent
21 pages
Seguridad
No ratings yet
Seguridad
29 pages
A Comprehensive Survey of Intrusion Detection System Using Machine Learning and Deep Learning Approaches
No ratings yet
A Comprehensive Survey of Intrusion Detection System Using Machine Learning and Deep Learning Approaches
6 pages
A Machine Learning-Based Intrusion Detection of DDoS Attack On IoT Devices
No ratings yet
A Machine Learning-Based Intrusion Detection of DDoS Attack On IoT Devices
6 pages
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
No ratings yet
Statistical Performance Assessment of Supervised Machine Learning Algorithms For Intrusion Detection System
12 pages
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
No ratings yet
Modernized Intrusion Detection Using Enhanced Apriori Algorithm
10 pages
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
No ratings yet
Sat - 35.Pdf - Detection of Attacks (DoS Probe) Using Genetic Algorithm
11 pages
BDLSTM Model for Intrusion Detection
No ratings yet
BDLSTM Model for Intrusion Detection
30 pages
Convolutional Neural Networks With LSTM For Intrusion Detection
No ratings yet
Convolutional Neural Networks With LSTM For Intrusion Detection
11 pages
User and Technical Manual
No ratings yet
User and Technical Manual
2 pages
Esol Project Guantt Chart
No ratings yet
Esol Project Guantt Chart
9 pages
Acc
No ratings yet
Acc
7 pages
Document
No ratings yet
Document
17 pages
MSC PH Tobacco Control 2025
No ratings yet
MSC PH Tobacco Control 2025
30 pages
Level 3
No ratings yet
Level 3
4 pages
A Systematic Literature Review On Long-Term Care Quality Improvem
No ratings yet
A Systematic Literature Review On Long-Term Care Quality Improvem
36 pages
5G Networks in Nigeria
No ratings yet
5G Networks in Nigeria
5 pages
India's Path to Great Power Status
No ratings yet
India's Path to Great Power Status
8 pages
Classified Paper 4 Answers
No ratings yet
Classified Paper 4 Answers
356 pages
Vishwakarma Institute of Technology: FF No. 182
No ratings yet
Vishwakarma Institute of Technology: FF No. 182
4 pages
Path-Based UPF Strategies for Power Management
No ratings yet
Path-Based UPF Strategies for Power Management
10 pages
EJERCICIOS MECANISMOS - Javier Fuentes C
No ratings yet
EJERCICIOS MECANISMOS - Javier Fuentes C
10 pages
Inertial vs Non-Inertial Frames Explained
No ratings yet
Inertial vs Non-Inertial Frames Explained
6 pages
Part 1 Mock Test - MTH 105 - Nass Oau
No ratings yet
Part 1 Mock Test - MTH 105 - Nass Oau
2 pages
EXP3
No ratings yet
EXP3
13 pages
Matric Physics Past Papers Guide
No ratings yet
Matric Physics Past Papers Guide
121 pages
Practice Questions Set 1
No ratings yet
Practice Questions Set 1
4 pages
Taping Techniques for Distance Measurement
No ratings yet
Taping Techniques for Distance Measurement
3 pages
Econometrics Course Overview
No ratings yet
Econometrics Course Overview
4 pages
PLUMBING ARITHMETIC - MODULE 2-Unlocked
No ratings yet
PLUMBING ARITHMETIC - MODULE 2-Unlocked
3 pages
Iso 17381 2003
No ratings yet
Iso 17381 2003
11 pages
Kinetics Guided Notes Part 3
No ratings yet
Kinetics Guided Notes Part 3
5 pages
Q.1 Write A Program To Find Maximum Between Three Numbers.: Code
No ratings yet
Q.1 Write A Program To Find Maximum Between Three Numbers.: Code
89 pages
Cambridge Assessment International Education: Mathematics 9709/33 May/June 2019
No ratings yet
Cambridge Assessment International Education: Mathematics 9709/33 May/June 2019
15 pages
M2000 Technical Manual V6 300 PDF
No ratings yet
M2000 Technical Manual V6 300 PDF
271 pages
AIAA SciTech 2025 eVTOL Flight Dynamics Rev 1
No ratings yet
AIAA SciTech 2025 eVTOL Flight Dynamics Rev 1
28 pages
BCS
No ratings yet
BCS
16 pages
Distributed Rendering in Computer Graphics
No ratings yet
Distributed Rendering in Computer Graphics
16 pages
Probability Assignments and Calculations
No ratings yet
Probability Assignments and Calculations
2 pages
Part III Seminar Series
No ratings yet
Part III Seminar Series
53 pages
Similarity Transformations S Li Dec22
No ratings yet
Similarity Transformations S Li Dec22
5 pages
Class 8 Rational Numbers Worksheet
67% (3)
Class 8 Rational Numbers Worksheet
5 pages
Bowers and Katsube
No ratings yet
Bowers and Katsube
18 pages
Basic Sampling Theory May 2018
No ratings yet
Basic Sampling Theory May 2018
1 page
5 Bit Multiplier
No ratings yet
5 Bit Multiplier
10 pages
Intermediate Algebra Exponent Laws Worksheet
100% (1)
Intermediate Algebra Exponent Laws Worksheet
4 pages
K-Means and Clustering Analysis Questions
No ratings yet
K-Means and Clustering Analysis Questions
9 pages
Nash Equilibrium
No ratings yet
Nash Equilibrium
30 pages

AI in Cyber Security

Uploaded by

AI in Cyber Security

Uploaded by

Over the past decade, more and more businesses and organizations are digitizing their confidential data.

Machine Learning and Deep Learning Algorithms

have garnered huge interest [4],[23]. Anomaly-based intrusion detection is essentially a

useful in Network Intrusion Detection [5],[6].

Machine Learning is a branch of Artificial Intelligence, and it gives computers the

towards Artificial Intelligence [11].

Deep Boltzmann Machine (DBM), etc.

it sound like a regression algorithm, logistic regression is a classification algorithm.

categorize different types of data in SVM. It is a non-probabilistic method. It creates a hyper-

data pre-processing of the data and tuning of parameters.

which works well.

can be applied to any continuous data [31].

Decision trees make their decision(classification) by learning a hierarchy of if/else questions. In

are poor at generalizations.

Random Forest (RF)

ensemble of classification trees.

In random forest, a collection of individual tree structured classifiers can be mathematically

{ h(x, θk ), k = 1, 2, ….i … } [30]

IPs, source and destination ports, protocols and type of attack.

Hardware and Software Environment

Processor : AMD Ryzen 5 3600 6-Core Processor, 3.6 GHz

Installed RAM: 16.0 GB

Startup Disk : McIntosh HD

Design of the Study

are Scikit-learn, Numpy, Pandas, Keras, matplotlib, TensorFlow, and Pytorch.

Figure 4. Set up for Pearson’s correlation coefficient in jupyter notebook

1. Total Flow Duration

2. Total Forward Packets

3. Total Length of Forward Packets

4. Forward Packet’s maximum length

6. Forward Packet’s mean length

7. Backward Packet maximum length

8. Backward Packet minimum length

9. Flow Bytes per second

10. Flow Packets per second

11. Backward Packets per second

12. Minimum Packet Length

13. Initial Window Bytes (Forward)

14. Initial Window Bytes (Backward)

Description of each features selected

Total Flow Duration: The total duration of flow in microseconds.

Total Forward Packets: Total packets in forward direction.

Total Length of Forward Packets: Total size of packet in forward direction.

mean length: The mean size of packet in forward direction.

Bytes per second: The number of flow bytes per second.

conducted the study to see the performance by supervised algorithms.

possibilities: True Positive (TP): correct classification attack packets as attacks.

True Negative (TN): correct classification normal packets as normal.

False Negative (FN): intrusive activity that is classified as normal.

Accuracy = (Tp + Tn)/(Tp + Fp + Tn + Fn)

Where, Tp = True Positive, Tn = True Negative, Fp = False positive, Fn = False Negative

divided by all retrieved instances [39].

Precision = True Positive / (True Positive + False Positive)

balance between Precision and Recall.

F-1 score = 2 * (Precision x Recall)/(Precision + Recall)

The CICIDS-2017 Dataset

(CSV files) [1].

Table 1. Types of Intrusion in the CICIDS-2017 dataset

2 Denial of Service (DoS) Botnet, DDoS, DoSGoldenEye, DoS

Hulk, DoSSlowhttp, DoSSlowloris

3 Password attack FTP-Patator, SSH-Patator, Web-Attack-Brute-Force

4 Probing Port Scan

5 Vulnerability Heartbleed Attack, Infiltration, Web-Attack-SQL-

the diversity of traffic records and avoiding overfitting.

objective was to perform study on each dataset separately. So, instead of

the study like the ‘Monday-WorkingHours.pcap’ dataset as it contained only

normal benign traffic and ‘Friday-WorkingHours-Morning.pcap_ISCX’ was

You might also like