0% found this document useful (0 votes)
22 views55 pages

Table of Content

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views55 pages

Table of Content

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CHAPTER 1

INTRODUCTION
The advancement of fifth-generation (5G) mobile communication technology
has led to the diversification of access environments and the establishment of
distributed networks, enabling the transmission of various types of data through
network systems. These data, originating from sensors, computers, and the
Internet of Things (IoT), are now processed more efficiently due to the
expanded capacity of network systems. However, the increased diversity of
access points has also expanded the attack surface, making network systems
more susceptible to potential threats. Furthermore, cyber-attack techniques
have evolved to become more intricate and frequent, underscoring the critical
importance of cybersecurity. Consequently, numerous studies are actively
being conducted to mitigate potential network threats. A key challenge in
cybersecurity lies in the identification of network threats, with various findings
emerging in the realm of network intrusion detection systems (NIDSs). Recent
studies have predominantly focused on integrating artificial intelligence (AI)
technology into NIDS, resulting in significant advancements in performance.
Initially, research efforts concentrated on applying traditional machine learning
models like decision trees (DTs) and support vector machines (SVMs) to
existing intrusion detection systems, with current emphasis shifting towards
deep learning methodologies such as convolutional neural networks (CNNs),
long short-term memory (LSTM), and auto encoders. While these approaches
have shown promising results in anomaly detection, challenges persist in their
practical deployment in real-world systems.
The majority of network flow data is typically normal traffic, with rare
occurrences of malicious behavior that can lead to service failure. Furthermore,
within the realm of malicious behavior, most of the data consists of well-
known attacks, while specific types of attacks are exceptionally uncommon.
This data imbalance poses a challenge for AI models deployed in Network
Intrusion Detection Systems (NIDS), as they struggle to adequately learn the

1
characteristics of specific network threats. Consequently, this can leave
network systems vulnerable to attacks due to poor detection performance.

To tackle this inherent problem, our study introduces a novel AI-based


NIDS that effectively addresses the data imbalance issue and enhances the
performance of previous systems. To overcome this challenge, we utilized
state-of-the-art deep learning architecture known as generative adversarial
networks (GANs) to generate synthetic network traffic data. Specifically, we
focused on a GAN architecture that incorporates reconstruction error and
Wasserstein distance enabling the generation of plausible synthetic data for
minor attack traffic. By combining this generative model with anomaly
detection models, we have demonstrated that our proposed systems surpass
previous results in terms of classification performance. The system's
architecture comprises four main stages preprocessing, generative model
training, autoencoder training, and predictive model training. During the
preprocessing stage, the system refines the raw data set into a format suitable
for deep learning models to learn. Following preprocessing, the system
sequentially trains generative models and an autoencoder model. The trained
generative models are then utilized to train the autoencoder model. Finally, the
system trains predictive models by applying the trained generative models and
the encoder of the trained autoencoder. The generative models are used to
generate scarce data, while the encoder serves as a feature extractor.

Regarding classifier models, we consider three deep learning models


widely used in AI-based NIDS: deep neural networks (DNNs), CNNs, and
LSTM models. To evaluate our system, we conducted experiments using four
network flow data sets that encompassed different scenarios: NSL-KDD;
UNSW-NB15 IoT data set and a real-world data set. Through these
experiments, we demonstrate that the proposed system outperforms previous
results. Additionally, we showcase how our methodology can enhance the
performance of existing AI-based NIDS by addressing the data imbalance
problem.

2
1.1 PROBLEM STATEMENT

Despite advancements in communication and information sharing,


certain drawbacks have emerged. The importance of information security has
become paramount, especially with the billions of transactions taking place
online. A single second of network failure can result in millions of losses for an
organization. Various hacking techniques are being employed to breach client
servers, with Intrusion Detection being the most prevalent form of cyber-attack
today. Hackers often utilize botnets, which can effectively disrupt any network,
posing a significant threat on the Internet.

1.2 OBJECTIVE OF THE PROJECT

Most of the techniques used in modern IDs they cannot manage the
flexible and complex environment of Internet attacks on computer networks.
Methods of Deep learning provide appropriate accounting and communication
costs.

This study describes the behavior of the Deep learning to identify intruders.
This is very helpful in preventing interference with some kind of related attack.
The model can also reach real time identification entry based on size reduction
and simple separator. This study aims to increase focus on a number of points:

• Selecting the appropriate algorithm for the appropriate tasks depending on the
data types, size and network behavior and requirements.

• Implement a well-developed development process by preparing and selecting


benchmark data set to build a promising NIDS system.

• Data analysis, acquisition, modeling, and engineering key features, are used
several processing techniques by putting them together in a smart order for best
accuracy with low data representation size and size.

3
1.3 SCOPE OF THE PROJECT

Detecting an intrusion can be challenging, but it is possible within a


short period of time. Attackers can be traced back to thousands of IP addresses,
which in turn enhances the security of the Internet due to the numerous threats
posed to servers and networks. One such threat is the Distributed Denial of
Service (DDoS) attack, which aims to overwhelm online servers/services with
high volumes of traffic from various sources, causing servers to become busy
or crash. Attackers also distribute malware to personal computers, enabling
them to remotely control these infected computers as botnets to launch attacks
on their targets. In our proposed system, we will be utilizing a DL model to
forecast intrusion attacks and evaluate the performance of these models.

4
CHAPTER 2

LITERATURE SURVEY

1.TITLE: Machine Learning and Deep Learning Methods for Intrusion


Detection Systems: A Survey

AUTHOR: Honguy lue and bea loan

ABSTRACT:

Networks play important roles in modern life, and cyber security has
become a vital research area. An intrusion detection system (IDS) which is an
important cyber security technique, monitors the state of software and
hardware running in the network. Despite decades of development, existing
IDSs still face challenges in improving the detection accuracy, reducing the
false alarm rate and detecting unknown attacks. To solve the above problems,
many researchers have focused on developing IDSs that capitalize on machine
learning methods. Machine learning methods can automatically discover the
essential differences between normal data and abnormal data with high
accuracy. In addition, machine learning methods have strong generalizability,
so they are also able to detect unknown attacks. Deep learning is a branch of
machine learning, whose performance is remarkable and has become a research
hotspot. This survey proposes a taxonomy of IDS that takes data objects as the
main dimension to classify and summarize machine learning-based and deep
learning-based IDS literature. We believe that this type of taxonomy
framework is fit for cyber security researchers. The survey first clarifies the
concept and taxonomy of IDSs. Then, the machine learning algorithms
frequently used in IDSs, metrics, and benchmark datasets are introduced. Next,
combined with the representative literature, we take the proposed taxonomic
system as a baseline and explain how to solve key IDS issues with machine
learning and deep learning techniques. Finally, challenges and future
developments are discussed by reviewing recent representative studies.

5
Merits:

Comprehensive coverage of machine learning and deep learning methods for


IDSs, offering insights into improving detection accuracy and addressing key
cybersecurity challenges.
Proposal of a taxonomy framework enhances understanding and classification,
providing a structured approach for cybersecurity researchers.

Demerits:

Limited scope may overlook emerging intrusion detection techniques beyond


machine learning.

Lack of comparative analysis between machine learning algorithms, hindering


method selection for specific use cases.

2.TITLE: Evaluation of Machine Learning Algorithms for Intrusion Detection


System

AUTHORS: Mohammad Almseidin, MaenAlzubi, Szilveszter Kovacs and


MouhammdAlkasassbeh

ABSTRACT:

Intrusion detection system (IDS) is one of the implemented solutions


against harmful attacks. Furthermore, attackers always keep changing their
tools and techniques. However, implementing an accepted IDS system is also a
challenging task. In this paper, several experiments have been performed and
evaluated to assess various machine learning classifiers based on KDD
intrusion dataset. It succeeded to compute several performance metrics in order
to evaluate the selected classifiers. The focus was on false negative and false
positive performance metrics in order to enhance the detection rate of the
intrusion detection system. The implemented experiments demonstrated that
the decision table classifier achieved the lowest value of false negative while
the random forest classifier has achieved the highest average accuracy rate.

6
Merits:

Practical experimentation with real-world data and focus on key


performance metrics to enhance intrusion detection system effectiveness.

Demerits:

Lack of contextualization regarding dataset characteristics and absence of


detailed comparative analysis among classifiers.

3.TITLE: Enhancing Network Intrusion Detection Model Using Machine


Learning Algorithms

AUTHORS: Youngsoo Kim; Jong-Geun Park

ABSTRACT:

After the digital revolution, large quantities of data have been generated
with time through various networks. The networks have made the process of
data analysis very difficult by detecting attacks using suitable techniques.
While Intrusion Detection Systems (IDSs) secure resources against threats,
they still face challenges in improving detection accuracy, reducing false alarm
rates, and detecting the unknown ones. This paper presents a framework to
integrate data mining classification algorithms and association rules to
implement network intrusion detection. Several experiments have been
performed and evaluated to assess various machine learning classifiers based
on the KDD99 intrusion dataset. Our study focuses on several data mining
algorithms such as; naïve Bayes, decision trees, support vector machines,
decision tables, k-nearest neighbor algorithms, and artificial neural networks.
Moreover, this paper is concerned with the association process in creating
attack rules to identify those in the network audit data, by utilizing a KDD99
dataset anomaly detection. The focus is on false negative and false positive
performance metrics to enhance the detection rate of the intrusion detection
system. The implemented experiments compare the results of each algorithm

7
and demonstrate that the decision tree is the most powerful algorithm as it has
the highest accuracy (0.992) and the lowest false positive rate (0.009).

Merits:

Integration of data mining algorithms and association rules to


enhance network intrusion detection accuracy using real-world dataset.

Demerits:

Limited discussion on potential biases in dataset selection and absence


of in-depth analysis on computational efficiency of implemented algorithms.

4.TITLE: Prediction of Denial of Service Attack using Machine Learning


Algorithms

AUTHORS: Hyunjin Kim; Dowon Hong

ABSTRACT:

DDoS attack is one of the significant security threats in today’s Internet


world. The main intention of the network thread is to make the resource
unavailable such as flooding attacks. Here, Machine learning algorithms have
been used for detecting DDoS attacks. Generally, the success of any algorithm
has depended on the selection of appropriate data sets and the identification of
attack parameters. The KDD-CUP dataset has been taken for a detail
investigation of the DDoS attack. The K-nearest neighbor, ID3, Naive Bayes
and C4.5 algorithms are compared in a single platform concluding with the
positives with Naive Bayes. The main objective of the paper is to compare and
predict the error rate, computation time, Accuracy of the algorithms using the
Tanagra tool. Finally, these correlative algorithms have been compared and
verified through experimental verification and graphical representation.

8
Merits:

Comparative analysis of machine learning algorithms for DDoS attack


detection using real-world dataset with focus on error rate, computation time,
and accuracy.

Demerits:

Limited exploration of other potential algorithms and lack of discussion


on scalability and robustness of the implemented models.

5.TITLE: Prediction of DDoS Attacksusing Machine Learning and Deep


Learning Algorithms

AUTHOR: Cheolhee Park

ABSTRACT:

With the emergence of network-based computing technologies like


Cloud Computing, Fog Computing and IoT (Internet of Things), the context of
digitizing the confidential data over the network is being adopted by various
organizations where the security of that sensitive data is considered as a major
concern. Over a decade there is a massive growth in the usage of internet along
with the technological advancements that demand the need for the development
of efficient security algorithms that could withstand various patterns of the
security breaches. The DDoS attack is the most significant network-based
attack in the domain of computer security that disrupts the internet traffic of the
target server. This study mainly focuses to identify the advancements and
research gaps in the development of efficient security algorithms addressing
DDoS attacks in various ubiquitous network environments

Merits:

Exploration of security algorithms for DDoS attack prediction in diverse


network environments with emphasis on identifying advancements and
research gaps.

9
Demerits:

Lack of specific methodology or experimental validation for proposed


algorithms and absence of comparative analysis with existing techniques.

6.TITLE: An Enhanced AI-Based Network Intrusion Detection System Using


Generative Adversarial Networks

AUTHORS: Cheolhee Park; Jonghoon Lee; Youngsoo Kim; Jong-Geun Park;


Hyunjin Kim; Dowon Hong

ABSTRACT:

As communication technology advances, various and heterogeneous data are


communicated in distributed environments through network systems.
Meanwhile, along with the development of communication technology, the
attack surface has expanded, and concerns regarding network security have
increased. Accordingly, to deal with potential threats, research on network
intrusion detection systems (NIDSs) has been actively conducted. Among the
various NIDS technologies, recent interest is focused on artificial intelligence
(AI)-based anomaly detection systems, and various models have been proposed
to improve the performance of NIDS. However, there still exists the problem of
data imbalance, in which AI models cannot sufficiently learn malicious
behavior and thus fail to detect network threats accurately. In this study, we
propose a novel AI-based NIDS that can efficiently resolve the data imbalance
problem and improve the performance of the previous systems. To address the
aforementioned problem, we leveraged a state-of-the-art generative model that
could generate plausible synthetic data for minor attack traffic. In particular,
we focused on the reconstruction error and Wasserstein distance-based
generative adversarial networks, and autoencoder-driven deep learning models.
To demonstrate the effectiveness of our system, we performed comprehensive
evaluations over various data sets and demonstrated that the proposed systems
significantly outperformed the previous AI-based NIDS.

10
Merits: Introduction of a novel AI-based NIDS utilizing generative adversarial
networks to resolve data imbalance and enhance detection performance.

Demerits: Potential complexity in implementation and potential challenges in


generalizing results across diverse network environments.

7.TITLE: Explaining Network Intrusion Detection System Using Explainable


AI Framework

AUTHORS: Shraddha Mane, Dattaraj Rao

ABSTRACT:

Cybersecurity is a domain where the data distribution is constantly


changing with attackers exploring newer patterns to attack cyber infrastructure.
Intrusion detection system is one of the important layers in cyber safety in
today's world. Machine learning based network intrusion detection systems
started showing effective results in recent years. With deep learning models,
detection rates of network intrusion detection system are improved. More
accurate the model, more the complexity and hence less the interpretability.
Deep neural networks are complex and hard to interpret which makes difficult
to use them in production as reasons behind their decisions are unknown. In
this paper, we have used deep neural network for network intrusion detection
and also proposed explainable AI framework to add transparency at every stage
of machine learning pipeline. This is done by leveraging Explainable AI
algorithms which focus on making ML models less of black boxes by
providing explanations as to why a prediction is made. Explanations give us
measurable factors as to what features influence the prediction of a cyberattack
and to what degree. These explanations are generated from SHAP, LIME,
Contrastive Explanations Method, ProtoDash and Boolean Decision Rules via
Column Generation. We apply these approaches to NSL KDD dataset for
intrusion detection system and demonstrate results.

11
Merits:

Integration of explainable AI framework with deep neural network-


based intrusion detection system enhances interpretability and transparency of
model decisions.

Demerits:

Potential computational overhead and complexity introduced by the


use of multiple explainable AI algorithms may impact deployment scalability.

8.TITLE: Network Intrusion Detection System using Deep Learning

AUTHORS: Lirim Ashiku, Cihan Dagli

ABSTRACT:

The widespread use of interconnectivity and interoperability of computing


systems have become an indispensable necessity to enhance our daily activities.
Simultaneously, it opens a path to exploitable vulnerabilities that go well
beyond human control capability. The vulnerabilities deem cyber-security
mechanisms essential to assume communication exchange. Secure
communication requires security measures to combat the threats and needs
advancements to security measures that counter evolving security threats. This
paper proposes the use of deep learning architectures to develop an adaptive
and resilient network intrusion detection system (IDS) to detect and classify
network attacks. The emphasis is how deep learning or deep neural networks
(DNNs) can facilitate flexible IDS with learning capability to detect recognized
and new or zero-day network behavioral features, consequently ejecting the
systems intruder and reducing the risk of compromise. To demonstrate the
model’s effectiveness, we used the UNSW-NB15 dataset, reflecting real
modern network communication behavior with synthetically generated attack
activities.

12
Merits: Utilization of deep learning architectures for adaptive network
intrusion detection, enhancing detection capabilities against evolving threats.

Demerits: Potential challenges in scalability and interpretability of deep


learning models for real-world deployment.

9.TITLE: Using Deep Learning Techniques for Network Intrusion Detection

AUTHORS: Sara Al-Emadi; Aisha Al-Mohannadi; Felwa Al-Senaid

ABSTRACT:

In recent years, there has been a significant increase in network intrusion


attacks which raises a great concern from the privacy and security aspects. Due
to the advancement of the technology, cyber-security attacks are becoming
very complex such that the current detection systems are not sufficient enough
to address this issue. Therefore, an implementation of an intelligent and
effective network intrusion detection system would be crucial to solve this
problem. In this paper, we use deep learning techniques, namely, Convolutional
Neural Networks (CNN) and Recurrent Neural Networks (RNN) to design an
intelligent detection system which is able to detect different network intrusions.
Additionally, we evaluate the performance of the proposed solution using
different evaluation matrices and we present a comparison between the results
of our proposed solution to find the best model for the network intrusion
detection system.

Merits:

Integration of deep learning techniques, CNN and RNN, for improved


network intrusion detection, enhancing accuracy and adaptability to complex
attack patterns.

13
Demerits:

Potential challenges in training and fine-tuning deep learning models


for optimal performance, requiring significant computational resources and
expertise.

10.TITLE: Intelligent Intrusion Detection System Using Deep Learning


Models

AUTHORS: Sumit Varshney; Shikha; Shefali Singhi; Bharti Sharma

ABSTRACT:

Cyber attacks are a very common issue in the modern world, and
since there is a growing array of challenges in accurately detecting intrusion,
this results in damage to security services, i.e. confidentiality, integrity, and
availability of data. The attackers found new types of attacks day by day, first
of all the type of attack should be analyzed properly with the help of IDS for
the prevention of these types of attacks to offer the correct answers. SIDS and
AIDS intrusion detection systems are separate proposed methods of intrusion
detection to manage security threats. This paper has reviewed numerous deep
learning algorithms that have been proposed to detect intrusion, i.e.,
Convolutional Neural Network, Recurrent Neural Network, Restricted
Boltzmann Machine, Deep Brief Network and Auto encoder. It is designed to
use IDS approach depending on a deep learning (DL) algorithm by using
literature work comparisons and by providing the expertise either in intrusion
detection or deep learning algorithms.

Merits:

Review and integration of various deep learning algorithms for


intrusion detection, enhancing detection accuracy and adaptability to evolving
attack patterns.

Demerits:

14
Potential challenges in selecting the most suitable deep learning
algorithm and fine-tuning parameters for optimal performance in different
network environments.

15
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM


The current system employs various strategies to combat Intrusion
detection, such as CAPTCHA puzzles, which provide a straightforward method
for mitigating attacks. However, recent studies have revealed that this approach
is ineffective. Additionally, a digital signature model utilizing meta-heuristic
methods was developed for analyzing network flow to detect abnormal traffic,
resulting in improved accuracy in DDoS detection. Nevertheless, this model
was unable to identify normal DoS attacks. Another method, SeVen, relies on
Adaptive Selective Verification to address network layer Intrusion detection.
While this technique operates on the concept of state, it is not equipped to
handle application layer Intrusion detection due to the absence of a state notion.
Consequently, this mechanism is vulnerable to HTTP Post Flooding attacks, as
a large number of reflectors are utilized to transmit payloads. To address these
limitations, a machine learning approach is recommended.
3.2 DISADVANTAGES OF EXISTING SYSTEM
 Accuracy is low.
 These segmentation have shortcomings.
 Feature extraction is not accurate.
 Accuracy will be low Computation load very high.

3.3 OBJECTIVE OF EXISTING SYSTEM:


The existing system employs various strategies to combat intrusion
detection, such as CAPTCHA puzzles, which provide a simple way to thwart
attacks. However, recent studies have revealed that this approach is ineffective.
In response, a digital signature model utilizing meta-heuristic methods has been
developed to analyze network flow and detect abnormal traffic, resulting in
improved accuracy in DDoS detection. However, this model is unable to detect
normal DoS attacks. Another method, SeVen, relies on Adaptive Selective
Verification for network layer intrusion detection. While effective at the

16
network layer, it lacks the capability to detect application layer intrusions due
to its lack of a state concept, making it vulnerable to HTTP Post Flooding
attacks. To address these limitations, a machine learning approach is
recommended.
3.4 PROPOSED SYSTEM
The proposed framework introduces a cutting-edge network intrusion
detection system that utilizes Deep learning techniques. It combines various
machine learning models such as ANN, CNN, and LSTM to improve the
identification of threats. The main objective of this system is to harness the
strengths of multiple algorithms and address the weaknesses of individual
models. By leveraging Deep learning, it aims to enhance detection accuracy,
adaptability to evolving threats, and resilience against adversarial attacks. The
system prioritizes ensemble diversity and consensus decision-making to
minimize false positives and effectively handle complex network behaviors.
Ultimately, the goal is to develop a robust, versatile, and collaborative system
that can proactively identify and counter emerging cyber threats in intricate
network environments.
3.5 PROPOSED SYSTEM ADVANTAGES
 Improved Precision: Deep learning integrates multiple models to enhance
detection accuracy.
 Enhanced Resilience: The combination of diverse models helps to reduce
individual vulnerabilities, thus enhancing the overall system's robustness.
 Minimized Overfitting: Ensemble methods are frequently used to prevent
overfitting, thereby improving the system's ability to generalize.
 Superior Adaptability: These models are adept at adjusting to emerging
threats by utilizing a variety of perspectives for comprehensive threat
detection.

17
CHAPTER 4
SOFTWARE SPECIFICATION

These are the requirements for doing the project. Without using these
tools and software’s we can’t do the project. So we have two requirements to
do the project. They are

1. Hardware Requirements.

2. Software Requirements.

SYSTEM REQUIREMENTS

4.1 HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for a contract for the
implementation of the system and should therefore be a complete and
consistent specification of the whole system. They are used by software
engineers as the starting point for the system design. It shows what the system
does and not how it should be implemented.

PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 500 GB
4.2 SOFTWARE REQUIREMENTS

The software requirements document is the specification of the system.


It should include both a definition and a specification of requirements. It is a
set of what the system should do rather than how it should do it. The software
requirements provide a basis for creating the software requirements
specification. It is useful in estimating cost, planning team activities,
performing tasks and tracking the team’s and tracking the team’s progress
throughout the development activity.

PYTHON IDE : Anaconda, Jupyter Notebook

18
PROGRAMMING LANGUAGE : Python

ANACONDA

It is a free and open-source distribution of the Python and R


programming languages for scientific computing (data science, machine
learning applications, large-scale data processing, predictive analytics, etc.),
that aims to simplify package management and deployment.

Anaconda distribution comes with more than 1,500 packages as well as


the Conda package and virtual environment manager. It also includes a
GUI, Anaconda Navigator, as a graphical alternative to the Command Line
Interface (CLI).

The big difference between Conda and the pip package manager is in how
package dependencies are managed, which is a significant challenge for Python
data science and the reason Conda exists. Pip installs all Python package
dependencies required, whether or not those conflict with other packages you
installed previously.

So your working installation of, for example, Google Tensorflow, can suddenly
stop working when you pip install a different package that needs a different
version of the Numpy library. More insidiously, everything might still appear
to work but now you get different results from your data science, or you are
unable to reproduce the same results elsewhere because you didn't pip install in
the same order.

Conda analyzes your current environment, everything you have installed, any
version limitations you specify (e.g. you only want tensorflow >= 2.0) and
figures out how to install compatible dependencies. Or it will tell you that what
you want can't be done. Pip, by contrast, will just install the thing you wanted
and any dependencies, even if that breaks other things.Open source packages
can be individually installed from the Anaconda repository, Anaconda Cloud

19
(anaconda.org), or your own private repository or mirror, using the conda
install command. Anaconda Inc compiles and builds all the packages in the
Anaconda repository itself, and provides binaries for Windows 32/64 bit, Linux
64 bit and MacOS 64-bit. You can also install anything on PyPI into a Conda
environment using pip, and Conda knows what it has installed and what pip has
installed. Custom packages can be made using the conda build command, and
can be shared with others by uploading them to Anaconda Cloud, PyPI or other
repositories.The default installation of Anaconda2 includes Python 2.7 and
Anaconda3 includes Python 3.7. However, you can create new environments
that include any version of Python packaged with Conda. Anaconda Navigator
is a desktop Graphical User Interface (GUI) included in Anaconda distribution
that allows users to launch applications and manage conda packages,
environments and channels without using command-line commands. Navigator
can search for packages on Anaconda Cloud or in a local Anaconda Repository,
install them in an environment, run the packages and update them. It is
available for Windows, macOS and Linux.

The following applications are available by default in Navigator :

 JupyterLab
 Jupyter Notebook
 QtConsole
 Spyder
 Glueviz
 Orange
 Rstudio
 Visual Studio Code
Microsoft .NET is a set of Microsoft software technologies for rapidly building
and integrating XML Web services, Microsoft Windows-based applications,
and Web solutions. The .NET Framework is a language-neutral platform for
writing programs that can easily and securely interoperate. There’s no language
barrier with .NET: there are numerous languages available to the developer

20
including Managed C++, C#, Visual Basic and Java Script. The .NET
framework provides the foundation for components to interact seamlessly,
whether locally or remotely on different platforms. It standardizes common
data types and communications protocols so that components created in
different languages can easily interoperate.

“.NET” is also the collective name given to various software components built
upon the .NET platform. These will be both products (Visual Studio.NET and
Windows.NET Server, for instance) and services (like Passport, .NET My
Services, and so on).

Microsoft VISUAL STUDIO is an Integrated Development


Environment (IDE) from Microsoft. It is used to develop computer programs,
as well as websites, web apps, web services and mobile apps.

Python is a powerful multi-purpose programming language created by Guido


van Rossum. It has simple easy-to-use syntax, making it the perfect language
for someone trying to learn computer programming for the first time. Python
features are:

 Easy to code
 Free and Open Source
 Object-Oriented Language
 GUI Programming Support
 High-Level Language
 Extensible feature
 Python is Portable language
 Python is Integrated language
 Interpreted
 Large Standard Library
 Dynamically Typed Language

21
PYTHON

 Python is a powerful multi-purpose programming language created by


Guido van Rossum.
 It has simple easy-to-use syntax, making it the perfect language for
someone trying to learn computer programming for the first time.
Features Of Python :

1.Easy to code:
Python is high level programming language. Python is very easy to learn
language as compared to other language like c, c#, java script, java etc. It is
very easy to code in python language and anybody can learn python basic in
few hours or days. It is also developer-friendly language.

2. Free and Open Source:


Python language is freely available at official website and you can download it
from the given download link below click on the Download Python keyword.
Since, it is open-source, this means that source code is also available to the
public. So you can download it as, use it as well as share it.

3.Object-Oriented Language:
One of the key features of python is Object-Oriented programming. Python
supports object oriented language and concepts of classes, objects
encapsulation etc.

4. GUI Programming Support:


Graphical Users interfaces can be made using a module such as PyQt5, PyQt4,
wxPython or Tk in python.
PyQt5 is the most popular option for creating graphical apps with Python.

5. High-Level Language:
Python is a high-level language. When we write programs in python, we do not
need to remember the system architecture, nor do we need to manage the
memory.

22
6.Extensible feature:
Python is a Extensible language. we can write our some python code into c or
c++ language and also we can compile that code in c/c++ language.

7. Python is Portable language:


Python language is also a portable language. for example, if we have python
code for windows and if we want to run this code on other platform such as
Linux, Unix and Mac then we do not need to change it, we can run this code on
any platform.

8. Python is Integrated language:


Python is also an Integrated language because we can easily integrated python
with other language like c, c++ etc.

9. Interpreted Language:
Python is an Interpreted Language. because python code is executed line by
line at a time. like other language c, c++, java etc there is no need to compile
python code this makes it easier to debug our code. The source code of python
is converted into an immediate form called bytecode.

10. Large Standard Library:


Python has a large standard library which provides rich set of module and
functions so you do not have to write your own code for every single
thing.There are many libraries present in python for such as regular expressions,
unit-testing, web browsers etc.

11. Dynamically Typed Language:


Python is dynamically-typed language. That means the type (for example- int,
double, long etc) for a variable is decided at run time not in advance.because of
this feature we don’t need to specify the type of variable.

23
APPLICATIONS OF PYTHON :

WEB APPLICATIONS

 You can create scalable Web Apps using frameworks and CMS
(Content Management System) that are built on Python. Some of the
popular platforms for creating Web Apps are: Django, Flask, Pyramid,
Plone, Django CMS.
 Sites like Mozilla, Reddit, Instagram and PBS are written in Python.
SCIENTIFIC AND NUMERIC COMPUTING:

 There are numerous libraries available in Python for scientific and


numeric computing. There are libraries like: SciPy and NumPy that are
used in general purpose computing. And, there are specific libraries like:
EarthPy for earth science, AstroPy for Astronomy and so on.
 Also, the language is heavily used in machine learning, data mining and
deep learning.
CREATING SOFTWARE PROTOTYPES:

 Python is slow compared to compiled languages like C++ and Java. It


might not be a good choice if resources are limited and efficiency is a
must.
 However, Python is a great language for creating prototypes. For
example: You can use Pygame (library for creating games) to create
your game's prototype first. If you like the prototype, you can use
language like C++ to create the actual game.
GOOD LANGUAGE TO TEACH PROGRAMMING:

 Python is used by many companies to teach programming to kids


 It is a good language with a lot of features and capabilities. Yet, it's one
of the easiest language to learn because of its simple easy-to-use sy

24
CHAPTER 5

MODULE DESCRIPTION

5.1 DATA LOADING

Data loading is the process of copying and loading data or data sets from
a source file, folder or application to a database or similar application. It is
usually implemented by copying digital data from a source and pasting or
loading the data to a data storage or processing utility. Data loading is used in
database-based extraction and loading techniques. Typically, such data is
loaded into the destination application as a different format than the original
source location.

For example, when data is copied from a word processing file to a


database application, the data format is changed from .doc or .txt to a .CSV or
DAT format. Usually, this process is performed through or the last phase of the
Extract, Transform and Load (ETL) process. The data is extracted from an
external source and transformed into the destination application's supported
format, where the data is further loaded.

5.2 DATA PREPROCESSING

Missing values were imputed to guarantee that all the algorithms would
be able to handle them. Nevertheless, some algorithms could deal with missing
values automatically without imputation, such as XGBoost. To restrict the
comparison complexity, the missing values were imputed based on their data
type. For numerical data types, the missing entries are replaced by the median
value of the complete entries. For categorical data, the missing entries were
replaced by the mode value of the complete entries.

5.3 DATA CLEANING

In this module the data is cleaned. After cleaning of the data, the data is
grouped as per requirement. This grouping of data is known as data clustering.

25
Then check if there is any missing value in the data set or not. It there is some
missing value then change it by any default value. After that if any data need to
change its format, it is done. That total process before the prediction is known
is data pre-processing. After that the data is used for the prediction and
forecasting step.

5.4 DATA SPLITTING

For each experiment, we split the entire dataset into 70% training set
and 30% test set. We used the training set for resampling, hyper parameter
tuning, and training the model and we used test set to test the performance of
the trained model. While splitting the data, we specified a random seed (any
random number), which ensured the same data split every time the program
executed.

5.5 DATA TRAINING

Algorithms learn from data. They find relationships, develop


understanding, make decisions, and evaluate their confidence from the training
data they’re given. And the better the training data is, the better the model
performs.In fact, the quality and quantity of your training data has as much to
do with the success of your data project as the algorithms themselves.

Now, even if you’ve stored a vast amount of well-structured data, it might not
be labeled in a way that actually works for training your model. For example,
autonomous vehicles don’t just need pictures of the road, they need labeled
images where each car, pedestrian, street sign and more are annotated;
sentiment analysis projects require labels that help an algorithm understand
when someone’s using slang or sarcasm; chatbots need entity extraction and
careful syntactic analysis, not just raw language.

In other words, the data you want to use for training usually needs to be
enriched or labeled. Or you might just need to collect more of it to power your

26
algorithms. But chances are, the data you’ve stored isn’t quite ready to be used
to train your classifiers.

Because if you’re trying to make a great model, you need great training data.
And we know a thing or two about that. After all, we’ve labeled over 5 billion
rows of data for some of the most innovative companies in the world. Whether
it’s images, text, audio, or, really, any other kind of data, we can help create the
training set that makes your models successful.

5.8 ALGORITHMS:

 LSTM
 ANN

LSTM (LONG SHORT TERM MEMORY):


A special type of RNN, which can learn long-term dependence, is called
Long-Short Term Memory (LSTM). LSTM enables RNN to remember long-
term inputs. Contains information in memory, similar to computer memory. It
is able to read, write and delete information in its memory. This memory can be
seen as a closed cell, with a closed description, the cell decides to store or
delete information. In LSTM, there are three gates: input, forget and exit gate.
These gates determine whether new input (input gate) should be allowed, data
deleted because it is not important (forget gate), or allow it to affect output at
current timeline (output gate)

27
FIG 5.2 LSTM MODEL
LSTM's are a special subset of RNN’s that can capture context-specific
temporal dependencies for long periods of time. Each LSTM neuron is a
memory cell that can store other information i.e., it maintains its own cell state.
While neurons in normal RNN’s merely take in their previous hidden state and
the current input to output a new hidden state, an LSTM neuron also takes in its
old cell state and outputs its new cell state.

1. Forget gate:
The forget gateway determines when certain parts of the cell will be
inserted with information that is more recent. It subtracts almost 1 in parts of
the cell state to be kept, and zero in values to be ignored.
2. Input gate:
Based on the input (e.g., previous output o (t-1), input x (t), and the
previous state of cell c (t-1)), this network category reads the conditions under
which any information should be stored (or updated) in the state cell.
3. Output gate:
Depending on the input mode and the cell, this component determines
which information is forwarded in the next location in the network.
Thus, LSTM networks are ideal for exploring how variation in one
stock's price can affect the prices of several other stocks over a long period of

28
time. They can also decide (in a dynamic fashion) for how long information
about specific past trends in stock price movement needs to be retained in order
to more accurately predict future trends in the variation of stock prices.
Advantages of LSTM:
The main advantage of LSTM is its ability to read intermediate context.
Each unit remembers details for a long or short period without explicitly
utilizing the activation function within the recurring components. An important
fact is that any cell state is repeated only with the release of the forget gate,
which varies between 0 and 1. That is to say, the gateway for forgetting in the
LSTM cell is responsible for both the hardware and the function of the cell
state activation. Thus, the data from the previous cell can pass through the
unchanged cell instead of explicitly increasing or decreasing in each step or
layer, and the instruments can convert to their appropriate values over a limited
time. This allows LSTM to solve a perishable gradient problem - because the
amount stored in the memory cell is not converted in a recurring manner, the
gradient does not end when trained to distribute backwards.
ARTIFICIAL NEURAL NETWORK (ANN)

An artificial neural network (ANN) is a computational model inspired


by the functioning of nerve cells in the human brain. ANNs employ learning
algorithms that autonomously adjust their behavior as they receive new input,
making them effective for non-linear statistical data modeling.

Deep learning ANNs are crucial in machine learning (ML) and support broader
artificial intelligence (AI) technologies. An artificial neural network typically
comprises three or more interconnected layers.

The initial layer contains input neurons, which transmit data to deeper layers,
culminating in the final output layer. The intermediate layers, termed hidden
layers, process information adaptively through transformations. Each layer acts
as both input and output, enabling the ANN to comprehend complex objects.
Units within the hidden layers learn by weighting information based on internal
guidelines, producing transformed outputs for subsequent layers.

29
Backpropagation, a learning process, enables the ANN to adjust its outputs by
considering errors. During supervised training, errors are propagated backward,
and weights are updated accordingly to minimize discrepancies between
desired and actual outcomes. Training ANNs involves selecting appropriate
models and associated algorithms. One of the main advantages of ANNs is
their ability to learn from data observations, serving as effective tools for
function approximation and cost-effective solution estimation.

ANNs analyze data samples rather than entire sets, saving time and resources.
They find applications in various domains, including predictive analytics, spam
detection, natural language processing, and more.

NEURAL NETWORKS AND DEEP LEARNING


While a proper description of neural networks and deep learning is far
beyond the scope of this chapter, we will however discuss an example use case
of one of the most popular frameworks for deep learning: Keras4.
In this section we will use Keras to build a simple neural network to classify
theWisconsin breast cancer dataset that was described earlier. Often, deep
learning algorithms and neural networks are used to classify images—
convolutional neural networks are especially used for image related
classification. However, they can of course be used for text or tabular-based
data as well. In this we will build a standard feed-forward, densely connected
neural network and classify a text-based cancer dataset in order to demonstrate
the framework’s usage. In this example we are once again using the Wisconsin
breast cancer dataset, which consists of 30 features and 569 individual samples.
To make it more challenging for the neural network, we will use a training set
consisting of only 50% of the entire dataset, and test our neural network on the
remaining 50% of the data.
Note, Keras is not installed as part of the Anaconda distribution, to install it use
pip:

30
Keras additionally requires either Theano or TensorFlow to be installed. In the
examples in this chapter we are using Theano as a backend, however the code
will work identically for either backend. You can install Theano using pip, but
it has a number of dependencies that must be installed first. Refer to the
Theano and TensorFlow documentation for more information [12].
Keras is a modular API. It allows you to create neural networks by building a
stack of modules, from the input of the neural network, to the output of the
neural network, piece by piece until you have a complete network. Also, Keras
can be configured to use your Graphics Processing Unit, or GPU. This makes
training neural networks far faster than if we were to use a CPU. We begin by
importing Keras:

We may want to view the network’s accuracy on the test (or its loss on the
training set) over time (measured at each epoch), to get a better idea how well it
is learning. An epoch is one complete cycle through the training data.
Fortunately, this is quite easy to plot as Keras’ fit function returns a history
object which we can use to do exactly this:

This will result in a plot similar to that shown. Often you will also want to plot
the loss on the test set and training set, and the accuracy on the test set and
training set.
Plotting the loss and accuracy can be used to see if you are over fitting (you
experience tiny loss on the training set, but large loss on the test set) and to see
when your training has plateaued.

31
CHAPTER 6

SYSTEM DESIGN

Designing of system is the process in which it is used to define the


interface, modules and data for a system to specified the demand to satisfy.
System design is seen as the application of the system theory. The main thing
of the design a system is to develop the system architecture by giving the data
and information that is necessary for the implementation of a system.

6.1 ARCHITECTURE DIAGRAM

FIG 6.1 ARCHITECTURE DIAGRAM

6.2 DATA FLOW DIAGRAM

Data flow diagrams are used to graphically represent the flow of data in
a business information system. DFD describes the processes that are involved
in a system to transfer data from the input to the file storage and reports
generation. Data flow diagrams can be divided into logical and physical. The
logical data flow diagram describes flow of data through a system to perform
certain functionality of a business. The physical data flow diagram describes
the implementation of the logical data flow.

32
FIG 6.2 DATA FLOW MODEL

6.3 USECASE DIAGRAM:

Use case diagrams are a way to capture the system's functionality and
requirements in UML diagrams. It captures the dynamic behavior of a live
system. A use case diagram consists of a use case and an actor. Here, data
owner and user having separate registration and login then data owners will
uploading the text document using the symmetric key for encrypting the cloud
data.

33
FIG 6.3 USE CASE DIAGRAM

6.4 CLASS DIAGRAM:


Class diagrams are the main building block in object-oriented modeling.
They are used to show the different objects in a system, their attributes, their
operations and the relationships among them. The different objects are Data
owner, Cloud user, Cloud admin these are the objects in this uml relationships
and their properties are uploading the documents, generating key for securing
the data, maintaining the cloud data s then downloading using the key and
accessing the cloud data.

34
FIG 6.4 CLASS DIAGRAM

6.5 SEQUENCE DIAGRAM:

A sequence diagram is a type of interaction diagram because it describes


how and in what order a group of objects works together. These diagrams are
used by software developers and business professionals to understand
requirements for a new system or to document an existing process.

FIG 6.5 CLASS DIAGRAM

35
6.6 ACTIVITY DIAGRAM

Activity Diagrams describe how activities are coordinated to provide a


service which can be at different levels of abstraction. Typically, an event
needs to be achieved by some operations, particularly where the operation is
intended to achieve a number of different things that require coordination.

FIG 6.6 ACTIVITY DIAGRAM

36
CHAPTER 7

SOFTWARE TESTING

Testing is vital to the success of the system. System testing makes a


logical assumption that if all parts of the system are correct, the goal will be
successfully achieved. In the testing process we test the actual system in an
organization and gather errors from the new system operates in full efficiency
as stated. System testing is the stage of implementation, which is aimed to
ensuring that the system works accurately and efficiently. In the testing
process we test the actual system in an organization and gather errors from the
new system and take initiatives to correct the same. All the front-end and back-
end connectivity are tested to be sure that the new system operates in full
efficiency as stated. System testing is the stage of implementation, which is
aimed at ensuring that the system works accurately and efficiently.
The main objective of testing is to uncover errors from the system. For the
uncovering process we must give proper input data to the system. So, we
should have more conscious to give input data. It is important to give correct
inputs to efficient testing.

Testing is done for each module. After testing all the modules, the modules are
integrated and testing of the final system is done with the test data, specially
designed to show that the system will operate successfully in all its aspects
conditions. Thus, the system testing is a confirmation that all is correct and an
opportunity to show the user that the system works. Inadequate testing or non-
testing leads to errors that may appear few months later. This will create two
problems, Time delay between the cause and appearance of the problem. The
effect of the system errors on files and records within the system. The purpose
of the system testing is to consider all the likely variations to which it will be
suggested and push the system to its limits. The testing process focuses on
logical intervals of the software ensuring that all the statements have been
tested and on the function intervals (i.e.,) conducting tests to uncover errors and
ensure that defined inputs will produce actual results that agree with the

37
required results. Testing has to be done using the two common steps Unit
testing and Integration testing. In the project system testing is made as follows:

The procedure level testing is made first. By giving improper inputs, the errors
occurred are noted and eliminated. This is the final step in system life cycle.
Here we implement the tested error-free system into real-life environment and
make necessary changes, which runs in an online fashion. Here system
maintenance is done every month or year based on company policies, and is
checked for errors like runtime errors, long run errors and other maintenances
like table verification and reports. Integration Testing is a level of software
testing where individual units are combined and tested as a group. The purpose
of this level is to expose faults in the interaction between integrated units. Test
drivers and test stubs are used to assist in Integration testing. Any of Black Box
Testing, White Box Testing, and Gray Box Testing methods can be used.
Normally, the method depends on your definition of ‘unit.’

TASKS:

 Integration Test Plan


 Prepare
 Review
 Rework
 Baseline
 Integration Test Cases/Scripts
 Prepare
 Review
 Rework
 Baseline
 Integration Test
 Perform

38
7.1 UNIT TESTING
Unit testing verification efforts on the smallest unit of software design,
module. This is known as “Module Testing.” The modules are tested separately.
This testing is carried out during programming stage itself. In these testing
steps, each module is found to be working satisfactorily as regard to the
expected output from the module.
7.2 BLACK BOX TESTING
Black box testing, also known as Behavioral Testing, is a software
testing method in which the internal structure/ design/ implementation of the
item being tested is not known to the tester. These tests can be functional or
non-functional, though usually functional.
7.3 WHITE-BOX TESTING
White-box testing (also known as clear box testing, glass box testing,
transparent box testing, and structural testing) is a method of testing software
that tests internal structures or workings of an application, as opposed to its
functionality (i.e. black-box testing).
7.4 GREY BOX TESTING
Grey box testing is a technique to test the application with having a
limited knowledge of the internal workings of an application. To test the Web
Services application usually the Grey box testing is used. Grey box testing is
performed by end-users and also by testers and developers.
7.5 INTEGRATION TESTING
Integration testing is a systematic technique for constructing tests to
uncover error associated within the interface. In the project, all the modules are
combined and then the entire programmer is tested as a whole. In the
integration-testing step, all the error uncovered is corrected for the next testing
steps. Software integration testing is the incremental integration testing of two
or more integrated software components on a single platform to produce
failures caused by interface defects. The task of the integration test is to check
that components or software applications.

39
7.6 ACCEPTANCE TESTING
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets
the functional requirements.
ACCEPTANCE TESTING FOR DATA SYNCHRONIZATION
The Acknowledgements will be received by the Sender Node after the
Packets are received by the Destination Node. The Route add operation is done
only when there is a Route request in need. The Status of Nodes information is
done automatically in the Cache Updating process.
BUILD THE TEST PLAN
Any project can be divided into units that can be further performed for
detailed processing. Then a testing strategy for each of this unit is carried out.
Unit testing helps to identity the possible bugs in the individual component, so
the component that has bugs can be identified and can be rectified from errors.

40
CHAPTER 8
CONCLUSION AND FUTURE WORK
This project presented the As number of devices used to access internet
increases day by day the danger of Intrusion detection also increases at an
alarming rate. Most of the current systems such as IPS and IDS, which are used
to detect and prevent Intrusion detection, are not able to detect and prevent
attacks that have new signatures or attacks which haven’t been identified. Thus,
therefore, the use of machine learning and pattern recognition comes into place
to give the systems like IDS or IPS to analyze new forms of Intrusion detection
and prevent it without being intervened by a user. Algorithms such as, ANN
and LSTM helps to classify and cluster the packets inbound to the network.
This project in depth focuses on identifying Intrusion detection based on UDP
Flooding, but classifying other types of Intrusion detection such as TCP Flood,
ICMP Flood, Smurf attack and HTTP Flood can be researched later as future
works.Based on the findings from the analyses conducted, recommendations
can be proposed to enhance the effectiveness of the Intrusion Detection
Systems. One suggestion is to enhance performance by implementing ongoing
real-time model training instead of relying on training the model with static
data. Additionally, a combination of Machine Learning (ML) and Deep
Learning (DL) can be utilized to further boost performance, where features are
extracted from the hidden layers of DL models and inputted into other ML or
DL models for further refinement. It is also advisable to subject the hybrid
model to various attacks, including zero-day attacks, in a real-world or
simulated network environment. This will allow for the identification of
vulnerabilities and enable the models to be retrained to detect such attacks.
Furthermore, integrating ANDAE with other anomaly detection techniques can
improve the detection of abnormal traffic patterns. Lastly, it is recommended to
explore the use of tools like Spark to improve the training and detection
capabilities of the model.

41
APPENDIX 1 SAMPLE CODINGS
import pandas as pd
#list of useful imports that i will use
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import random
import pickle
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve
data = pd.read_csv(r"c:\users\deepi\music\project\dataset\data\kddtrain+.txt")
data
data.columns
data.head()
data.normal.value_counts()
data.normal.value_counts()
#renaming columns
data=data.rename(
columns={
'0':'duration',
'tcp':'protocol_type',
'ftp_data':'service',
'sf':'flag',
'491':'src_bytes',
'0.1':'dst_bytes',
'0.2':'land',
'0.3':'wrong_fragment',
'0.4':'urgent',
'0.5':'hot',
'0.6':'num_failed_logins',

42
'0.7':'logged_in',
'0.8':'num_compromised',
'0.9':'root_shell',
'0.10':'su_attempted',
'0.11':'num_root',
'0.12':'num_file_creations',
'0.13':'num_shells',
'0.14':'num_access_files' ,
'0.15':'num_outbound_cmds',
'0.16':'is_host_login',
'0.17':'is_guest_login',
'2':'count',
'2.1':'srv_count',
'0.00':'serror_rate',
'0.00.1':'srv_serror_rate',
'0.00.2':'rerror_rate',
'0.00.3':'srv_rerror_rate',
'1.00':'same_srv_rate',
'0.00.4':'diff_srv_rate',
'0.00.5':'srv_diff_host_rate',
'150':'dst_host_count',
'25':'dst_host_srv_count',
'0.17.1':'dst_host_same_srv_rate',
'0.03':'dst_host_diff_srv_rate',
'0.17.2':'dst_host_same_src_port_rate',
'0.00.6':'dst_host_srv_diff_host_rate',
'0.00.7':'dst_host_serror_rate',
'0.00.8':'dst_host_srv_serror_rate',
'0.05':'dst_host_rerror_rate',
'0.00.9':'dst_host_srv_rerror_rate',
'normal':'class',

43
'20':'num'})
data.head()
data.info()
data.describe()
data.isnull().sum()
data.isnull().any()
from sklearn.preprocessing import labelencoder
from sklearn.preprocessing import labelencoder

columns = data.columns
label_encoder = labelencoder()

for cols in columns[:-1]:


if isinstance(data[cols].values[0], str):
data.loc[:, cols] = label_encoder.fit_transform(data[cols].values)

# now 'data' contains the label-encoded values


data['class'].value_counts()
# resample data
from sklearn.utils import resample
# separate majority and minority classes
df_c0 = data[data['class']== 'normal']
df_c1= data[data['class']== 'dos']
df_c2= data[data['class']== 'probe']
df_c3= data[data['class']== 'r2l']
df_c4= data[data['class']== 'u2r']

# downsample majority class and upsample the minority class


df_c0_upsampled = resample(df_c0,
replace=true,n_samples=500,random_state=100)

44
df_c1_upsampled = resample(df_c1,
replace=true,n_samples=500,random_state=100)
df_c2_upsampled = resample(df_c2,
replace=true,n_samples=500,random_state=100)
df_c3_upsampled = resample(df_c3,
replace=true,n_samples=500,random_state=100)
df_c4_upsampled = resample(df_c4,
replace=true,n_samples=500,random_state=100)
# df_majority_downsampled = resample(df_majority,
replace=false,n_samples=2500,random_state=100)

# combine minority class with downsampled majority class


df_balanced =
pd.concat([df_c0_upsampled,df_c1_upsampled,df_c2_upsampled,df_c3_upsa
mpled,df_c4_upsampled])

# display new class counts


df_balanced['class'].value_counts()
sns.barplot(x=df_balanced['class'].value_counts().index,
y=df_balanced['class'].value_counts())
fig = plt.figure(figsize=(10, 5))
# show plot
plt.show()
data1 = df_balanced.sample(frac=1)
data1.head()
x = data1.iloc[:,:-1]
y = data1.iloc[:,-1]
x.head()
y.head()
#pip install tensorflow
import keras

45
from sklearn.preprocessing import labelencoder
enc=labelencoder()
y = enc.fit_transform(y)
from keras.utils import to_categorical
y1 = to_categorical(y)
enc.classes_
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y1, test_size=0.3,
random_state=42)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
x_test.to_csv(r"c:\users\deepi\music\project\test.csv")
from keras.utils import to_categorical#convert to one-hot-encoding
from keras.models import sequential
from keras.layers import
dense,dropout,flatten,conv1d,maxpool1d,globalavgpool1d,globalmaxpooling1d
from tensorflow.keras.optimizers import rmsprop
from tensorflow.keras.optimizers import adam
from sklearn.model_selection import train_test_split
from keras.layers import dense, lstm, repeatvector, timedistributed
#reshapind data
x_test1 = x_test.values.reshape((len(x_test),41,1))
x_train1 = x_train.values.reshape((len(x_train),41,1))
x_train1.shape
y_train.shape
import tensorflow as tf
model = sequential()
model.add(lstm(100, input_shape=(41,1)))
model.add(dropout(0.5))
model.add(dense(100, activation='relu'))
model.add(dense(5, activation='softmax'))

46
#model.compile(loss='categorical_crossentropy',optimizer='adam',
metrics=['accuracy'])
model.compile(loss='categorical_crossentropy',
# optimizer='sgd', # almost same optimizer=tf.keras.optimizers.adam(1e-4),
metrics=['accuracy'])
history1 = model.fit(x_train1,y_train, batch_size= 128,
epochs = 30, validation_data = (x_test1,y_test))
plt.plot(history1.history['accuracy'], 'r')
plt.plot(history1.history['val_accuracy'], 'b')
plt.legend({'train accuracy': 'r', 'test accuracy':'b'})
plt.show()
score = model.evaluate(x_test1, y_test, verbose=0)
print('test accuracy:', score[1])
score = model.evaluate(x_train1, y_train, verbose=0)
print('train accuracy:', score[1])
# save the model
tf.keras.models.save_model(model,file_name)
#plot confusion matrix
from sklearn.metrics import confusion_matrix
class_names = enc.classes_
df_heatmap =
pd.dataframe(confusion_matrix(np.argmax((model.predict(x_test1)),axis =
1),np.argmax(y_test,axis=1)),columns = class_names, index = class_names)
df_heatmap
#heatmap = sns.heatmap(df_heatmap, fmt="d")
enc.classes_
i=3
y_pred = model.predict(x_test1[i-1:i])
classes_x=np.argmax(y_pred,axis=1)
act = np.argmax(y_test[i-1])
print("predicted class: {}".format(enc.classes_[classes_x]))

47
print("actual class: {}".format(enc.classes_[act]))
i=6
y_pred = model.predict(x_test1[i-1:i])
classes_x=np.argmax(y_pred,axis=1)
act = np.argmax(y_test[i-1])
print("predicted class: {}".format(enc.classes_[classes_x]))
print("actual class: {}".format(enc.classes_[act]))
i=8
y_pred = model.predict(x_test1[i-1:i])
classes_x=np.argmax(y_pred,axis=1)
act = np.argmax(y_test[i-1])
print("predicted class: {}".format(enc.classes_[classes_x]))
print("actual class: {}".format(enc.classes_[act]))
all_model_result =
pd.dataframe(columns=['model','test_accuracy','train_accuracy'])
new = ['lstm',91, 91]
all_model_result.loc[0] = new
model = sequential()
model.add(conv1d(filters = 32, kernel_size = 3,activation ='relu', input_shape =
(41,1)))
model.add(conv1d(filters = 32, kernel_size = 3, activation ='relu'))
model.add(dropout(0.4))
model.add(conv1d(filters = 32, kernel_size = 3,activation ='relu'))
model.add(conv1d(filters = 32, kernel_size = 3, activation ='relu'))
model.add(dropout(0.4))
model.add(conv1d(filters = 64, kernel_size = 3, activation ='relu'))
model.add(conv1d(filters = 64, kernel_size = 3, activation ='relu'))
model.add(dropout(0.4))
model.add(flatten())
model.add(dense(256, activation = "relu"))
model.add(dropout(0.5))

48
model.add(dense(5, activation = "softmax"))
model.summary()
model.compile(optimizer = 'rmsprop' , loss = "categorical_crossentropy",
metrics=["accuracy"])
history = model.fit(x_train1,y_train, batch_size= 128,
epochs = 30, validation_data = (x_test1,y_test))
plt.plot(history.history['accuracy'], 'r')
plt.plot(history.history['val_accuracy'], 'b')
plt.legend({'train accuracy': 'r', 'test accuracy':'b'})
plt.show()
score1 = model.evaluate(x_test1, y_test, verbose=0)
print('test accuracy:', score1[1])
score = model.evaluate(x_train1, y_train, verbose=0)
print('train accuracy:', score[1])
# save the model
tf.keras.models.save_model(model,file_name)
new = ['cnn 1d',score[1], score1[1]]
all_model_result.loc[1] = new
all_model_result
# initialising the ann
classifier = sequential()
#adding the input layer and hidden layer
classifier.add(dense(input_dim=41, units=45, kernel_initializer='uniform',
activation='relu'))
#adding the second hidden layer
classifier.add(dense(units=20, kernel_initializer='uniform', activation='relu'))
#adding the output layer
classifier.add(dense(units=5, kernel_initializer='uniform', activation='sigmoid'))
#compiling the ann(applying stochastic gradient)
classifier.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])

49
history=classifier.fit(x_train1, y_train,batch_size=128,
epochs=200,validation_data = (x_test1,y_test))
plt.plot(history.history['accuracy'], 'r')
plt.plot(history.history['val_accuracy'], 'b')
plt.legend({'train accuracy': 'r', 'test accuracy':'b'})
plt.show()
score1 = classifier.evaluate(x_train1, y_train, verbose=0)
print('train accuracy:', score1[1])
score = classifier.evaluate(x_test1, y_test, verbose=0)
print('test accuracy:', score[1])
# save the model
tf.keras.models.save_model(classifier,file_name)
#plot confusion matrix
from sklearn.metrics import confusion_matrix
class_names = enc.classes_
df_heatmap =
pd.dataframe(confusion_matrix(np.argmax((classifier.predict(x_test1)),axis =
1),np.argmax(y_test,axis=1)),columns = class_names, index = class_names)
df_heatmap
# heatmap = sns.heatmap(df_heatmap, annot=true, fmt="d")
enc.classes_
i=8
y_pred = classifier.predict(x_test1[i-1:i])
classes_x=np.argmax(y_pred,axis=1)
act = np.argmax(y_test[i-1])
print("predicted class: {}".format(enc.classes_[classes_x]))
print("actual class: {}".format(enc.classes_[act]))
new = ['ann',score[1], score1[1]]
all_model_result.loc[2] = new
all_model_result

50
APPENDIX 2: SNAP SHOTS

51
52
REFERENCES
[1] Wani, Abdul Raoof, Q. P. Rana, and Nitin Pandey. "Cloud security
architecture based on user authentication and symmetric key cryptographic
techniques." Reliability, Infocom Technologies and Optimization (Trends and
Future Directions)(ICRITO), 2017 6th International Conference on.IEEE, 2017.

[2] Wani, Abdul Raoof, Q. P. Rana, and Nitin Pandey. "Analysis and
Countermeasures for Security and Privacy Issues in Cloud Computing."
System Performance and Management Analytics. Springer, Singapore, 2019.
47-54.

[3] E.Cambiaso, G. Papaleo, and M. Aiello, “Taxonomy of Slow DoSAttacks


to Web Applications,” in Recent Trends inComputer Networks and Distributed
Systems Security, vol. 335 of Communications in Computer and Information
Science, pp. 195–204, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.

[4] Kaspersky Labs, Global IT security risks survey 2014 - distributed denial of
service (DDoS) attacks, 2014, (http://media.kaspersky.com/en/ B2B-
International- 2014- Survey- DDoS- Summary- Report.pdf ).

[5] S.Y. Nam, T. Lee, Memory-efficient IP filtering for countering DDoS


attacks, in: Proceedings of the 12th Asia-Pacific Network Operations and
Management Conference on Management Enabling the Future Internet for
Changing Business and New

Computing Services, APNOMS’09, Springer-Verlag, Berlin, Heidelberg, 2009,


pp. 301–310.

[6] Maciá-Fernández, Gabriel, Rafael A. Rodríguez-Gómez, and Jesús E. Díaz-


Verdejo. "Defense techniques for low-rate DoS attacks against application
servers." Computer Networks54.15 (2010): 2711-2727.

[7] Salmen, Fadir, et al. "Using Firefly and Genetic Metaheuristics for
Anomaly Detection based on Network Flows." AICT: The Eleventh Advanced
International Conference onTelecommunications. 2015. 875

53
[8] Vijayalakshmi, M., S. Mercy Shalinie, and A. Arun Pragash. "IP traceback
system for network and application layer attacks." Recent Trends In
Information Technology (ICRTIT), 2012. International Conference on. IEEE,
2012.

[9] Dantas, Yuri Gil, Vivek Nigam, and Iguatemi E. Fonseca. "A selective
defense for application layer ddos attacks."Intelligence and Security
Informatics Conference (JISIC), 2014IEEE Joint. IEEE, 2014.

[10] YoungiooAhnSeomgjinAhn and Jinwook Chung , 2018 .The Design of the


Network Service Access Control System through Address Control inIPv6
Environments.

[11] Haidong Xia and Jos´eBrustoloni. 2017. Detecting and Blocking


Unauthorized Access in Wi-Fi Networks.

[12] Mrutyunjaya Panda and ManasRanjanPatra. 2019. EVALUATING


MACHINE LEARNING ALGORITHMS FOR DETECTING NETWORK
INTRUSIONS.

[13] PhyuThiHtun and KyawThetKhaing. 2018. Detection Model for Daniel-


of-Service Attacks using Random Forest and k-Nearest Neighbors.

[14] Alpna,Sona Malhotra. 2018. Intrusion detectionDetection and Prevention


Using Ensemble Classifier (Random Forest).

[15] AshalataPanigrahi and ManasRanjanPatra. 2019. AN ANN APPROACH


FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED
FEATURE SELECTION.

54
55

You might also like