0% found this document useful (0 votes)

150 views8 pages

Malware Detection with ML Techniques

This project aims to use supervised machine learning techniques to detect malware. Dynamic malware analysis is performed by executing malware samples in a controlled environment to generate log files. Natural language processing and the bag-of-words model are used to extract data from the log files, which are then labeled and used to train four different machine learning models: Decision Tree, Random Forest, Logistic Regression, and SVM. The Random Forest model achieved the highest accuracy of 99.99806% for malware detection.

Uploaded by

Secdition 30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views8 pages

Malware Detection with ML Techniques

Uploaded by

Secdition 30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Malware Detection Using Supervised

Machine Learning

Submitted to

Niteesh Kunwar
Semester- V
Abstract

This project has been made by using supervised machine learning techniques for
malware detection where primary data has been used for the project. Malware is
analyzed by performing dynamic malware analysis in the malware analysis lab.
The sandbox is made up of Flare VM on Windows 10 distribution. Both malware
and goodware are utilized on Flare VM to obtain log files. Data is extracted from
these log files using an NLP technique called bag of words and labeled afterward.
This is followed by model training and evaluation. RandomForest, Decision Tree,
Logistic Regression, and SVM are the four different algorithms that were used to
train the model. Among them, Random Forest gave the highest accuracy of
99.99806%.

Background

Cyber security is essential for securing sensitive information, data, and patents in
today's world of increasing digitization. Malware is short for malicious software
that is designed to disrupt, damage, or gain unauthorized access to any device or
network. Examples of common malware include viruses, worms, Trojan viruses,
spyware, adware, and ransom ware. Static and dynamic are two types of malware
analysis. This project focuses on dynamic malware analysis, where malware files
are executed in a controlled environment for analysis. The log files generated will
be used to extract data and train the machine learning model.

Problem Statement

With increasing cyber-attacks, it's difficult for traditional programming methods

to deal with them efficiently because of the sheer volume and variety of malware.
ML can be used to detect malware in the system with greater accuracy.
Objective

The main objective to create such a project was to integrate machine learning
with cyber security and to gain an advantage to detect malware in the system
based on generated log files.

Motivation and significance

Any compromise to cyber security has the potential to harm the organization
both long and short term. Cyber security is an important component of a
country's overall security. ML is good at dealing with huge amounts of data and
overcomes the limitations of conventional programming methods when dealing
with cyber-attacks.

Because of its ability to deal with large amounts of data, machine learning is ideal
for boosting cyber security. The deployment of an unsupervised machine learning
model to detect anomalies in network traffic and warn cyber security systems is
common. Many businesses have escaped ransom ware assaults. Financial
organizations, such as banks, are increasingly relying on machine learning for
cyber protection.

Malware is a term used to describe malicious software that is designed to disrupt,

damage, or gain unauthorized access to a device or network. Viruses, worms,
Trojan horses, spyware, adware, and ransom ware are all examples of prevalent
malware.

Malware analysis can be divided into two categories. The first is a static analysis,
in which we examine a malware file without running it. We examine file
signatures such as size, hash, and extension. Dynamic analysis is the second type.
This entails running the malware file in a sandbox and observing how it behaves.

Ricardo Calix used log file data to do virus detection. He ran roughly fifty samples
of both goodware and malware in an isolated environment. Each program's log
files were gathered. They were stored with good1, good2 or badrabbit1,
badrabbit2 based on malware and goodware. He extracts data using a bag of
words technique and labels it 1 (goodware) or -1 (malware). After that, model
training takes place. This project takes a similar method, however, the data is
retrieved from two big log files from malware-infected and non-infected systems,
rather than several short log files each representing one goodware/malware
program.

Methodology

System Requirement

Software Tools for Log File Generation

Malware Analysis Lab:

Kali Linux Host (preferable)

Oracle Virtual box

Window 10 VM

Flare VM

ProcMon

Malware samples from VirusBazaar and GitHub

Goodware samples

For Data Extraction, Model Training, and Evaluation

Google Collab

Code: Python

Hardware Specification
8 GB RAM for the system

Modules

Pandas

Numpy

Seaborn

Matplotlib

Sklearn

System Design

Fig: System for Log File Generation

Algorithm Used

Decision Tree
A Decision tree is a flowchart-like tree structure, where each internal node denotes
a test on an attribute, each branch represents an outcome of the test, and each leaf
node (terminal node) holds a class label. Decision Trees often give very good
results.

Random Forest

Random forest is a supervised learning algorithm. It builds a forest with an

ensemble of decision trees. It is an easy-to-use machine learning algorithm that
produces a great result most of the time even without hyper-parameter tuning. It
combines the results of multiple decision trees thus it gives much better accuracy
than the decision tree.

Fig 2: Random Forest Diagram

Advantages

 It overcomes the problem of overfitting by averaging or combining the

results of different decision trees.
 Random forests work better for a large range of data items than a single
decision tree does.
 Random forest has less variance than a single decision tree.
 Random forests are very flexible and possess very high accuracy.
 Scaling of data does not require a random forest algorithm.
 Random Forest algorithms maintain good accuracy even if a large proportion
of the data is missing.

Disadvantage

 Takes a lot of storage to store the model.

 It took a lot more time and computation power.

Logistic Regression

Logistic regression is basically a supervised classification algorithm. In a

classification problem, the target variable (or output), y, can take only discrete
values for a given set of features (or inputs), X. Since this is a binary classification,
logistic regression is suitable for use.

SVM

The algorithm creates a line or a hyperplane that separates the data into classes.
It shouldn't be used when there are overlapping classes or when there is too
much noise in the data. This was thought suitable as there are a large number of
features.

WorkFlow

Data Collection

Log file generation

 Setting up a Malware Analysis lab and taking a snapshot.

 Collecting Malware and goodware Samples.
 Run malware samples and save the log files as malware1.csv.
 Restore the lab to the previous snapshot.
 Run goodware samples and save the log files as good1.csv.

Data Extraction from log files

Applying Natural Language Programming technique bag of words for data
extraction with the help of countvectorizer from sklearn to extract data. Also,
labeling the dataset in the process.

Model Training

Four models are trained and the algorithms used are Decision Tree, Random

Forest, Logistic Regression, and SVM.

Conclusion

The aim of this project is to use Machine Learning to detect malware in the
system. This helps us identify infected systems and improve cyber security at the
individual or organizational level.

Survey Paper of Group 7
No ratings yet
Survey Paper of Group 7
9 pages
Malware Detection with Machine Learning
No ratings yet
Malware Detection with Machine Learning
29 pages
Malware - Detection - Research - Paper - Updated Soheb6
No ratings yet
Malware - Detection - Research - Paper - Updated Soheb6
8 pages
Major Project
No ratings yet
Major Project
10 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
20 pages
Android Malware Detection
No ratings yet
Android Malware Detection
17 pages
Research Paper
No ratings yet
Research Paper
8 pages
Malware Detection for CS Students
No ratings yet
Malware Detection for CS Students
30 pages
Machine Learning for Malware Detection
No ratings yet
Machine Learning for Malware Detection
16 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Research Paper
No ratings yet
Research Paper
8 pages
Windows Malware Detection
No ratings yet
Windows Malware Detection
14 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
A Survey of Machine Learning Methods and Challenges For Windows Malware Classification
No ratings yet
A Survey of Machine Learning Methods and Challenges For Windows Malware Classification
52 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
28 pages
6 Thsemminiproject
No ratings yet
6 Thsemminiproject
12 pages
Dynamic Malware Detection via Deep Learning
No ratings yet
Dynamic Malware Detection via Deep Learning
16 pages
Malware Detection for Researchers
No ratings yet
Malware Detection for Researchers
11 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
4 pages
Malware Analysis Using Python and Kaggle Dataset
No ratings yet
Malware Analysis Using Python and Kaggle Dataset
4 pages
Windows Operating System Malware Detection Using M
No ratings yet
Windows Operating System Malware Detection Using M
10 pages
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
No ratings yet
Analyzing and Comparing The Effectiveness of Malware Detection - A Study of Machine Learning Approaches - ScienceDirect
39 pages
15709-Article Text-55876-2-10-20220114
No ratings yet
15709-Article Text-55876-2-10-20220114
26 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
Dynamic Malware Analysis Using Machine Learning-Ba
No ratings yet
Dynamic Malware Analysis Using Machine Learning-Ba
20 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
Final Synposis
No ratings yet
Final Synposis
10 pages
FRP Design
No ratings yet
FRP Design
20 pages
Robust Malicious Software Detection and Classifica
No ratings yet
Robust Malicious Software Detection and Classifica
16 pages
Malware Classification Dimva08
No ratings yet
Malware Classification Dimva08
20 pages
HPC-Based Malware Detection
No ratings yet
HPC-Based Malware Detection
12 pages
A New Malware Detection Model Using
No ratings yet
A New Malware Detection Model Using
9 pages
MCA Thesis 21MCA1088 Vikku Kumar
No ratings yet
MCA Thesis 21MCA1088 Vikku Kumar
72 pages
Symmetry 14 02304
No ratings yet
Symmetry 14 02304
11 pages
Salifyanji & Bethsaida Kmu
No ratings yet
Salifyanji & Bethsaida Kmu
12 pages
Malware Detection with Ensemble Learning
No ratings yet
Malware Detection with Ensemble Learning
70 pages
Development of Malware Detection and Analysis Mode
No ratings yet
Development of Malware Detection and Analysis Mode
50 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
7 pages
Malcode Detection
No ratings yet
Malcode Detection
5 pages
Android Malware Detection Study
No ratings yet
Android Malware Detection Study
5 pages
Rawmal-Tf: Raw Malware Dataset Labeled by Type and Family: David B Alik, Martin Jure Cek, Mark Stamp
No ratings yet
Rawmal-Tf: Raw Malware Dataset Labeled by Type and Family: David B Alik, Martin Jure Cek, Mark Stamp
32 pages
Ijcna 2021 o 56
No ratings yet
Ijcna 2021 o 56
18 pages
Thesis Final PDF
No ratings yet
Thesis Final PDF
93 pages
Microsoft Malware Analysis
No ratings yet
Microsoft Malware Analysis
16 pages
BlackBook-Report FY-ML MalwareDetection1
No ratings yet
BlackBook-Report FY-ML MalwareDetection1
48 pages
Group 7
No ratings yet
Group 7
25 pages
Symmetry 14 02304 With Cover
No ratings yet
Symmetry 14 02304 With Cover
12 pages
Efficient and Effective Malware Detection System
No ratings yet
Efficient and Effective Malware Detection System
5 pages
Project JAISON
No ratings yet
Project JAISON
61 pages
A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques
No ratings yet
A Comprehensive Survey On Identification of Malware Types and Malware Classification Using Machine Learning Techniques
8 pages
153 Shyam Icgcct2024
No ratings yet
153 Shyam Icgcct2024
20 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
5 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
Cybersecurity ML for Malware Detection
No ratings yet
Cybersecurity ML for Malware Detection
15 pages
Ahmadi Et Al. - 2016 - Novel Feature Extraction, Selection and Fusion For Effective Malware Family Classification
No ratings yet
Ahmadi Et Al. - 2016 - Novel Feature Extraction, Selection and Fusion For Effective Malware Family Classification
7 pages
Tuning The K Value in K-Nearest Neighbors For Malware Detection
No ratings yet
Tuning The K Value in K-Nearest Neighbors For Malware Detection
8 pages
FF 44
No ratings yet
FF 44
11 pages
Malware Classification ML Report TechGB2336 Group13
No ratings yet
Malware Classification ML Report TechGB2336 Group13
27 pages
Applsci 12 08604 v2
No ratings yet
Applsci 12 08604 v2
21 pages
Build Operate Transfer
No ratings yet
Build Operate Transfer
16 pages
The Natural History of Dental Caries Lesions A 4 Year
No ratings yet
The Natural History of Dental Caries Lesions A 4 Year
8 pages
Does Demon Possession Exist Today
No ratings yet
Does Demon Possession Exist Today
4 pages
Cat Litter Box Solutions by Innovative Synergy
No ratings yet
Cat Litter Box Solutions by Innovative Synergy
21 pages
Biotech
No ratings yet
Biotech
19 pages
Group Superannuation Retirement Planning
No ratings yet
Group Superannuation Retirement Planning
5 pages
Household Food Security 2022 All Slides
No ratings yet
Household Food Security 2022 All Slides
152 pages
JC Agriculture 2021 Question Paper 2
No ratings yet
JC Agriculture 2021 Question Paper 2
16 pages
2017 D 2.0 TCI-R D 2.0 TCI-R Schematic Diagrams Engine Electrical System Engine Control System Schematic Diagrams
No ratings yet
2017 D 2.0 TCI-R D 2.0 TCI-R Schematic Diagrams Engine Electrical System Engine Control System Schematic Diagrams
1 page
Philippines: Philippines: Pearl of The Orient Sea Why?
No ratings yet
Philippines: Philippines: Pearl of The Orient Sea Why?
3 pages
ECOLAB Nalco 7330 L
No ratings yet
ECOLAB Nalco 7330 L
4 pages
Office Cadet - Seagull Tracking 1
No ratings yet
Office Cadet - Seagull Tracking 1
8 pages
Daftar Harga Netto
No ratings yet
Daftar Harga Netto
53 pages
Argentina
No ratings yet
Argentina
6 pages
HRSG Basics PDF
No ratings yet
HRSG Basics PDF
14 pages
Bugkalot Coffee: Empowering Local Farmers
No ratings yet
Bugkalot Coffee: Empowering Local Farmers
23 pages
Mini Project
No ratings yet
Mini Project
23 pages
Sustainable Energy for Schools
No ratings yet
Sustainable Energy for Schools
4 pages
General Organic Chemistry GOC Workbook + Practice Problems 1
No ratings yet
General Organic Chemistry GOC Workbook + Practice Problems 1
12 pages
Manual de Usuario Motorola MA1 (Español - 2 Páginas)
No ratings yet
Manual de Usuario Motorola MA1 (Español - 2 Páginas)
2 pages
The Supernatural Kingdom An Atmosphere of His Presence R Pepe Ramnath PDF Download
No ratings yet
The Supernatural Kingdom An Atmosphere of His Presence R Pepe Ramnath PDF Download
41 pages
Risiko TI pada Sistem Manajemen Dokumen JATEL
No ratings yet
Risiko TI pada Sistem Manajemen Dokumen JATEL
14 pages
LF 90LS Parts Manual PDF
100% (2)
LF 90LS Parts Manual PDF
348 pages
Cyklon User Manual-4,4
No ratings yet
Cyklon User Manual-4,4
29 pages
Tequila T
No ratings yet
Tequila T
17 pages
Prevention of Cruelty of Animals Act 1960 (Dog-Breeding & Marketing Rule-2017)
No ratings yet
Prevention of Cruelty of Animals Act 1960 (Dog-Breeding & Marketing Rule-2017)
17 pages
Try Out Modern Freezer Meals Simple Recipes To Cook Now and Freeze For Later Full Text Download
100% (10)
Try Out Modern Freezer Meals Simple Recipes To Cook Now and Freeze For Later Full Text Download
19 pages
Patient Health Information Form
No ratings yet
Patient Health Information Form
3 pages
Environmental Quality (Scheduled Wastes) Regulations 2005 - P.U. (A) 294-2005 PDF
No ratings yet
Environmental Quality (Scheduled Wastes) Regulations 2005 - P.U. (A) 294-2005 PDF
38 pages
Justin Walter Nelson Court Records
No ratings yet
Justin Walter Nelson Court Records
3 pages