0% found this document useful (0 votes)

36 views4 pages

Malware Analysis Using Python and Kaggle Dataset

The lab focuses on analyzing malware using Python and a Kaggle dataset, covering steps such as data exploration, preprocessing, feature engineering, and machine learning model training. Key techniques include handling missing values, understanding class distribution, and using algorithms like Random Forest for classification. The lab concludes with model evaluation and saving the trained model for future predictions.

Uploaded by

Nadou She

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views4 pages

Malware Analysis Using Python and Kaggle Dataset

Uploaded by

Nadou She

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Malware Analysis Lab Dr Benabderrezak

Lab : Malware Analysis Using Python and Kaggle Dataset

Objective
The objective of this lab is to analyze malware using Python by exploring a Kaggle dataset, performing feature
extraction, and applying machine learning techniques for malware classification.

Prerequisites
- Python : Basic understanding of Python programming.
- Pandas & NumPy : Used for data manipulation and numerical operations
- Matplotlib & Seaborn : Visualization libraries for data analysis
- Scikit-learn : Essential for machine learning tasks such as data preprocessing, model training, and
evaluation
- Joblib : Used for saving and loading trained models
- Kaggle Account : Required to download datasets
- Jupyter Notebook or Python IDE : Recommended for running the lab efficiently.

Step 1: Install Required Libraries

pip install pandas numpy sklearn matplotlib seaborn joblib
Step 2: Download the Malware Dataset from Kaggle
- Visit Kaggle and search for a malware dataset (e.g., "Microsoft Malware Classification")
- Download the dataset and place it in your working directory
Step 3 : Load the Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load dataset (adjust filename as needed)
df = pd.read_csv('malware_dataset.csv')
# Display basic info
df.info()
df.head()

1
Malware Analysis Lab Dr Benabderrezak

Step 4: Data Exploration and Preprocessing

1. Checking for Missing Values
- Before proceeding with data analysis, it is essential to check if there are any missing values in the dataset.
- Missing data can impact the accuracy of machine learning models.

# Check for missing values

print("Missing values:")
print(df.isnull().sum())

If any missing values are found, we handle them appropriately by filling them with zeros or using other imputation
techniques.

# Handle missing values (if any)

df.fillna(0, inplace=True)

2. Understanding Class Distribution

- Class distribution analysis helps in understanding if the dataset is imbalanced.
- In malware classification, an imbalanced dataset can lead to biased model predictions.

# Check class distribution

sns.countplot(x='label', data=df)
plt.title("Class Distribution")
plt.show()

If the dataset is highly imbalanced, techniques such as oversampling, undersampling, or using balanced algorithms
(e.g., SMOTE) can be applied.
Step 5: Feature Engineering

from sklearn.preprocessing import LabelEncoder

# Convert categorical features to numerical values
label_encoder = LabelEncoder()
df['label'] = label_encoder.fit_transform(df['label'])

2
Malware Analysis Lab Dr Benabderrezak

# Select relevant features

features = df.drop(columns=['label'])
labels = df['label']

Step 6: Split Dataset into Training and Testing Sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

Step 7 : Train a Machine Learning Model

Most Used Machine Learning Algorithms for Malware Detection :
- Random Forest - Ensemble learning method for classification
- Support Vector Machine (SVM) - Effective in high-dimensional spaces.
- Gradient Boosting (XGBoost, LightGBM) - Powerful boosting techniques.
- Neural Networks (Deep Learning) - Advanced detection with deep models.

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report

# Train the model

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:")
print(classification_report(y_test, y_pred))

3
Malware Analysis Lab Dr Benabderrezak

Step 8 : Feature Importance Analysis

feature_importances = pd.Series(model.feature_importances_, index=features.columns)

feature_importances.nlargest(10).plot(kind='barh')
plt.title("Top 10 Important Features")
plt.show()

Step 9: Save the Model

import joblib
joblib.dump(model, "malware_classifier.pkl")

Step 10: Detect Malware on New Data

# Load saved model

model = joblib.load("malware_classifier.pkl")

# Predict on new data (adjust filename accordingly)

new_data = pd.read_csv('new_malware_sample.csv')
new_pred = model.predict(new_data)
print("Prediction:", new_pred)

Conclusion

In this lab, we explored a malware dataset, performed feature engineering, trained a machine learning model, and
evaluated its performance. This approach can be expanded with deep learning techniques and additional feature
extraction methods for better malware detection.

Malware Detection with Machine Learning
No ratings yet
Malware Detection with Machine Learning
29 pages
Naal
No ratings yet
Naal
38 pages
Malware Detection with ML Techniques
No ratings yet
Malware Detection with ML Techniques
8 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
20 pages
School of Computer Science and Engineerin1
No ratings yet
School of Computer Science and Engineerin1
10 pages
Malware Detection Research Paper Updated Soheb6
No ratings yet
Malware Detection Research Paper Updated Soheb6
6 pages
Malware - Detection - Research - Paper - Updated Soheb6
No ratings yet
Malware - Detection - Research - Paper - Updated Soheb6
8 pages
Amogh Bajpai PBL
No ratings yet
Amogh Bajpai PBL
1 page
FRP Design
No ratings yet
FRP Design
20 pages
Malware Detection Using Machine Leaning
No ratings yet
Malware Detection Using Machine Leaning
9 pages
Polymorphic or Evasive Malware Detection Using Metadata and Entropy Analysis
No ratings yet
Polymorphic or Evasive Malware Detection Using Metadata and Entropy Analysis
4 pages
Machine Learning for Malware Detection
No ratings yet
Machine Learning for Malware Detection
16 pages
Malware Application Detection Using Machine Learning
No ratings yet
Malware Application Detection Using Machine Learning
7 pages
Automated Malware Detection Project R1
No ratings yet
Automated Malware Detection Project R1
10 pages
Supervised Malware Detection Model
No ratings yet
Supervised Malware Detection Model
21 pages
Malware Detection Using Machine Learning and Deep Learning
No ratings yet
Malware Detection Using Machine Learning and Deep Learning
10 pages
Machine Learning for Malware Detection
No ratings yet
Machine Learning for Malware Detection
11 pages
Survey Paper of Group 7
No ratings yet
Survey Paper of Group 7
9 pages
Malware Classification ML Report TechGB2336 Group13
No ratings yet
Malware Classification ML Report TechGB2336 Group13
27 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
38 pages
MCA Thesis 21MCA1088 Vikku Kumar
No ratings yet
MCA Thesis 21MCA1088 Vikku Kumar
72 pages
Research Paper
No ratings yet
Research Paper
8 pages
Windows Malware Detection
No ratings yet
Windows Malware Detection
14 pages
Convolutional Neural Networks for Malware Detection
No ratings yet
Convolutional Neural Networks for Malware Detection
37 pages
Malware Detection Lab Experiment
No ratings yet
Malware Detection Lab Experiment
11 pages
Machine Learning Techniques for Malware Analysis
No ratings yet
Machine Learning Techniques for Malware Analysis
18 pages
Development of Malware Detection and Analysis Mode
No ratings yet
Development of Malware Detection and Analysis Mode
50 pages
Research Paper
No ratings yet
Research Paper
8 pages
Malware Detection with Machine Learning
No ratings yet
Malware Detection with Machine Learning
31 pages
6 Thsemminiproject
No ratings yet
6 Thsemminiproject
12 pages
Synopsis 1
No ratings yet
Synopsis 1
7 pages
A New Malware Detection Model Using
No ratings yet
A New Malware Detection Model Using
9 pages
Final Synposis
No ratings yet
Final Synposis
10 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
4 pages
A Framework For Detection of Malicious Code by Exploiting Machine Learning Techniques On Portable Executables
No ratings yet
A Framework For Detection of Malicious Code by Exploiting Machine Learning Techniques On Portable Executables
4 pages
Jijo Renj
No ratings yet
Jijo Renj
4 pages
Malware
No ratings yet
Malware
10 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
2 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Unit 3
No ratings yet
Unit 3
19 pages
Malware Detection with Ensemble Learning
No ratings yet
Malware Detection with Ensemble Learning
70 pages
Malware Detection for CS Students
No ratings yet
Malware Detection for CS Students
30 pages
Malware Detection for Researchers
No ratings yet
Malware Detection for Researchers
11 pages
Malware Detection: Rahul R S (1BM17IS066) Vikram K (1BM17IS089) Rithvik M (1BM17IS068)
No ratings yet
Malware Detection: Rahul R S (1BM17IS066) Vikram K (1BM17IS089) Rithvik M (1BM17IS068)
17 pages
MLC Malware Lab
No ratings yet
MLC Malware Lab
8 pages
Malware Detection Using ANN
No ratings yet
Malware Detection Using ANN
10 pages
Analysis of Cyber Security Threats Using
No ratings yet
Analysis of Cyber Security Threats Using
5 pages
Scalable Malware Detection with ML
No ratings yet
Scalable Malware Detection with ML
18 pages
Amutenda r206668v Technical Paper
No ratings yet
Amutenda r206668v Technical Paper
5 pages
Machine Learning for Advanced Malware Detection
No ratings yet
Machine Learning for Advanced Malware Detection
8 pages
Group 7
No ratings yet
Group 7
25 pages
Salifyanji & Bethsaida Kmu
No ratings yet
Salifyanji & Bethsaida Kmu
12 pages
Comp. Project Synopsis Reviwed
No ratings yet
Comp. Project Synopsis Reviwed
16 pages
Survey of Machine Learning Techniques Fo
No ratings yet
Survey of Machine Learning Techniques Fo
55 pages
Major Project
No ratings yet
Major Project
10 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
4 pages
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
No ratings yet
From Code To Conundrum Machine Learnings Role in Modern Malware Detection
6 pages
Malware Detection by Machine Learning: Shivam Vatshayan Software Engineer
No ratings yet
Malware Detection by Machine Learning: Shivam Vatshayan Software Engineer
11 pages
Cybersecurity ML for Malware Detection
No ratings yet
Cybersecurity ML for Malware Detection
15 pages
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
0% (1)
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
2 pages
Project 3 - Case Study Part 2 - MS Access
0% (1)
Project 3 - Case Study Part 2 - MS Access
6 pages
Math Math: Vocabulary Vocabulary
No ratings yet
Math Math: Vocabulary Vocabulary
6 pages
Metaphysical Deja Vu Hacking and Latour On Science Studies and Metaphysics - Martin Kusch 2002
No ratings yet
Metaphysical Deja Vu Hacking and Latour On Science Studies and Metaphysics - Martin Kusch 2002
9 pages
The Mahamaya Tantra - 84000
No ratings yet
The Mahamaya Tantra - 84000
40 pages
Physics of The Trumpet
No ratings yet
Physics of The Trumpet
17 pages
Expressing Yourself Criticize
No ratings yet
Expressing Yourself Criticize
3 pages
Past Perfect and Second Conditional Exercises
No ratings yet
Past Perfect and Second Conditional Exercises
8 pages
Model-Answers
No ratings yet
Model-Answers
5 pages
Intermediate 1
No ratings yet
Intermediate 1
94 pages
Understanding Psycholinguistics and Language
No ratings yet
Understanding Psycholinguistics and Language
28 pages
God Sees You: Women's Ministries 2024
No ratings yet
God Sees You: Women's Ministries 2024
18 pages
ADA Pitchraq Manual
No ratings yet
ADA Pitchraq Manual
11 pages
Mindless Reading
No ratings yet
Mindless Reading
3 pages
Class 9th Paper
No ratings yet
Class 9th Paper
3 pages
Grade 10 English Assessment Plan 2025
No ratings yet
Grade 10 English Assessment Plan 2025
3 pages
Android Programming Sample Questions
No ratings yet
Android Programming Sample Questions
3 pages
Assignment 1, 2021-22 (1) DMU Assignment
No ratings yet
Assignment 1, 2021-22 (1) DMU Assignment
2 pages
A Cricket Team Management Mini Project
No ratings yet
A Cricket Team Management Mini Project
8 pages
Tổng hợp đề thi HSG tiếng Anh lớp 9 (With key)
No ratings yet
Tổng hợp đề thi HSG tiếng Anh lớp 9 (With key)
26 pages
Introduction
No ratings yet
Introduction
42 pages
Essay Plan Guidelines
No ratings yet
Essay Plan Guidelines
3 pages
2.A - Some Basic Relationships Between Pixels Draft
No ratings yet
2.A - Some Basic Relationships Between Pixels Draft
32 pages
Solution Manual For Effective Human Relations Interpersonal and Organizational Applications 12th Edition by Reece ISBN 1133960839 9781133960836
No ratings yet
Solution Manual For Effective Human Relations Interpersonal and Organizational Applications 12th Edition by Reece ISBN 1133960839 9781133960836
6 pages
Knox Grammar Improving Writing Booklet!!! ENGLISH ADVANCED
No ratings yet
Knox Grammar Improving Writing Booklet!!! ENGLISH ADVANCED
51 pages
Resume Building
No ratings yet
Resume Building
27 pages
Dynamic Arrays: Fundamentals of Symbian C++
No ratings yet
Dynamic Arrays: Fundamentals of Symbian C++
44 pages
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
No ratings yet
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
338 pages
Buchi-Emecheta - A-Feminist-With-A-Small - F'-Or-A-Motherist-With-A-Big - M'
No ratings yet
Buchi-Emecheta - A-Feminist-With-A-Small - F'-Or-A-Motherist-With-A-Big - M'
20 pages
ECS Concepts and Features-Participant Guide
No ratings yet
ECS Concepts and Features-Participant Guide
132 pages

Malware Analysis Using Python and Kaggle Dataset

Uploaded by

Malware Analysis Using Python and Kaggle Dataset

Uploaded by

Malware Analysis Lab Dr Benabderrezak

Lab : Malware Analysis Using Python and Kaggle Dataset

Step 1: Install Required Libraries

Step 4: Data Exploration and Preprocessing

# Check for missing values

# Handle missing values (if any)

2.​ Understanding Class Distribution

# Check class distribution

from sklearn.preprocessing import LabelEncoder

# Select relevant features

Step 6: Split Dataset into Training and Testing Sets

from sklearn.model_selection import train_test_split

Step 7 : Train a Machine Learning Model

from sklearn.ensemble import RandomForestClassifier

# Train the model

# Evaluate the model

Step 8 : Feature Importance Analysis

feature_importances = pd.Series(model.feature_importances_, index=features.columns)

Step 9: Save the Model

Step 10: Detect Malware on New Data

# Load saved model

# Predict on new data (adjust filename accordingly)

You might also like

2. Understanding Class Distribution