Detection of Adversarial Malware Using Deep Learning on Executable
Files
Project Description:
The increasing sophistication of malware, including obfuscation techniques and adversarial
manipulations, makes traditional signature-based detection systems insufficient. This project
proposes a deep learning-based framework to detect such advanced malware and provide
explainable insights about why a file is flagged as malicious.
By combining static and dynamic analysis of executable files, robustness against adversarial
modifications, and explainable AI techniques, this framework aims to enhance malware
detection performance and interpretability for cybersecurity analysts.
Dataset:
EMBER Dataset (Executable Malware Benchmark for ML Research)
Project Steps:
1. Data Preparation
Load the EMBER dataset and split it into training, validation, and test sets
Perform preprocessing:
o Normalize numeric features
o Encode categorical features
o Optionally, transform raw binaries into sequences suitable for deep learning
2. Feature Engineering
Combine multi-modal features:
o Static PE file features (headers, section info, imports)
o Dynamic behavior features (API calls, execution logs)
o Raw byte sequences (optional CNN or Transformer input)
3. Model Design
Train a Deep Neural Network (DNN), CNN, or LSTM on the processed features
Include regularization techniques (dropout, batch normalization) to improve
generalization
Apply adversarial training:
o Simulate minor modifications of malware that attempt to evade detection
o Train the model to remain robust against such changes
4. Explainable AI Integration
Use techniques such as SHAP, LIME, or Integrated Gradients to interpret model
predictions
Provide explanations like:
o Which features or file sections contributed most to labeling the file as
malicious
o Alerts that are understandable by security analysts
5. Evaluation
Metrics:
o Precision, Recall, F1-score, AUC
Test model on unseen malware variants to assess robustness
Compare performance to baseline methods (Random Forest, SVM, classic signature-
based detection)
6. Automated Recommendations
Extend the system using a generative model (LLM) to propose mitigation actions for
flagged files
o Example: “Isolate process X, block file execution, notify analyst”