Cyber
Attack
Prediction
BCSE355L CLOUD ARCHITECTURE
Sakethram Sathish 23BCE0934
Yug Raithatha 23BCE0964
Divya Juliet 23BCE2297
AIM of the project
To develop an advanced, web-based Cyber Attack Prediction
System that leverages Machine Learning ML, Deep Learning DL,
and Generative Adversarial Networks GANs for:
● Detecting network intrusions in real-time using multiple AI
models
● Improving detection accuracy through ensemble methods and
hybrid modeling
● Addressing class imbalance in cybersecurity datasets using
GAN-based synthetic data augmentation
● Providing an interactive and user-friendly interface for data
analysis, model training, evaluation, and live prediction
MOTIVATION behind the project
We chose a fundamental cybersecurity problem
and approached it with practical elegance.
Instead of just chasing accuracy metrics like
most research, we focused on real-world
applicability - specifically tackling data
imbalances and optimizing the false positive
ratio to create a system that actually works in
production environments.
Project novelties compared to the existing methods
● In the existing methods they use about 1 or 2 approaches in their pipeline but we have integrated 3 different
approaches being ML+DL+ Gen AI in a single pipeline
● Existing models are more focused on theoretical data generation but we attack a real world problem using our Gan
model which is it handels data imbalance
● The predictions of the first model were used to augment the training data of the next model so that the next model
would use the prediction data and the dataset of its [Link] existing models Treat models as separate entities
without knowledge transfer
● Our Model Achieves consistent 88% accuracy across all three configurations other papers report higher theoretical
accuracy but lack practical consistency our models show excellent normal traffic identification (98% accuracy) minimal
false alarms (2-14 mistakes per 1000 normal samples)
● Practical Problem-Solving Focus
● Your model specifically addresses: Class imbalance through GAN-based augmentation,Vanishing gradient problem
using LeakyReLU activation,Overfitting prevention with strategic dropout layers (0.1, 0.2, 0.3)
● Unlike other papers that focus only on accuracy, your research acknowledges: The critical issue of missing 25% of
actual attacks,The trade-off between model complexity and practical performance,The importance of false positive
rates in real deployments
●
Additional Findings
For the same model we replaced
the GAN model with a SMOTE
+Tomek model
why was this change implemented ?
the SMOTE +Tomek model is faster
, it's always predictable hence more
reliable , more simple , helps in
imbalanced data
this model has proved to show
better accuracy than our previous
model and has an exceptional
increase in the out of the bag score
Proposed
Architecture
Screenshots
Figure 1: Home Screen of the Cyber Attack Prediction System. This interface highlights the three core technological components of
the solution: Machine Learning (Random Forest), Deep Learning (Neural Network), and GAN Augmentation, along with key dataset
information and instructions for system usage.
Screenshots
Figure 2: Streamlit Application Control Panel and Data Analysis Interface. This view shows the main navigation sidebar and the
Data Analysis page, where the user can load the dataset before proceeding to model training.
Screenshots
Figure 3: Dataset Load Confirmation and Sample Data View. The application successfully loads the dataset, displaying the total
number of samples and a head-view of the raw features, including session_id, various network metrics, and the binary target
variable, attack_detected.
Screenshots
Figure 4: Attack Detection Distribution of the Loaded Dataset. This bar chart displays the class imbalance, showing the count of
'Normal' instances (blue bar, class 0) versus 'Attack' instances (red bar, class 1) in the cybersecurity dataset.
Screenshots
Figure 5: Feature Correlation Matrix Visualization. This heatmap, displayed in the Data Analysis section, illustrates
the linear relationship between the numeric features. The darker red shades (e.g., along the diagonal) indicate strong
positive correlations, while lighter shades or low absolute values (like the -0.01) indicate weak or no linear correlation
between features.
Screenshots
Figure 6: Descriptive Feature Statistics (Numerical). This table provides a statistical summary of the dataset's numerical
columns, including the count, mean, standard deviation, minimum, quartiles (25%, 50%, 75%), and maximum values for each
feature.
Screenshots
Figure 7: Completed Model Training Pipeline. This view summarizes the sequential execution of the training steps, including
data preprocessing, training of the Random Forest and Deep Learning models, and the final training of the Hybrid RF + DL model
using GAN Augmented data, along with their preliminary performance metrics.
Screenshots
Figure 8: Model Performance Comparison on Test Data (Metrics Tab). This view presents a side-by-side comparison of the three
models' performance, highlighting Accuracy and ROC AUC scores. The RF + DL (GAN Augmented) model is shown achieving a
competitive accuracy, demonstrated visually by the accompanying bar chart.
Screenshots
Figure 9: Visual Comparison of Model Accuracies. This bar chart clearly contrasts the performance of the three models:
Random Forest (Baseline), Deep Learning + Random Forest (Ensemble), and the final Hybrid RF + DL with GAN
Augmentation, showing that the most advanced model maintains the highest performance on the test set.
Screenshots
Figure 10: Detailed Classification Reports for Baseline and GAN Augmented Models. This table provides a comprehensive
breakdown of model performance using metrics such as precision, recall, and F1-score for each class (0=Normal, 1=Attack),
highlighting the effectiveness of the augmentation strategy in balancing performance across classes.
Screenshots
Figure 11: Confusion Matrices for Random Forest, Deep Learning + Random Forest, and RF + DL (GAN Augmented) Models.
This side-by-side comparison shows the number of True Positives (Attack correctly identified), True Negatives (Normal
correctly identified), False Positives, and False Negatives for each model, demonstrating the classification performance on
both Normal and Attack classes.
Screenshots
Figure 12: Deep Learning Training History (Learning Curves Tab). These plots show the model's performance over epochs,
detailing the Area Under the Curve (AUC) and Loss for both the training and validation datasets, confirming the
convergence and stability of the Deep Learning component of the hybrid system.
Screenshots
Figure 13: Real-time Predictions Interface. This view demonstrates the final capability of the system by generating a random
scaled input sample and simultaneously displaying the prediction (Normal/Attack) and the confidence score from each of the
three trained models: Random Forest, Deep Learning, and the Augmented RF (Hybrid) model.
Screenshots
Figure 14: Model Comparison and Final Decision Logic. This section of the predictions interface combines the
individual model results (Model Votes) to arrive at a definitive Final Decision on the network traffic. In this case, a
'Perfect Agreement' among all three models leads to the classification of NORMAL TRAFFIC.
Screenshots
Figure 15: Detailed Prediction Analysis and Confidence Insights. This final prediction view provides a tabular
summary of each model's prediction, confidence, and the probability breakdown for both Normal and Attack
classes. The Insights section confirms the perfect agreement and highlights the model with the highest
prediction confidence (Deep Learning).
Screenshots
Figure 1: The Amazon Web Services (AWS) login interface, illustrating the choice between logging in as the high-privilege Root
user or a managed-privilege IAM user for accessing the AWS Management Console.
Screenshots
Figure 2: The subsequent stage of the AWS Root user sign-in process, requiring the entry of the Root user password to
authenticate access to the AWS Management Console. This screen follows the email entry step (as shown in Figure 1).
Screenshots
Figure 3: The enforcement of Multi-Factor Authentication (MFA) during the AWS sign-in process. This security layer requires the
user to input a unique, time-sensitive code from a designated MFA device, significantly enhancing account protection against
unauthorized access.
Screenshots
Figure 4: The AWS Management Console Home dashboard displayed upon successful login. This interface provides the user with an
overview of recently accessed services (e.g., EC2), regional settings (e.g., Europe (Stockholm)), account identification, and current
status information (AWS Health, Cost and usage).
Screenshots
Figure 5: The Amazon EC2 (Elastic Compute Cloud) Instances dashboard within the AWS Management Console, showing the details of
a single virtual machine instance. The view highlights key operational data relevant to cloud resource management, including the
instance ID, state (Stopped), type ([Link]), and associated network details (Public and Private IPv4 addresses).
Screenshots
Figure 6: The Actions dropdown menu for an EC2 instance, illustrating the administrative capabilities available to the user. This
menu includes critical operational controls such as Launch Instances, Start/Stop/Reboot/Terminate instance, and options for
managing networking and security configurations.
Screenshots
Figure 7: Confirmation of an action execution within the EC2 dashboard. The notification banner confirms the successful
initiation of the instance start operation. The instance state is transitioning from Stopped (Figure 5) to Running, demonstrating
the real-time feedback provided by the AWS Management Console during resource provisioning and operational changes.
Conclusions
This project introduced a novel hybrid model for
cyber attack prediction by sequentially
combining Random Forest, Deep Learning, and
GANs. The key finding was that while the
advanced DL and GAN models provided slight
enhancements, the Random Forest classifier
remained a robust and interpretable core for
detection. Feature analysis confirmed that failed
login attempts are critical predictors.
Future discussions
While our current implementation successfully leverages
GAN-based augmentation to address class imbalance in
cybersecurity data, we recognize the proven effectiveness of
SMOTE variants demonstrated in alternative approaches. Future
iterations could explore hybrid strategies that combine the realistic
sample generation of GANs with the computational efficiency of
SMOTE+ techniques. This would allow us to balance the trade-off
between synthetic data quality and training performance, potentially
further optimizing our false positive ratios while maintaining the
practical elegance that defines our approach to this fundamental
cybersecurity challenge.
Outcomes Achieved
1. Successful AWS Deployment:
The Cyber Attack Prediction System was successfully hosted on AWS,
enabling global accessibility and demonstrating cloud deployment skills such
as EC2 configuration, environment setup, and security management.
2. End-to-End ML Pipeline Implementation:
Implemented a complete data pipeline from preprocessing, feature scaling,
model training, and evaluation to live prediction, showcasing practical
understanding of machine learning workflow in a cloud environment.
3. Integration of Multiple AI Techniques:
Combined Machine Learning (Random Forest), Deep Learning (Neural
Networks), and Generative Adversarial Networks (GANs) to enhance
accuracy and detect anomalous or unseen cyber-attack patterns effectively.
Outcomes Achieved
4. Interactive Web Application Development:
Designed and deployed a Streamlit-based interactive web interface that
allows real-time input and prediction, improving usability and demonstrating
frontend-backend integration with ML models.
5. Model Optimization and Evaluation:
Achieved measurable accuracy improvements through hyperparameter tuning,
normalization, and evaluation metrics like confusion matrix, ROC-AUC, and
classification reports.
References
Abdullahi, M., Baashar, Y., Alhussian, H., Alwadain, A., Aziz, N., Capretz, L. F., & Abdulkadir, S. J. (2022). Detecting Cybersecurity Attacks in Internet of
Things Using Artificial Intelligence Methods: A Systematic Literature Review. MDPI Electronics. Available online:
[Link] MDPI
(Paper) An efficient cyber threat prediction using a novel artificial intelligence technique. (2024). Introduces a Cuttlefish-based Peephole LSTM
(CbP-LSTM) model for threat prediction and preprocessing pipeline. Available via ResearchGate:
[Link]
ResearchGate+1
Meduri, K., Gonaygunta, H., & Nadella, G. S. (2024). Evaluating the Effectiveness of AI-Driven Frameworks in Predicting and Preventing Cyber Attacks.
(International Journal / ResearchGate entry). Available:
[Link]
cks. ResearchGate
Ankalaki, S., Atmakuri, A. R., Pallavi, M., Hukkeri, G. S., Jan, T., & Naik, G. R. (2025). Cyber Attack Prediction: From Traditional Machine Learning to
Generative Artificial Intelligence. (IEEE/Conference/Article draft available online). PDF available:
[Link] Research @ Flinders
Khalaf, M. A., & Steiti, A. (2024). Artificial Intelligence Predictions in Cyber Security: Analysis and Early Detection of Cyber Attacks. Babylonian Journal
of Machine Learning (2024). Available:
[Link] Mesopotamian Pre