0% found this document useful (0 votes)

2K views18 pages

Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat

This document discusses using machine learning techniques to detect malicious URLs. It presents an approach that uses machine learning algorithms like random forests, SVMs, and Naive Bayes on a training dataset of URLs classified as malicious or benign. The random forest classifier performed better than SVM for this problem. Previous literature on using machine learning for malicious URL detection is also reviewed, showing methods like associative classification and deep learning have also been applied to this issue.

Uploaded by

ITWorld

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views18 pages

Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat

Uploaded by

ITWorld

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Malicious URL Detection Using

Machine Learning
Presented By
Mr. Swapnil Thorat
TE (Computer Engineering)
Roll No. : TC52

Under the Guidance of

Prof. Dr. Sashikala Mishra
DEPARTMENT OF COMPUTER ENGINEERING
Hope Foundation’s
International Institute of Information Technology, Hinjewadi,
Pune-411057
content
1. Introduction
2. Identify the Social Problem to be solved using Computing Algorithms
3. Motivation
4. Literature Survey
5. Objective
6. Approach
7. Architecture
8. Details of Design and structure of module
9. Advantages and Disadvantages
10. Conclusion & Future work
11. References
Introduction

• Phishing is the most commonly used social engineering and cyber attack.
• Through such attacks, the phisher targets naïve online users by tricking them
into revealing confidential information, with the purpose of using it
fraudulently.
• Have a blacklist of phishing websites which requires the knowledge of
website being detected as phishing.
• Detect them in their early appearance, using machine learning and deep
neural network algorithms.
• Of the below three, the machine learning based method is proven to be most
effective than the other methods.
• Even then, online users are still being trapped into revealing sensitive
information in phishing websites.
• Identify the Social Problem to be solved using
Computing Algorithms
• Malicious Web sites are the basis of most of the criminal activities over
the internet.

• The dangers that arise due to the malicious sites are enormous and the
end-users must be prohibited from visiting such sites.

• The users should prohibit themselves from clicking on such Uniform

Resource Locator (URL).

• The detection of malicious URLs is a binary classification problem and

several Machine Learning Algorithms, namely Random Forests, SVMs
and Naive Bayes are implemented on training dataset. Also, it has
been seen that the Random Forest classifier performs better for the
particular problem than the SVM classifier
Motivation
• Currently, the risk of network information insecurity is increasing
rapidly in number and level of danger. The methods mostly used by
hackers today is to attack end-to-end technology and exploit
human vulnerabilities.

• These techniques include social engineering, phishing, pharming, etc.

One of the steps in conducting these attacks is to deceive users with
malicious Uniform Resource Locators(URLs). As a results, malicious URL
detection is of great interest nowadays.

• There have been several scientific studies showing a number of methods

todetect malicious URLs based on machine learning and deep learning
techniques.

• In this paper, we propose a maliciousURL detection method using

machine learning techniques based on ourproposed URL behaviors and
attributes.

• This is suggested that the proposed system may be considered as an

optimized and friendly used solution for malicious URL detection
Literature Survey
Sr. No. Paper Name And Year Author Summary
1 Empirical Study on Malicious URL Ripon Patgiri(B) , Hemanth Malicious Web sites are the basis of
Detection Using Machine Learning [2018] Katari(B) , Ronit Kumar(B), and most of the criminal activities over the
Dheeraj Sharma( internet. The dangers that arise due to
the malicious sites are enormous and the
end-users must be prohibited from
visiting such sites. The users should
prohibit themselves from clicking on
such Uniform Resource Locator (URL).

2 Detection of URL based Phishing Attacks Ms. Sophiya Shikalgar This paper addresses the widespread
using Machine Learning[ Nov -2019] Department of Computer cybersecurity concern where threat
Engineering actors bypass security defenses and use
Datta Meghe College of URLs to launch various forms of
Engineering, malicious attacks on unsuspecting
Airoli, Navi Mumbai, INDIA individuals. In order to prevent such
Dr. S. D. Sawarkar attacks, the paper proposes the use of
Department of Computer machine learning algorithms to detect
Engineering malicious URLs. The proposed MuD
Datta Meghe College of (Malicious URL Detection) model is
Engineering, trained using an existing dataset which
Airoli, Navi Mumbai, INDIA contains URLs, each with unique
Mrs.Swati Narwane features, and is applied to three different
Department of Computer machine learning classififiers—support
Engineering vector machine, logistic regression and
Datta Meghe College of Naïve Bayes. After training and testing
Engineering, the algorithms, it is observed that Naïve
Airoli, Navi Mumbai, INDIA Bayes classififier recorded the highest
accuracy
3 Malicious URL Detection Based on Sandra Kumi 1 , ChaeHo Lim Cybercriminals have invented
Associative Classifification [2020] 2 and Sang-Gon Lee 1 sophisticated ways such as injecting
malicious code into websites to
disseminate malware in an attempt to
infect target systems. Associative
classifification approaches to detect
malicious URLs mainly focus on
phishing websites. Regarding this, we
present an approach based on
classifification based on association
(CBA) algorithm to detect malicious
URLs comprising phishing, malware,
and drive-by-download websites.

4 Using Deep Learning to Detect Malicious Yuchen Liang Shady Side This paper presents different approaches
URLs [2019] Academy Pittsburgh, PA, United to detect DGAgenerated domains based
States on the features of URLs. The result
[email protected] proves that the DBLSTM algorithm is
g Xiaodan Yan Beijing superior to other conventional machine
University of Posts and learning methods. The source code is
Telecommunications Beijing, posted on GitHub for other groups to
China [email protected] use or to reproduce the same result
(https://github.com/liangy2019/Using-
Deep Learning-to-Detect-Malicious-
URLs). The deep learning technique
presented in the paper can be widely
utilized in the realm of cybersecurity,
especially for energy network security,
to detect attacks initiated by different
domain generation algorithms.
Objectives
• Calculating the accuracy using each of the algorithms.
• Extract features from the training data categorized into lexical features,
network based features and host based features
• Divide the collected dataset into two subsets in the ratio of 80:20 for
training purposes and testing purposes
• Collecting a dataset which consists of huge number of URL’s which consists
of both malicious and non malicious URLs
APPROACH
Below mentioned are the steps involved in the completion of
this project:
• Collect dataset containing phishing and legitimate websites from the open source platforms.
• Write a code to extract the required features from the URL database.
• Analyze and preprocess the dataset by using EDA techniques.
• Divide the dataset into training and testing sets.
• Run selected machine learning and deep neural network algorithms like SVM, Random Forest,
Autoencoder on the dataset.
• Write a code for displaying the evaluation result considering accuracy metrics.
• Compare the obtained results for trained models and specify which is better.
Architecture of System
Technology
1.Naive Bayes:
This classifier can also be known as a Generative
Learning Model. The classification here is based on Baye’s Theorem, it
assumes independent predictors. In simple words, this classifier will assume
that the existence of specific features in a class is not related to the existence
of any other feature. If there is dependency among the features of each other
or on the presence of other features, all of these will be considered as an
independent contribution to the probability of the output. This classification
algorithm is very much useful to large datasets and is very easy to use.
Random Forest:
This classification algorithm are similar to ensemble learning
method of classification. The regression and other tasks, work by building a
group of decision trees at training data level and during the output of the
class, which could be the mode of classification or prediction regression for
individual trees. This classifier accuracy for decision trees practice of
overfitting the training data set.

Support vector machine (SVM):

This is also one of the classification
algorithm which is supervised and is easy to use. It can used for both
classification and regression applications, but it is more famous to be used
in classification applications. In this algorithm each point which is a data
item is plotted in a dimensional space, this space is also known as n
dimensional plane, where the ‘n’ represents the number of features of the
data. The classification is done based on the differentiation in the
classes, these classes are data set points present in different planes
XGBoost:
Recently, the researches have come across an algorithm
“XGBoost” and its usage is very useful for machine learning classification. It
is very much fast and its performance is better as it is an execution of a
boosted decision tree. This classification model is used to improve the
performance of the model and also to improve the speed
Data set:
The data of urls is obtained from Phishtank website,where Phishtank
is an anti-phishing site.It contains urls which is in unstructured form. Our
main objective is to detect whether the url is phishing or egitimate based on
the features extracted. In Preprocessing we have done feature extraction
where The URLs are transmitted to the feature extractor, which extracts
feature values through the predefined URL-based features.The features have
assigned binary values 0 and 1 which indicates that feature is present or not
as shown in figure below. The extracted feature values are stored as input
and passed to the classifiers. structured dataset is given to the classifiers. We
use four methods classification namely: XG Boost, SVM, Naive Bayes and
stacking classifier for detection of url as phishing or legitimate. Now the
classifier will find whether a requested site is a phishing site. When there is
page request , the URL of the requested site is radiate do the feature
extractor. It extracts the feature values through the predefined URL-based
features. These feature values are act as a input for the classifier. After
this we will come to know if the site is phishing or not.
Advantages,
Disadvantages/Limitations of System
Advantages:
•-Provide clear idea about the effective level of each classifier on phishing email detection
•-High level of accuracy by take the advantages of classifiers many
•- High level of accuracy
•-Fast in classification process fast ,less consuming memory, high accuracy, Evolving with time, online working
Disadvantages :
•-Time consuming
• -huge number of features
•-consuming memory Non standard classifier
•-Time consuming because this technique has many layers to make the final result
•-huge number of features -many algorithm for classification which mean time consuming
•-higher cost
•-need large mail server and high memory requirement
•-Less accuracy because it depend on unsupervised leaming, need feed continuously Need feed continuously
Next work
• Working on this project is very knowledgeable and worth the effort.

• Through this project, one can know a lot about the phishing websites and
how they are differentiated from legitimate ones.

• This project can be taken further by creating a browser extensions of

developing a GUI.

• These should classify the inputted URL to legitimate or phishing with the
use of
the saved model.
Conclusion & Future work
It is found that phishing attacks is very crucial and it is important for us to get a
mechanism to detect it. As very important and personal information of the user
can be leaked through phishing websites, it becomes more critical to take care of
this issue.This problem can be easily solved by using any of the machine learning
Algorithm with the classifier.
The proposed technique is much more secured as it detects new and previous
phishing sites The proposed model is also planned to be deployed onlineby
integrating it as a Web browser plug-in capable of warning users of potential
malicious URLs in real time. URLs clicked or typed will be checked based on its
features to determine if it is malicious or not. If it is malicious or suspected to be
malicious, there will be a pop-up informing the user of the potential threat and it
will be temporarily blocked except the user chooses to still navigate to the URL.
The future work also includes evaluating the proposed model against more
recent and diverse datasets along with using additional classifiers such as
decision trees and random forest.
References
• References in standard IEEE format
1. Vanhoenshoven, F., N´apoles, G., Falcon, R., Vanhoof, K., K¨oppen, M.: Detecting
malicious URLs using machine learning techniques. In: 2016 IEEE Symposium Series
on Computational Intelligence (SSCI), pp. 1–8. IEEE (2016)
2. F. Vanhoenshoven, G. Nápoles, R. Falcon, K. Vanhoof, M. Köppen, Detecting
malicious URLs
using machine learning techniques, in 2016 IEEE Symposium Series on
Computational Intelligence (SSCI) (IEEE, 2016), pp. 1–8
3. A. Singh, N. Goyal, A comparison of machine learning attributes for detecting
malicious websites, in 2019 11th International Conference on Communication
Systems & Networks (COMSNETS) (IEEE, 2019), pp. 352–358
4. A.S. Manjeri, R. Kaushik, M. Ajay, P.C. Nair, A machine learning approach for
detecting malicious websites using URL features, in 2019 3rd International
conference on Electronics, Communication and Aerospace Technology (ICECA) (IEEE,
2019), pp. 555–561
5.Internet Security Threat Report (ISTR) 2019–Symantec.
https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-
en.pdf

Mad Lab Viva Questions
0% (1)
Mad Lab Viva Questions
3 pages
IoT-Cloud Integration Challenges
No ratings yet
IoT-Cloud Integration Challenges
2 pages
Content Beyond Syllabus
No ratings yet
Content Beyond Syllabus
7 pages
Ooad Lab Question Set
100% (1)
Ooad Lab Question Set
3 pages
MSD Previous Papers 2022-23
100% (1)
MSD Previous Papers 2022-23
4 pages
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
No ratings yet
F.Y.M.Sc. (CS) Sem-I AI Pract Slip
22 pages
CSDF FlyHigh Services
No ratings yet
CSDF FlyHigh Services
8 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Object Oriented Software Engineering - CCS356 - Notes - Unit 4 - Software Testing and Maintenance
No ratings yet
Object Oriented Software Engineering - CCS356 - Notes - Unit 4 - Software Testing and Maintenance
31 pages
Final BI Lab Manual
No ratings yet
Final BI Lab Manual
42 pages
STM - Lab - Manul III Cse II Sem
No ratings yet
STM - Lab - Manul III Cse II Sem
36 pages
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
No ratings yet
CSE 5th Semester - Software Testing and Automation - CCS366 - Question Bank and Important 2 Marks Questions With Answer
25 pages
CCS356 Object Oriented Software Engineering Apr May 2024 Question Paper Download
No ratings yet
CCS356 Object Oriented Software Engineering Apr May 2024 Question Paper Download
3 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
B Tech AIDS
No ratings yet
B Tech AIDS
43 pages
Machine Learning for Credit Card Fraud Detection
100% (1)
Machine Learning for Credit Card Fraud Detection
22 pages
Computer Networks Lab Manual Course Code bcs653 For 2024 25
No ratings yet
Computer Networks Lab Manual Course Code bcs653 For 2024 25
45 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
Node.js MongoDB Lab Guide
No ratings yet
Node.js MongoDB Lab Guide
4 pages
Computer Networking Lab Record
No ratings yet
Computer Networking Lab Record
85 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Lab Manual - LP2 - Sem - II - 2022 - 23
No ratings yet
Lab Manual - LP2 - Sem - II - 2022 - 23
91 pages
VTU Mca Syllabus
No ratings yet
VTU Mca Syllabus
103 pages
CCS374 Web Application Security Q&A
No ratings yet
CCS374 Web Application Security Q&A
18 pages
Advanced Algorithms - Cse-Cs
No ratings yet
Advanced Algorithms - Cse-Cs
2 pages
CCS356 OOSE Unit - 1 Notes
No ratings yet
CCS356 OOSE Unit - 1 Notes
24 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
Age Detection Using Machine
No ratings yet
Age Detection Using Machine
11 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
Domain Specific Iot
No ratings yet
Domain Specific Iot
17 pages
Maliciousurlpaper
No ratings yet
Maliciousurlpaper
6 pages
Comparative Evaluation of Machine Learning Models For Malicious URL Detection
No ratings yet
Comparative Evaluation of Machine Learning Models For Malicious URL Detection
7 pages
Sensors 23 07760
No ratings yet
Sensors 23 07760
14 pages
Batch 18-Journal
No ratings yet
Batch 18-Journal
7 pages
Analysis For Malicious URLs Using
No ratings yet
Analysis For Malicious URLs Using
17 pages
Malicious URL Detection and Classification Analysis Using Machine Learning Models
No ratings yet
Malicious URL Detection and Classification Analysis Using Machine Learning Models
9 pages
Final Review 1
No ratings yet
Final Review 1
29 pages
15th ICCCNT 2024 Paper 452
No ratings yet
15th ICCCNT 2024 Paper 452
6 pages
B.E Cse Batchno 256
No ratings yet
B.E Cse Batchno 256
57 pages
Man Jeri 2019
No ratings yet
Man Jeri 2019
7 pages
Phishing Final
No ratings yet
Phishing Final
13 pages
Phishing URL Detection with ML Techniques
No ratings yet
Phishing URL Detection with ML Techniques
24 pages
(IJIT-V10I6P4) :roopesh Kumar B N, Rekha B Venkatapur, Suman B S, Gagan Shivanna
No ratings yet
(IJIT-V10I6P4) :roopesh Kumar B N, Rekha B Venkatapur, Suman B S, Gagan Shivanna
5 pages
2 Review
No ratings yet
2 Review
21 pages
ICT4SD Published Version
No ratings yet
ICT4SD Published Version
11 pages
CNNs for Phishing Detection
No ratings yet
CNNs for Phishing Detection
6 pages
Applsci 12 12030 v2
No ratings yet
Applsci 12 12030 v2
14 pages
A New Dataset and Methodology For Malicious URL Classification
No ratings yet
A New Dataset and Methodology For Malicious URL Classification
10 pages
Malicious URL Detection with ML
No ratings yet
Malicious URL Detection with ML
52 pages
Detecting Malicious Urls Using Machine Learning Techniques: A Comparative Literature Review
No ratings yet
Detecting Malicious Urls Using Machine Learning Techniques: A Comparative Literature Review
5 pages
MaliciousURLDetection Acomparativestudy
No ratings yet
MaliciousURLDetection Acomparativestudy
6 pages
Fake Url
No ratings yet
Fake Url
64 pages
Machine Learning for Malicious URL Detection
No ratings yet
Machine Learning for Malicious URL Detection
6 pages
Malicious Url: Analysis and Detection Using Machine Learning
No ratings yet
Malicious Url: Analysis and Detection Using Machine Learning
58 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
Machine Learning Based Malicious URL IP Amp File Classification
No ratings yet
Machine Learning Based Malicious URL IP Amp File Classification
8 pages
Malicious URL Detection with Deep Learning
No ratings yet
Malicious URL Detection with Deep Learning
35 pages
Quantum ML for URL Fraud Detection
No ratings yet
Quantum ML for URL Fraud Detection
18 pages
CT43B0513 Ieee
No ratings yet
CT43B0513 Ieee
6 pages
Malicious - Url - Detect - 1BY21IS087,88
No ratings yet
Malicious - Url - Detect - 1BY21IS087,88
5 pages
Industrial Training Report: Corporate Network With Advance Routing, Switching & Security
No ratings yet
Industrial Training Report: Corporate Network With Advance Routing, Switching & Security
33 pages
Digital Logic Design Overview
No ratings yet
Digital Logic Design Overview
38 pages
E-Commerce Payment & Fraud Solutions
No ratings yet
E-Commerce Payment & Fraud Solutions
16 pages
Java OOP Project for Students
No ratings yet
Java OOP Project for Students
2 pages
Rap Adobe Form
No ratings yet
Rap Adobe Form
11 pages
Maxtor Onetouch 4 Plus
No ratings yet
Maxtor Onetouch 4 Plus
2 pages
9 ch1 (Coexnumbers)
No ratings yet
9 ch1 (Coexnumbers)
34 pages
Understanding Mobile Ecosystem Dynamics
No ratings yet
Understanding Mobile Ecosystem Dynamics
13 pages
Advanced Java and Web Technologies Exam
No ratings yet
Advanced Java and Web Technologies Exam
4 pages
IOT in Agriculture 1
No ratings yet
IOT in Agriculture 1
7 pages
Lesson Plan Ccs 24-25
No ratings yet
Lesson Plan Ccs 24-25
4 pages
Advanced Linux System Administration I: Study Guide For
No ratings yet
Advanced Linux System Administration I: Study Guide For
98 pages
SY48K48H-PD: Product Data Sheet
No ratings yet
SY48K48H-PD: Product Data Sheet
3 pages
Salesforce Developer Roadmap 2024
No ratings yet
Salesforce Developer Roadmap 2024
2 pages
Topcon CT-80 Computerized Tonometer Guide
No ratings yet
Topcon CT-80 Computerized Tonometer Guide
4 pages
18 Secrets of Successful Dropshippers - Every Beginner Needs To Know Author Drop Ship Lifestyle - Voted Best Ecommerce Course by Shopify
No ratings yet
18 Secrets of Successful Dropshippers - Every Beginner Needs To Know Author Drop Ship Lifestyle - Voted Best Ecommerce Course by Shopify
22 pages
Spread Spectrum: Unit - 5 Presented by Mrs. M.P.Sasirekha
No ratings yet
Spread Spectrum: Unit - 5 Presented by Mrs. M.P.Sasirekha
26 pages
Computer Theory Exam 2016
No ratings yet
Computer Theory Exam 2016
3 pages
Thesis On Cloud Computing
No ratings yet
Thesis On Cloud Computing
71 pages
Data Presentation for Students
No ratings yet
Data Presentation for Students
71 pages
2018-Prediction of Site Overhead Costs With The Use of ANN
No ratings yet
2018-Prediction of Site Overhead Costs With The Use of ANN
10 pages
Document From Kyro
No ratings yet
Document From Kyro
139 pages
Gre Student Presentation
No ratings yet
Gre Student Presentation
68 pages
Lilli at McKinsey Case
No ratings yet
Lilli at McKinsey Case
5 pages
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
No ratings yet
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
7 pages
Transportation Model Optimization Methods
No ratings yet
Transportation Model Optimization Methods
64 pages
Basic ICT Skills Class 9 Questions and Answers-1-3
No ratings yet
Basic ICT Skills Class 9 Questions and Answers-1-3
3 pages
1 Condition and Looping Notes Final
No ratings yet
1 Condition and Looping Notes Final
4 pages
Sens y Touch Presentation
No ratings yet
Sens y Touch Presentation
46 pages
ZXONE 8300&8500&8700 Acceptance Test Guide - R1.7
No ratings yet
ZXONE 8300&8500&8700 Acceptance Test Guide - R1.7
152 pages

Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat

Uploaded by

Malicious URL Detection Using Machine Learning: Mr. Swapnil Thorat

Uploaded by

Malicious URL Detection Using

Under the Guidance of

• The users should prohibit themselves from clicking on such Uniform

• The detection of malicious URLs is a binary classification problem and

• These techniques include social engineering, phishing, pharming, etc.

• There have been several scientific studies showing a number of methods

• In this paper, we propose a maliciousURL detection method using

• This is suggested that the proposed system may be considered as an

Support vector machine (SVM):

• This project can be taken further by creating a browser extensions of

You might also like