0% found this document useful (0 votes)

23 views19 pages

Phishing Website Detection

A research paper

Uploaded by

theclassyexsistence

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views19 pages

Phishing Website Detection

A research paper

Uploaded by

theclassyexsistence

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Phishing Website detection

Submitted in partial fulfillment of the requirements

of the degree of

Bachelor of Engineering

Megha Agarwal (04)

Arieyshma Chowhan (20)
Shruti Jani (44)
Hansika Koli (54)

Supervisor:
Prof. Renuka Nagpure

Department of Information Technology

Atharva College of Engineering

Year: 2022-2023

1
ATHARVA COLLEGE OF ENGINEERING
MALAD (W), MUMBAI 400 095
YEAR: 2021-22

CERTIFICATE
This is to certify that

Megha Agarwal
Arieyshma Chowhan
Shruti Jani
Hansika Koli

have submitted the project report for the requirements of the Bachelor of
Engineering in Information Technology satisfactorily
on

“Phishing Website Detection”

As prescribed by the University of Mumbai Under the guidance of

PROJECT GUIDE H.O.D. PRINCIPAL

INTERNAL EXAMINER COLLEGE SEAL EXTERNAL EXAMINER

2
B.E. Mini-Project Report Approval

This mini-project synopsis entitled Phishing Website Detection by

Megha Agarwal, Arieyshma Chowhan, Shruti Jani, Hansika Koli
is approved for the degree of Information Technology from University
of Mumbai.

Examiners

Date:

Place:

3
Declaration

I declare that this written submission represents my ideas in my own words

and where others' ideas or words have been included, I have adequately cited
and referenced the original sources. I also declare that I have adhered to all
principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in my submission. I understand
that any violation of the above will be cause for disciplinary action by the
Institute and can also evoke penal action from the sources which have thus not
been properly cited or from whom proper permission has not been taken when
needed.

-----------------------------------------

(Signature)

-----------------------------------------

Megha Agarwal (04)

Arieyshma Chowhan (20)
Shruti Jani (44)
Hansika Koli (54)

Date:
Date:

4
Table of Contents

Chapter 1 Introduction 7
1.1 Motivation 7
1.2 Problem Statement 7
1.3 Objectives 8
1.4 Scope 8
Chapter 2 Review of Literature 9
Chapter 3 Report on Present Investigation 11
3.1 Proposed System 11
3.1.1 Block diagram 11
3.2 Implementation 13
3.2.1 ML Algorithm 14
3.2.2 Dataset description / Data 15
Preparation/Feature Engineering
Chapter 4 Model Implementation 16
• Training of Model
• Evaluation of Model
Chapter 5 Results and Discussion (Screenshots of the 17
output with description )
5.1 Parameter Tuning and Inference
Chapter 6 Conclusion 18

Chapter 7 Future Scope 19

References

5
List of Figures
Figure No. Figure Name Page No.
3.1 BLOCK DIAGRAM 11

List of Tables
Table No. Table Name Page No.
3.1 LITERATURE REVIEW 9

Chapter 1
6
INTRODUCTION
In recent years, advancements in Internet and cloud technologies have led to a significant
increase in electronic trading in which consumers make online purchases and transactions.
This growth leads to unauthorized access to users’ sensitive information and damages the
resources of an enterprise. Phishing is one of the familiar attacks that trick users to access
malicious content and gain their information. In terms of website interface and uniform
resource locator (URL), most phishing webpages look identical to the actual webpages.

1
MOTIVATION

Website Phishing costs internet users billions of dollars per year. Phishers steal personal
information and financial account details such as usernames and passwords, leaving users
vulnerable in the online space. CheckPoint Research Security Report 2018, 77% of IT
professionals feel their security teams are unprepared for today’s cybersecurity challenge, and
64% of organizations have experienced a phishing attack in the past year. Detecting phishing
websites is not easy because of the use of URL obfuscation to shorten the URL, link
redirections and manipulating link in such a way that it looks trustable and the list goes on.
This necessitated the need to switch from traditional programming methods to machine
learning approach

Problem Statement
Phishing detection techniques do suffer low detection accuracy and high false alarm
especially when novel phishing approaches are introduced. Besides, the most common
technique used, blacklist-based method is inefficient in responding to emanating phishing
attacks since registering new domain has become easier, no comprehensive blacklist can
ensure a perfect up-to-date database.

7
OBJECTIVES

The objectives are as follows:

To develop a novel approach to detect malicious URL and alert users.

To apply ML techniques in the proposed approach in order to analyze the real time URLs and
produce effective results.
To implement the concept of RNN, which is a familiar ML technique that has the capability
to handle huge amounts of data.

The rest of the paper is organized as follows: Section 1 introduces the concept of malicious
URL and objective of the study. The background of the study and related literature in
detecting URL is discussed in section 2. Section 3 presents the methodology of the research.
Results and discussion are presented in section 4. Finally, section 5 concludes the study with
its future direction.

4
SCOPE

Website Phishing costs internet users billions of dollars per year. Phishers steal personal
information and financial account details such as usernames and passwords, leaving users
vulnerable in the online space.
The COVID-19 pandemic has boosted the use of technology in every sector, resulting in
shifting of activities like organizing official meetings, attending classes, shopping, payments,
etc. from physical to online space. This means more opportunities for phishers to carry out
attacks impacting the victim financially, psychologically & professionally.

Chapter 2
Review of Literature

8
Table 3.1

Sr AUTHOR/YEAR TITLE WORK

No
.
1 Amani HYPERLINK Detecting We have
"https://ieeexplore.ieee.org/author/37086921111"Alsw Phishing selected the
ailem, 2019 Websites Random
Using Forest. They
Machine conclude
Learning their paper
with
combination
of 26
features.
2 Weiheng Bai,2020 Detection of This software
Phishing is designed to
Website show
using awareness of
Machine the
Learning extensive
level of its
functionality,
whereas our
software
blacklists the
particular
website.

3 Abdulhamit Subasi , 2020 Comparison This paper

of Adaboost aims to
with enhance
MultiBoosti detection
ng for method to
Phishing detect
Website phishing
Detection websites
using SVM.

4 Guru raj Harinahalli Lokesh,2020 Phishing This paper

website aims to
detection enhance
based on detection
effective method to
machine detect
learning phishing
approach websites
using
Random

9
Forest, K
nearest
neighbors.

Phishing attack is a simplest way to obtain sensitive

information from innocent users. These papers deals with machine learning
technology for detection of phishing URLs by extracting and
analyzing various features of legitimate and phishing URLs.
Some Machine Learning Algorithms like decision Tree, random forest and Support vector
machine algorithms are used to detect phishing websites.
These papers are providing us with above 85% of accuracy, also result shows
that classifiers give better performance when we use more data as training data

Chapter 3

10
Report on Present Investigation

3.1 Proposed System

Uses different machine learning models trained over features like if URL contains @, if it has
double slash redirecting, page rank of the URL, number of external links embedded on the
webpage, etc.
Neural network perceptron on data provided by Machine Learning and were able to achieve a
better accuracy This approach could get up to 92% true positive rate and 0.4% false positive
rate.

3.1.1 Block diagram

Figure 3.1

Steps for the training and evaluation of model:

• Dataset Collection - The set of phishing URLs are collected from opensource service
called Kaggle. This service provide a set of phishing URLs in multiple formats like
csv, json etc. that gets updated on a regular basis. To download the data:
https://www.kaggle.com/datasets/shashwatwork/phishing-dataset-for-machine-
learning

11
• Data Exploration - • This step helps identifying styles and issues inside the dataset, as
well as finding out which model or algorithm to apply in next steps.
• Extracting Features and Feature Selection - Address Bar based Features,Domain
based Features,HTML & Javascript based Features
So, all together 48 features are extracted from the 10,000 URL dataset and are
stored in 'Phishing_Legitimate_full' .csv file in the DataFiles folder.

• Model Training and Classification

• - Earlier than declaring the ML model training, the facts is break up into 80-20 i.e.,
8000 education samples & 2000 checking out samples. From the dataset, it's
far clean that this is a supervised system studying undertaking. There
are main varieties of supervised machine studying problems, called category and
regression.
This data set comes beneath type trouble, because the enter URL is classed as
phishing (1) or legitimate (zero). The supervised machine gaining knowledge
of models (class) taken into consideration to teach the dataset on this mission are:
Logistic Regression
K NeighbourClassifier
Random Forest
Decision Tree
XGBoost

All these models are trained on the dataset and evaluation of the model is done with
the test dataset. The elaborate details of the models & its training are mentioned in
https://colab.research.google.com/drive/1t6eUJFBhe-rfBK2NMc2DU1-TPcHMt7In?
usp=sharing

• Model Evaluation: From the obtained results of the above models, XGBoost Classifier
has highest model performance of 99%

3.2 Implementation

12
13
3.2.1 ML Algorithm
Xg Booster
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
flexible and portable. It implements machine learning algorithms under the Gradient Boosting
framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that
solve many data science problems in a fast and accurate way.
STEPS:
Step 1: Load the important libraries
Step 2: Import dataset.
Step 3: Divide the dataset into train and test
Step 4: Initializing the models
Step 5: Fitting the models
Step 6: Coming up with predictions
Step 7: Evaluating model’s performance

3.2.2 Dataset description / Data Preparation/Feature Engineering

Data Preparation is the process of collecting, cleaning, and consolidating data into one file or
data table, primarily for use in analysis.
The major tasks we use in data preparation are as follows:
• Data discretization
• Data cleaning
• Data integration
• Data transformation
• Data reduction
We have collected the dataset from Kaggle under the name Phishing_Legitimate_full.

Chapter 4

14
Model Implementation
• Training of Model

• Evaluation of Model

Chapter 5
Results and Discussion

15
5.1 Parameter Tuning and Inference

Chapter 6
Conclusion
To the best of our knowledge, the present study is the first review which included results
from all studies that applied machine learning methods to the detection of Phishing Websites.

16
The proposed observe the phishing method within the context of category, where phishing
website is taken into consideration to involve automatic categorization of web sites into a
predetermined set of sophistication values primarily based on several features and the
magnificence variable. The ML primarily based phishing strategies depend on internet site
functionalities to accumulate records which could help classify websites for detecting
phishing sites. The hassle of phishing can't be eliminated, however can be reduced by means
of preventing it in two methods, improving centered anti-phishing strategies and strategies
and informing the public on how fraudulent phishing web sites may be detected and
identified. To fight the ever evolving and complexity of phishing attacks and approaches, ML
anti-phishing techniques are critical. The outcome of this examine famous that the proposed
method offers advanced effects as opposed to the present deep studying strategies. The
model has performed higher accuracy and F1—score with restrained amount of time. The
destiny route of this observe is to expand an unmonitored deep mastering method to generate
insight from a URL. in addition, the study can be prolonged with a view to generate an final
results for a bigger network and defend the privacy of an man or woman.

Chapter 7
Future Scope
• This task can be further prolonged to advent of browser extention or advanced a GUI
which takes the URL and predicts it is nature i.e., valid of phishing.

17
• As of now, i am working closer to the introduction of browser extention for this
venture. and can even attempt the GUI option also.
• The further traits can be up to date at the earliest.
• We’ll be looking forward in making a full fledge application that directly blocks the
website instead of checking.

References

 Alswailem, A. (2019)Detecting Phishing Websites Using Machine Learning

 Bai,W(2020) Phishing Website Detection Based on Machine Learning Algorithm

 Subasi,A()2020 Comparison of Adaboost with MultiBoosting for Phishing Website
Detection
 Boregowda,G(2020)Phishing website detection based on effective machine learning
approach

18
19

Phishing
No ratings yet
Phishing
8 pages
SOP For Envirox Maintenance Lifted From OM-0315, NA Harmonized Envirox Manual - Removed
No ratings yet
SOP For Envirox Maintenance Lifted From OM-0315, NA Harmonized Envirox Manual - Removed
6 pages
Phishing URL Detection Presentation[1]
No ratings yet
Phishing URL Detection Presentation[1]
12 pages
Botanical Pitch Deck
No ratings yet
Botanical Pitch Deck
7 pages
LESSON 5 -Maintenance of Computer Systems and Networks
No ratings yet
LESSON 5 -Maintenance of Computer Systems and Networks
13 pages
Phishing-Detection Using Ml[1]
No ratings yet
Phishing-Detection Using Ml[1]
14 pages
Paper 7AdvancesinEngineeringSoftware
No ratings yet
Paper 7AdvancesinEngineeringSoftware
6 pages
B5 Project Report Format SEM I 2022
No ratings yet
B5 Project Report Format SEM I 2022
16 pages
My Essay Presentation
No ratings yet
My Essay Presentation
18 pages
Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014
No ratings yet
Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014
9 pages
LESSON 2.1 - Using Variables in PL/SQL: Page - 1
No ratings yet
LESSON 2.1 - Using Variables in PL/SQL: Page - 1
21 pages
The Rise of Artificial Intelligence
No ratings yet
The Rise of Artificial Intelligence
2 pages
Phishing Website Classification and Detection Using Machine Learning
No ratings yet
Phishing Website Classification and Detection Using Machine Learning
4 pages
B5_PPT_Final-1
No ratings yet
B5_PPT_Final-1
15 pages
Udyog Aadhaar Acknowledgement
No ratings yet
Udyog Aadhaar Acknowledgement
1 page
B5_Project Synopsis
No ratings yet
B5_Project Synopsis
5 pages
THT41-THT42-THT44 - Infrared
No ratings yet
THT41-THT42-THT44 - Infrared
7 pages
Phishing
No ratings yet
Phishing
10 pages
Phishing Detection Using Machine Learnin
No ratings yet
Phishing Detection Using Machine Learnin
5 pages
Main pagesPDF
No ratings yet
Main pagesPDF
6 pages
Batch-22
No ratings yet
Batch-22
14 pages
22 04 CPE Presentation
No ratings yet
22 04 CPE Presentation
18 pages
Presentation Slides
No ratings yet
Presentation Slides
42 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
24 pages
Second Review
No ratings yet
Second Review
26 pages
Jurnal Intan Vol 5 No 1 2022 Raharjo FIX
No ratings yet
Jurnal Intan Vol 5 No 1 2022 Raharjo FIX
6 pages
Midterm Project Report
No ratings yet
Midterm Project Report
21 pages
URL Phishing
No ratings yet
URL Phishing
36 pages
EESM 539 - Broadband Wireless Communications Part III: Wireless Applications
No ratings yet
EESM 539 - Broadband Wireless Communications Part III: Wireless Applications
22 pages
Department of Computer Engineering: Phishing Website Detector Using ML
No ratings yet
Department of Computer Engineering: Phishing Website Detector Using ML
13 pages
Phishing_Review_2023
No ratings yet
Phishing_Review_2023
17 pages
Imb 140d Plus
No ratings yet
Imb 140d Plus
8 pages
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
No ratings yet
Logistic Regression Based Machine Learning Technique For Phishing Website Detection
4 pages
Phishing URL Detection Using ML: Project Report
No ratings yet
Phishing URL Detection Using ML: Project Report
25 pages
128 Submission
No ratings yet
128 Submission
7 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
No ratings yet
Sat - 26.Pdf - Phishing Website Detection Using Novel Machine Learning Fusion Approach
11 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
Career Objective: Bachelor of Business Administration (B.B.A)
No ratings yet
Career Objective: Bachelor of Business Administration (B.B.A)
2 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Group Assignment (Oct 2021-Feb 2022)
No ratings yet
Group Assignment (Oct 2021-Feb 2022)
2 pages
LNBF Install Guide
No ratings yet
LNBF Install Guide
2 pages
Towards Detection of Phishing Websites On Client-Side Using Machine
No ratings yet
Towards Detection of Phishing Websites On Client-Side Using Machine
14 pages
base paper
No ratings yet
base paper
16 pages
Project Docoment Merged
No ratings yet
Project Docoment Merged
86 pages
Final Yr Project PhishingAttack Ppt
No ratings yet
Final Yr Project PhishingAttack Ppt
12 pages
Machine_Learning_for_Detecting_the_Phishing_Threats
No ratings yet
Machine_Learning_for_Detecting_the_Phishing_Threats
6 pages
Bus Finance IT
No ratings yet
Bus Finance IT
46 pages
Fake Website Detection
No ratings yet
Fake Website Detection
13 pages
phishing4
No ratings yet
phishing4
6 pages
updated_phishing_url_detection
No ratings yet
updated_phishing_url_detection
13 pages
Batch-5 Journal-6 ECE-D new (1)
No ratings yet
Batch-5 Journal-6 ECE-D new (1)
6 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
7.1 The Kirkpatrick Model of Training Evaluation For Report-TATA
No ratings yet
7.1 The Kirkpatrick Model of Training Evaluation For Report-TATA
25 pages
1NH16CS054
No ratings yet
1NH16CS054
95 pages
1822 B.E Cse Batchno 287
No ratings yet
1822 B.E Cse Batchno 287
65 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
Research Report
No ratings yet
Research Report
19 pages
Phishing Phase1 Report
No ratings yet
Phishing Phase1 Report
20 pages
paper2
No ratings yet
paper2
10 pages
Module 01 RealAttackScenario
No ratings yet
Module 01 RealAttackScenario
49 pages
Major Project File
No ratings yet
Major Project File
53 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
CyberSec Review3 Team10
No ratings yet
CyberSec Review3 Team10
28 pages
3 Years Spare Parts
No ratings yet
3 Years Spare Parts
2 pages
Major Project Final Report
No ratings yet
Major Project Final Report
53 pages
92077v00 Xilinx WhitePaper Final
No ratings yet
92077v00 Xilinx WhitePaper Final
15 pages
Automated Phishing Detection Through URL Analysis and Machine Learning
No ratings yet
Automated Phishing Detection Through URL Analysis and Machine Learning
9 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
Network Security Report
No ratings yet
Network Security Report
42 pages
User'S Manual: Capstone Microturbine
No ratings yet
User'S Manual: Capstone Microturbine
56 pages
final ppt
No ratings yet
final ppt
26 pages
Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
1998 - Jeep - Grand Cherokee (ZJ) - Parts Manual
100% (2)
1998 - Jeep - Grand Cherokee (ZJ) - Parts Manual
364 pages
Feasibility Study Dry Ports
No ratings yet
Feasibility Study Dry Ports
74 pages
Detecting Phishing Website With Code Implementation
No ratings yet
Detecting Phishing Website With Code Implementation
13 pages
Food Monitoring System Using Iot
No ratings yet
Food Monitoring System Using Iot
4 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
MINI PROJECT PHISHING WEBSITE DETECTION USING ML
No ratings yet
MINI PROJECT PHISHING WEBSITE DETECTION USING ML
45 pages
Detection of Phishing Website
No ratings yet
Detection of Phishing Website
12 pages
KPMG - Nigeria - Transforming Internal Audit and Control Through Digital Innovation
No ratings yet
KPMG - Nigeria - Transforming Internal Audit and Control Through Digital Innovation
21 pages
Project Report1
No ratings yet
Project Report1
83 pages
Ethical Hacking Basics for New Coders: A Practical Guide with Examples
From Everand
Ethical Hacking Basics for New Coders: A Practical Guide with Examples
William E. Clark
No ratings yet
Yogesh Cs Project
100% (1)
Yogesh Cs Project
15 pages
Phishing Website Detection DOCUMENTATION
0% (2)
Phishing Website Detection DOCUMENTATION
80 pages
SM Minarcmig 170-180 V1.0.En
No ratings yet
SM Minarcmig 170-180 V1.0.En
23 pages
Abjac 12.1 User Guide
No ratings yet
Abjac 12.1 User Guide
478 pages
Mobile Network Dimensioning
No ratings yet
Mobile Network Dimensioning
8 pages
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
BMIDE
No ratings yet
BMIDE
935 pages

Phishing Website Detection

Uploaded by

Phishing Website Detection

Uploaded by

Phishing Website detection

Submitted in partial fulfillment of the requirements

Megha Agarwal (04)

Department of Information Technology

Atharva College of Engineering

“Phishing Website Detection”

As prescribed by the University of Mumbai Under the guidance of

PROJECT GUIDE H.O.D. PRINCIPAL

INTERNAL EXAMINER COLLEGE SEAL EXTERNAL EXAMINER

This mini-project synopsis entitled Phishing Website Detection by

I declare that this written submission represents my ideas in my own words

Megha Agarwal (04)

Chapter 7 Future Scope 19

The objectives are as follows:

To develop a novel approach to detect malicious URL and alert users.

Sr AUTHOR/YEAR TITLE WORK

3 Abdulhamit Subasi , 2020 Comparison This paper

4 Guru raj Harinahalli Lokesh,2020 Phishing This paper

Phishing attack is a simplest way to obtain sensitive

3.1 Proposed System

3.1.1 Block diagram

Steps for the training and evaluation of model:

• Model Training and Classification

3.2.2 Dataset description / Data Preparation/Feature Engineering

 Alswailem, A. (2019)Detecting Phishing Websites Using Machine Learning

 Bai,W(2020) Phishing Website Detection Based on Machine Learning Algorithm

You might also like