SECUREURL
-A PHSIHING WEBSITE DETECTION APPLICATION
- GAYATHRI BELIDE (1602-21-733-013)
Internal Guide: Dr.T.Adilakshmi
- CH HARISH REDDY (1602-21-733-012)
Project Co ordinator :Dr.D.Baswaraj
INTRODUCTION
What is Phishing?
• A cyberattack where hackers use deceptive methods to obtain sensitive information.
• Targets include:
• Personal Information: Passwords, credit card numbers, etc.
• Financial Data: Bank account details, transaction credentials.
How Does Phishing Work?
• Conducted via:
• Fake emails or messages.
• Fraudulent websites imitating legitimate businesses.
• Malicious links or attachments.
INTRODUCTION
• Phishing attacks usually involve the creation of fake websites or emails
that seem like those of legitimate businesses, such as banks, social
networking platforms, or online stores.
• Phishing websites are fraudulent websites that imitate legitimate ones,
aiming to deceive users into disclosing sensitive information.
• These websites often have URLs that closely resemble those of
reputable websites, making it challenging for users to distinguish
between them.
INTRODUCTION
Why Detect Phishing Websites?
• Protect Users:
Prevent identity theft and financial losses.
• Maintain Trust:
Ensures businesses and users retain confidence in online platforms.
• Combat Evolving Threats:
Phishing tactics are constantly evolving, requiring robust detection
mechanisms.
INTRODUCTION
• In order to avoid getting phished,
• users should have awareness of phishing websites.
• have a blacklist of phishing websites which requires the
knowledge of website being detected as phishing.
• detect them in their early appearance, using machine learning
and deep neural network algorithms.
LITERATURE SURVEY
1
• Paper Title: "List-Based Detection Methods for Phishing Websites"
• Approach: This study explores list-based detection mechanisms, which include
maintaining blacklists of known phishing URLs and whitelists of trusted URLs. Blacklists,
used in browsers like Google Chrome and Mozilla Firefox via Google Safe Browsing,
provide warnings when malicious sites are detected. Whitelists, on the other hand, help
identify trusted websites by flagging URLs not present in the list as suspicious.
• Limitations: List creation and updates should be based on lightweight mechanisms not
to introduce delays in the detection process,Lists should be constantly updated to
defend against newly discovered phishing attacks, Rules and heuristics devised for
creating and updating the lists should reflect in a timely manner the evolution of the
tactics adopted by attackers.
LITERATURE SURVEY
2
• Paper Title: Page similarity based detection
• Approach: The paper explores page similarity methods for detecting
phishing websites. It compares suspicious pages to legitimate ones based
on textual (HTML, CSS, DOM) and visual (images, logos) content. The
similarity scores help identify phishing attempts by comparing these
elements, with final decisions based on predefined thresholds.
• Limitations: Effectiveness(Evasion techniques like code obfuscation and
image distortions can reduce detection accuracy.), Approaches relying on
external services can be slow.,Large datasets of legitimate pages require
significant storage capacity
LITERATURE SURVEY
3
Paper Title: "Phishing Attack, Its Detections and Prevention Techniques “
Approach: The study outlines various phishing techniques, including email phishing, spear
phishing, whaling, smishing, vishing, clone phishing, and social media phishing. It
emphasizes the sophistication of modern attacks, which leverage social engineering and
advanced technologies like AI.
Limitations: Although the paper provides a broad review, it lacks detailed explanations of
the technical implementation of detection and prevention techniques, such as specific
algorithms or systems.
LITERATURE SURVEY
4
Paper Title: "Study on Phishing Attacks “
Approach: The paper emphasizes methods like:Using custom DNS services to block
malicious sites.Leveraging browser-based phishing lists.Manual verification of links for
authenticity.
Limitations: While the paper categorizes and discusses phishing attacks, the analysis of
detection and prevention methods lacks depth, particularly in emerging phishing techniques
(e.g., AI-driven phishing).
PROPOSED SOLUTION
• The objective of this project is to train machine learning models and
deep neural nets on the dataset created to predict phishing websites.
Both phishing and benign URLs of websites are gathered to form a
dataset and from them required URL and website content-based
features are extracted. The performance level of each model is
measures and compared.
STEPS FOR IMPLEMENTATION
1) DATA COLLECTION
• Legitimate URLs are collected from the dataset provided by
University of New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html.
• From the collection, 5000 URLs are randomly picked.
• Phishing URLs are collected from opensource service called
PhishTank . This service provide a set of phishing URLs in multiple formats
like csv, json etc. that gets updated hourly.
• Form the obtained collection, 5000 URLs are randomly picked.
STEPS FOR IMPLEMENTATION
2)Feature Selection
• Address Bar based Features considered are Domian of URL ,Redirection
‘//’ in URL ,IP Address in URL,‘http/https’ in Domain name ,‘@’ Symbol in
URL ,Using URL Shortening Service ,Length of URL , Prefix or Suffix "-" in
Domain ,Depth of URL
• Domain based Features considered are DNS Record , Age of
Domain ,Website Traffic,End Period of Domain .
• HTML and JavaScript based Features considered are: Iframe
Redirection ,Disabling Right Click ,Status Bar Customization ,Website
Forwarding
STEPS FOR IMPLEMENTATION
3)MACHINE LEARNING MODELS
• This is a supervised machine learning task and this data set comes under classification
problem, as the input URL is classified as phishing (1) or legitimate (0). The machine
learning models (classification) considered to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Support Vector Machines
STEPS FOR IMPLEMENTATION
4. Model training and efficiency evaluation
The sample data set of urls along with the extracted features is split
into train (80) and test data.
Train data is used to train the model,test data is used to test the
efficiency of model after the training process.
Model having high efficiency with test data is chosen for further usage.
CONCLUSION
• This project demonstrates a robust approach to detecting phishing
websites using machine learning algorithms with high efficiency in
classifying the websites as legitimate or phishing.
• Project aims to protect people cyberattacks and increase customers’
trust towards legitimate businesses.
REFERENCES
1. " Phishing or Not Phishing? A Survey on theDetection of Phishing Websites "
Rasha Zieni , Lusia Massari, and Maria Carla Calzarosa, (Senior Member,,IEEE),Department of Electrical,
Computer and Biomedical Engineering, Università di Pavia, 27100 Pavia, Italy
2. " A Study on Adversarial Sample Resistance and Defense Mechanism for Multimodal
Learning-Based Phishing Website Detection”
Phan , VO QUANG MINH, BUI TAN HA
3. "Computer Vision-Based Exercise Form Analysis“
• Reference: Cheng, Z., Lu, Z., Shi, X., & Cao, Y. (2020). "3D Human Pose Estimation
Using Convolutional Neural Networks". IEEE Access, 8, 56863-56873.
• Source: IEEE Xplore.
REFERENCES
4. "Using OpenCV for Real-Time Image Processing in Fitness Applications“
• Reference: Bradski, G. (2000). "The OpenCV Library". Dr. Dobb's Journal of
Software Tools.Source: OpenCV.
5. "Fuzzy Inference Systems in Decision Making“
• Reference: Jang, J. S. R. (1993). "ANFIS: Adaptive-Network-Based Fuzzy Inference
System". IEEE Transactions on Systems, Man, and Cybernetics, 23(3), 665-685.
6. "Analyzing Human Motion Patterns for Injury Prevention in Sports“
• Reference: Le, T. X., & Makihara, Y. (2022). "Analysis of Human Motion in Sports:
From 2D to 3D Data". IEEE Transactions on Biomedical Engineering.
THANK YOU