Real Time Phishing Website Detectionusing ML
Real Time Phishing Website Detectionusing ML
ISSN No:-2456-2165
Abstract:- Phishing involves fraudulent activities where architectures have greatly enhanced the precision of
attackers impersonate trustworthy websites to phishing detection. Algorithms including Random Forest
unlawfully obtain private information, including and SVM, and neural networks are widely applied, each
usernames, passwords, and financial details. Traditional offering unique advantages in handling complex data.
detection methods, including blacklists and heuristic-
based approaches, struggles identifying new, evolving II. LITERATURE SURVEY
phishing sites. In recent times, AI using machine
learning (ML) has emerged as a powerful tool for The literature on machine learning-based phishing
phishing detection, offering predictive capabilities that detection shows both advancements and ongoing challenges
adapt to changing attack patterns. This survey examines in the areas of feature extraction, detection efficiency, and
state- of-the-art ML techniques for phishing website adaptability to new phishing techniques. Below, we review
detection, covering feature extraction, model types, and significant studies on feature-based detection, deep learning
challenges in data handling. Through analyzing recent methods, ensemble models, and hybrid approaches.
methodologies, this paper highlights the strengths and
limitations of various ML models and proposes Feature-Based identification through Machine Learning:
directions for further improving phishing detection Sarma et al. (2021) conducted a detailed analysis of
systems. machine learning methods applied to phishing
prevention, focusing on Random Forest (RF) and
Keywords:- Phishing Detection, Machine Learning, (SVM), and K-nearest neighbors (KNN). Among these,
Cybersecurity, Feature Extraction, Classification Models, RF showed the highest accuracy (98%) in distinguishing
URL Analysis. phishing from legitimate sites due to its handling of
complex features like URL structure, domain age, and
I. INTRODUCTION HTTPS status. This study underscores the importance of
well-chosen features but also highlights challenges in
Phishing is one among the top widespread and adapting models to new phishing
deceptive forms of cybercrime, targeting users to obtain patterns(Sarma2021_Chapter_Compa…).
secure data, such as account credentials, financial data, or
personal identity details. Attackers accomplish this by Machine Learning in Phishing Lifecycle Detection: Tang
creating false sites mirroring the appearance of legitimate and Mahmoud (2021) analyzed ML techniques at
ones, often exploiting human psychology through urgent or different stages of phishing attacks, such as URL
enticing messages. These attacks have evolved significantly analysis, feature extraction, and classification. They
over the years, becoming more sophisticated and harder to noted that each phase benefits from specific ML models:
detect, especially as the internet expands in both user base decision trees are effective in feature extraction, while
and functionality. Traditional methods, such as blacklists neural networks can identify deeper patterns. The study
and heuristic-based detection, offer some protection by suggests that a multi-stage ML framework enhances
filtering known phishing sites or using basic rule-based detection accuracy, but real-time deployment remains
criteria. However, these techniques are inherently limited: challenging due to high computational costs(make-03-
blacklists cannot identify newly emerging phishing sites, 00034 (1)).
and heuristic rules are often bypassed by attackers who
adjust tactics to avoid detection. Deep Learning and Convolutional Neural Networks
(CNNs): Odeh et al. (2021) explored advanced deep
The advent of machine learning (ML) has proven to be learning architectures like Convolutional Neural
a promising solution to these limitations, bringing predictive Networks (CNNs) and Long Short-Term Memory
capabilities that allow systems to recognize phishing (LSTM) networks networks, to improve phishing
attempts based on patterns rather than specific pre-identified detection. CNNs process URLs and web content to detect
threats. By analyzing numerous characteristics—such as phishing patterns more accurately but at a higher
URL structure, domain registration details, and website computational cost. The authors conclude that while
content—ML algorithms can classify websites as legitimate CNNs improve detection rates, a hybrid approach may
or phishing featuring an elevated degree of accuracy. In balance accuracy and efficiency more effectively in
recent times, advances in a combination of conventional resource- constrained environments(2020013989).
machine learning techniques and advanced deep learning
Real-Time Detection: The hybrid ML model aims to The Methodology of the Proposed System Involves
achieve faster detection suitable for real-time Several Stages:
applications.
Improved Accuracy: By combining deep learning with Data Collection: Collect URL data and webpage content
feature-based methods .The model can achieve improved from sources like PhishTank and OpenPhish for phishing
detection rates with reduced false alarms. sites and Alexa for legitimate sites.
Adaptability: The model’s design allows it to adapt to Feature Extraction: Identify key features, including
emerging phishing tactics, improving its relevance in URL length, domain age, and HTTPS presence. Extract
dynamic online environments. visual and structural features for deep learning models.
Scalability: The use of ensemble methods and Model Training: Train various ML classifiers, such as
dimensionality reduction enables efficient handling of Random Forest, SVM, CNN, and LSTM, on labeled
large datasets, essential for real-world deployment. data. Fine-tune models through cross-validation to
optimize accuracy.
Ensemble Learning: Apply ensemble methods by
combining RF with PCA to minimize data complexity
while preserving high accuracy.
Evaluation: Assess models using metrics like accuracy,
precision, recall, and F1 score. Compare performance
across models to determine the optimal configuration.
Creating a Fake Website: Social media and messaging apps are commonly used,
expanding the reach of these phishing attempts.
Attackers build a phishing site that closely resembles a These messages often create urgency, using language
legitimate website, often using similar logos, colors, and that pressures users to click, such as warnings about
layout. account suspensions or overdue payments.
To deceive users, attackers may alter the URL subtly,
like using slight spelling changes or similar characters. Collecting User Information:
For instance, a fake URL might look like "aimazon"
insteadof "amazon." Once users click the phishing link, they’re taken to the
fake website, where they’re asked to enter secure data
Delivering the Phishing Link: such as login credentials, or payment details.
The phishing site may mimic login or payment pages to
Attackers send out links to the fake site, often through make the experience feel authentic.
emails, SMS, voice messages, or QR codes.
VIII. CONCLUSION
REFERENCES