Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection
Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection
ABSTRACT Phishing is a kind of worldwide spread cybercrime that uses disguised websites to trick
users into downloading malware or providing personally sensitive information to attackers. With the
rapid development of artificial intelligence, more and more researchers in the cybersecurity field utilize
machine learning and deep learning algorithms to classify phishing websites. In order to compare the
performances of various machine learning and deep learning methods, several experiments are conducted in
this study. According to the experimental results, ensemble machine learning algorithms stand out among
other candidates in both detection accuracy and computational consumption. Furthermore, the ensemble
architectures still provide impressive capability when the amount of features decreases sharply in the dataset.
Subsequently, the paper discusses the factors why ensemble machine learning methods are more suitable for
the binary phishing classification challenge in up-date training and real-time detecting environment, which
reflects the sufficiency of ensemble machine learning methods in anti-phishing techniques.
INDEX TERMS Phishing websites detection, machine learning, ensemble learning, deep learning.
I. INTRODUCTION anxiety of the public in the wake of the spread of the virus.
With the expansion of the Internet and the ubiquity of social Emails allegedly providing ways to stop the coronavirus
media, data breaches have consequently emerged as one of outbreak were the most common kind of phishing emails
the main concerns in cyber security fields. Most security employed [1]. In order to boost the likelihood of success,
problems and data breaches are usually caused by malicious phishing attempts that occurred during the pandemic also had
criminals. Phishing is a common form of cybercrime when distinctive features, for instance, the registration of covid-
hackers attempt to lure individuals into divulging private related domains soared during the first months of the pan-
information, such as bank account details, credit card number, demic [2]. Threats on social media continued to escalate,
and even employee login credentials for use in unauthorized with a 47% increase from Q1 to Q2 2022, according to a
access to a specific company. To lure a victim, hackers create recent trends report by the APWG (Anti-Phishing Working
fraudulent messages that seem to come from a trustworthy Group) [3].
person or entity but actually contain disguised links. Then, Artificial Intelligence (AI) is an emerging science, which
they send these fake messages to the targets by email or has captured tremendous attention over the past decades.
instant messages. If the victim is tricked by the malicious link, It investigates how to build intelligent machines that can
confidential data of him or her will be stolen in this cyber creatively find solutions to problems without human inter-
fraud. vention. Machine Learning (ML) is a branch of AI that gives
Since the coronavirus pandemic, people are ordered to machines the capability to automatically learn and make
work remotely, Covid-19-themed phishing attacks have decisions from experience. As a subset of ML, deep learning
spiked. Phishers take advantage of the virus-related fear and (DL) employs neural networks with a structure resembling
the human neural system to analyze a wide range of vari-
The associate editor coordinating the review of this manuscript and ables. Researchers in the cybersecurity domain have con-
approving it for publication was Li He . ducted various AI solutions to detect illegal phishing attacks.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 124103
Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection
A typical AI-based phishing detection procedure is shown in visual similarity [8], machine learning, deep learning, and
FIGURE 1, in which AI techniques can learn and extract fea- hybrid [9]. This section mainly talks about two categories:
tures to classify phishing attacks effectively and efficiently. ML-based phishing detection techniques and DL-based
Existing phishing detection methods usually choose ML or phishing detection approaches in the literature.
DL to detect unknown attacks. Due to its ability to automati-
cally extract features, DL has recently been seen as a promis- A. ML-BASED PHISHING DETECTION
ing phishing detection tool [4]. However, our research found There are supervised, semi-supervised, unsupervised, and
that based on some generally recognized phishing websites reinforcement methods in Machine Learning, the most pop-
features [5], conventional ML methods achieve higher accu- ular one used to detect phishing acts is the supervised
racy and lower false-positive rate. Besides, DL techniques method, where machines try to make intelligent decisions
always suffer from deficiencies in computational constraints by learning certain features of phishing and legitimate sam-
and time complexity. This study is intended to indicate the ple dataset [10]. These kinds of solutions always extract
sufficiency of traditional ML algorithms for phishing URLs features like URLs [11], [12], [13], hyperlinks informa-
detection. tion [14], webpage content [15], [16], hybrid features [17],
In summary, this paper makes the following contributions: and other resources. The performance of these methods typ-
• We evaluated multiple ML algorithms for phishing ically depends on the quality of the dataset, the characteris-
detection empirically and contrasted their performances. tics, and the algorithm employed in the approach [18]. The
• We implemented and evaluated a 3-layer fully connected following are typical ML algorithms used in phishing detec-
neural network (FCNN) model, an LSTM model, and a tion methods: Support Vector Machine, Classification and
CNN model on a dataset. Regression Tree, Random Forest, AdaBoost, Light Gradient
• We analyzed the performances of ML-based methods Boosting Machine. . . etc.
and DL-based methods. Moreover, we discussed the A phishing detection engine using the features extracted
sufficiency of ML-based methods for phishing detection from URLs was proposed by A. Butnaru et al. [13]. They
and provided suggestions for the phishing feature selec- also assessed how well phishing detection performed over
tion approach. time without model training. As a result, their solution works
better than Google Safe Browsing (GSB), which is the default
security tool in most popular web browsers. It is worth men-
tioning that the model performs well against phishing URLs
even after one year. Although the methodology achieves good
performance, the authors are concerned about the robustness
against adversarial attacks, which are frequently exploited
by malevolent entities even when the system produces good
performance.
Jain and Gupta [14] presented a novel method that ana-
lyzes hyperlinks included in the HTML source code of web-
sites to identify phishing assaults. In their feature selection
process, six new features were proposed to increase the
detecting performance, which is also the key contribution in
this work because both processing time and response time
were thus reduced. Moreover, their approach is language-
FIGURE 1. Phishing detection steps by applying AI solutions. independent to detect any textual language webpage.
However, the approach has certain restrictions because it is
The rest of this paper is organized as follows: Section II totally dependent on the website’s source code. If the attack-
presents the previous research employing, respectively, ers change all the page resource references, their method will
ML and DL. Section III introduces and compares three make a false prediction.
published datasets and features. Section IV provides the The performance of an ML-based system heavily depends
detection results by utilizing conventional ML algorithms. on the feature sets. Useless features will increase the cost
In section V, we build several DL models and compare the of storage, time, and power. Feature engineering is crucial
results with ML. Finally, Section VI discusses ML methods’ since traditional ML techniques depend on human expertise
sufficiency for phishing detection and proposes future works for feature extraction and selection. K. L. Chiew et al. [19]
in phishing detection field. introduced a Hybrid Ensemble Feature Selection (HEFS)
framework for ML-based phishing detection systems, where
II. LITERATURE REVIEW major feature subsets are created using a novel Cumulative
Based on the methodologies used, phishing detection solu- Distribution Function gradient (CDF-g) method. By using a
tions can be categorized into many different groups includ- function perturbation, they can get a set of baseline features.
ing blacklist and whitelist [6], heuristic-based method [7], After integrating with Random Forest, the detection accuracy
can achieve 94.6% using only 20.8% of the original number rate of their proposed phishing email detection method can
of features. approach 95%, according to experimental results. In their
The main agenda of our previous work [20] also focuses research, to make the detection system more efficient, they
on the feature selection approach for phishing detection. labeled a small amount of data manually. Based on this
In our proposed framework, existing feature importance small dataset, they used KNN and K-Means to expand it
methods Mean Decrease in Impurity (MDI), Permutation, into the final samples. It is commonly known that DL can
and SHapley Additive explanation (SHAP) are leveraged to manage large amounts of data and when the size of the dataset
obtain a ranking of the importance of features. By assigning increases, DL performs better. However, it is difficult for
different weights to evaluation metrics under various con- researchers to find abundant and appropriate datasets to work
ditions, we can automatically generate the optimal feature with. At the same time, using a single processor to train DL
subsets. According to experimental results, our feature selec- models on such a significant dataset is also a challenge.
tion framework outperforms HEFS [19] on the same dataset. In a recent comprehensive DL-based review in the phishing
Based on the top 10 features we select, detection accuracy detection field [4], Do et al. indicated that Each DL algorithm
achieves 96.83%, which is higher than their results (94.6%) has unique properties that make it ideal for a specific appli-
with 10 baseline features. Both of the feature selection cation. For example, RNN is more appropriate for processing
frameworks above can provide a fully automatic, flexible, sequential data such as natural language and text. When
and robust system to produce high-quality sub-feature sets. analyzing two-dimensional data, such as images and videos,
Furthermore, the framework can be applied to various CNN produces better results. In addition, the main drawback
datasets, which can provide a solution to the problem dis- is that supervised DL requires a massive amount of labeled
cussed in [4] that manual feature engineering is separated instances, which adds a high level of computational complex-
from classification tasks in conventional ML models. ity to the detection system [31]. Additionally, DL models are
unable to justify the inference they draw. It would be tough
B. DL-BASED PHISHING DETECTION to comprehend the relationship between input attributes and
It is precisely because of its capability to find hidden infor- output decisions [32].
mation in complicated datasets, DL has recently emerged as
a viable substitute for traditional ML techniques. In order III. DATASET AND FEATURES
to enhance the effectiveness of phishing detection solutions, Several high-quality phishing datasets are widely used by
various DL-based approaches have been applied. Popular various authors in their research, such as UCI_2015 [33],
DL algorithms used in phishing detection include Multi- Mendeley_2018 [34], and Mendeley_2020 [35]. Phishing
Layer Perceptron (MLP) [21], [22], Long Short-Term Mem- instances are usually derived from PhishTank [36], which
ory (LSTM) [23], Convolutional Neural Network (CNN) is a cooperative repository for data and information about
[24], [25], Recurrent Neural Network (RNN) [26], [27], and phishing attacks on the Internet. Other legitimate instances
hybrid [28]. . . etc. are from Alexa, DMOZ, and Common Crawl. Features
Yerima and Alzaylaee [25] presented a DL-based approach used in phishing detection are usually extracted from URLs
with high detecting accuracy, where CNN is utilized to (protocol, domain, path, parameter shown in FIGURE 2)
distinguish legitimate websites from phishing websites. and other external resources. In this section, we will give an
A 1D-CNN model with two convolutional layers, two introduction and comparison of these three popular phishing
max-pooling layers, and one fully connected layer was con- datasets.
structed in their method. The model surpassed several pop-
ular machine learning classifiers, according to testing on a
benchmarked dataset of 4,898 examples from phishing web-
sites and 6,157 instances from reliable websites. However,
to fine-tune the important impacting parameters (i.e. num-
ber of filters, filter lengths, and the number of fully con-
nected units), they conducted a series of experiments. This
FIGURE 2. An example of URL structure.
time-consuming and labor-intensive procedure is frequently
observed in DL-based methods [29], [30].
Li et al. [23] proposed an LSTM-based phishing detection
method for big email data which consists of two important A. UCI_2015
stages: sample expansion stage and testing stage. To suit the University California Irvine Machine Learning Repository
needs of in-depth learning, sufficient training samples should (UCI) is a common repository that contains both fraudu-
be provided, they merged KNN with K-Means in the sample lent and trustworthy website URLs, which is popular among
expansion stage. Prior to testing, they preprocessed the data phishing detection researchers [4], [37], [38]. The dataset
by generalizing, word segmenting, and creating word vectors. was donated in 2015 and collected primarily from PhishTank
The LSTM model was then trained using the preprocessed and MillerSmiles archives. The dataset comprises 30 fea-
data. Finally, they categorized phishing emails. The accuracy tures and 11055 instances (6157 legitimate websites and
B. MENDELEY_2018
48 features are contained in the dataset Mendeley_2018,
which includes 5000 malicious and 5000 legitimate instances.
The legal websites are derived from Alexa and common
crawl, whereas phishing instances are from PhishTank and
OpenPhish. Based on this dataset, L. Chiew et al. [19]
proposed the HEFS framework mentioned in Section II.
Table 2 shows a list of features in Mendeley_2018.
C. MENDELEY_2020
Dataset Mendeley_2020 is the primary dataset utilized in our
research, which consists of two sub-datasets: dataset_full and
dataset_small. There are 88647 instances in the full dataset
and 58645 instances in the small dataset. Data were collected
from PhishTank and Alexa ranking. This dataset contains
111 features, for better understanding, we redivided them into
8 groups. Two sub-datasets are illustrated in FIGURE 3, and
the descriptions are explained in Table 3. FIGURE 3. Dataset Mendeley_2020.
D. COMPARISON
Comparisons among the three datasets are provided in
Table 4 and FIGURE 4. As shown in TABLE 4, there are First, traditional ML algorithms including K-Means Clus-
more instances in dataset Mendeley_2020, even eight times as tering (KMeans), Support Vector Machine (SVM), Naive
many as in datasets UCI_2015 and Mendeley_2018. In addi- Bayes Classifier (NB), K-Nearest Neighbor (KNN), Logistic
tion, all features in dataset UCI_2015 were transformed into Regression (LR), Linear Discriminant Analysis (LDA), Clas-
Boolean type based on specified rules, making it difficult for sification and Regression Tree (CART), and Random Forest
further analysis. Dataset Mendeley_2020 was selected in our (RF) were utilized to classify. Then, results by using ensemble
research for its quantity in instances and features. ML methods including RF, AdaBoost, GBDT, XGBoost, and
LightGBM were compared in the second sub-section. The
IV. ML-BASED PHISHING DETECTION RESULTS same as most studies [4], [14] performance was analyzed
In this section, we performed an empirical analysis of using Accuracy, Precision, Recall, F1 score, ROC Curve, and
various traditional ML algorithms for phishing detection. P-R Curve.
B. ENSEMBLE ML ALGORITHMS
The learning algorithms known as ‘‘ensemble ML methods’’
FIGURE 4. Number of instances in three phishing datasets. classify new data by performing a (weighted) vote on the
predictions made by each classifier [39]. They are consid-
ered as the state-of-the-art solutions for many ML chal-
A. TRADITIONAL ML ALGORITHMS lenges [40]. We implemented 5 ensemble ML methods on
On Jupyter Notebook (6.4.3), all of the models were trained the dataset including AdaBoost, Gradient Boosted Decision
using the scikit-learn (1.1.2) library with Python (3.8.11) Trees (GBDT), LightGBM (version 3.3.3), Histogram-Based
programming language. We used 10-fold cross-validation Gradient Boosting (HGB), and the most popular ensemble
in our studies on the full dataset in Mendeley_2020. The method Random Forest (RF). In this experiment, we split the
performances are provided in Table 5, ROC Curves and original dataset into two parts, using 70% for training and
P-R Curves are illustrated in FIGURE 5 and FIGURE 6. As a 30% for testing.
result, RF shows the best performance on all metrics with Performances are provided in Table 6 and ROC curves are
a 97.01% accuracy rate. As can be seen from the graphs, illustrated in FIGURE 7, where RF outperforms other meth-
the highest value of Area Under Curve (AUC) belongs to ods in both accuracy rate and AUC value. LightGBM shows
RF, which means that it can separate the positive class and its high efficiency with minimum training and testing time
negative class correctly. Besides, RF presents the ability to consumption. We can conclude that ensemble ML methods,
TABLE 5. Performance metrics of various traditional ML algorithms. designed to prevent the long-term dependency problem [43].
CNN is renowned for its ability to recognize simple patterns
in a multi-dimensional task, and as a result, it has had success
processing 2D signals like images and video frames [25].
However, a 1D CNN model can also be used to process
datasets with a one-dimensional structure. [44]. In the fol-
lowing subsections, the experiment setup and data division
are described, following the result and comparison.
A. EXPERIMENTAL SETUP
We built three DL-based models by using Python (3.8.11)
with Tensorflow (2.9.1) and Keras library (2.9.0) on Jupyter
Notebook (6.4.3). The dataset was divided into three parts:
training dataset, validation dataset, and test dataset. The train
dataset is 80% of the original dataset, and 20% is the test
dataset. Furthermore, 10% of the train dataset is used as a
validation dataset shown in FIGURE 8.
in particular the boosting methods, tend to achieve the best Fully connected layers are usually used for classification,
performance in phishing classification. in order to build the FCNN model, it is essential to decide
the number of layers, we set different layers to observe the
V. DL-BASED PHISHING DETECTION RESULTS changes in accuracy and loss on the validation dataset as
The goal of this section is to assess the performance of shown in FIGURE 9. When the number of layers rises, the
current popular DL-based methods including FCNN, LSTM, accuracy rate and loss are basically flat, and the validation
and CNN. Fully Connected Neural Networks (FCNN) are accuracy rate is at its highst (0.9403) when the number of
constituted by a sequence of completely connected layers layers is 3.
that have the primary advantage of being ‘‘structure agnos- Overfitting occurs when the number of layers is 20 in
tic,’’ meaning that no special assumptions about the input FIGURE 10, which indicates that the model fits perfectly
are required [41]. LSTM is a particularly unique type of against its training data but fails to perform accurately against
Recurrent Neural Network (RNN) that performs significantly the unseen (test) dataset, violating its purpose.
better than the normal version. It was introduced by Hochre- We built our 3-layers FCNN model after determining the
iter and Schmidhuber [42] and several researchers have epochs by using early stopping (FIGURE 11). The final
since improved and popularized it. LSTMs are specifically model could be illustrated in FIGURE 12.
FIGURE 10. Overfitting occurs in the 20-layers FCNN model. FIGURE 12. Our 3-layers FCNN model.
function declines. A large gap between training outputs and rate 96.94%, whereas that of CNN, FCNN, and LSTM are
validation outputs is commonly considered as overfitting, 91.38% 90.13%, and 89.73%, respectively. This result casts a
which typically happens when the model entirely memorizes new light on the performance of RF model. Third, RF model
data patterns, noise, and other random fluctuations, causing has the lowest training time, which is sensible because the
it fits too closely to the training set [45]. This phenomenon computation complexity of DL-based models is always high.
appears in CNN model visibly in FIGURE 13. Note that we only record the training time cost of its best fine-
Table 8 summarizes the evaluation results acquired from tuning state for each individual model. Furthermore, we also
the experiments. Evaluation metrics consist of training time conducted an experiment to compare the performances of
consumption, precision, recall, AUC, and accuracy. From the the selected features against full features on dataset_full.
table, we observed the following phenomenon that needs to be Results showed that RF only experiences a minimal accuracy
emphasized. First, all the classifiers perform better when data deterioration of 0.1% (96.94% to 96.84%) while achieving a
is getting bigger from dataset_small to dataset_full, which massive reduction in the dataset. Compared to RF, DL models
indicates that significant datasets are typically necessary for suffer from serious decreases in testing accuracy rate with
AI to reach high accuracy. Second, it is surprising that RF out- selected features. FIGURE 14 also presents ROC Curves
performs other DL models with the highest testing accuracy of the 4 classifiers, where lower plots are larger versions
FIGURE 14. ROC curves of RF, CNN, LSTM, and FCNN on three different datasets.
zooming in at the top left. The curves and Area Under Ensemble ML techniques represented by RF are usually
the ROC Curve (AUC) values offer a more comprehen- regarded as a crystallization of wisdom of various ML meth-
sive insight into the performances of the models. In every ods. In ensemble methods, by combining different models,
graph, RF clearly shows incomparable curves against other the risk of selecting an improper decision is reduced, and thus,
DL models. the forecast performance is improved. In our experiments,
As a result, the evaluation results have validated that RF CART, RF, and Boosting methods obtain better performances
is advantageous and highly effective when working with in phishing classification. This is potentially due to these
selected features and real-time applications in distinguishing ensemble methods benefit from the dynamic changing of
between legitimate and phishing websites. The implications assigned weight to each instance in the iteration process, mak-
of these findings are discussed in the following Section to ing it more robust and stable than traditional ML algorithms.
highlight the sufficiency of ensemble ML methods in phish- For instance, AdaBoost’s basic principle is to concentrate on
ing detection and navigate the future directions. cases that were previously incorrectly classified when train-
ing a new inducer [40]. In the initial iteration, each instance
VI. DISCUSSION AND CONCLUSION is given the same weight, after which the weights of incor-
Previous sections have compared classification performances rectly categorized instances increase and those of correctly
of various ML models and DL models. In this Section, identified examples decrease. Additionally, based on their
we discuss the advantages and disadvantages between the two total prediction performances, the individual basic learners
groups and draw our conclusion. are also given voting weights. Hence, ensemble ML methods
Deep Learning is considered to be the state-of-the-art decrease both bias and variance of variable techniques while
solution to various problems with the advantages of deal- increasing the variance for stable classifiers, making them
ing with big data and generating features automatically more suitable for classification tasks.
over Machine Learning. However, model architecture design, As a typical binary classification problem, ML-based
manual parameter tuning, high training time costs, computa- phishing detection solutions are questioned on the ability to
tional complexity, and deficient accuracy performance are the handle big data and extract features. Researchers believe that
most prevalent problems with DL approaches, as discussed the process of feature selection relies on professional knowl-
in Section V. edge and reduplicative experiments, which is considered
to be tedious, labor-intensive, and susceptible to human [11] H. Tupsamudre, A. K. Singh, and S. Lodha, ‘‘Everything is in the name—
mistakes [4]. However, this problem can be effectively and A URL based approach for phishing detection,’’ in Cyber Security Cryp-
tography and Machine Learning (Lecture Notes in Computer Science).
efficiently resolved by utilizing automatic feature selec- Cham, Switzerland: Springer, 2019, pp. 231–248, doi: 10.1007/978-3-030-
tion methods, for example, our feature selection framework 20951-3_21.
achieves a remarkable 87.6% reduction in feature quantity [12] E. S. Aung and H. Yamana, ‘‘URL-based phishing detection using the
entropy of non-alphanumeric characters,’’ in Proc. 21st Int. Conf. Inf.
with suffering from only a 0.1% deterioration in detecting Integr. Web-based Appl. Services, New York, NY, USA, Dec. 2019,
accuracy, making it possible for up-date training and real- pp. 385–392, doi: 10.1145/3366030.3366064.
time detecting in a production environment. In another hand, [13] A. Butnaru, A. Mylonas, and N. Pitropakis, ‘‘Towards lightweight URL-
based phishing detection,’’ Future Internet, vol. 13, no. 6, p. 154, Jun. 2021,
phishers are also employing the latest schemes to execute doi: 10.3390/fi13060154.
attacks, phishing features are under evolution constantly. The [14] A. K. Jain and B. B. Gupta, ‘‘A machine learning based approach for phish-
phishing websites features cannot be generated once and ing detection using hyperlinks information,’’ J. Ambient Intell. Humanized
Comput., vol. 10, no. 5, pp. 2015–2028, May 2019, doi: 10.1007/s12652-
for all, conversely, it should be a continuous updating and 018-0798-z.
accumulating process, in which researchers are supposed to [15] U. Ozker and O. K. Sahingoz, ‘‘Content based phishing detection with
pay efforts. machine learning,’’ in Proc. Int. Conf. Electr. Eng. (ICEE), Sep. 2020,
pp. 1–6, doi: 10.1109/ICEE49691.2020.9249892.
To sum up, our experiments and discussions offer a signif- [16] A. K. Jain, S. Parashar, P. Katare, and I. Sharma, ‘‘PhishSKaPe: A content
icant insight into the sufficiency of ensemble ML methods based approach to escape phishing attacks,’’ Proc. Comput. Sci., vol. 171,
for anti-phishing techniques. As for future work, we will pp. 1102–1109, Jan. 2020, doi: 10.1016/j.procs.2020.04.118.
[17] M. M. Yadollahi, F. Shoeleh, E. Serkani, A. Madani, and H. Gharaee,
validate our conclusion on various datasets with more fea- ‘‘An adaptive machine learning based approach for phishing detection
tures and more instances. In addition, further efforts need to using hybrid features,’’ in Proc. 5th Int. Conf. Web Res. (ICWR), Apr. 2019,
be taken to avoid the inefficiency when detecting zero-day pp. 281–286, doi: 10.1109/ICWR.2019.8765265.
[18] R. Zaimi, M. Hafidi, and M. Lamia, ‘‘Survey paper: Taxonomy of
attacks. We plan to extract features of the latest phishing web- website anti-phishing solutions,’’ in Proc. 7th Int. Conf. Social
sites and train our ensemble ML method at intervals. Then, Netw. Anal., Manag. Secur. (SNAMS), Dec. 2020, pp. 1–8, doi:
by observing the variation trends in newly evolving phishing 10.1109/SNAMS52053.2020.9336559.
[19] K. L. Chiew, C. L. Tan, K. Wong, K. S. C. Yong, and W. K. Tiong, ‘‘A new
patterns, we would like to find a balanced renewal frequency hybrid ensemble feature selection framework for machine learning-based
for extracting features and training models to maintain high phishing detection system,’’ Inf. Sci., vol. 484, pp. 153–166, May 2019,
detection accuracy. Last but not least, as a practical tool, doi: 10.1016/j.ins.2019.01.064.
[20] Y. Wei and Y. Sekiya, ‘‘Feature selection approach for phishing detection
a phishing detection architecture is supposed to be deployed based on machine learning,’’ in Proc. Int. Conf. Appl. CyberSecurity (ACS),
in a real-world production environment (e.g. web browser) to 2021, pp. 61–70, doi: 10.1007/978-3-030-95918-0_7.
verify its effectiveness against phishing attacks eventually. [21] S. Al-Ahmadi. (2020). PDMLP: Phishing Detection Using Multilayer Per-
ceptron. Rochester, NY, USA. Accessed: Sep. 1, 2022. [Online]. Available:
https://papers.ssrn.com/abstract=3624621
REFERENCES [22] A. Odeh, I. Keshta, and E. Abdelfattah. (2020). Efficient Detection of
[1] N. Akdemir and S. Yenal, ‘‘How phishers exploit the coronavirus pan- Phishing Websites Using Multilayer Perceptron. International Association
demic: A content analysis of COVID-19 themed phishing emails,’’ of Online Engineering. Accessed: Sep. 30, 2022. [Online]. Available:
SAGE Open, vol. 11, no. 3, Jul. 2021, Art. no. 21582440211031880, doi: https://www.learntechlib.org/p/217754/
10.1177/21582440211031879. [23] Q. Li, M. Cheng, J. Wang, and B. Sun, ‘‘LSTM based phishing detection
for big email data,’’ IEEE Trans. Big Data, vol. 8, no. 1, pp. 278–288,
[2] A. F. Al-Qahtani and S. Cresci, ‘‘The COVID-19 scamdemic: A survey of
Feb. 2022, doi: 10.1109/TBDATA.2020.2978915.
phishing attacks and their countermeasures during COVID-19,’’ IET Inf.
[24] M. A. Adebowale, K. T. Lwin, and M. A. Hossain, ‘‘Deep learning with
Secur., vol. 16, no. 5, pp. 324–345, Sep. 2022, doi: 10.1049/ise2.12073.
convolutional neural network and long short-term memory for phishing
[3] APWG | Phishing Activity Trends Reports. Accessed: Sep. 28, 2022.
detection,’’ in Proc. 13th Int. Conf. Softw., Knowl., Inf. Manag. Appl.
[Online]. Available: https://apwg.org/trendsreports/
(SKIMA), Aug. 2019, pp. 1–8, doi: 10.1109/SKIMA47702.2019.8982427.
[4] N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, [25] S. Y. Yerima and M. K. Alzaylaee, ‘‘High accuracy phishing detec-
‘‘Deep learning for phishing detection: Taxonomy, current challenges and tion based on convolutional neural networks,’’ in Proc. 3rd Int.
future directions,’’ IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: Conf. Comput. Appl. Inf. Secur. (ICCAIS), Mar. 2020, pp. 1–6, doi:
10.1109/ACCESS.2022.3151903. 10.1109/ICCAIS48893.2020.9096869.
[5] Phishing Websites Features.pdf. Accessed: Sep. 28, 2022. [26] Y. Su, ‘‘Research on website phishing detection based on
[Online]. Available: http://eprints.hud.ac.uk/id/eprint/24330/6/ LSTM RNN,’’ in Proc. IEEE 4th Inf. Technol., Netw., Electron.
MohammadPhishing14July2015.pdf Autom. Control Conf. (ITNEC), Jun. 2020, pp. 284–288, doi:
[6] A. K. Jain and B. B. Gupta, ‘‘A novel approach to protect against phishing 10.1109/ITNEC48623.2020.9084799.
attacks at client side using auto-updated white-list,’’ EURASIP J. Inf. [27] T. Feng and C. Yue, ‘‘Visualizing and interpreting RNN models in URL-
Secur., vol. 2016, no. 1, pp. 1–11, May 2016, doi: 10.1186/s13635-016- based phishing detection,’’ in Proc. 25th ACM Symp. Access Control
0034-3. Models Technol., Jun. 2020, pp. 13–24, doi: 10.1145/3381991.3395602.
[7] A. A. Zuraiq and M. Alkasassbeh, ‘‘Review: Phishing detection [28] Y. Lin. (2021). Phishpedia: A Hybrid Deep Learning Based
approaches,’’ in Proc. 2nd Int. Conf. new Trends Comput. Sci. (ICTCS), Approach to Visually Identify Phishing Webpages. Accessed:
Oct. 2019, pp. 1–6, doi: 10.1109/ICTCS.2019.8923069. Sep. 30, 2022. [Online]. Available: https://www.usenix.org/conference/
[8] S. Abdelnabi, K. Krombholz, and M. Fritz, ‘‘VisualPhishNet: Zero- usenixsecurity21/presentation/lin
day phishing website detection by visual similarity,’’ in Proc. ACM [29] W. Wei, Q. Ke, J. Nowak, M. Korytkowski, R. Scherer, and M. Wozniak,
SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, Oct. 2020, ‘‘Accurate and fast URL phishing detector: A convolutional neural network
pp. 1681–1698, doi: 10.1145/3372297.3417233. approach,’’ Comput. Netw., vol. 178, Sep. 2020, Art. no. 107275, doi:
[9] A. Ozcan, C. Catal, E. Donmez, and B. Senturk, ‘‘A hybrid DNN–LSTM 10.1016/j.comnet.2020.107275.
model for detecting phishing URLs,’’ Neural Comput. Appl., vol. 33, [30] S. Mahdavifar and A. A. Ghorbani, ‘‘DeNNeS: Deep embedded neu-
pp. 1–17, Aug. 2021, doi: 10.1007/s00521-021-06401-z. ral network expert system for detecting cyber attacks,’’ Neural Comput.
[10] V. Shahrivari, M. M. Darabi, and M. Izadi, ‘‘Phishing detection using Appl., vol. 32, no. 18, pp. 14753–14780, 2020, doi: 10.1007/s00521-020-
machine learning techniques,’’ 2020, arXiv:2009.11116. 04830-w.
[31] S. Mahdavifar and A. A. Ghorbani, ‘‘Application of deep learning to cyber- YI WEI received the B.E. degree from the College
security: A survey,’’ Neurocomputing, vol. 347, pp. 149–176, Jun. 2019, of Computer Science and Electronic Engineering,
doi: 10.1016/j.neucom.2019.02.056. Hunan University, China, in 2018, and the M.E.
[32] A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, ‘‘SoK: A com- degree in electrical engineering and information
prehensive reexamination of phishing research from the security perspec- systems from The University of Tokyo, Tokyo,
tive,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 1, pp. 671–708, 1rst Japan, in 2021, where she is currently pursuing the
Quart., 2020, doi: 10.1109/COMST.2019.2957750. Ph.D. degree in electrical engineering and infor-
[33] UCI Machine Learning Repository: Phishing Websites Data Set.
mation systems.
Accessed: Oct. 1, 2022. [Online]. Available: https://archive.ics.uci.edu/ml/
Since August 2021, she has been a Technical
datasets/phishing+websites
[34] C. L. Tan, ‘‘Phishing dataset for machine learning: Feature evaluation,’’ Assistant with the Security Informatics Education
Mendeley Data, vol. 1, Mar. 2018, doi: 10.17632/h3cgnj8hft.1. and Research Center and the Graduate School of Information Science and
[35] G. Vrbancic, I. Fister, and V. Podgorelec, ‘‘Datasets for phishing web- Technology, The University of Tokyo. Her research interests include phish-
sites detection,’’ Data Brief, vol. 33, Dec. 2020, Art. no. 106438, doi: ing website detection by using machine learning and deep learning, feature
10.1016/j.dib.2020.106438. selection approach for dimensionality reduction, and applications of future
[36] PhishTank | Join the Fight Against Phishing. Accessed: Oct. 1, 2022. quantum machine learning algorithms in the cybersecurity field.
[Online]. Available: https://phishtank.org/
[37] G. H. Lokesh and G. BoreGowda, ‘‘Phishing website detection based on
effective machine learning approach,’’ J. Cyber Secur. Technol., vol. 5,
no. 1, pp. 1–14, Jan. 2021, doi: 10.1080/23742917.2020.1813396.
[38] A. Lakshmanarao, P. S. P. Rao, and M. M. B. Krishna, ‘‘Phishing website
detection using novel machine learning fusion approach,’’ in Proc. Int.
Conf. Artif. Intell. Smart Syst. (ICAIS), Mar. 2021, pp. 1164–1169, doi:
10.1109/ICAIS50930.2021.9395810.
[39] T. G. Dietterich, ‘‘Ensemble methods in machine learning,’’ in Multi-
ple Classifier Systems. Berlin, Germany: Springer, 2000, pp. 1–15, doi:
10.1007/3-540-45014-9_1.
[40] O. Sagi and L. Rokach, ‘‘Ensemble learning: A survey,’’ WIREs Data YUJI SEKIYA received the B.E. degree from Kyoto
Mining Knowl. Discovery, vol. 8, no. 4, p. e1249, Jul. 2018, doi: University, in 1997, and the M.E. degree and
10.1002/widm.1249. the Ph.D. degree in media and governance from
[41] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, ‘‘Convolutional, long Keio University, Tokyo, Japan, in 1999 and 2005,
short-term memory, fully connected deep neural networks,’’ in Proc. respectively.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2015, Since October 1999, he has been working as
pp. 4580–4584, doi: 10.1109/ICASSP.2015.7178838. a Visiting Researcher at USC/ISI for six months.
[42] Long Short-Term Memory | Neural Computation | MIT Press. Since 2002, he has also been working at the
Accessed: Oct. 2, 2022. [Online]. Available: https://direct.mit.edu/ Information Technology Center, The University of
neco/article/9/8/1735/6109/Long-Short-Term-Memory Tokyo, where he is currently a Professor at the
[43] A. Sherstinsky, ‘‘Fundamentals of recurrent neural network
Graduate School of Information Science and Technology and working as a
(RNN) and long short-term memory (LSTM) network,’’ Phys. D,
member of the Security Informatics Education and Research Center. He has
Nonlinear Phenomena, vol. 404, Mar. 2020, Art. no. 132306, doi:
10.1016/j.physd.2019.132306. been working on DNS measurements and security, SDN, network virtualiza-
[44] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and tion, cloud computing, and cyber security. As society activities, he is deeply
D. J. Inman, ‘‘1D convolutional neural networks and applications: A sur- involved in WIDE Project, M Root DNS server, JP DNS servers, Internet
vey,’’ Mech. Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398, Exchanges called, DIX-IE, PIX-IE, and NSPIXP-3, NECOMA Project, and
doi: 10.1016/j.ymssp.2020.107398. Interop Tokyo ShowNet. He has been in-charge of the Executive Advisor to
[45] X. Ying, ‘‘An overview of overfitting and its solutions,’’ J. Phys., the Japanese Government CIO, since February 2020, and also in-charge of a
Conf., vol. 1168, Feb. 2019, Art. no. 022022, doi: 10.1088/1742- Senior Network Engineer at Digital Agency of Japanese Government.
6596/1168/2/022022.