0% found this document useful (0 votes)

8 views11 pages

Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection

Uploaded by

sathwikaveladi29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection

Uploaded by

sathwikaveladi29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received 4 November 2022, accepted 18 November 2022, date of publication 24 November 2022,

date of current version 30 November 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3224781

Sufficiency of Ensemble Machine Learning

Methods for Phishing Websites Detection
YI WEI 1 AND YUJI SEKIYA2
1 Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo 113-8656, Japan
2 Security Informatics Education and Research Center, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan

Corresponding authors: Yi Wei ([email protected]) and Yuji Sekiya ([email protected])

ABSTRACT Phishing is a kind of worldwide spread cybercrime that uses disguised websites to trick
users into downloading malware or providing personally sensitive information to attackers. With the
rapid development of artificial intelligence, more and more researchers in the cybersecurity field utilize
machine learning and deep learning algorithms to classify phishing websites. In order to compare the
performances of various machine learning and deep learning methods, several experiments are conducted in
this study. According to the experimental results, ensemble machine learning algorithms stand out among
other candidates in both detection accuracy and computational consumption. Furthermore, the ensemble
architectures still provide impressive capability when the amount of features decreases sharply in the dataset.
Subsequently, the paper discusses the factors why ensemble machine learning methods are more suitable for
the binary phishing classification challenge in up-date training and real-time detecting environment, which
reflects the sufficiency of ensemble machine learning methods in anti-phishing techniques.

INDEX TERMS Phishing websites detection, machine learning, ensemble learning, deep learning.

I. INTRODUCTION anxiety of the public in the wake of the spread of the virus.
With the expansion of the Internet and the ubiquity of social Emails allegedly providing ways to stop the coronavirus
media, data breaches have consequently emerged as one of outbreak were the most common kind of phishing emails
the main concerns in cyber security fields. Most security employed [1]. In order to boost the likelihood of success,
problems and data breaches are usually caused by malicious phishing attempts that occurred during the pandemic also had
criminals. Phishing is a common form of cybercrime when distinctive features, for instance, the registration of covid-
hackers attempt to lure individuals into divulging private related domains soared during the first months of the pan-
information, such as bank account details, credit card number, demic [2]. Threats on social media continued to escalate,
and even employee login credentials for use in unauthorized with a 47% increase from Q1 to Q2 2022, according to a
access to a specific company. To lure a victim, hackers create recent trends report by the APWG (Anti-Phishing Working
fraudulent messages that seem to come from a trustworthy Group) [3].
person or entity but actually contain disguised links. Then, Artificial Intelligence (AI) is an emerging science, which
they send these fake messages to the targets by email or has captured tremendous attention over the past decades.
instant messages. If the victim is tricked by the malicious link, It investigates how to build intelligent machines that can
confidential data of him or her will be stolen in this cyber creatively find solutions to problems without human inter-
fraud. vention. Machine Learning (ML) is a branch of AI that gives
Since the coronavirus pandemic, people are ordered to machines the capability to automatically learn and make
work remotely, Covid-19-themed phishing attacks have decisions from experience. As a subset of ML, deep learning
spiked. Phishers take advantage of the virus-related fear and (DL) employs neural networks with a structure resembling
the human neural system to analyze a wide range of vari-
The associate editor coordinating the review of this manuscript and ables. Researchers in the cybersecurity domain have con-
approving it for publication was Li He . ducted various AI solutions to detect illegal phishing attacks.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 124103
Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

A typical AI-based phishing detection procedure is shown in visual similarity [8], machine learning, deep learning, and
FIGURE 1, in which AI techniques can learn and extract fea- hybrid [9]. This section mainly talks about two categories:
tures to classify phishing attacks effectively and efficiently. ML-based phishing detection techniques and DL-based
Existing phishing detection methods usually choose ML or phishing detection approaches in the literature.
DL to detect unknown attacks. Due to its ability to automati-
cally extract features, DL has recently been seen as a promis- A. ML-BASED PHISHING DETECTION
ing phishing detection tool [4]. However, our research found There are supervised, semi-supervised, unsupervised, and
that based on some generally recognized phishing websites reinforcement methods in Machine Learning, the most pop-
features [5], conventional ML methods achieve higher accu- ular one used to detect phishing acts is the supervised
racy and lower false-positive rate. Besides, DL techniques method, where machines try to make intelligent decisions
always suffer from deficiencies in computational constraints by learning certain features of phishing and legitimate sam-
and time complexity. This study is intended to indicate the ple dataset [10]. These kinds of solutions always extract
sufficiency of traditional ML algorithms for phishing URLs features like URLs [11], [12], [13], hyperlinks informa-
detection. tion [14], webpage content [15], [16], hybrid features [17],
In summary, this paper makes the following contributions: and other resources. The performance of these methods typ-
• We evaluated multiple ML algorithms for phishing ically depends on the quality of the dataset, the characteris-
detection empirically and contrasted their performances. tics, and the algorithm employed in the approach [18]. The
• We implemented and evaluated a 3-layer fully connected following are typical ML algorithms used in phishing detec-
neural network (FCNN) model, an LSTM model, and a tion methods: Support Vector Machine, Classification and
CNN model on a dataset. Regression Tree, Random Forest, AdaBoost, Light Gradient
• We analyzed the performances of ML-based methods Boosting Machine. . . etc.
and DL-based methods. Moreover, we discussed the A phishing detection engine using the features extracted
sufficiency of ML-based methods for phishing detection from URLs was proposed by A. Butnaru et al. [13]. They
and provided suggestions for the phishing feature selec- also assessed how well phishing detection performed over
tion approach. time without model training. As a result, their solution works
better than Google Safe Browsing (GSB), which is the default
security tool in most popular web browsers. It is worth men-
tioning that the model performs well against phishing URLs
even after one year. Although the methodology achieves good
performance, the authors are concerned about the robustness
against adversarial attacks, which are frequently exploited
by malevolent entities even when the system produces good
performance.
Jain and Gupta [14] presented a novel method that ana-
lyzes hyperlinks included in the HTML source code of web-
sites to identify phishing assaults. In their feature selection
process, six new features were proposed to increase the
detecting performance, which is also the key contribution in
this work because both processing time and response time
were thus reduced. Moreover, their approach is language-
FIGURE 1. Phishing detection steps by applying AI solutions. independent to detect any textual language webpage.
However, the approach has certain restrictions because it is
The rest of this paper is organized as follows: Section II totally dependent on the website’s source code. If the attack-
presents the previous research employing, respectively, ers change all the page resource references, their method will
ML and DL. Section III introduces and compares three make a false prediction.
published datasets and features. Section IV provides the The performance of an ML-based system heavily depends
detection results by utilizing conventional ML algorithms. on the feature sets. Useless features will increase the cost
In section V, we build several DL models and compare the of storage, time, and power. Feature engineering is crucial
results with ML. Finally, Section VI discusses ML methods’ since traditional ML techniques depend on human expertise
sufficiency for phishing detection and proposes future works for feature extraction and selection. K. L. Chiew et al. [19]
in phishing detection field. introduced a Hybrid Ensemble Feature Selection (HEFS)
framework for ML-based phishing detection systems, where
II. LITERATURE REVIEW major feature subsets are created using a novel Cumulative
Based on the methodologies used, phishing detection solu- Distribution Function gradient (CDF-g) method. By using a
tions can be categorized into many different groups includ- function perturbation, they can get a set of baseline features.
ing blacklist and whitelist [6], heuristic-based method [7], After integrating with Random Forest, the detection accuracy

124104 VOLUME 10, 2022

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

can achieve 94.6% using only 20.8% of the original number rate of their proposed phishing email detection method can
of features. approach 95%, according to experimental results. In their
The main agenda of our previous work [20] also focuses research, to make the detection system more efficient, they
on the feature selection approach for phishing detection. labeled a small amount of data manually. Based on this
In our proposed framework, existing feature importance small dataset, they used KNN and K-Means to expand it
methods Mean Decrease in Impurity (MDI), Permutation, into the final samples. It is commonly known that DL can
and SHapley Additive explanation (SHAP) are leveraged to manage large amounts of data and when the size of the dataset
obtain a ranking of the importance of features. By assigning increases, DL performs better. However, it is difficult for
different weights to evaluation metrics under various con- researchers to find abundant and appropriate datasets to work
ditions, we can automatically generate the optimal feature with. At the same time, using a single processor to train DL
subsets. According to experimental results, our feature selec- models on such a significant dataset is also a challenge.
tion framework outperforms HEFS [19] on the same dataset. In a recent comprehensive DL-based review in the phishing
Based on the top 10 features we select, detection accuracy detection field [4], Do et al. indicated that Each DL algorithm
achieves 96.83%, which is higher than their results (94.6%) has unique properties that make it ideal for a specific appli-
with 10 baseline features. Both of the feature selection cation. For example, RNN is more appropriate for processing
frameworks above can provide a fully automatic, flexible, sequential data such as natural language and text. When
and robust system to produce high-quality sub-feature sets. analyzing two-dimensional data, such as images and videos,
Furthermore, the framework can be applied to various CNN produces better results. In addition, the main drawback
datasets, which can provide a solution to the problem dis- is that supervised DL requires a massive amount of labeled
cussed in [4] that manual feature engineering is separated instances, which adds a high level of computational complex-
from classification tasks in conventional ML models. ity to the detection system [31]. Additionally, DL models are
unable to justify the inference they draw. It would be tough
B. DL-BASED PHISHING DETECTION to comprehend the relationship between input attributes and
It is precisely because of its capability to find hidden infor- output decisions [32].
mation in complicated datasets, DL has recently emerged as
a viable substitute for traditional ML techniques. In order III. DATASET AND FEATURES
to enhance the effectiveness of phishing detection solutions, Several high-quality phishing datasets are widely used by
various DL-based approaches have been applied. Popular various authors in their research, such as UCI_2015 [33],
DL algorithms used in phishing detection include Multi- Mendeley_2018 [34], and Mendeley_2020 [35]. Phishing
Layer Perceptron (MLP) [21], [22], Long Short-Term Mem- instances are usually derived from PhishTank [36], which
ory (LSTM) [23], Convolutional Neural Network (CNN) is a cooperative repository for data and information about
[24], [25], Recurrent Neural Network (RNN) [26], [27], and phishing attacks on the Internet. Other legitimate instances
hybrid [28]. . . etc. are from Alexa, DMOZ, and Common Crawl. Features
Yerima and Alzaylaee [25] presented a DL-based approach used in phishing detection are usually extracted from URLs
with high detecting accuracy, where CNN is utilized to (protocol, domain, path, parameter shown in FIGURE 2)
distinguish legitimate websites from phishing websites. and other external resources. In this section, we will give an
A 1D-CNN model with two convolutional layers, two introduction and comparison of these three popular phishing
max-pooling layers, and one fully connected layer was con- datasets.
structed in their method. The model surpassed several pop-
ular machine learning classifiers, according to testing on a
benchmarked dataset of 4,898 examples from phishing web-
sites and 6,157 instances from reliable websites. However,
to fine-tune the important impacting parameters (i.e. num-
ber of filters, filter lengths, and the number of fully con-
nected units), they conducted a series of experiments. This
FIGURE 2. An example of URL structure.
time-consuming and labor-intensive procedure is frequently
observed in DL-based methods [29], [30].
Li et al. [23] proposed an LSTM-based phishing detection
method for big email data which consists of two important A. UCI_2015
stages: sample expansion stage and testing stage. To suit the University California Irvine Machine Learning Repository
needs of in-depth learning, sufficient training samples should (UCI) is a common repository that contains both fraudu-
be provided, they merged KNN with K-Means in the sample lent and trustworthy website URLs, which is popular among
expansion stage. Prior to testing, they preprocessed the data phishing detection researchers [4], [37], [38]. The dataset
by generalizing, word segmenting, and creating word vectors. was donated in 2015 and collected primarily from PhishTank
The LSTM model was then trained using the preprocessed and MillerSmiles archives. The dataset comprises 30 fea-
data. Finally, they categorized phishing emails. The accuracy tures and 11055 instances (6157 legitimate websites and

VOLUME 10, 2022 124105

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

TABLE 1. Features in dataset UCI_2015. TABLE 2. Features in dataset Mendeley_2018.

4898 phishing websites). The specific features are shown in

Table 1. Although the UCI dataset is widely used, it is now
too old to be used for modern phishing detection algorithms
development.

B. MENDELEY_2018
48 features are contained in the dataset Mendeley_2018,
which includes 5000 malicious and 5000 legitimate instances.
The legal websites are derived from Alexa and common
crawl, whereas phishing instances are from PhishTank and
OpenPhish. Based on this dataset, L. Chiew et al. [19]
proposed the HEFS framework mentioned in Section II.
Table 2 shows a list of features in Mendeley_2018.

C. MENDELEY_2020
Dataset Mendeley_2020 is the primary dataset utilized in our
research, which consists of two sub-datasets: dataset_full and
dataset_small. There are 88647 instances in the full dataset
and 58645 instances in the small dataset. Data were collected
from PhishTank and Alexa ranking. This dataset contains
111 features, for better understanding, we redivided them into
8 groups. Two sub-datasets are illustrated in FIGURE 3, and
the descriptions are explained in Table 3. FIGURE 3. Dataset Mendeley_2020.

D. COMPARISON
Comparisons among the three datasets are provided in
Table 4 and FIGURE 4. As shown in TABLE 4, there are First, traditional ML algorithms including K-Means Clus-
more instances in dataset Mendeley_2020, even eight times as tering (KMeans), Support Vector Machine (SVM), Naive
many as in datasets UCI_2015 and Mendeley_2018. In addi- Bayes Classifier (NB), K-Nearest Neighbor (KNN), Logistic
tion, all features in dataset UCI_2015 were transformed into Regression (LR), Linear Discriminant Analysis (LDA), Clas-
Boolean type based on specified rules, making it difficult for sification and Regression Tree (CART), and Random Forest
further analysis. Dataset Mendeley_2020 was selected in our (RF) were utilized to classify. Then, results by using ensemble
research for its quantity in instances and features. ML methods including RF, AdaBoost, GBDT, XGBoost, and
LightGBM were compared in the second sub-section. The
IV. ML-BASED PHISHING DETECTION RESULTS same as most studies [4], [14] performance was analyzed
In this section, we performed an empirical analysis of using Accuracy, Precision, Recall, F1 score, ROC Curve, and
various traditional ML algorithms for phishing detection. P-R Curve.

124106 VOLUME 10, 2022

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

TABLE 3. Features in dataset Mendeley_2020.

FIGURE 5. ROC curves of eight traditional ML classifiers.

FIGURE 6. P-R curves of eight traditional ML classifiers.

return accurate results (high precision), as well as high posi-

tive results (high recall) at the same time in P-R Curves.

B. ENSEMBLE ML ALGORITHMS
The learning algorithms known as ‘‘ensemble ML methods’’
FIGURE 4. Number of instances in three phishing datasets. classify new data by performing a (weighted) vote on the
predictions made by each classifier [39]. They are consid-
ered as the state-of-the-art solutions for many ML chal-
A. TRADITIONAL ML ALGORITHMS lenges [40]. We implemented 5 ensemble ML methods on
On Jupyter Notebook (6.4.3), all of the models were trained the dataset including AdaBoost, Gradient Boosted Decision
using the scikit-learn (1.1.2) library with Python (3.8.11) Trees (GBDT), LightGBM (version 3.3.3), Histogram-Based
programming language. We used 10-fold cross-validation Gradient Boosting (HGB), and the most popular ensemble
in our studies on the full dataset in Mendeley_2020. The method Random Forest (RF). In this experiment, we split the
performances are provided in Table 5, ROC Curves and original dataset into two parts, using 70% for training and
P-R Curves are illustrated in FIGURE 5 and FIGURE 6. As a 30% for testing.
result, RF shows the best performance on all metrics with Performances are provided in Table 6 and ROC curves are
a 97.01% accuracy rate. As can be seen from the graphs, illustrated in FIGURE 7, where RF outperforms other meth-
the highest value of Area Under Curve (AUC) belongs to ods in both accuracy rate and AUC value. LightGBM shows
RF, which means that it can separate the positive class and its high efficiency with minimum training and testing time
negative class correctly. Besides, RF presents the ability to consumption. We can conclude that ensemble ML methods,

VOLUME 10, 2022 124107

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

TABLE 4. Comparison of three popular phishing datasets.

TABLE 5. Performance metrics of various traditional ML algorithms. designed to prevent the long-term dependency problem [43].
CNN is renowned for its ability to recognize simple patterns
in a multi-dimensional task, and as a result, it has had success
processing 2D signals like images and video frames [25].
However, a 1D CNN model can also be used to process
datasets with a one-dimensional structure. [44]. In the fol-
lowing subsections, the experiment setup and data division
are described, following the result and comparison.

A. EXPERIMENTAL SETUP
We built three DL-based models by using Python (3.8.11)
with Tensorflow (2.9.1) and Keras library (2.9.0) on Jupyter
Notebook (6.4.3). The dataset was divided into three parts:
training dataset, validation dataset, and test dataset. The train
dataset is 80% of the original dataset, and 20% is the test
dataset. Furthermore, 10% of the train dataset is used as a
validation dataset shown in FIGURE 8.

FIGURE 7. ROC curves of five ensemble ML classifiers.

FIGURE 8. Dataset is divided into three parts.

in particular the boosting methods, tend to achieve the best Fully connected layers are usually used for classification,
performance in phishing classification. in order to build the FCNN model, it is essential to decide
the number of layers, we set different layers to observe the
V. DL-BASED PHISHING DETECTION RESULTS changes in accuracy and loss on the validation dataset as
The goal of this section is to assess the performance of shown in FIGURE 9. When the number of layers rises, the
current popular DL-based methods including FCNN, LSTM, accuracy rate and loss are basically flat, and the validation
and CNN. Fully Connected Neural Networks (FCNN) are accuracy rate is at its highst (0.9403) when the number of
constituted by a sequence of completely connected layers layers is 3.
that have the primary advantage of being ‘‘structure agnos- Overfitting occurs when the number of layers is 20 in
tic,’’ meaning that no special assumptions about the input FIGURE 10, which indicates that the model fits perfectly
are required [41]. LSTM is a particularly unique type of against its training data but fails to perform accurately against
Recurrent Neural Network (RNN) that performs significantly the unseen (test) dataset, violating its purpose.
better than the normal version. It was introduced by Hochre- We built our 3-layers FCNN model after determining the
iter and Schmidhuber [42] and several researchers have epochs by using early stopping (FIGURE 11). The final
since improved and popularized it. LSTMs are specifically model could be illustrated in FIGURE 12.

124108 VOLUME 10, 2022

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

TABLE 6. Performance metrics of various ensemble ML algorithms.

TABLE 7. Parameters settings for the three DL-Based models.

FIGURE 11. Accuracy vs. epochs in the 3-layers FCNN model.

FIGURE 9. Accuracy and loss vs. number of layers in FCNN.

FIGURE 10. Overfitting occurs in the 20-layers FCNN model. FIGURE 12. Our 3-layers FCNN model.

Procedure from FIGURE 9 to FIGURE 12 can be seen as B. RESULT AND COMPARISON

a basic example of parameter settings in DL-based methods. To increase the reliability of classifications, models include
Parameters can differ between different DL models, such as RF were tested on three datasets: dataset_small with 111 fea-
the number of layers in the model, batch size, the number tures, dataset_full with 111 features, and dataset_full with
of epochs, type of optimizer, type of activation function in 14 selected features in our previous work. For the purpose
hidden layers and output layer, etc. [4]. Based on these steps, of seeing accuracy and loss during training process and
we built a 3-layers LSTM model with one dropout layer validation process, accuracy and loss curves are illustrated
and one dense layer. In addition, a 6-layers CNN model in FIGURE13, where the upper graph shows accuracy and
was constructed in the research. Table 7 lists the parameter the lower graph shows loss function. As the number of
settings for these DL architectures. epochs increases, the accuracy appears to rise but the loss

VOLUME 10, 2022 124109

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

FIGURE 13. Accuracy and loss of FCNN, LSTM, and CNN.

TABLE 8. Performance metrics of RF, FCNN, LSTM, and CNN.

function declines. A large gap between training outputs and rate 96.94%, whereas that of CNN, FCNN, and LSTM are
validation outputs is commonly considered as overfitting, 91.38% 90.13%, and 89.73%, respectively. This result casts a
which typically happens when the model entirely memorizes new light on the performance of RF model. Third, RF model
data patterns, noise, and other random fluctuations, causing has the lowest training time, which is sensible because the
it fits too closely to the training set [45]. This phenomenon computation complexity of DL-based models is always high.
appears in CNN model visibly in FIGURE 13. Note that we only record the training time cost of its best fine-
Table 8 summarizes the evaluation results acquired from tuning state for each individual model. Furthermore, we also
the experiments. Evaluation metrics consist of training time conducted an experiment to compare the performances of
consumption, precision, recall, AUC, and accuracy. From the the selected features against full features on dataset_full.
table, we observed the following phenomenon that needs to be Results showed that RF only experiences a minimal accuracy
emphasized. First, all the classifiers perform better when data deterioration of 0.1% (96.94% to 96.84%) while achieving a
is getting bigger from dataset_small to dataset_full, which massive reduction in the dataset. Compared to RF, DL models
indicates that significant datasets are typically necessary for suffer from serious decreases in testing accuracy rate with
AI to reach high accuracy. Second, it is surprising that RF out- selected features. FIGURE 14 also presents ROC Curves
performs other DL models with the highest testing accuracy of the 4 classifiers, where lower plots are larger versions

124110 VOLUME 10, 2022

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

FIGURE 14. ROC curves of RF, CNN, LSTM, and FCNN on three different datasets.

zooming in at the top left. The curves and Area Under Ensemble ML techniques represented by RF are usually
the ROC Curve (AUC) values offer a more comprehen- regarded as a crystallization of wisdom of various ML meth-
sive insight into the performances of the models. In every ods. In ensemble methods, by combining different models,
graph, RF clearly shows incomparable curves against other the risk of selecting an improper decision is reduced, and thus,
DL models. the forecast performance is improved. In our experiments,
As a result, the evaluation results have validated that RF CART, RF, and Boosting methods obtain better performances
is advantageous and highly effective when working with in phishing classification. This is potentially due to these
selected features and real-time applications in distinguishing ensemble methods benefit from the dynamic changing of
between legitimate and phishing websites. The implications assigned weight to each instance in the iteration process, mak-
of these findings are discussed in the following Section to ing it more robust and stable than traditional ML algorithms.
highlight the sufficiency of ensemble ML methods in phish- For instance, AdaBoost’s basic principle is to concentrate on
ing detection and navigate the future directions. cases that were previously incorrectly classified when train-
ing a new inducer [40]. In the initial iteration, each instance
VI. DISCUSSION AND CONCLUSION is given the same weight, after which the weights of incor-
Previous sections have compared classification performances rectly categorized instances increase and those of correctly
of various ML models and DL models. In this Section, identified examples decrease. Additionally, based on their
we discuss the advantages and disadvantages between the two total prediction performances, the individual basic learners
groups and draw our conclusion. are also given voting weights. Hence, ensemble ML methods
Deep Learning is considered to be the state-of-the-art decrease both bias and variance of variable techniques while
solution to various problems with the advantages of deal- increasing the variance for stable classifiers, making them
ing with big data and generating features automatically more suitable for classification tasks.
over Machine Learning. However, model architecture design, As a typical binary classification problem, ML-based
manual parameter tuning, high training time costs, computa- phishing detection solutions are questioned on the ability to
tional complexity, and deficient accuracy performance are the handle big data and extract features. Researchers believe that
most prevalent problems with DL approaches, as discussed the process of feature selection relies on professional knowl-
in Section V. edge and reduplicative experiments, which is considered

VOLUME 10, 2022 124111

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

to be tedious, labor-intensive, and susceptible to human [11] H. Tupsamudre, A. K. Singh, and S. Lodha, ‘‘Everything is in the name—
mistakes [4]. However, this problem can be effectively and A URL based approach for phishing detection,’’ in Cyber Security Cryp-
tography and Machine Learning (Lecture Notes in Computer Science).
efficiently resolved by utilizing automatic feature selec- Cham, Switzerland: Springer, 2019, pp. 231–248, doi: 10.1007/978-3-030-
tion methods, for example, our feature selection framework 20951-3_21.
achieves a remarkable 87.6% reduction in feature quantity [12] E. S. Aung and H. Yamana, ‘‘URL-based phishing detection using the
entropy of non-alphanumeric characters,’’ in Proc. 21st Int. Conf. Inf.
with suffering from only a 0.1% deterioration in detecting Integr. Web-based Appl. Services, New York, NY, USA, Dec. 2019,
accuracy, making it possible for up-date training and real- pp. 385–392, doi: 10.1145/3366030.3366064.
time detecting in a production environment. In another hand, [13] A. Butnaru, A. Mylonas, and N. Pitropakis, ‘‘Towards lightweight URL-
based phishing detection,’’ Future Internet, vol. 13, no. 6, p. 154, Jun. 2021,
phishers are also employing the latest schemes to execute doi: 10.3390/fi13060154.
attacks, phishing features are under evolution constantly. The [14] A. K. Jain and B. B. Gupta, ‘‘A machine learning based approach for phish-
phishing websites features cannot be generated once and ing detection using hyperlinks information,’’ J. Ambient Intell. Humanized
Comput., vol. 10, no. 5, pp. 2015–2028, May 2019, doi: 10.1007/s12652-
for all, conversely, it should be a continuous updating and 018-0798-z.
accumulating process, in which researchers are supposed to [15] U. Ozker and O. K. Sahingoz, ‘‘Content based phishing detection with
pay efforts. machine learning,’’ in Proc. Int. Conf. Electr. Eng. (ICEE), Sep. 2020,
pp. 1–6, doi: 10.1109/ICEE49691.2020.9249892.
To sum up, our experiments and discussions offer a signif- [16] A. K. Jain, S. Parashar, P. Katare, and I. Sharma, ‘‘PhishSKaPe: A content
icant insight into the sufficiency of ensemble ML methods based approach to escape phishing attacks,’’ Proc. Comput. Sci., vol. 171,
for anti-phishing techniques. As for future work, we will pp. 1102–1109, Jan. 2020, doi: 10.1016/j.procs.2020.04.118.
[17] M. M. Yadollahi, F. Shoeleh, E. Serkani, A. Madani, and H. Gharaee,
validate our conclusion on various datasets with more fea- ‘‘An adaptive machine learning based approach for phishing detection
tures and more instances. In addition, further efforts need to using hybrid features,’’ in Proc. 5th Int. Conf. Web Res. (ICWR), Apr. 2019,
be taken to avoid the inefficiency when detecting zero-day pp. 281–286, doi: 10.1109/ICWR.2019.8765265.
[18] R. Zaimi, M. Hafidi, and M. Lamia, ‘‘Survey paper: Taxonomy of
attacks. We plan to extract features of the latest phishing web- website anti-phishing solutions,’’ in Proc. 7th Int. Conf. Social
sites and train our ensemble ML method at intervals. Then, Netw. Anal., Manag. Secur. (SNAMS), Dec. 2020, pp. 1–8, doi:
by observing the variation trends in newly evolving phishing 10.1109/SNAMS52053.2020.9336559.
[19] K. L. Chiew, C. L. Tan, K. Wong, K. S. C. Yong, and W. K. Tiong, ‘‘A new
patterns, we would like to find a balanced renewal frequency hybrid ensemble feature selection framework for machine learning-based
for extracting features and training models to maintain high phishing detection system,’’ Inf. Sci., vol. 484, pp. 153–166, May 2019,
detection accuracy. Last but not least, as a practical tool, doi: 10.1016/j.ins.2019.01.064.
[20] Y. Wei and Y. Sekiya, ‘‘Feature selection approach for phishing detection
a phishing detection architecture is supposed to be deployed based on machine learning,’’ in Proc. Int. Conf. Appl. CyberSecurity (ACS),
in a real-world production environment (e.g. web browser) to 2021, pp. 61–70, doi: 10.1007/978-3-030-95918-0_7.
verify its effectiveness against phishing attacks eventually. [21] S. Al-Ahmadi. (2020). PDMLP: Phishing Detection Using Multilayer Per-
ceptron. Rochester, NY, USA. Accessed: Sep. 1, 2022. [Online]. Available:
https://papers.ssrn.com/abstract=3624621
REFERENCES [22] A. Odeh, I. Keshta, and E. Abdelfattah. (2020). Efficient Detection of
[1] N. Akdemir and S. Yenal, ‘‘How phishers exploit the coronavirus pan- Phishing Websites Using Multilayer Perceptron. International Association
demic: A content analysis of COVID-19 themed phishing emails,’’ of Online Engineering. Accessed: Sep. 30, 2022. [Online]. Available:
SAGE Open, vol. 11, no. 3, Jul. 2021, Art. no. 21582440211031880, doi: https://www.learntechlib.org/p/217754/
10.1177/21582440211031879. [23] Q. Li, M. Cheng, J. Wang, and B. Sun, ‘‘LSTM based phishing detection
for big email data,’’ IEEE Trans. Big Data, vol. 8, no. 1, pp. 278–288,
[2] A. F. Al-Qahtani and S. Cresci, ‘‘The COVID-19 scamdemic: A survey of
Feb. 2022, doi: 10.1109/TBDATA.2020.2978915.
phishing attacks and their countermeasures during COVID-19,’’ IET Inf.
[24] M. A. Adebowale, K. T. Lwin, and M. A. Hossain, ‘‘Deep learning with
Secur., vol. 16, no. 5, pp. 324–345, Sep. 2022, doi: 10.1049/ise2.12073.
convolutional neural network and long short-term memory for phishing
[3] APWG | Phishing Activity Trends Reports. Accessed: Sep. 28, 2022.
detection,’’ in Proc. 13th Int. Conf. Softw., Knowl., Inf. Manag. Appl.
[Online]. Available: https://apwg.org/trendsreports/
(SKIMA), Aug. 2019, pp. 1–8, doi: 10.1109/SKIMA47702.2019.8982427.
[4] N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, [25] S. Y. Yerima and M. K. Alzaylaee, ‘‘High accuracy phishing detec-
‘‘Deep learning for phishing detection: Taxonomy, current challenges and tion based on convolutional neural networks,’’ in Proc. 3rd Int.
future directions,’’ IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: Conf. Comput. Appl. Inf. Secur. (ICCAIS), Mar. 2020, pp. 1–6, doi:
10.1109/ACCESS.2022.3151903. 10.1109/ICCAIS48893.2020.9096869.
[5] Phishing Websites Features.pdf. Accessed: Sep. 28, 2022. [26] Y. Su, ‘‘Research on website phishing detection based on
[Online]. Available: http://eprints.hud.ac.uk/id/eprint/24330/6/ LSTM RNN,’’ in Proc. IEEE 4th Inf. Technol., Netw., Electron.
MohammadPhishing14July2015.pdf Autom. Control Conf. (ITNEC), Jun. 2020, pp. 284–288, doi:
[6] A. K. Jain and B. B. Gupta, ‘‘A novel approach to protect against phishing 10.1109/ITNEC48623.2020.9084799.
attacks at client side using auto-updated white-list,’’ EURASIP J. Inf. [27] T. Feng and C. Yue, ‘‘Visualizing and interpreting RNN models in URL-
Secur., vol. 2016, no. 1, pp. 1–11, May 2016, doi: 10.1186/s13635-016- based phishing detection,’’ in Proc. 25th ACM Symp. Access Control
0034-3. Models Technol., Jun. 2020, pp. 13–24, doi: 10.1145/3381991.3395602.
[7] A. A. Zuraiq and M. Alkasassbeh, ‘‘Review: Phishing detection [28] Y. Lin. (2021). Phishpedia: A Hybrid Deep Learning Based
approaches,’’ in Proc. 2nd Int. Conf. new Trends Comput. Sci. (ICTCS), Approach to Visually Identify Phishing Webpages. Accessed:
Oct. 2019, pp. 1–6, doi: 10.1109/ICTCS.2019.8923069. Sep. 30, 2022. [Online]. Available: https://www.usenix.org/conference/
[8] S. Abdelnabi, K. Krombholz, and M. Fritz, ‘‘VisualPhishNet: Zero- usenixsecurity21/presentation/lin
day phishing website detection by visual similarity,’’ in Proc. ACM [29] W. Wei, Q. Ke, J. Nowak, M. Korytkowski, R. Scherer, and M. Wozniak,
SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, Oct. 2020, ‘‘Accurate and fast URL phishing detector: A convolutional neural network
pp. 1681–1698, doi: 10.1145/3372297.3417233. approach,’’ Comput. Netw., vol. 178, Sep. 2020, Art. no. 107275, doi:
[9] A. Ozcan, C. Catal, E. Donmez, and B. Senturk, ‘‘A hybrid DNN–LSTM 10.1016/j.comnet.2020.107275.
model for detecting phishing URLs,’’ Neural Comput. Appl., vol. 33, [30] S. Mahdavifar and A. A. Ghorbani, ‘‘DeNNeS: Deep embedded neu-
pp. 1–17, Aug. 2021, doi: 10.1007/s00521-021-06401-z. ral network expert system for detecting cyber attacks,’’ Neural Comput.
[10] V. Shahrivari, M. M. Darabi, and M. Izadi, ‘‘Phishing detection using Appl., vol. 32, no. 18, pp. 14753–14780, 2020, doi: 10.1007/s00521-020-
machine learning techniques,’’ 2020, arXiv:2009.11116. 04830-w.

124112 VOLUME 10, 2022

Y. Wei, Y. Sekiya: Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection

[31] S. Mahdavifar and A. A. Ghorbani, ‘‘Application of deep learning to cyber- YI WEI received the B.E. degree from the College
security: A survey,’’ Neurocomputing, vol. 347, pp. 149–176, Jun. 2019, of Computer Science and Electronic Engineering,
doi: 10.1016/j.neucom.2019.02.056. Hunan University, China, in 2018, and the M.E.
[32] A. Das, S. Baki, A. El Aassal, R. Verma, and A. Dunbar, ‘‘SoK: A com- degree in electrical engineering and information
prehensive reexamination of phishing research from the security perspec- systems from The University of Tokyo, Tokyo,
tive,’’ IEEE Commun. Surveys Tuts., vol. 22, no. 1, pp. 671–708, 1rst Japan, in 2021, where she is currently pursuing the
Quart., 2020, doi: 10.1109/COMST.2019.2957750. Ph.D. degree in electrical engineering and infor-
[33] UCI Machine Learning Repository: Phishing Websites Data Set.
mation systems.
Accessed: Oct. 1, 2022. [Online]. Available: https://archive.ics.uci.edu/ml/
Since August 2021, she has been a Technical
datasets/phishing+websites
[34] C. L. Tan, ‘‘Phishing dataset for machine learning: Feature evaluation,’’ Assistant with the Security Informatics Education
Mendeley Data, vol. 1, Mar. 2018, doi: 10.17632/h3cgnj8hft.1. and Research Center and the Graduate School of Information Science and
[35] G. Vrbancic, I. Fister, and V. Podgorelec, ‘‘Datasets for phishing web- Technology, The University of Tokyo. Her research interests include phish-
sites detection,’’ Data Brief, vol. 33, Dec. 2020, Art. no. 106438, doi: ing website detection by using machine learning and deep learning, feature
10.1016/j.dib.2020.106438. selection approach for dimensionality reduction, and applications of future
[36] PhishTank | Join the Fight Against Phishing. Accessed: Oct. 1, 2022. quantum machine learning algorithms in the cybersecurity field.
[Online]. Available: https://phishtank.org/
[37] G. H. Lokesh and G. BoreGowda, ‘‘Phishing website detection based on
effective machine learning approach,’’ J. Cyber Secur. Technol., vol. 5,
no. 1, pp. 1–14, Jan. 2021, doi: 10.1080/23742917.2020.1813396.
[38] A. Lakshmanarao, P. S. P. Rao, and M. M. B. Krishna, ‘‘Phishing website
detection using novel machine learning fusion approach,’’ in Proc. Int.
Conf. Artif. Intell. Smart Syst. (ICAIS), Mar. 2021, pp. 1164–1169, doi:
10.1109/ICAIS50930.2021.9395810.
[39] T. G. Dietterich, ‘‘Ensemble methods in machine learning,’’ in Multi-
ple Classifier Systems. Berlin, Germany: Springer, 2000, pp. 1–15, doi:
10.1007/3-540-45014-9_1.
[40] O. Sagi and L. Rokach, ‘‘Ensemble learning: A survey,’’ WIREs Data YUJI SEKIYA received the B.E. degree from Kyoto
Mining Knowl. Discovery, vol. 8, no. 4, p. e1249, Jul. 2018, doi: University, in 1997, and the M.E. degree and
10.1002/widm.1249. the Ph.D. degree in media and governance from
[41] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, ‘‘Convolutional, long Keio University, Tokyo, Japan, in 1999 and 2005,
short-term memory, fully connected deep neural networks,’’ in Proc. respectively.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2015, Since October 1999, he has been working as
pp. 4580–4584, doi: 10.1109/ICASSP.2015.7178838. a Visiting Researcher at USC/ISI for six months.
[42] Long Short-Term Memory | Neural Computation | MIT Press. Since 2002, he has also been working at the
Accessed: Oct. 2, 2022. [Online]. Available: https://direct.mit.edu/ Information Technology Center, The University of
neco/article/9/8/1735/6109/Long-Short-Term-Memory Tokyo, where he is currently a Professor at the
[43] A. Sherstinsky, ‘‘Fundamentals of recurrent neural network
Graduate School of Information Science and Technology and working as a
(RNN) and long short-term memory (LSTM) network,’’ Phys. D,
member of the Security Informatics Education and Research Center. He has
Nonlinear Phenomena, vol. 404, Mar. 2020, Art. no. 132306, doi:
10.1016/j.physd.2019.132306. been working on DNS measurements and security, SDN, network virtualiza-
[44] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and tion, cloud computing, and cyber security. As society activities, he is deeply
D. J. Inman, ‘‘1D convolutional neural networks and applications: A sur- involved in WIDE Project, M Root DNS server, JP DNS servers, Internet
vey,’’ Mech. Syst. Signal Process., vol. 151, Apr. 2021, Art. no. 107398, Exchanges called, DIX-IE, PIX-IE, and NSPIXP-3, NECOMA Project, and
doi: 10.1016/j.ymssp.2020.107398. Interop Tokyo ShowNet. He has been in-charge of the Executive Advisor to
[45] X. Ying, ‘‘An overview of overfitting and its solutions,’’ J. Phys., the Japanese Government CIO, since February 2020, and also in-charge of a
Conf., vol. 1168, Feb. 2019, Art. no. 022022, doi: 10.1088/1742- Senior Network Engineer at Digital Agency of Japanese Government.
6596/1168/2/022022.

VOLUME 10, 2022 124113

Vol +31+no +5+ (2025) +-+23
No ratings yet
Vol +31+no +5+ (2025) +-+23
9 pages
Real Time Phishing Website Detectionusing ML
No ratings yet
Real Time Phishing Website Detectionusing ML
4 pages
Final Synopsisi 2
No ratings yet
Final Synopsisi 2
11 pages
Machine Learning For Detecting The Phishing Threats
No ratings yet
Machine Learning For Detecting The Phishing Threats
6 pages
Mandadi 2022
No ratings yet
Mandadi 2022
4 pages
Final Yr Project PhishingAttack
No ratings yet
Final Yr Project PhishingAttack
12 pages
Paper 5665
No ratings yet
Paper 5665
117 pages
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
No ratings yet
Leveraging Advanced Machine Learning Techniques For Phishing Website Detection
6 pages
A Sophisticated Framework For The Accurate Detection of Phishing Websites
No ratings yet
A Sophisticated Framework For The Accurate Detection of Phishing Websites
23 pages
Paper 2
No ratings yet
Paper 2
10 pages
Phishing Paper 2
No ratings yet
Phishing Paper 2
6 pages
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
No ratings yet
Machine Learning-Driven Phishing Detection: A Robust Browser Extension Solution
4 pages
SafeSurf Enhancing Web Security
No ratings yet
SafeSurf Enhancing Web Security
16 pages
Enhancing Phishing URL Detection Through Comprehen
No ratings yet
Enhancing Phishing URL Detection Through Comprehen
7 pages
Phishing Detection in Dynamic Environments Using Network Behavior
No ratings yet
Phishing Detection in Dynamic Environments Using Network Behavior
6 pages
Paper 1
No ratings yet
Paper 1
5 pages
TSP CMC 51778
No ratings yet
TSP CMC 51778
21 pages
3406 6866 1 PB
No ratings yet
3406 6866 1 PB
10 pages
SafeSurf Enhancing Web Security Through Phishing Detection
No ratings yet
SafeSurf Enhancing Web Security Through Phishing Detection
15 pages
PHISHNET Multi Algorithmic Safety Net For Advanced Phishing URL Detection
No ratings yet
PHISHNET Multi Algorithmic Safety Net For Advanced Phishing URL Detection
8 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
Phishing Detection via ML Stacking
No ratings yet
Phishing Detection via ML Stacking
16 pages
Harinahalli Phishing Website Detection Based On Effective Machine Learning Approach
No ratings yet
Harinahalli Phishing Website Detection Based On Effective Machine Learning Approach
15 pages
Electronics 12 00232 v2
No ratings yet
Electronics 12 00232 v2
18 pages
Machine LearningTechniquesfor Detection of Website Phishing A Review For Promises and Challenges
No ratings yet
Machine LearningTechniquesfor Detection of Website Phishing A Review For Promises and Challenges
6 pages
Phishing Url Detection Research PDF
No ratings yet
Phishing Url Detection Research PDF
9 pages
155-Article Text-230-3-10-20230813
No ratings yet
155-Article Text-230-3-10-20230813
7 pages
Abedin 2020
No ratings yet
Abedin 2020
6 pages
Applsci 13 04649
No ratings yet
Applsci 13 04649
16 pages
1 PB
No ratings yet
1 PB
11 pages
Phishing Detection (Yamu Research Project)
No ratings yet
Phishing Detection (Yamu Research Project)
19 pages
Generative Adversarial Network-Based Phishing URL Detection With Variational Autoencoder and Transformer
No ratings yet
Generative Adversarial Network-Based Phishing URL Detection With Variational Autoencoder and Transformer
8 pages
Base Paper
No ratings yet
Base Paper
16 pages
ML for Phishing Detection
No ratings yet
ML for Phishing Detection
8 pages
Phishing URL Detection Using LSTM Based Ensemble Learning Approaches
No ratings yet
Phishing URL Detection Using LSTM Based Ensemble Learning Approaches
17 pages
Major Project Final Report
No ratings yet
Major Project Final Report
53 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
7 pages
Adebowale 2020
No ratings yet
Adebowale 2020
22 pages
Make 03 00034
No ratings yet
Make 03 00034
23 pages
A Machine Learning Based Approach For Phishing Detection Using
No ratings yet
A Machine Learning Based Approach For Phishing Detection Using
14 pages
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
No ratings yet
Detection of Url Based Phishing Attacks Using Machine Learning IJERTV8IS110269
8 pages
1229-Article Text-12170-1-10-20250203-2
No ratings yet
1229-Article Text-12170-1-10-20250203-2
13 pages
Employing Machine Learning Algorithms To Detect Phishing URL Websites
No ratings yet
Employing Machine Learning Algorithms To Detect Phishing URL Websites
6 pages
Phishing Detection via Machine Learning
No ratings yet
Phishing Detection via Machine Learning
15 pages
Depuuu DOCNW
No ratings yet
Depuuu DOCNW
28 pages
Phishing Detection with XGBoost
No ratings yet
Phishing Detection with XGBoost
20 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
6 pages
1 s2.0 S0957417423016858 Main
No ratings yet
1 s2.0 S0957417423016858 Main
13 pages
PhishNotCloud-Based ML
No ratings yet
PhishNotCloud-Based ML
11 pages
Base Paper
No ratings yet
Base Paper
13 pages
20mis0106 VL2023240102875 Pe003
No ratings yet
20mis0106 VL2023240102875 Pe003
42 pages
Hybrid DNN-LSTM for Phishing Detection
No ratings yet
Hybrid DNN-LSTM for Phishing Detection
17 pages
Jain 2018
No ratings yet
Jain 2018
14 pages
Effective Ensemble Learning Phishing Detection System Using Hybrid Feature Selection
No ratings yet
Effective Ensemble Learning Phishing Detection System Using Hybrid Feature Selection
16 pages
Advanced Phishing Detection Model
No ratings yet
Advanced Phishing Detection Model
79 pages
Fake Url
No ratings yet
Fake Url
64 pages
Al-Hadhrami
No ratings yet
Al-Hadhrami
17 pages
AI-Based Phishing Detection Techniques
No ratings yet
AI-Based Phishing Detection Techniques
15 pages
Ins Research Paper New
No ratings yet
Ins Research Paper New
6 pages
MPD A Meteorological and Pollution Dataset A Comprehensive Study of Machine and Deep Learning Methods For Air Pollution Forecasting
No ratings yet
MPD A Meteorological and Pollution Dataset A Comprehensive Study of Machine and Deep Learning Methods For Air Pollution Forecasting
18 pages
An Intelligent Question Paper Generator Using Randomized Algorithm IJERTV11IS040041
No ratings yet
An Intelligent Question Paper Generator Using Randomized Algorithm IJERTV11IS040041
6 pages
Enhanced Sign Language Translation Between American Sign Language and Indian Sign Language Using LLMs
No ratings yet
Enhanced Sign Language Translation Between American Sign Language and Indian Sign Language Using LLMs
14 pages
Sign Language Detection Using Deep Learning
No ratings yet
Sign Language Detection Using Deep Learning
7 pages
WEKA 3-7-7 Manual: Command-Line & GUI
No ratings yet
WEKA 3-7-7 Manual: Command-Line & GUI
327 pages
1 - Altman - Financial Ratios, Discriminant Analysis, and The Prediction of Corporate Bankruptcy
100% (3)
1 - Altman - Financial Ratios, Discriminant Analysis, and The Prediction of Corporate Bankruptcy
22 pages
Tuber Classification Using SVM & KNN
No ratings yet
Tuber Classification Using SVM & KNN
4 pages
HCIA-AI V3.5 Version Instructions
No ratings yet
HCIA-AI V3.5 Version Instructions
2 pages
Mathematical Model
No ratings yet
Mathematical Model
34 pages
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
No ratings yet
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
5 pages
Cloud-Based Network Intrusion Detection System Using Deep Learning
No ratings yet
Cloud-Based Network Intrusion Detection System Using Deep Learning
6 pages
Siamfc++: Towards Robust and Accurate Visual Tracking With Target Estimation Guidelines
No ratings yet
Siamfc++: Towards Robust and Accurate Visual Tracking With Target Estimation Guidelines
12 pages
Review - 1 Machine Learning: - D.Malakondaiah Chowdary (160050051)
No ratings yet
Review - 1 Machine Learning: - D.Malakondaiah Chowdary (160050051)
12 pages
Classification Nearest Neighbor: Jeff Howbert Introduction To Machine Learning Winter 2012 1
No ratings yet
Classification Nearest Neighbor: Jeff Howbert Introduction To Machine Learning Winter 2012 1
19 pages
The Hundred Page Machine Learning Book
No ratings yet
The Hundred Page Machine Learning Book
152 pages
Introduction To The Theory of Neural Computation
No ratings yet
Introduction To The Theory of Neural Computation
18 pages
IIT Kharagpur AI4ICPS Program Schedule
No ratings yet
IIT Kharagpur AI4ICPS Program Schedule
2 pages
Customer Segmentation Techniques On E-Commerce
No ratings yet
Customer Segmentation Techniques On E-Commerce
4 pages
Sentiment Analysis On Youtube Comments
No ratings yet
Sentiment Analysis On Youtube Comments
54 pages
Zhang 2020
No ratings yet
Zhang 2020
5 pages
Lab Program 4
No ratings yet
Lab Program 4
4 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
AI & Computer Vision Lab Manual
No ratings yet
AI & Computer Vision Lab Manual
36 pages
Machine Learning 2024
No ratings yet
Machine Learning 2024
2 pages
Research Variables Guide
No ratings yet
Research Variables Guide
131 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
1maturity Classification of Cacao by Image Processing and Acoustic Sensing
No ratings yet
1maturity Classification of Cacao by Image Processing and Acoustic Sensing
8 pages
Bengal College of Engineering and Technology
No ratings yet
Bengal College of Engineering and Technology
12 pages
Prediction of Heart Disease Using Machine Learning
No ratings yet
Prediction of Heart Disease Using Machine Learning
5 pages
Texture Analysis Methods - A Review
No ratings yet
Texture Analysis Methods - A Review
33 pages
Stock Prediction for Investors
100% (2)
Stock Prediction for Investors
27 pages
Similarity Measures
No ratings yet
Similarity Measures
48 pages
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
No ratings yet
"Resume Ranking Using NLP and Machine Learning": Bachelor of Engineering
41 pages
DWDN Lab
No ratings yet
DWDN Lab
7 pages

Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection

Uploaded by

Sufficiency of Ensemble Machine Learning Methods For Phishing Websites Detection

Uploaded by

Received 4 November 2022, accepted 18 November 2022, date of publication 24 November 2022,

date of current version 30 November 2022.

Sufficiency of Ensemble Machine Learning

Corresponding authors: Yi Wei ([email protected]) and Yuji Sekiya ([email protected])

124104 VOLUME 10, 2022

VOLUME 10, 2022 124105

TABLE 1. Features in dataset UCI_2015. TABLE 2. Features in dataset Mendeley_2018.

4898 phishing websites). The specific features are shown in

124106 VOLUME 10, 2022

TABLE 3. Features in dataset Mendeley_2020.

FIGURE 5. ROC curves of eight traditional ML classifiers.

FIGURE 6. P-R curves of eight traditional ML classifiers.

return accurate results (high precision), as well as high posi-

VOLUME 10, 2022 124107

TABLE 4. Comparison of three popular phishing datasets.

FIGURE 7. ROC curves of five ensemble ML classifiers.

124108 VOLUME 10, 2022

TABLE 6. Performance metrics of various ensemble ML algorithms.

TABLE 7. Parameters settings for the three DL-Based models.

FIGURE 11. Accuracy vs. epochs in the 3-layers FCNN model.

Procedure from FIGURE 9 to FIGURE 12 can be seen as B. RESULT AND COMPARISON

VOLUME 10, 2022 124109

FIGURE 13. Accuracy and loss of FCNN, LSTM, and CNN.

TABLE 8. Performance metrics of RF, FCNN, LSTM, and CNN.

124110 VOLUME 10, 2022

VOLUME 10, 2022 124111

124112 VOLUME 10, 2022

VOLUME 10, 2022 124113

You might also like