A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
https://doi.org/10.1007/s12065-020-00562-x
RESEARCH PAPER
Received: 17 October 2020 / Revised: 20 December 2020 / Accepted: 30 December 2020 / Published online: 27 January 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021
Abstract
Data clustering is an important activity in the field of data analytics. It can be described as unsupervised learning for grouping
the similar objects into clusters. The similarity between objects is computed through distance measure. Further, clustering has
proven its significance for solving wide range of real-world optimization problems. This work presents water wave optimiza-
tion (WWO) based metaheuristic algorithm for clustering task. It is seen that WWO algorithm is an effective algorithm for
solving constrained and unconstrained optimization problems. But, sometimes WWO cannot obtain promising solution for
complex optimization problems due to absence of global best information component and converged on premature solution.
To address the absentia of global best information and premature convergence, some improvements are inculcated in WWO
algorithm to make it more promising and efficient. These improvements are described in terms of modified search mecha-
nism and decay operator. The absentia of global best information component is handled through updated search mechanism.
While, the premature convergence is addressed through a decay operator. The performance of WWO algorithm is evaluated
using thirteen benchmark clustering datasets using accuracy and F-score parameters. The simulation results are compared
with several state of art existing clustering algorithms and it is observed proposed WWO clustering algorithm achieves a
higher accuracy and F-score rates with most of clustering datasets as compared to existing clustering algorithms. It is also
showed that the proposed WWO algorithm improves the accuracy and F-score rates an average of 4% and 7% respectively
as compared to existing clustering algorithm. Further, statistical test is also conducted to validate the existence of proposed
WWO algorithm and statistical results confirm the existence of WWO algorithm in clustering field.
Keywords Clustering · Data analysis · Meta-heuristic algorithms · Water wave optimization · Unsupervised learning
13
Vol.:(0123456789)
760 Evolutionary Intelligence (2022) 15:759–783
converged on premature solution and also faced difficulty to other aspect of meta-heuristic algorithm is stagnant in local
partition the overlap data [16]. The hierarchical clustering optima [45]. It can be described as no change in the fit-
algorithms explore the data through tree structure, but these ness function/cluster allocation in successive iterations. It
algorithms cannot have prior information regarding number also occurs due to lack of population diversity during the
of clusters. But these algorithms are computationally exten- execution of algorithms. The local optima can affect the final
sive than partitional clustering algorithms [17]. The clusters optimal solution and also lead to premature convergence of
with arbitrary shapes are determined through density-based algorithms. Several researchers focused on well-known local
clustering algorithms and these algorithms are more efficient optima problem of clustering algorithm and presented some
to find outliers in data. Still, these algorithms are less ade- intelligent and effective solutions and strategies to overcome
quate with high dimensional data and clusters with varying this problem [48]. The initialization sensitivity can also be
densities [18]. The graph-based clustering algorithms can be considered as one of important problem related to clustering
described as to divide the vertices into k-clusters by consid- algorithm [49]. As, all clustering algorithm especially parti-
ering the edge structure of given graph and this arrangement tional clustering choose the initial cluster centers in random
considers many edges within each cluster and smaller num- order and these selected centers have great impact on the
ber of edges in between clusters [19]. These algorithms are final optimal solution. The empirical studies also considered
not suited well for dataset with large number of features, but this issue of clustering algorithm and reported promising
works efficiently with minor features [20]. In present time, solution for the same [50, 51].
optimization-based clustering algorithms get wide attention In recent time, an algorithm inspired through water waves
from research community for solving clustering problems. is developed, called water wave optimization (WWO) algo-
These algorithms are more competitive than traditional algo- rithm [52]. WWO attracts the attention of research commu-
rithms and provide more attractive solution for clustering nity due to ease of implementation and simplicity. The key
problems [21–24]. Few of these are Tabu Search (TS) [25], features of WWO algorithm are its diversity and adaptation
Simulated Annealing (SA) [26], Genetic Algorithm (GA) mechanisms which makes the algorithm more suitable for
[27], Artificial Bee Colony (ABC) [28–30], Arrhenius Arti- diverse applications. In turn, wide range of optimization
ficial Bee Colony algorithm [31], Ant Colony Optimization problems have been solved using WWO such as scheduling
(ACO) [32, 33], Particle Swarm Optimization (PSO) [34], problems [52, 53], radio cognitive system [54], global opti-
Cuckoo Search (CS) [35], Cat Swarm Optimization (CSO) mization [55], multi objective optimization [56], parameter
[36, 37], Firefly Algorithm (FA) [38, 39], Gravitational optimization of neural network [57], reactive power dispatch
Search Algorithm (GSA) [40, 41], Black Hole Algorithm problem [58], feature selection [59], congestion control and
(BH) [42], Charge System Search Algorithm (CSS) [43, quality of service in Wireless sensor networks [60], etc.
44], Teacher Learning Based Optimization (TLBO) [45], However, WWO algorithm achieves at par results for most
Artificial Chemical Reaction Optimization (ACRO) [46] and of optimization problems, but sometimes its performance is
Big Bang-Big Crunch algorithm (BB-BC) [47, 48]. Further- affected due to absentia of the global best information and
more, these algorithms consist of several inbuilt mechanisms converged to premature solutions [61, 62].
for refining promise solutions and also explore the solution
through local search and global search. The search process 1.1 Motivation and contribution of work
can be characterized through self-sustaining, dispersed and
inhabitant behavior. These features made the optimization This research aim is to address the premature convergence
algorithms more powerful than traditional algorithms and and absentia of global best information issues of WWO
competent to solve diverse problems. It is observed that algorithm. It is seen that due to aforementioned issues,
exploitation and exploration are key foundations of popu- sometimes WWO algorithm could not attain global optimal
lation and meta-heuristic algorithms. The exploitation can solution and converged on local best solution [61]. It is also
be interpreted as discovering the candidate solution near to observed that premature convergence issue occurs due to
current solution, whereas, exploration can be described as lack of balance between local and global searches. So, this
searching of new candidate solution distant from the current study also investigates the balancing factor between local
solution location. To achieve the good solution for optimiza- and global searches. Furthermore, for enhancing the bal-
tion problems, exploitation and exploration processes should ance between local and global searches and also to handle
be balanced [27]. The several other factors also consider premature convergence, a decay operator is incorporated into
for balancing the aforementioned processes such as search- WWO algorithm. The aim of decay operator is to maintain
ing the problem space, conflicts search objectives, domi- the balancing between local and global searches; and explore
nating factors of search process. Several studies have been search space effectively for addressing premature conver-
reported for effective balancing of the search processes of gence issue. The absence of global best information issue is
meta-heuristic and population-based algorithms [44]. The resolved through Particle Swarm Optimization (PSO) based
13
Evolutionary Intelligence (2022) 15:759–783 761
search mechanism. The aim of this search mechanism is to effectiveness of HHO algorithm for solving data clustering
guide the search towards the global optimal solution. Finally, problems.
the capabilities of improved WWO are explored for solv- For improving the effectiveness and efficiency of cluster-
ing data clustering problems. Several benchmarks cluster- ing, Tsal et al. [64] developed a new meta-heuristic algo-
ing datasets are taken into consideration for evaluating the rithm, called coral reef optimization with substrate layers
performance of WWO algorithm. The simulation results (CRO-SL) for handling large amount of data. The substrate
are compared with wide range of meta-heuristic clustering layers concept integrates the particle swarm optimization
algorithms. The key points of this research work are sum- (PSO) and genetic-k-means algorithm (GKA) for refining
marized as: the end results. The aim of CRO-SL algorithm is to achieve
better cluster result for big data analytics. Moreover, the pro-
• To incorporate a decay operator into WWO algorithm posed CRO-SL algorithm is implemented on cloud platform
for effectively balancing the local and global searches as for reducing the response time of data analytics. Several
well as premature convergence. state of art clustering algorithms like k-mean, GKA, PSO,
• The global search mechanism of WWO is improved simple coral reef optimization (SCRO) are picked for per-
through PSO based search mechanism. The algorithm formance comparison. Seven benchmark and two artificial
adopts decay operator to make a balance between explo- datasets are taken for implementing the CRO-SL algorithm.
ration and exploitation. The simulation results showed that proposed CRO-SL algo-
• The capability of improved WWO algorithm is explored rithm accelerates the clustering results as compared to other
for solving data clustering. algorithm using cloud platform.
• Thirteen benchmark clustering datasets are considered Kuwil et al. [65] designed a distance-based clustering
for evaluating the performance of WWO algorithm. algorithm, called critical distance clustering algorithm. The
• The statistical analysis is also performed for validating proposed algorithm considers a new objective function for
the proposed WWO algorithm. determining the similarity between data. The objective func-
tion is devised using Euclidean distance and basic statistics
1.2 Organization of paper operation of mathematics. Moreover, the proposed algorithm
can be worked with quantitative data, not qualitative, and
The remaining section of paper is organized in the following categorical data. The performance of proposed algorithm
manner. Section 2 discusses the related works in the direc- is examined over twenty-six experiments. The simulation
tion of partitional clustering algorithms along with research results proved that proposed algorithm produces the compet-
gap. Section 3 describes the basic WWO algorithm. Sec- itive clustering results as compared to k-means, DBSCAN
tion 4 gives the description of improvements and proposed and MST based clustering. It is also claimed that outliers
WWO algorithm with flowchart. Section 5 demonstrates are successfully handled through proposed distance-based
the experimental results of proposed WWO clustering algo- algorithm.
rithm. Finally, the work is concluded with future scope in Singh et al. [46] considered the slow convergence and
Sect. 6. local optima issues of clustering algorithms and developed
a new heuristic algorithm, called artificial chemical optimi-
zation (ACRO) clustering algorithm. The convergence and
2 Literature review local optima issues are addressed through position-based
operator and neighborhood operator respectively. The per-
2.1 Related works formance of ACRO algorithm is tested over five bench-
mark and two artificial datasets. The standard clustering
This section discusses the various recent related works for algorithms are taken to compare the simulation results of
partitional clustering. ACRO. The results showed that ACRO algorithm obtains
Singh [63] introduced harris hawk’s optimization (HHO) better data clustering results in terms of intra cluster distance
algorithm for solving data clustering problems in efficient and f-measure. Furthermore, Friedman statistical test is also
manner. Further, this work considers a chaotic sequence applied to validate the performance of ACRO. The results
number for guiding the search pattern of HHO algorithm of statistical test confirm the effectiveness of the proposed
and also overcomes the dependency of HHO algorithm ACRO algorithm in clustering filed.
on random numbers. The performance of HHO algorithm Baalamurugan and Bhanu [66] introduced an efficient
has been evaluated using twelve benchmark datasets and stud krill herd clustering (ESKH-C) technique to address
compared with six state of art techniques. The efficacy of data clustering in cloud environment. The objective of
proposed HHO algorithm is validated using several perfor- ESKH-C technique is to compute the optimum locations
mance measures and statistical test. These tests confirm the of clusters centers. Further, stud selection and crossover
13
762 Evolutionary Intelligence (2022) 15:759–783
operator (SSC) has been integrated into krill herd cluster- Tarkhaneh and Moser [70] developed an improved dif-
ing algorithm to make it more efficient. The SSC operator ferential evolution (IDE) algorithm for data clustering. IDE
is inspired through genetic reproduction process and aim of algorithm integrates Archimedean spiral, Mantegna levy
this operator is to improve the convergence rate. The seven distribution, and neighborhood search (NS) for effective
experiments are conducted for measuring the efficacy of cluster analysis, called adaptive differential evolution with
ESKH-C algorithm. The simulation results are compared neighborhood search (ADENS). Twelve experiments are
with k-means, PSO, ant colony optimization (ACO) and bac- conducted to investigate the performance of ADENS algo-
terial foraging algorithm (BFO) algorithms. The simulation rithm. The results showed that ADENS algorithm archives
results demonstrated that ESKH-C algorithm effectively superior clustering results as compared to other algorithms.
works with different clusters number, densities and multi- Further, Wilcoxon and Friedman statistical tests are also
dimensional datasets. considered to validate the ADENS algorithm. The statisti-
Sharma and Chhabra [67] considered the lack of equiv- cal results support the existence the ADENS algorithm for
alence between exploration and exploitation processes of cluster analysis.
clustering algorithms and developed a new clustering algo- The random selection of initial cluster centers incurs the
rithm inspired through PSO and polygamous crossover, premature convergence sometimes, especially in cluster-
called PSOPC. The seven standard clustering datasets are ing algorithms. Agbaje et al. [50] investigated the afore-
taken to implement the PSOPC algorithm and simulation mentioned issue of clustering algorithm by combining
results are compared with PSO, genetic algorithm (GA), firefly algorithm (FA) and PSO algorithm, called FAPSO.
differential evolution (DE), firefly algorithm (FA) and grey In proposed FAPSO algorithm, initially FA algorithm is
wolf optimization (GWO). The cluster distance, cluster implemented to start the initial search and further, the PSO
quality and convergence rate measures are adopted to evalu- algorithm is employed for obtaining the optimal solution.
ate the performance of PSOPC algorithm. It is stated that The robustness of the proposed algorithm is assessed over
PSOPC algorithm outperforms in context of aforementioned twelve benchmark datasets and compared with four stand-
measures. ard benchmark clustering algorithms. The simulation results
Abdulwahab et al. [68] examined the exploration issue illustrated that FAPSO having advantage over other cluster-
of clustering algorithms and designed an effective cluster- ing methods in terms of DB index and CS index.
ing algorithm, called Levy Flight Black Hole (LBH). The Zhou et al. [49] investigated the dependency of K-means
LBH algorithm is the combination of levy flight concept and algorithm on initial solution and found that algorithm could
black hole optimization algorithm. Authors stated that black be stuck in local optima if initial solution is not explored. In
hole (BH) algorithm provides superior results with bench- this work, authors propose an efficient algorithm inspired
mark datasets, but having lack of exploration capabilities through symbiotic organism search (SOS) for data cluster-
for few datasets. In turn, BH algorithm cannot explore the ing. Ten experiments are conducted to examine the efficacy
search space away from the current black hole. Hence, this of SOS algorithm and simulation results are compared with
shortcoming of BH algorithm is handled through levy flight DE, cuckoo search (CS), flower pollination algorithm (FPA),
concept and aim of this concept is to increase the step size PSO, artificial bee colony (ABC), multi-verse optimizer, and
for the movement of star so that large search space can be k-means. Authors claimed that SOS algorithm generates
explored for solution search. The performance of LBH algo- more stable clustering results.
rithm is evaluated using six standard dataset and compared Aljarah et al. [71] inspected the trap in local optima and
with several clustering algorithms. The results indicated that premature convergence issues of grey wolf optimizer (GWO)
LBH algorithm displays robust clustering results. clustering algorithm. Authors stated that the aforementioned
Mustafa et al. [69] taken into consideration the equiv- problems are occurred due to large number of variables. So,
alency issue of exploration and exploitation processes of to address local optima and premature convergence issues, a
clustering algorithm and developed an adaptive memetic dif- tabu search (TS) method is incorporated in GWO algorithm,
ferential evolution (ADME) optimization algorithm for data called TSGWO. The performance of proposed TSGWO is
clustering. The ADME algorithm contains the advantages of evaluated using thirteen benchmark clustering datasets and
memetic algorithm and adaptive differential evolution (DE) compared with several existing clustering algorithms. The
algorithm. In ADME algorithm, adaptive differential evolu- simulation results showed that TSGWO achieves better con-
tion mutation strategy is employed in memetic algorithm vergence rate as compared to same class of algorithm and
for better compromise between local and global searches. successfully overcomes the local optima issue.
The experimental results specified that ADME algorithm Zhu et al. [72] considered the shortcomings of bat algo-
can obtain more accurate clustering results than ME and DE rithm such as stagnation in local minima and accuracy, and
algorithms. Furthermore, the statistical test also validates the developed an improved bat algorithm for effective cluster
proposed ADME algorithm in clustering filed. analysis. To alleviate the shortcomings of bat algorithm,
13
Evolutionary Intelligence (2022) 15:759–783 763
two improvements are incorporated for improving global results are compared with FPA, CS algorithm, BH algo-
and local optimum abilities. The global optimum ability rithm, BA, PSO, FA and ABC clustering algorithms. The
is improved using Gaussian based convergence factor and efficacy CFPA is measured using cluster integrity, execution
five different convergence factors. Whereas, local optimum time, number of iterations to converge (NIC) and stability
ability is enhanced using whale-based optimization and sine performance measures. The simulation results showed that
position updating mechanism. The seven clustering data- CFPA algorithm achieves better clustering results than other
sets are adopted for evaluating the performance of improved algorithms in terms of cluster integrity and execution time.
bat algorithm. The simulation results showed that proposed Xie et al. [76] investigated the sensitivity towards ini-
improvements enhance the accuracy of bat algorithm in sig- tial clusters and local optima problems of clustering algo-
nificant manner. rithms and provide the solution by developing two variants
Kushwaha et al. [51] investigated the choice of initial of FPA algorithm, called inward intensified exploration
cluster issue of k-means algorithm and propose electromag- FPA (IIEFPA) and compound intensified exploration FPA
netic clustering algorithm (ELMC) to address this problem. (CIEFPA). Further, the matrix-based search parameters and
The ELMC is an improved version of electromagnetic filed dispersing mechanisms are incorporated for improving the
optimization algorithm. Furthermore, the diversity of ELMC global and local searches of proposed variants of FPA. The
algorithm is maintained through concept of attraction–repul- fifteen datasets are adopted to assess the effectiveness of
sion. A series of experiments are conducted to measure the FPA variants. The simulation results showed that proposed
efficiency of proposed ELMC algorithm and simulation FPA variants exhibit superior clustering results as minimum
results are compared with ACO, KFCM, KFC, PCM, and distance and higher accuracy rate than other algorithms.
standard k-means clustering algorithms. It is observed that Huang et al. [77] developed a memetic clustering algo-
ELMC algorithm achieves more stable clustering results rithm based on PSO and GSA, called the memetic parti-
than other algorithms. cle gravitation optimization (MPGO) algorithm. The aim
Senthilnath et al. [73] adopted flower pollination algo- of this algorithm is to perform efficient search and faster
rithm (FPA) for addressing the data clustering problem. The convergence. The important aspects of MPGO algorithm
FPA algorithm is inspired through pollination process of are highlighted as hybrid operation and enhanced diversity
flower. The objective of FPA algorithm is to compute the mechanism. The performance of MPGO is evaluated on six
optimal cluster centers. The performance of FPA is evalu- benchmark clustering datasets and compared with K-means,
ated using three clustering databases and compared with PSO, GSA], BH algorithm and WOA algorithm. The simula-
GA, PSO, CS, spider monkey optimization (SMO), GWO, tion results stated that MPGO outperforms than other algo-
DE, harmony search and bat algorithm. The simulation rithms in terms of better fitness function and more accurate
results demonstrate that FPA algorithm having minimum clustering rate.
classification error as compared to rest of algorithms. Fur- Dinkar and Deep [78] addressed the local optima and
thermore, statistical test is also employed to validate the slow convergence issues of clustering algorithm and develop
effectiveness of FPA clustering algorithm. The statistical an improved ant lion optimization (ALO) algorithm, called
test proved that FPA is an effective algorithm to deal data OB-C-ALO, for performing data clustering in efficient
clustering problem. manner. To make the algorithm more competitive, two
Mageshkumar et al. [74] developed a hybrid meta-heu- amendments are integrated into ALO algorithm. These
ristic algorithm for improving the efficacy of data cluster- amendments are i) employ of Cauchy mutation operator for
ing. The proposed hybrid algorithm integrates the important handling problem of local minima, and ii) utilizes opposi-
capabilities of ant lion optimization (ALO) algorithm with tion-based learning for addressing slow convergence rate.
ant colony optimization (ACO) algorithm, called ACO- The six experiments are conducted to evaluate the perfor-
AOL. Furthermore, local minima problem overridden mance of OB-C-ALO clustering algorithm. The simulation
through Cauchy mutation operator. The four experiments results are compared with AO and C-ALO clustering algo-
are conducted to evaluate the performance of ACO-AOL rithm. It is noticed that OB-C-ALO algorithm obtains more
algorithm and compared with K-means and ACO clustering promising result in terms of distance than others.
algorithm. The simulation results showed that ACO-AOL To improve the global search mechanism, Abualigah
algorithm obtains superior clustering results. et al. [79] proposed hybrid algorithm (H-KHA) for solving
Kaur et al. [75] examined the local optima and slow con- data clustering and text clustering problem. In this work,
vergence issues of K-means algorithm, especially for larger krill herd (KH) algorithm is hybridized with harmony
datasets and develop a new clustering algorithm based on search (HS). The distance factor of HS has been adopted
chaos optimization and flower pollination algorithm, called for improving the global search mechanism in KH. The per-
chaotic FPA (CFPA). The sixteen datasets are considered to formance has been evaluated on both seven data clustering
tested the performance of CFPA algorithm. The simulation datasets and six text document datasets. It is noticed from the
13
764 Evolutionary Intelligence (2022) 15:759–783
results that proposed algorithm achieves state-of art results the problem. It can be employed for solving multi-objective
in terms of accuracy and convergence rate. The statistical problems, and unsolved optimization problems by GSO. It
analysis shows highest rank for H-KHA using F-measure as can be modified for real-world problems.
compared to other clustering and optimization algorithms A comprehensive review of multi-verse optimizer algo-
in comparison. The work can be extended by using other rithm (MOA) from March-2015 till April- 2019 was pre-
powerful local search approaches, handle different clustering sented by Abualigah [83]. The various features, advantages,
problem and evaluation of H-KHA on benchmark function disadvantages and its applications have been discussed. The
datasets. study has also discussed various variants of MOA including-
A hybrid proposal distribution method for pattern rec- binary, hybridized, modified, multiobjective and chaotic.
ognition was developed by Zeng et al. [80]. The aim is to The performance of MOA has been evaluated for unimodal
estimate segment test and control lines accurately from and multimodal functions; and compared to other related
gold immunochromatographic strip (GICS) images. A new approaches. It has been depicted that MOA gives high explo-
dynamic state-space model has been adopted to describe ration ability along with adjustable convergence rates. The
relation between contour points on upper and lower bound- study presents various future directions. It can be modified
aries using transition equation for test, and control lines. to solve real-world optimization problems and unsolved
With uniformity measure and class variance, new obser- optimization problems. Further, it can be hybridized with
vation equation has been developed. Further, deep-belief- other methods like hill climbing, differential evolution for
network-based particle filter (DBN-PF) has been adopted significant improvement of results.
to fins initial recognition and regions of high likelihood. Zeng et al. [84] developed a framework for diagnosis
The experimentation has been done using artificial data- of Alzheimer’s disease (AD) and mild cognitive impair-
set and GICS image. From the results, it is evaluated that ment (MCI). The model consists of pre-processing mag-
proposed approach has resulted in superior performance for netic resonance (MRI) images, feature extraction, principal
GICS images and significant improvement has been shown component analysis, and support vector machine (SVM).
in terms of several indices. Further, this can be used with The proposed model has adopted switching delayed with
other approaches to combat the problems of particle filters. PSO(SDPSO) in order to optimize SVM parameters. The
Zeng et al. [81] proposed a dynamic-neighborhood-based proposed model has been evaluated over MRI scans taken
switching PSO (DNSPSO) algorithm for improving explora- from Alzheimer’s Disease Neuroimaging Initiative (ADNI)
tion ability. The proposed algorithm has adopted dynamic dataset for classification od AD and MCI. The results dem-
neighborhood strategy to adjust personal and global best onstrate that the developed framework gives the classifica-
positions to overcome premature convergence problem. A tion accuracies of 69.2308% for stable MCI (sMCI) vs pro-
new learning strategy has been developed to select accel- gressive MCI (pMCI), 81.25% for Normal Control (NC) vs
eration coefficients, and update velocity in order to search AD, 76.9231% for NC vs sMCI, 85.7143% for NC vs pMCI,
the complete search space. Further, differential evolution 71.2329% for sMCI vs AD and 57.1429% for pMCI vs AD.
method has been utilized for improving the diversity of The proposed method can be investigated with deep learn-
PSO. The performance has been evaluated using fourteen ing methods and also positron emission tomography (PET)
benchmark functions. The experimental results have demon- image can be considered for AD diagnosis problems.
strated that proposed algorithm gives 100% successful rate
and overall ranks second for success performance on all 2.2 Research gaps
benchmark functions. The DNSPSO can be applied to other
research areas such as self-organizing RBF neural network, This subsection focuses on the research gaps in the existing
moving horizon estimation etc. studies. In related works section, twenty recently published
Abualigah [82] provided a comprehensive survey of research articles are discussed to determine the research gaps
group search optimizer (GSO). The various variants of GSO, and all these articles are published in journals of repute.
results and its applications have been discussed from the Through literature review, it is observed that data clustering
year 2009 to 2020. From the set of candidate solution, GSO problem attracts the wide attention from research commu-
algorithm is able to find best solution. It helps to determine nity and large number of algorithms are reported for solv-
maximum or minimum objective function to solve the opti- ing the data clustering problem in efficient manner. Further,
mization problem. From the survey, it has been noticed that Table 1 demonstrates the pros and cons of the metaheuris-
GSO is competent and gives promising results as compared tic algorithms as well as clustering problem. The several
to similar optimization algorithms. The study concludes that points are noted regarding clustering algorithms and data
significant improvement can be done in the performance clustering problem and can be summarized as (1) recently,
by enhancing GSO algorithms with different modifica- meta-heuristic algorithms take over the traditional algorithm
tions, improvements, hybridizations or as per the need of for solving data clustering problems, (2) meta-heuristic
13
Table 1 Summarization of related works
Author Name Problem Identified Methods/Findings Data Sets Future Work
Singh [63] Dependency on random numbers Proposed harris hawk’s optimization Jain, Flame, Compound, D31, R15, For real-world application
Improving the performance (HHO) algorithm. Aggregation, Path-based, Spiral, Multi-objective version
To handle large data Adopted a chaotic sequence of number Wine, Glass, Iris, Yeast
for guiding the search pattern of
HHO algorithm.
Tsai et al. [64] To handle large data Coral reef optimization with substrate Iris, Wine, Breast Cancer Wisconsin, Dynamic parameter setting for conver-
Better cluster result layers (CRO-SL) was developed for HTRU2, Spambase, User Locations gence
handling large amount of data. Finland, Abalone, c20d6n2000, Hybridize with other metaheuristics
The substrate layers concept integrates c20d6n200000 algorithms
the particle swarm optimization Speed up of Graphics processing unit
Evolutionary Intelligence (2022) 15:759–783
13
765
Table 1 (continued)
766
13
Tarkhaneh and Moser [70] Convergence Speed Proposed Adaptive Differential Evolu- Iris, Contraceptive method choice Can be enhanced through quantum or
Balance between exploration and tion with Neighborhood Search (denoted as CMC), Wisconsin breast chaotic theory.
exploitation (ADENS). cancer (WBC), Diabetes, Wine, Adopt more diversified approach for
Adopted new mutation strategy by Yeast, Vehicle, Letters, Liver, Iono- solutions.
combining Archimedian Spiral (AS) sphere, Heart, and Cars Considered other applications like text
with Mantegna Levy flight for robust and image clustering
solutions.
Self-adaptive strategy is applied for
tuning control parameters.Initializa-
tion methodology is applied
Agbaje et al. [50] Premature convergence Combined firefly algorithm (FA) and Breast, Compound, Flame, Glass, Iris, Can be enhanced using Levy flight by
PSO algorithm, called FAPSO.FA Jain, Path based, Spiral, Statlog, Thy- reducing foraging time.
algorithm is employed for initial roid, Two moons, Wine, Yeast Local search approaches can be
search. employed for solving real world clus-
PSO algorithm is employed for obtain- tering problems.
ing the optimal solution. Can be employed in other engineering
optimization problems.
Zhou et al. [49] Local optima SOS algorithm is applied to solve the Artificial Dataset one, Artificial Data- Can be extended for dynamically deter-
Sensitive to initial selection data clustering problem. set Two, Iris, Wine, Seeds, Statlog, mine optimal number of clusters.
New equations have been given for Breast Cancer Wisconsin (Original), Can be enhanced to solve complex clus-
mutualism and commensalism CMC, Haberman’s Survival, Balance tering problems by combining with
phases.Adopted parasite vector in the Scale other approaches.
parasitism phase Can be employed for other engineering
applications.
Aljarah et al. [71] Trapped in local optima Proposed hybrid approach to enhance Iris, blood, breast cancer, glass, seeds, Analyzing spatial data and other syn-
Premature convergence the efficiency and balance between wine, Australian, diabetes, Haber- thetic datasets.
Balance between exploration and exploratory and exploitative behav- man, heart, liver, planning index, Can be enhanced using Parallel
exploitation iors of the GWO algorithm. tic-tac-toe approach for reducing run time.
Tabu search has been employed as an
operator in GWO
Zhu et al. [72] Trapped in local optima Proposed an improved bat algo- Iris, Wine, Bupa, Seeds, Heartstatlog, Can be enhanced for handling unstable
Improving accuracy rate rithm To improve global search, a WDBC, and Wisconsin breast cancer search.
Gaussian-like convergence factor has
been added.
Five more different convergence fac-
tors have been proposed for global
optimization.
Hunting mechanism from whale opti-
mization algorithm (WOA) and the
sine position updating strategy are
adopted for enhancing local optimiza-
tion.
Evolutionary Intelligence (2022) 15:759–783
Table 1 (continued)
Author Name Problem Identified Methods/Findings Data Sets Future Work
Kushwaha et al. [51] Problem of initial choice of clusters Introduced an enhanced variant of elec- Iris, Vowel, Crude oil, Thyroid, IONO Can be extended in the fields of image
Local optima tromagnetic field optimization (EFO) (Ionosphere database), GAS, Human and text.
that is electromagnetic clustering Activity Recognition, and Contracep-
algorithm (ELMC) for clustering. tive Method Choice (CMC)
Utilizes the attraction–repulsion con-
cept of the EFO algorithm to main-
tain the diversity of the population,
making it less vulnerable towards the
initial choice of centroids.
Senthilnath et al. [73] Prior information for optimal cluster Flower Pollination Algorithm (FPA) Image segmentation, vehicle, glass, –
Evolutionary Intelligence (2022) 15:759–783
centers based approach is developed for data Crop Type and Synthetic datasets for
clustering. test samples
FPA is used to minimize the objec-
tive function to obtain the optimal
solutions for the locations of cluster
centers.
Kumar et al. [74] Balance between exploration and Proposed new hybrid ACO-ALO Zoo, iris, glass, wine Can be hybrid using neural based algo-
exploitation algorithm. rithm for better parameter selection.
Trapping in local minima Employed Cauchy’s mutation operator
Reduce the intra cluster distance to avoid local minima.
Kaur et al. [75] Local optima Proposed a hybridized algorithm using Can be employed using other
Iris, Wine, Breast Cancer, Glass, Bal-
Slow convergence for large datasets Chaos optimization and flower pol- ance, Dermatolgy, Haberman, Ecoli, metaheuristics like krill herd algo-
lination over K-means. rithm, spider monkey agent ETC.
Heart, Spambase, Ilpd, Leaf, Libras,
Qualitative Bankruptcy Synthetic Can be enhanced for constraint handling
problems.
Xie et al. [76] Initialization sensitivity Proposed two variants of the Firefly ALL-IDB2 database, a skin lesion data Can be enhanced using other objective
Local optima trap Algorithm (FA)- (i)inward intensi- set, Sonar, Thyroid, Ozone, Iris, Wis- functions.
fied exploration FA (IIEFA) and, (ii) consin breast cancer diagnostic data Can be used for other optimization
compound intensified exploration FA set (Wbc1), Wine, Wisconsin breast problems like discriminative feature
(CIEFA). cancer original data set (Wbc2, Bal- selection, evolving deep neural net-
Matrix-based search parameters and ance, and E. coli. work generation etc.
dispersing mechanisms are incor-
porated to enhance exploration and
exploitation.
Adopted minimum Redundancy
Maximum Relevance (mRMR)-based
feature selection method to reduce
feature dimensionality.
13
767
Table 1 (continued)
768
13
Huang et al. [77] Convergence rate Proposed a hybrid memetic clustering Iris, Wine, Breast cancer, Car evalua- Computed tomography (CT) image
Accuracy algorithm based on PSO and GSA tion, Statlog, and Yeast; 52 bench- enhancement using MPGO.
names as memetic particle gravitation mark function, six images for image Improve computing performance by
optimization (MPGO). segmentation dimension or pattern reduction.
Hybrid operation and diversity Can be enhanced using levy flight for
enhancement are important features improving diversity.
of MPGO. For automatic object tracking and, text
PSO helps in exchange of individuals recognition MPGO can be imple-
and GSA uses enhancement operator mented on Raspberry Pi.
to enhance diversity of system.
Dinkar and Deep [78] Local optima Proposed opposition-based ALO using Iris, Glass, Wine, CMC, LD, WBC and –
Slow convergence Cauchy distribution (OB-C- ALO). 21 benchmark test functions
Adopted random walk based on
Cauchy distribution for addressing
local optima problem.
Employed opposition-based learning
model along with acceleration coef-
ficient.
Abualigah [79] Exploration ability Proposed a novel hybrid of KH CMC, Iris, Vowel, Seeds, Cancer, Can be investigated on test functions.
Premature convergence algorithm with harmony search (HS) Glass, WineText documents are: Can be used for other clustering prob-
algorithm as H-KHA. Classic4, Classic4, Reuters21578, lems.
Adopted distance factor from HS to 20Newsgroup, Reuters21578, Can be hybrid with other local search
improve the global search mechanism 20Newsgroup methods.
in KH.
Zeng et al. [81] Exploration ability Proposed a dynamic-neighborhood- Fourteen benchmark functions Can be employed in areas such as mov-
Premature Convergence based switching PSO (DNSPSO) ing horizon estimation, self-organizing
algorithm for improving exploration RBF neural network etc.
ability.
The proposed algorithm has adopted
dynamic neighborhood strategy
to adjust personal and global best
positions to overcome premature
convergence problem.
Differential evolution method has been
utilized for improving the diversity
of PSO.
Evolutionary Intelligence (2022) 15:759–783
Evolutionary Intelligence (2022) 15:759–783 769
algorithms provides more effective and promising solution As deep-water waves have low wave heights and long
for data clustering than traditional algorithm due to inbuild wavelengths. Similarly, shallow water waves have low
local and global search mechanisms, (3) hybridization of wave heights and short wavelengths. So, wavelength
different algorithms improve the performance as compared decreases if the wave moves from deep water to shallow
to stand alone algorithms, and (4) both simulation and sta- water. The wavelength (λ) of each wave is calculated using
tistical results are considered to validate the performance Eq. 2.
of clustering algorithms. Apart from these benefits, several
(2)
−(f(X)−fmin+ε)
issues are also associated with meta-heuristic algorithms λ = λ × α (fmax−fmin+ε)
such as (1) balance factor between local and global search,
where fmax is maximum and fmin is minimum fitness value
(2) stagnation in local minima, (3) sensitive to initial clus-
within the current population, α is wavelength reduction
ters selection, and (4) premature convergence due to lack of
coefficient parameter, and ε is a small constant used to avoid
diversity in population. These issues affect the performance
division-by-zero, f(X) is the fitness of wave X. This helps the
of meta-heuristic algorithm especially for solving data clus-
propagation of higher fitness waves within smaller ranges
tering problems. Lot of works have been reported on local
and smaller wavelengths.
minima and effective balancing between local and global
The refraction operator is adopted when waves height
search mechanisms. However, this work addresses the pre-
decreases to zero. The new wave (X’) is calculated using
mature convergence and global best information issues of
a Gaussian function described in terms of mean and stand-
clustering algorithm. The premature convergence problem
ard deviation.
will be alleviated through a decay operator, whereas, global
best information will be incorporated through an improved X� = Gaussian(μ, σ) (3)
solution mechanism. This work introduces a water wave
optimization (WWO) algorithm for solving data clustering In Eq. 3, μ can be described as Mean, whereas σ can
problem in effective manner and the capability of WWO be defined as standard deviation and these are computed
algorithm is improved through decay operator and improved using Eqs. 4, 5.
solution search mechanism. Xbestd + Xd
μ= (4)
2
Xbestd − Xd
3 Water wave optimization σ= (5)
2
This section presents the basic Water Wave Optimization The mean (μ) is computed using present wave (X) and
(WWO). WWO is a metaheuristic algorithm inspired by best wave (X bestd). The standard deviation ( σ ) can be
water wave theory for solving global optimization problems described as difference between the best wave ( Xbestd) and
[52]. WWO considers solution space similar to a seabed present wave (X). Further, the wave height is reset to hmax,
area where each solution represents a “wave” using height and wavelength is set using Eq. 6.
(h) and wavelength (λ). The fitness of each wave is meas-
ured using seabed depth and shorter distance to still water f(X)
λ� = (6)
level represents the higher fitness. The population of WWO f(X�)
algorithm is described in terms of waves and each wave is In Eq. 6, λ� is the wavelength of next wave, f(X�) is
represented through h max and λ equal to 0.5. In WWO, three the fitness of the new wave (X’), f(X) is the fitness of the
operations are defined for attaining global optimum such old wave (X) and λ is the previous wavelength. In WWO,
as propagation, refraction, and breaking at each iteration. breaking operator breaks the wave (X), when it attains the
In propagation operation, a new wave (X’) is created using better location than the current best solution (Xbest). The
displacement at each dimension (d) for each wave (X) and solitary wave (X’) is computed using Eq. 7.
added to the original wave as given in Eq. 1.
X� = X + Gaussian(0, 1) × β × Ld (7)
X� = X + rand (−1, 1) × λ × Ld (1)
where β denotes the breaking coefficient, Gaussian(0, 1)
where rand is random function for generating random num- generates the random number in the range of 0 and 1. If the
bers in specified range and Ld is the length for dth dimen- wave X’ is better than X, then it replaces X. The pseudocode
sion of search space. The fitness ( f(X�)) of new wave (X’) is of WWO is mentioned in Algorithm 1.
higher than the fitness ( f(X)) of old wave (X), then replace
old wave (X) by new wave (X’) and reset height to h max;
Otherwise, wave height is decreased by one.
13
770 Evolutionary Intelligence (2022) 15:759–783
4 Proposed WWO clustering algorithm improvements in the WWO algorithm for addressing weak
global search mechanism and premature convergence are
This section presents a water wave optimization-based discussed below.
clustering algorithm for data clustering problems and also
highlights the modification incorporated in WWO algo- 4.1.1 Decay operator
rithm. Subsection 4.1 presents the proposed modifications
in WWO, while, the steps of WWO clustering algorithm and The issue associated with the WWO algorithm is premature
flowchart is mentioned in subsection 4.2. convergence. In the refraction phase, the height of the wave
is continuously decreased and suddenly, it becomes zero. In
turn, the algorithm converges without attaining the global
4.1 Proposed modifications best solution. Further, it is observed that WWO algorithm
is not explored the entire search space effectively due to
WWO consists of three phases i.e., propagation, refraction,
small step size and in turn, local search and global search
and breaking. WWO algorithm having a strong local search
become imbalance. Hence, to overcome the aforementioned
mechanism, but the global search mechanism is weak [62].
problems, a decay operator is integrated with the location
Through literature, it is revealed that the PSO algorithm
updated equation of wave in the refraction phase and also to
consists of a strong global search mechanism [85]. Hence,
increase the step size. The updated location search equation
to improve the global search mechanism of WWO and also
is given as:
determine the optimal solution, an updated solution search
( ∗ )
equation inspired through PSO is designed for the propa- ∗
X + X ||X − X|| [ ]
gation phase of the WWO algorithm. It is also observed X� = N , × (1 − ρ) + ΔX (8)
2 2
that the WWO algorithm is also suffered from premature
convergence problems due to the refraction phase [61]. In In Eq. 8, ρ denotes as decay operator and it can be defined
the refraction phase, wave heights decrease continuously, as ρ ∈ [0, 1] and ΔX represents the difference between two
suddenly tends to zero and algorithm converges without consecutive waves, X represents the current wave and X∗
obtaining the optimal solution. So, to overcome the prema- denotes local best wave.
ture convergence problem, a decay operator is proposed in
the refraction phase of the WWO algorithm. The proposed
13
Evolutionary Intelligence (2022) 15:759–783 771
This research work applies improved WWO for solving data 4.2 Computational steps of improved WWO
clustering problems. In data clustering, a dataset consists of algorithm
n number of data objects such as (X = X1 , X2 , X3 … Xn ) and
d number of dimensions i.e., ( X = X1.1 , X1,2 , X1,3 … X1,d ). The algorithmic steps of the proposed WWO algorithm are
Each data object is repersented using Xi,j , where Xij repre- mentioned in Algorithm 2, while flowchart of proposed
sents ith object with jth dimension. The vector representation WWO algorithm for cluster analysis is shown in Fig. 1.
of data is given as Xi,j = Xi,1 , Xi,2 … Xi,3 ..Xi,d . The objec-
tive of data clustering is divided the dataset into K distinct
1 Iris 3 150 4
2 Wine 3 178 13 Table 3 Parameter setting for Parameter Value
3 Vowel 6 871 3 proposed WWO
4 Balance 3 625 4 hmax 12
5 Glass 7 214 9 α 1.2
6 CMC 3 1473 9 kmax 12
7 Thyroid 3 215 5 β Linearly
8 Dermatology 6 358 34 decreases from
0.25 to 0.01
9 BC 2 683 9
n 10
10 WDBC 2 569 30
11 LD 2 345 6
12 Heart 2 270 13
13 Diabetes 2 768 8
13
772 Evolutionary Intelligence (2022) 15:759–783
13
Evolutionary Intelligence (2022) 15:759–783 773
5 Experimental results and discussion datasets. These datasets were extracted from the UCI reposi-
tory. Table 2 presents descriptions of these datasets. The pro-
This section presents the experimental results of the proposed posed algorithm has been implemented on window operating
WWO clustering algorithm. The performance of the proposed system in MATLAB 2010a on corei5 processor with 4 GB.
algorithm is assessed over thirteen benchmark clustering Table 3 gives the parameter setting for proposed WWO.
Bold indicates best values obtained by the algorithms for particular dataset
13
774 Evolutionary Intelligence (2022) 15:759–783
Whereas the parameter values for the compared algorithms the true label of an object “i” to cluster “c” is matched with
are taken the same as reported in corresponding literature Fur- cluster label using the map function. Clustering results are
ther, accuracy and f-score parameters are employed to evaluate accurate when a high value of accuracy is obtained.
the performance of the proposed algorithm. The experimen- n
∑
tal results are compared with various existing meta-heuristic Accuracy = δ(Truelabel, map(c))∕n (11)
clustering algorithms. Results are taken as an average of thirty i=1
independent runs.
F-Score It is computed as the harmonic mean of precision
5.1 Performance measures and recall for testing the accuracy of the algorithm. Preci-
sion is calculated using the number of true positive results
This subsection describes performance measures to evalu- divided by the number of all true positive results. The recall
ate the proposed WWO clustering algorithm. Accuracy and represents the number of true positives divided by the num-
f-score are taken as performance measures. ber of all relevant results.
Accuracy It determines the correctness of an algorithm as
compared to true class labels. Accuracy can be described as
Bold indicates best values obtained by the algorithms for particular dataset
Bold indicates best values obtained by the algorithms for particular dataset
13
Evolutionary Intelligence (2022) 15:759–783 775
Bold indicates best values obtained by the algorithms for particular dataset
13
776 Evolutionary Intelligence (2022) 15:759–783
13
Evolutionary Intelligence (2022) 15:759–783 777
Fig. 3 a–e Demonstrate clustering of data objects using WWO based clustering algorithm for Iris, Wine, Vowel, Balance, and Glass datasets
13
778 Evolutionary Intelligence (2022) 15:759–783
Table 8 Average ranking of proposed WWO and benchmark cluster- Table 11 Average ranking of proposed WWO and hybrid variants of
ing algorithm using an accuracy parameter clustering algorithm using an accuracy parameter
FCM PSO K-means GA WWO Proposed WWO Fuzzy-PSO KFCM Fuzzy-MOC PSO-GA Proposed WWO
3.92 3.69 5.23 4.08 2.92 1.15 3.75 4.42 2.92 2.67 1.25
Table 9 Results of the Friedman test on benchmark clustering algo- three clusters i.e. normal, hyperthyroidism, and hypothy-
rithms using an accuracy parameter roidism. Figure 2e illustrates dermatology disease and it is
Method Statistical p-Value Degree Critical Hypoth- observed that the proposed clustering algorithm divides it
Value of free- Value esis into 6 different clusters (1) psoriasis, (2) saboreic, (3) lichen,
dom (4) pityriasis, (5) chronic and (6) pityriasis. It is being
Fried- 25.625455 1.62E−06 5 11.070504 Rejected observed from results the proposed clustering algorithm
man well separated the dermatology data into 6 clusters. Fig-
Test ure 2f shows the CMC disease data. It is observed that the
proposed clustering algorithm determines all three clusters
in CMC dataset i.e. (1) no use, (2) long term, and (3) short
Wine, Vowel, and Glass datasets than other clustering algo- term. Figures 2g, h illustrate the cancer and breast cancer
rithms. For rest of datasets, the proposed WWO algorithm datasets. It is observed that the proposed WWO clustering
does not obtains higher accurate results, but still competitive algorithm effectively analyze the both datasets. Hence, it can
as compared to most of clustering algorithms. The PSO-GA be stated that the proposed WWO clustering algorithm is an
algorithm achieves higher f-measure rate (0.837) than other efficient for analyzing healthcare dataset.
clustering algorithms for thyroid dataset, but the proposed Figure 3a–e demonstrates the clustering of data objects
algorithm obtains second highest f-measure rate (0.812). based on the WWO based clustering algorithm on non-
For iris and balance datasets, Fuzzy-PSO and Fuzzy-MOC healthcare datasets. Figure 3a considers the iris dataset and
clustering algorithms obtains better f-measure rates (0.795 proposed WWO algorithm groups data into 3 clusters, (1)
and 0.749) respectively, while proposed WWO algorithm setosa (2) versicolour and (3) virginica. Figure 3b, c depict
achieves 0.793 and 0.744 f-measure rates. Hence, it is stated the clustering of wine and balance datasets. The proposed
that proposed algorithm is also a competitive algorithm for algorithm categorizes the wine dataset into three clusters
such datasets. It can be stated that the proposed WWO clus- i.e. (1) A, (2) B and (3) C. Data objects of balance dataset
tering algorithm is one of the robust, viable, and efficient are also divided into three clusters which are B, L, and R.
technique for analyzing benchmark clustering datasets. Figure 3d demonstrates the clustering results of vowel data-
Figure 2a–h illustrates the categorization of healthcare set. The proposed WWO algorithm divides the data objects
data using the proposed WWO clustering algorithm. Fig- into seven clusters i.e. (1)/i/, (2)/e/, (3)/δ/, (4)/a/, (5)/o/, and
ure 2a considers the diabetes dataset and proposed clustering (6)/u/. It is also observed that the proposed algorithm sepa-
algorithm groups data into 2 clusters (i) diabetes and (ii) rates data objects of the vowel dataset effectively. Figure 3e
non-diabetes clusters. Figures 2b, c illustrates liver and heart illustrates clustering of glass dataset. The proposed algo-
diseases dataset in various cluster groups. It is observed that rithm divides data objects into seven different clusters. It is
the proposed algorithm is capable to determine the clusters also observed that one cluster is linearly separable from the
of healthy and non-healthy patients. Figure 2d demonstrates other six clusters. Whereas, the rest of the six are non-line-
the thyroid disease dataset. The proposed clustering algo- arly separable. It is stated that the proposed algorithm effec-
rithm significantly divides the thyroid disease dataset into tively performs the clustering of data objects into different
Table 10 Results of post hoc Techniques FCM PSO K-means GA WWO Pro-
test on benchmark clustering posed
algorithm using accuracy WWO
parameter
FCM NA = + = = +
PSO = NA + = = +
K-means + + NA + + +
GA = = + NA = +
WWO = = + =` NA +
Proposed WWO + + + + + NA
13
Evolutionary Intelligence (2022) 15:759–783 779
Table 12 Results of the Friedman test on hybrid clustering algo- Table 15 Results of the Friedman test on benchmark clustering algo-
rithms using an accuracy parameter rithms using F-score parameter
Method Statistical p-Value Degree of Critical Hypoth- Method Statistical p-Value Degree Critical Hypoth-
Value freedom Value esis Value of free- Value esis
dom
Friedman 31.76834 2.13E−06 4 9.487729 Rejected
Test Fried- 35.462555 1.62E−06 5 11.070504 Rejected
man
Test
Table 13 Results of post hoc test on hybrid variants of clustering
algorithm using accuracy parameter
Techniques Fuzzy-PSO KFCM Fuzzy-MOC PSO-GA Pro- the performance of the proposed WWO based clustering
posed algorithm and the rest of the clustering algorithms.
WWO Further, post hoc test is conducted to find out from where
Fuzzy-PSO NA = + + + the differences actually came. Table 10 shows the results of
KFCM = NA + + + the post hoc test on benchmark clustering algorithm using
Fuzzy-MOC + + NA = + accuracy parameter. + symbol represents significantly differ-
PSO-GA + + = NA + ent, = represents similar and NA is not applicable. It is seen
Proposed + + + + NA from the results that FCM, PSO, GA and WWO yields simi-
WWO lar results with respect to each other, while there is signifi-
cant difference in results for K-means and proposed WWO
with respect to FCM, PSO, GA and WWO. Thus, it is stated
Table 14 Average ranking of proposed WWO and benchmark cluster- that proposed WWO outperforms the state-of-art clustering
ing algorithm using an F-score parameter algorithms. The clustering algorithms are divided into five
FCM PSO K-means GA WWO Proposed WWO groups on the basis of post hoc test results. These groups are
(FCM, PSO, GA and WWO), (K-means), (FCM, PSO, GA),
4.12 3.38 4.35 5.15 2.77 1.23 (FCM, PSO, WWO) and (Proposed WWO). The algorithms
lie within the group having similar performance. It is stated
that proposed WWO algorithm is statistically different than
clusters for non-healthcare datasets. Hence, it can be said other algorithms.
that the proposed WWO based clustering algorithm is an Table 11 shows the average ranking of the proposed
effective algorithm for the clustering of data objects. WWO based and hybrid variants of clustering algorithms
using accuracy parameter. It is seen that proposed WWO
achieves first rank that is 1.25 among the other clustering
5.3 Statistical results algorithms. While KFCM achieves the lowest among hybrid
variants clustering algorithms with value 4.42.
This subsection discusses the statistical analysis to vali- Table 12 presents the statistics of Friedman test using
date the performance of proposed WWO based clustering accuracy parameter on hybrid variants of clustering algo-
algorithm for various benchmark datasets. Table 8 shows rithms. The statistical value and critical value of Friedman
the average ranking of the proposed WWO based cluster- Test is 31.76834 and 9.487729 respectively with degree of
ing algorithm and other benchmark clustering algorithms in freedom as 4, whereas p-value is 2.13E−06. Hence, it can
terms of accuracy parameter. It is seen that proposed WWO be stated that the null hypothesis (H0) is rejected at the con-
achieves first rank that is 1.15 among the other clustering fidence level of 0.05.
algorithms. While K-means achieves the lowest among all A significant difference occurs between the performance
the clustering algorithms in comparison with value 5.23. of proposed WWO based clustering algorithm and the
Table 9 presents the statistics of Friedman test using hybrid variants of clustering algorithms. Further, post hoc
accuracy parameter on benchmark clustering algorithms. test is conducted. Table 13 shows the results of the post hoc
The statistical value and critical value of Friedman Test is test on hybrid clustering algorithm using accuracy parame-
25.62545 and 11.070504 respectively with degree of free- ter. It is seen that Fuzzy-PSO resulted in similar performance
dom as 5, whereas p value is 1.62E−06. Hence, it can be to KFCM for accuracy but Fuzzy-MOC, PSO-GA and pro-
stated that the null hypothesis (H0) is rejected at the confi- posed WWO give significantly different results. Fuzzy-
dence level of 0.05. A significant difference occurs between MOC with respect to PSO-GA gives the similar results. It
is observed that the proposed WWO clustering algorithms
13
780 Evolutionary Intelligence (2022) 15:759–783
Table 16 Results of post hoc Techniques FCM PSO K-means GA WWO Pro-
test on benchmark clustering posed
algorithm using F-score WWO
parameter
FCM NA = = + + +
PSO = NA = + = +
K-means = = NA = + +
GA + + = NA + +
WWO + = + + NA +
Proposed WWO + + + + + NA
Table 17 Average ranking of proposed WWO and hybrid variants of i.e., 1.23. It is also noted that the GA algorithm achieves the
clustering algorithm using F-score parameter lowest rank i.e., 5.15 among other algorithms in comparison.
Fuzzy-PSO KFCM Fuzzy-MOC PSO-GA Proposed WWO Table 15 presents the statistics of the Friedman test. The sta-
tistical value of the Friedman test (0.05, 9) is 34.868132 and
3.85 4.23 2.54 3 1.46 the critical value is 34.868132 with degree of freedom as 5,
whereas the p-value is 1.62E−06. Hence, it can be stated that
the null hypothesis (H0) is rejected at the confidence level
Table 18 Results of the Friedman test on hybrid clustering algo- of 0.05. It is stated that the performance of proposed WWO
rithms using F-score parameter algorithm significantly differs from clustering algorithms
Method Statistical p-Value Degree Critical Hypoth- in comparison.
Value of free- Value esis Further, post hoc test is conducted using F-score param-
dom eter. Table 16 shows the results of post hoc test on bench-
Friedman 25.930502 3.27E−05 4 9.487729 Rejected mark clustering algorithm. It is observed that PSO, FCM,
Test K-means and GA yield similar results. For most of the cases,
GA, WWO and proposed WWO gives significantly differ-
ent results (denoted by + symbol) for F-score parameter. It
Table 19 Results of post hoc test on hybrid variants of clustering is observed that proposed WWO algorithms performs sig-
algorithm using F-score parameter nificantly better as compared to benchmark clustering algo-
Techniques Fuzzy-PSO KFCM Fuzzy-MOC PSO-GA Pro- rithms. As per the results of post hoc test, the algorithms
posed are allocated to divided into six groups such as (FCM, PSO,
WWO K-means), (FCM, PSO, K-means, WWO), (FCM, PSO,
Fuzzy-PSO NA = + = + K-means, GA), (K-means, GA, PSO, WWO) and (Proposed
KFCM = NA + + + WWO). The proposed WWO algorithm is not grouped with
Fuzzy-MOC + + NA = + other algorithms. Hence, it can be concluded that proposed
PSO-GA = + = NA + WWO algorithm exhibits dissimilar performance.
Proposed + + + + NA Table 17 shows the average ranking of the proposed
WWO WWO and hybrid variants of clustering algorithm. It is
seen that the proposed algorithm obtains the first rank i.e.
1.46. It is also noted that the KFCM achieves the lowest
performs effectively with respect to other hybrid variants of rank i.e. 4.23 in contrast to other algorithms. Table 18 pre-
clustering algorithm. Further, the algorithms are grouped sents the statistics of the Friedman test. The statistical value
into three different categories as per the post hoc test results. of the Friedman test is 25.930502and the critical value is
These categories are (Fuzzy-PSO, KFCM), (Fuzzy-MOC, 9.487729 with degree of freedom as 4, whereas the p-value
PSO-GA) and (Proposed WWO). The algorithms placed in is 3.27E−05. The null hypothesis (H0) is therefore rejected
same category exhibit similar performance. at the confidence level of 0.05.
The results of Friedman statistical test using the F-score Similarly, post hoc test is conducted using F-score param-
parameter are reported in Tables 14, 15 and 16. Table 14 eter for hybrid variants of clustering algorithm. Table 19
depicts the average ranking of the proposed WWO based shows the results of the post hoc test on hybrid variants of
clustering algorithm and the other clustering algorithms. It clustering algorithm. It is observed that Fuzzy-PSO and
is seen that the proposed algorithm obtains the first rank KFC, Fuzzy-PSO and PSO-GA, Fuzzy-MOC and PSO-GA
13
Evolutionary Intelligence (2022) 15:759–783 781
yield similar results. It is seen that Fuzzy-PSO and proposed 2. Gong S, Hu W, Li H, Qu Y (2018) Property clustering in linked
WWO perform significantly differently in contrast to Fuzzy- data: an empirical study and its application to entity browsing. Int
J Semant Web Inf Syst (IJSWIS) 14(1):31–70
PSO. Also, Fuzzy-MOC, PSO-GA and proposed WWO give 3. Chou CH, Hsieh SC, Qiu CJ (2017) Hybrid genetic algorithm and
significance difference compared to KFCM. With respect fuzzy clustering for bankruptcy prediction. Appl Soft Comput
to PSO-GA, KFCM and proposed WWO gives significance 56:298–316
difference among the results and algorithms group into five 4. Holý V, Sokol O, Černý M (2017) Clustering retail products based
on customer behaviour. Appl Soft Comput 60:752–762
different categories such as (Fuzzy-PSO, KFCM, PSO-GA), 5. Navarro ÁAM, Ger PM (2018) Comparison of clustering algo-
(Fuzzy-PSO, KFCM), (Fuzzy-MOC, PSO-GA), (Fuzzy- rithms for learning analytics with educational datasets. IJIMAI
PSO, Fuzzy-MOC, PSO-GA) and (Proposed WWO). Hence, 5(2):9–16
the test reveals that proposed WWO performs significantly 6. Hyde R, Angelov P, MacKenzie AR (2017) Fully online clustering
of evolving data streams into arbitrarily shaped clusters. Inf Sci
better in contrast to other hybrid variants of clustering algo- 382:96–114
rithm for F-score parameter. 7. Wang L, Zhou X, Xing Y, Yang M, Zhang C (2017) Clustering ecg
heartbeat using improved semi-supervised affinity propagation.
IET Softw 11(5):207–213
8. Mekhmoukh A, Mokrani K (2015) Improved fuzzy C-means
6 Conclusion based particle swarm optimization (PSO) initialization and outlier
rejection with level set methods for MR brain image segmentation.
In this work, the WWO based clustering algorithm is pro- Comput Methods Prog Biomed 122(2):266–281
posed for solving data clustering problems. Prior to adop- 9. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017)
Text feature selection with a robust weight scheme and dynamic
tion, two improvements are incorporated in WWO algo- dimension reduction to text document clustering. Expert Syst
rithm for generating more promising and efficient clustering Appl 84:24–36
results. These improvements are described in terms of an 10. Triguero I, del Río S, López V, Bacardit J, Benítez JM, Herrera F
updated global search mechanism and decay operator. The (2015) ROSEFW-RF: the winner algorithm for the ECBDL’14 big
data competition: an extremely imbalanced big data bioinformat-
objective of decay operator is to address the premature con- ics problem. Knowl-Based Syst 87:69–79
vergence issue of WWO algorithm. Further, an updated 11. Zhu J, Lung CH, Srivastava V (2015) A hybrid clustering tech-
global search mechanism based on the global best concept nique using quantitative and qualitative data for wireless sensor
of PSO algorithm is incorporated in WWO to improve the networks. Ad Hoc Netw 25:38–53
12. Abualigah LMQ (2019) Feature selection and enhanced krill herd
accuracy rate as well as guide the global optimum solution. algorithm for text document clustering. Springer, Berlin, pp 1–165
The performance of proposed WWO algorithm is assessed 13. Marinakis Y, Marinaki M, Doumpos M, Zopounidis C (2009) Ant
over thirteen benchmark datasets on the basis accuracy and colony and particle swarm optimization for financial classification
F-score parameter. The simulation results of the proposed problems. Expert Syst Appl 36(7):10604–10611
14. Saraswathi S, Sheela MI (2014) A comparative study of various
WWO based clustering algorithm are compared with the clustering algorithms in data mining. Int J Comput Sci Mob Com-
state of art clustering algorithms from the literature. It is put 11(11):422–428
observed that the proposed clustering algorithm achieves 15. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means
higher accuracy and F-score rate in contrast to other exist- clustering algorithm. J R Stat Soc Ser C Appl Stat 28(1):100–108
16. Celebi ME, Kingravi HA, Vela PA (2013) A comparative study
ing clustering algorithms reported in the literature. It is of efficient initialization methods for the k-means clustering algo-
concluded that the proposed WWO clustering algorithm rithm. Expert Syst Appl 40(1):200–210
obtains better clustering results for most of the clustering 17. Han J, Pei J, Kamber M (2011) Data mining: concepts and tech-
datasets. Hence, it can be stated that the proposed WWO niques. Elsevier, Amsterdam
18. Moreira A, Santos MY, Carneiro S (2005) Density-based cluster-
based clustering algorithm is a promising and efficient clus- ing algorithms–DBSCAN and SNN. University of Minho-Portu-
tering algorithm for analyzing the data. In future, the pro- gal, pp 1–18
posed WWO algorithm can be enhanced using neighborhood 19. Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64
method. Moreover, the applicability of WWO algorithm will 20. Hufnagl B, Lohninger H (2020) A graph-based clustering method
with special focus on hyperspectral imaging. Anal Chim Acta
be evaluated feature selection, parameter optimization and 1097:37–48
multi objective optimization problems. 21. Nanda SJ, Panda G (2014) A survey on nature inspired
metaheuristic algorithms for partitional clustering. Swarm Evol
Comput 16:1–18
22. Nayyar A, Le DN, Nguyen NG (eds) (2018) Advances in swarm
intelligence for optimizing problems in computer science. CRC
References Press, Boca Raton
23. Nayyar A, Nguyen NG (2018) Introduction to swarm intelligence.
1. Jain AK (2008) Data clustering: 50 years beyond k-means. In: Adv Swarm Intell Optim Probl Comput Sci:53–78
Joint European conference on machine learning and knowledge 24. Nayyar A, Garg S, Gupta D, Khanna A (2018) Evolutionary com-
discovery in databases. Springer, Berlin, Heidelberg, pp 3–4 putation: theory and algorithms. In: Advances in swarm intel-
ligence for optimizing problems in computer science. Chapman
and Hall/CRC, pp 1–26
13
782 Evolutionary Intelligence (2022) 15:759–783
25. Sung CS, Jin HW (2000) A tabu-search-based heuristic for clus- on innovative computing technology. Springer, Berlin, Heidelberg,
tering. Pattern Recogn 33(5):849–858 pp 383–388
26. Selim SZ, Alsultan K (1991) A simulated annealing algorithm for 48. Singh H, Kumar Y (2019) Hybrid big bang-big crunch algorithm
the clustering problem. Pattern Recogn 24(10):1003–1008 for cluster analysis. In: International conference on futuristic
27. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based trends in networks and computing technologies. Springer, Singa-
clustering technique. Pattern Recogn 33(9):1455–1465 pore, pp 648–661
28. Karaboga D, Ozturk C (2011) A novel clustering approach: 49. Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data
artificial Bee Colony (ABC) algorithm. Appl Soft Comput clustering using nature-inspired symbiotic organism search algo-
11(1):652–657 rithm. Knowl-Based Syst 163:546–557
29. Sahoo G, Kumar Y (2017) A two-step artificial bee colony algo- 50. Agbaje MB, Ezugwu AE, Els R (2019) Automatic data clustering
rithm for clustering. Neural Comput Appl 28(3):537–551 using hybrid firefly particle swarm optimization algorithm. IEEE
30. Nayyar A, Puri V, Suseendran G (2019) Artificial bee Colony opti- Access 7:184963–184984
mization—population-based meta-heuristic swarm intelligence 51. Kushwaha N, Pant M, Sharma S (2019) Electromagnetic optimiza-
technique. Data management, analytics and innovation. Springer, tion‐based clustering algorithm. Expert Syst:e12491
Singapore, pp 513–525 52. Zhao F, Zhang L, Liu H, Zhang Y, Ma W, Zhang C, Song H (2019)
31. Kumar S, Nayyar A, Kumari R (2019) Arrhenius artificial bee An improved water wave optimization algorithm with the single
colony algorithm. International conference on innovative comput- wave mechanism for the no-wait flow-shop scheduling problem.
ing and communications. Springer, Singapore, pp 187–195 Eng Optim 51(10):1727–1742
32. Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony 53. Singh G, Rattan M, Gill SS, Mittal N (2019) Hybridization of
approach for clustering. Anal Chim Acta 509(2):187–195 water wave optimization and sequential quadratic programming
33. Nayyar A, Singh R (2016) Ant colony optimization—computa- for cognitive radio system. Soft Comput 23(17):7991–8011
tional swarm intelligence technique. In: 2016 3rd International 54. Zhao F, Liu H, Zhang Y, Ma W, Zhang C (2018) A discrete water
conference on computing for sustainable global development wave optimization algorithm for no-wait flow shop scheduling
(INDIACom), IEEE, pp 1493–1499 problem. Expert Syst Appl 91:347–363
34. Niknam T, Amiri B (2010) An efficient hybrid approach based on 55. Zhang J, Zhou Y, Luo Q (2018) An improved sine cosine water
PSO, ACO and k-means for cluster analysis. Appl Soft Comput wave optimization algorithm for global optimization. J Intell
10(1):183–197 Fuzzy Syst 34(4):2129–2141
35. Bouyer A, Hatamlou A (2018) An efficient hybrid clustering 56. Shao Z, Pi D, Shao W (2019) A novel multi-objective discrete
method based on improved cuckoo optimization and modified water wave optimization for solving multi-objective blocking
particle swarm optimization algorithms. Appl Soft Comput flow-shop scheduling problem. Knowl-Based Syst 165:110–131
67:172–182 57. Liu A, Li P, Sun W, Deng X, Li W, Zhao Y, Liu B (2019) Predic-
36. Kumar Y, Singh PK (2018) Improved cat swarm optimization tion of mechanical properties of micro-alloyed steels via neural
algorithm for solving global optimization problems and its appli- networks learned by water wave optimization. Neural Comput
cation to clustering. Appl Intell 48(9):2681–2697 Appl:1–16
37. Kumar Y, Sahoo G (2015) A hybrid data clustering approach 58. Zhou Y, Zhang J, Yang X, Ling Y (2018) Optimal reactive power
based on improved cat swarm optimization and K-harmonic mean dispatch using water wave optimization algorithm. Oper Res:1–17
algorithm. AI Commun 28(4):751–764 59. Ibrahim AM, Tawhid MA, Ward RK (2020) A binary water wave
38. Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly optimization for feature selection. Int J Approximate Reasoning
algorithm: performance study. Swarm Evol Comput 1(3):164–171 120:74–91
39. Durbhaka GK, Selvaraj B, Nayyar A (2019) Firefly swarm: 60. Manshahia MS (2017) Water wave optimization algorithm-based
metaheuristic swarm intelligence technique for mathematical opti- congestion control and quality of service improvement in wireless
mization. Data Management, Analytics and Innovation. Springer, sensor networks. Trans Netw Commun 5(4):31–31
Singapore, pp 457–466 61. Hematabadi AA, Foroud AA (2019) Optimizing the multi-
40. Han X, Quan L, Xiong X, Almeter M, Xiang J, Lan Y (2017) A objective bidding strategy using min–max technique and modi-
novel data clustering algorithm based on modified gravitational fied water wave optimization method. Neural Comput Appl
search algorithm. Eng Appl Artif Intell 61:1–7 31(9):5207–5225
41. Kumar Y, Sahoo G (2014) A review on gravitational search algo- 62. Soltanian A, Derakhshan F, Soleimanpour-Moghadam M (2018)
rithm and its applications to data clustering & classification. Int J MWWO: modified water wave optimization. In: 2018 3rd con-
Intell Syst Appl 6(6):79 ference on swarm intelligence and evolutionary computation
42. Hatamlou A (2013) Black hole: a new heuristic optimization (CSIEC). IEEE, pp 1–5
approach for data clustering. Inf Sci 222:175–184 63. Singh T (2020) A chaotic sequence-guided Harris hawks opti-
43. Kumar Y, Sahoo G (2014) A charged system search approach for mizer for data clustering. Neural Comput Appl
data clustering. Prog Artif Intell 2(2–3):153–166 64. Tsai CW, Chang WY, Wang YC, Chen H (2019) A high-perfor-
44. Kumar Y, Sahoo G (2015) Hybridization of magnetic charge mance parallel coral reef optimization for data clustering. Soft
system search and particle swarm optimization for efficient data Comput 23(19):9327–9340
clustering using neighborhood search strategy. Soft Comput 65. Kuwil FH, Shaar F, Topcu AE, Murtagh F (2019) A new data clus-
19(12):3621–3645 tering algorithm based on critical distance methodology. Expert
45. Kumar Y, Singh PK (2019) A chaotic teaching learning based Syst Appl 129:296–310
optimization algorithm for clustering problems. Appl Intell 66. Baalamurugan KM, Bhanu SV (2019) An efficient clustering
49(3):1036–1062 scheme for cloud computing problems using metaheuristic algo-
46. Singh H, Kumar Y, Kumar S (2019) A new meta-heuristic algo- rithms. Cluster Comput 22(5):12917–12927
rithm based on chemical reactions for partitional clustering prob- 67. Sharma M, Chhabra JK (2019) An efficient hybrid PSO polyga-
lems. Evol Intel 12(2):241–252 mous crossover-based clustering algorithm. Evol Intell:1–19
47. Hatamlou A, Abdullah S, Hatamlou M (2011) Data clustering 68. Abdulwahab HA, Noraziah A, Alsewari AA, Salih SQ (2019)
using big bang–big crunch algorithm. In: International conference An enhanced version of black hole algorithm via levy flight
13
Evolutionary Intelligence (2022) 15:759–783 783
for optimization and data clustering problems. IEEE Access 78. Dinkar SK, Deep K (2019) Opposition-based antlion optimizer
7:142085–142096 using Cauchy distribution and its application to data clustering
69. Mustafa HM, Ayob M, Nazri MZA, Kendall G (2019) An problem. Neural Comput Appl:1–29
improved adaptive memetic differential evolution optimi- 79. Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017)
zation algorithm for data clustering problems. PLoS ONE A novel hybridization strategy for krill herd algorithm applied to
14(5):e0216906 clustering techniques. Appl Soft Comput 60:423–435
70. Tarkhaneh O, Moser I (2019) An improved differential evolution 80. Zeng N, Wang Z, Zhang H, Kim KE, Li Y, Liu X (2019) An
algorithm using Archimedean spiral and neighborhood search- improved particle filter with a novel hybrid proposal distribution
based mutation approach for cluster analysis. Fut Gener Comput for quantitative analysis of gold immunochromatographic strips.
Syst 101:921–939 IEEE Trans Nanotechnol 18:819–829
71. Aljarah I, Mafarja M, Heidari AA, Faris H, Mirjalili S (2020) 81. Zeng N, Wang Z, Liu W, Zhang H, Hone K, Liu X (2020) A
Clustering analysis using a novel locality-informed grey wolf- dynamic neighborhood-based switching particle swarm optimiza-
inspired clustering approach. Knowl Inf Syst 62(2):507–539 tion algorithm. IEEE Trans Cybern
72. Zhu LF, Wang JS, Wang HY, Guo SS, Guo MW, Xie W (2020) 82. Abualigah L (2020) Group search optimizer: a nature-inspired
Data clustering method based on improved bat algorithm with meta-heuristic optimization algorithm with its results, variants,
six convergence factors and local search operators. IEEE Access and applications. Neural Comput Appl:1–24
8:80536–80560 83. Abualigah L (2020) Multi-verse optimizer algorithm: a compre-
73. Senthilnath J, Kulkarni S, Suresh S, Yang XS, Benediktsson JA hensive survey of its results, variants, and applications. Neural
(2019) FPA clust: evaluation of the flower pollination algorithm Comput Appl:1–21
for data clustering. Evol Intell:1–11 84. Zeng N, Qiu H, Wang Z, Liu W, Zhang H, Li Y (2018) A new
74. Mageshkumar C, Karthik S, Arunachalam VP (2019) Hybrid switching-delayed-PSO-based optimized SVM algorithm for diag-
metaheuristic algorithm for improving the efficiency of data clus- nosis of Alzheimer’s disease. Neurocomputing 320:195–202
tering. Cluster Comput 22(1):435–442 85. Zhu G, Kwong S (2010) Gbest-guided artificial bee colony algo-
75. Kaur A, Pal SK, Singh AP (2019) Hybridization of chaos and rithm for numerical function optimization. Appl Math Comput
flower pollination algorithm over k-means for data clustering. 217(7):3166–3173
Appl Soft Comput:105523
76. Xie H, Zhang L, Lim CP, Yu Y, Liu C, Liu H, Walters J (2019) Publisher’s Note Springer Nature remains neutral with regard to
Improving K-means clustering with enhanced Firefly Algorithms. jurisdictional claims in published maps and institutional affiliations.
Appl Soft Comput 84:105763
77. Huang KW, Wu ZX, Peng HW, Tsai MC, Hung YC, Lu YC (2019)
Memetic particle gravitation optimization algorithm for solving
clustering problems. IEEE Access 7:80950–80968
13