K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
Abstract: Clustering is a process of partitioning a set of data (or objects)into a set of meaningful sub-classes, called clusters, help users
understand the natural grouping or structure in adata set. Clustering has wide applications, inEconomic Science (especially market
research), Document classification,Pattern Recognition, Spatial Data Analysis and Image Processing.This paper focuses on clustering
in data mining and image processing. K-means algorithm is the chosen clustering algorithm to study in this work. The paper include:
the algorithm and its implementation, how to use it in data mining application and also in pattern recognition.
Input:
k: the number of clusters,
D: a data set containing n objects.
Output: A set of k clusters.
Method:
1) Arbitrarily choose k objects from D as the initial cluster
centers; Figure 1: Flow chart of k-means algorithm
2) Repeat
3) (Re) assign each object to the cluster to which the object 3.2 k-mean example
is the most similar,based on the mean value of the objects
in the cluster; The below figures shows the steps of implementing k-means
4) Update the cluster means, i.e., calculate the mean value algorithm in details.
of the objects foreach cluster;
5) Until no change;
So and after implementation of k- mean algorithm on that bioinformatics, weatherforecasting, fraud detection, financial
data set the result was:K-Means algorithm is more efficient analysis and customersegmentation. Customer segmentation
algorithmfor mining large Databases and Cloud computing is the subdivision of a businesscustomer base into groups
providessolution for storing largedatabase with less cost. called customer segments such thateach customer segment
consists of customers who sharesimilar market characteristics.
4.2 using K-Means Algorithm for EfficientCustomer This segmentation is based onfactors that can directly or
Segmentation[9] indirectly influence market orbusiness such as products
preferences or expectations,locations, behaviors and so on.
Data mining is the process of extracting meaningful The importance of customer segmentation include, inter alia,
information from dataset and presenting it in a human the ability of a business tocustomize market programs that
understandable format forthe purpose of decision support. will be suitable for each of itscustomer segments; business
The data mining techniquesintersect areas such as statistics, decision support in terms ofrisky situation such as credit
artificial intelligence,machine learning and database systems. relationship with its customers.
The applications ofdata mining include but not limited to
Figure 5: K-Means clustering for an apple fruit that is infected with apple scab disease with four clusters (a) The infected fruit
image, (b) first cluster, (c) second cluster, (d) third cluster, and (e) fourth cluster, respectively, (f) single gray-scale image
colored based on their cluster index.
Figure 6: Defect segmentation results of apples (a) Images before segmentation, (b) Images after segmentation.
During the unfolding measures that are taken for the purpose
of leukemia detection, segmentation of blood cells is a vital
step. There are several method can used for solving this stat
.k-mean algorithm method is used and the results show that
the segmentation based on K-means clustering gives good
results.
References
[1] John A. Hartigan , Clustering Algorithms, John Wiley &
Sons New York , London , Sydney , Toronto,1975 .
[2] Fayyad, U.M., G. Piatetsky Shapiro, P. Smyth And R.
Uthurusamy, Advances In Knowledge Discovery And
Data Mining,Aaai Press/The Mit Press, Pp: 573-592,
1996.
[3] M.Jianliang, S.Haikun And B.Ling, “The Application
On Intrusion Detection Based On K-Means Cluster
Algorithm”, IEEE International Conference On
Information Technology And Applications, 2009.
[4] Lei Li, De-Zhang Yang, Fang-Cheng Shen,A Novel
Rule-Based Intrusion Detection System Using Data
Mining,978-1-4244-5539, IEEE,2010.
[5] Alan Jose, S. Ravi And M. Sambath, Brain Tumor
Segmentation Using K-Means Clustering And Fuzzy C-
Means Algorithm And Its Area Calculation. In
International Journal Of Innovative Research In
Computer And Communication Engineering, Vol. 2,
Issue 2, March ,2014.