0% found this document useful (0 votes)

45 views14 pages

Clustering in Data Mining

Clustering in data mining is an unsupervised learning technique that groups similar data points based on their features, aiming to identify patterns within datasets. Various clustering methods exist, including partitioning, hierarchical, density-based, grid-based, model-based, and constraint-based methods, each with its own advantages and applications. While clustering can reveal hidden patterns and assist in exploratory data analysis, it also has limitations such as sensitivity to initial conditions and computational expense.

Uploaded by

preethier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views14 pages

Clustering in Data Mining

Uploaded by

preethier

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Clustering in Data Mining

Clustering in data mining is a technique that groups similar data points together based on
their features and characteristics.

It can also be referred to as a process of grouping a set of objects so that objects in the same
group (called a cluster) are more similar to each other than those in other groups (clusters).

It is an unsupervised learning technique that aims to identify similarities and patterns in a

dataset.

Clustering algorithms typically require defining the number of clusters, similarity measures,
and clustering methods.

These algorithms aim to group data points together in a way that maximizes similarity within
the groups and minimizes similarity between different groups, as shown in the picture below.

A cluster can have the following properties -

 The data points within a cluster are similar to each other based on some pre-defined
criteria or similarity measures.
 The clusters are distinct from each other, and the data points in one cluster are
different from those in another cluster.
 The data points within a cluster are closely packed together.
 A cluster is often represented by a centroid or a center point that summarizes the
properties of the data points within the cluster.
 A cluster can have any number of data points, but a good cluster should not be too
small or too large.

Requirements of clustering in data mining:

Clustering is a critical technique in the data mining process, and it has various advantages, as
mentioned below -
 Scalability
Clustering algorithms in data mining can handle large datasets efficiently, making it possible
to extract useful insights and knowledge from massive amounts of data.
 High Dimensionality
Clustering algorithms in data mining can efficiently handle high-dimensional datasets,
making it possible to find patterns and relationships that may not be apparent in lower
dimensions.
 Discovery of Clusters with Arbitrary Shape
Clustering algorithms in data mining can discover clusters that have different shapes and
sizes, making it possible to identify groups of data points that share common properties or
features.
 Interpretability
Clustering results can be easily interpreted by humans, making it possible to extract useful
insights and knowledge from the data.
 Ability to Deal with Different Kinds of Data
Clustering algorithms in data mining can handle different types of data, such as categorical,
numerical, and binary, making it possible to cluster a wide range of data types.

Clustering Methods
Clustering methods can be classified into the following categories −

 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method

Partitioning Method: It is used to make partitions on the data in order to

form clusters. If “n” partitions are done on “p” objects of the database
then each partition is represented by a cluster and n < p. The two
conditions which need to be satisfied with this Partitioning Clustering
Method are:
 One objective should only belong to only one group.
 There should be no group without even a single purpose.
In the partitioning method, there is one technique called iterative
relocation, which means the object will be moved from one group to
another to improve the partitioning
Hierarchical Method: In this method, a hierarchical decomposition of the
given set of data objects is created. We can classify hierarchical methods
and will be able to know the purpose of classification on the basis of how
the hierarchical decomposition is formed. There are two types of
approaches for the creation of hierarchical decomposition, they are:
 Agglomerative Approach: The agglomerative approach is also known
as the bottom-up approach. Initially, the given data is divided into
which objects form separate groups. Thereafter it keeps on merging
the objects or the groups that are close to one another which means
that they exhibit similar properties. This merging process continues
until the termination condition holds.
 Divisive Approach: The divisive approach is also known as the top-
down approach. In this approach, we would start with the data objects
that are in the same cluster. The group of individual clusters is divided
into small clusters by continuous iteration. The iteration continues until
the condition of termination is met or until each cluster contains one
object.

Density-Based Method: The density-based method mainly focuses on

density. In this method, the given cluster will keep on growing
continuously as long as the density in the neighbourhood exceeds some
threshold, i.e, for each data point within a given cluster. The radius of a
given cluster has to contain at least a minimum number of points.
Grid-Based Method: In the Grid-Based method a grid is formed using the
object together,i.e, the object space is quantized into a finite number of
cells that form a grid structure. One of the major advantages of the grid-
based method is fast processing time and it is dependent only on the
number of cells in each dimension in the quantized space. The
processing time for this method is much faster so it can save time.

Model-Based Method: In the model-based method, all the clusters are

hypothesized in order to find the data which is best suited for the model.
The clustering of the density function is used to locate the clusters for a
given model. It reflects the spatial distribution of data points and also
provides a way to automatically determine the number of clusters based
on standard statistics, taking outlier or noise into account. Therefore it
yields robust clustering methods.
Constraint-Based Method: The constraint-based clustering method is
performed by the incorporation of application or user-oriented constraints.
A constraint refers to the user expectation or the properties of the desired
clustering results. Constraints provide us with an interactive way of
communication with the clustering process. The user or the application
requirement can specify constraints.

Applications Of Cluster Analysis:

 It is widely used in image processing, data analysis, and pattern
recognition.
 It helps marketers to find the distinct groups in their customer base and
they can characterize their customer groups by using purchasing
patterns.
 It can be used in the field of biology, by deriving animal and plant
taxonomies and identifying genes with the same capabilities.
 It also helps in information discovery by classifying documents on the
web.

Advantages of Cluster Analysis:

1. It can help identify patterns and relationships within a dataset that may
not be immediately obvious.

2. It can be used for exploratory data analysis and can help with feature
selection.
3. It can be used to reduce the dimensionality of the data.

4. It can be used for anomaly detection and outlier identification.

5. It can be used for market segmentation and customer profiling.

Disadvantages of Cluster Analysis:
1. It can be sensitive to the choice of initial conditions and the number of
clusters.

2. It can be sensitive to the presence of noise or outliers in the data.

3. It can be difficult to interpret the results of the analysis if the clusters

are not well-defined.

4. It can be computationally expensive for large datasets.

5. The results of the analysis can be affected by the choice of clustering

algorithm used.

6. It is important to note that the success of cluster analysis depends on

the data, the goals of the analysis, and the ability of the analyst to
interpret the results.

Partitioning Method
 K-means Clustering
K-means clustering is a partitioning method that divides the data points into k
clusters, where k is a pre-defined number. It works by iteratively moving the centroid
of each cluster to the mean of the data points assigned to it until convergence. K-
means aims to minimize the sum of squared distances between each data point and its
assigned cluster centroid.

K-Means Clustering Algorithm-

K-Means Clustering Algorithm involves the following steps:

Step 1: Calculate the number of K (Clusters).

Step 2: Randomly select K data points as cluster center.

Step 3: Using the Euclidean distance formula measure the distance between each data

point and each cluster center.

Step 4: Assign each data point to that cluster whose center is nearest to that data point.

Step 5: Re-compute the center of newly formed clusters. The center of a cluster is

computed by taking the mean of all the data points contained in that cluster.

Step 6: Keep repeating the procedure from Step 3 to Step 5 until any of the following

stopping criteria is met-

 If data points fall in the same cluster

 Reached maximum of iteration

 The newly formed cluster does not change in center points

Figure – K-mean Clustering

Flowchart:
Example
Lets consider we have cluster points P1(1,3) , P2(2,2) , P3(5,8) , P4(8,5) , P5(3,9)

, P6(10,7) , P7(3,3) , P8(9,4) , P9(3,7).

First, we take our K value as 3 and we assume that our Initial cluster centers are

P7(3,3), P9(3,7), P8(9,4) as C1, C2, C3. We will find out the new centroids after 2

iterations for the above data points.

Step 1
Find the distance between data points and Centroids. which data points have a

minimum distance that points moved to the nearest cluster centroid.

Iteration 1
Calcualte the distance between data points and K (C1,C2,C3)

C1P1 =>(3,3)(1,3) => sqrt[(1–3)²+(3–3)²] => sqrt[4] =>2

C2P1 =>(3,7)(1,3)=> sqrt[(1–3)²+(3–7)²] => sqrt[20] =>4.5

C3P1 =>(9,4)(1,3) => sqrt[(1–9)²+(3–4)²]=> sqrt[65] =>8.1

For P2,

C1P2 =>(3,3)(2,2) => sqrt[(2–3)²+(2–3)²] => sqrt[2] =>1.4

C2P2 =>(3,7)(2,2)=> sqrt[(2–3)²+(2–7)²] => sqrt[26] =>5.1

C3P2 =>(9,4)(2,2) => sqrt[(2–9)²+(2–4)²]=> sqrt[53] =>7.3

For P3,

C1P2 =>(3,3)(5,8) => sqrt[(5–3)²+(8–3)²] => sqrt[29] =>5.3

C2P2 =>(3,7)(5,8)=> sqrt[(5–3)²+(8–7)²] => sqrt[5] =>2.2

C3P2 =>(9,4)(5,8) => sqrt[(5–9)²+(8–4)²]=> sqrt[32] =>5.7

Similarly for other distances..

Cluster 1 => P1(1,3) , P2(2,2) , P7(3,3)

Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7)

Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4)

Now, We re-compute the new clusters and the new cluster center is computed

by taking the mean of all the points contained in that particular cluster.

New center of Cluster 1 => (1+2+3)/3 , (3+2+3)/3 => 2,2.7

New center of Cluster 2 => (5+3+3)/3 , (8+9+7)/3 => 3.7,8

New center of Cluster 3 => (8+10+9)/3 , (5+7+4)/3 => 9,5.3

Iteration 1 is over. Now, let us take our new center points and repeat the same

steps which are to calculate the distance between data points and new center

points with the Euclidean formula and find cluster groups.

Iteration 2

Calcualte the distance between data points and K (C1,C2,C3)

C1(2,2.7) , C2(3.7,8) , C3(9,5.3)

C1P1 =>(2,2.7)(1,3) => sqrt[(1–2)²+(3–2.7)²] => sqrt[1.1] =>1.0

C2P1 =>(3.7,8)(1,3)=> sqrt[(1–3.7)²+(3–8)²] => sqrt[32.29] =>4.5

C3P1 =>(9,5.3)(1,3) => sqrt[(1–9)²+(3–5.3)²]=> sqrt[69.29] =>8.3

Similarly for other distances..

Cluster 1 => P1(1,3) , P2(2,2) , P7(3,3)

Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7)

Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4)

Center of Cluster 1 => (1+2+3)/3 , (3+2+3)/3 => 2,2.7

Center of Cluster 2 => (5+3+3)/3 , (8+9+7)/3 => 3.7,8

Center of Cluster 3 => (8+10+9)/3 , (5+7+4)/3 => 9,5.3

We got the same centroid and cluster groups which indicates that this dataset has only 2 groups.

K-Means clustering stops iteration because of the same cluster repeating so no need to continue

iteration and display the last iteration as the best cluster groups for this dataset.

The Below graph explained the difference between iterations 1 and 2. We can see centroids (green

dot) changed in the 2nd Iteration.

 Hierarchical Clustering
Hierarchical clustering in data mining is a method that builds a tree-like hierarchy of
clusters, either by merging smaller clusters into larger ones (agglomerative or bottom-
up) or by splitting larger clusters into smaller ones (divisive or top-down). It does not
require a pre-defined number of clusters.

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:
1. Agglomerative Clustering
2. Divisive clustering
1. Agglomerative Clustering
Initially consider every data point as an individual Cluster and at every
step, merge the nearest pairs of the cluster. (It is a bottom-up method). At
first, every dataset is considered an individual entity or cluster. At every
iteration, the clusters merge with different clusters until one cluster is
formed.
The algorithm for Agglomerative Hierarchical Clustering is:
 Calculate the similarity of one cluster with all the other clusters
(calculate proximity matrix)
 Consider every data point as an individual cluster
 Merge the clusters which are highly similar or close to each other.
 Recalculate the proximity matrix for each cluster
 Repeat Steps 3 and 4 until only a single cluster remains.
Let’s see the graphical representation of this algorithm using a
dendrogram.
Note: This is just a demonstration of how the actual algorithm works no
calculation has been performed below all the proximity among the clusters
is assumed.
Let’s say we have six data points A, B, C, D, E, and F.

Agglomerative Hierarchical clustering

 Step-1: Consider each alphabet as a single cluster and calculate the

distance of one cluster from all the other clusters.
 Step-2: In the second step comparable clusters are merged together to
form a single cluster. Let’s say cluster (B) and cluster (C) are very
similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A),
(BC), (DE), (F)]
 Step-3: We recalculate the proximity according to the algorithm and
merge the two nearest clusters([(DE), (F)]) together to form new
clusters as [(A), (BC), (DEF)]
 Step-4: Repeating the same process; The clusters DEF and BC are
comparable and merged together to form a new cluster. We’re now left
with clusters [(A), (BCDEF)].
 Step-5: At last, the two remaining clusters are merged together to form
a single cluster [(ABCDEF)].
2. Divisive Hierarchical clustering
We can say that Divisive Hierarchical clustering is precisely
the opposite of Agglomerative Hierarchical clustering. In Divisive
Hierarchical clustering, we take into account all of the data points as a
single cluster and in every iteration, we separate the data points from the
clusters which aren’t comparable. In the end, we are left with N clusters.

Hierarchical clustering has several advantages over other

clustering methods
 The ability to handle non-convex clusters and clusters of different sizes
and densities.
 The ability to handle missing data and noisy data.
 The ability to reveal the hierarchical structure of the data, which can be
useful for understanding the relationships among the clusters.
Drawbacks of Hierarchical Clustering
 The need for a criterion to stop the clustering process and determine
the final number of clusters.
 The computational cost and memory requirements of the method can
be high, especially for large datasets.
 The results can be sensitive to the initial conditions, linkage criterion,
and distance metric used.
In summary, Hierarchical clustering is a method of data mining that
groups similar data points into clusters by creating a hierarchical
structure of the clusters.
 This method can handle different types of data and reveal the
relationships among the clusters. However, it can have high
computational cost and results can be sensitive to some conditions.
Density-based clustering
Density-based clustering refers to a method that is based on local cluster criterion, such
as density connected points
There are two different parameters to calculate the density-based clustering

DM Module 4
No ratings yet
DM Module 4
17 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
Data Mining Clustering Insights
No ratings yet
Data Mining Clustering Insights
3 pages
Clustering for Data Analysts
No ratings yet
Clustering for Data Analysts
6 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
Unit 5
No ratings yet
Unit 5
27 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering
No ratings yet
Clustering
8 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering Techniques for Data Scientists
No ratings yet
Clustering Techniques for Data Scientists
5 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
DM Notes - UNIT 4
No ratings yet
DM Notes - UNIT 4
31 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Unit 4
No ratings yet
Unit 4
40 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
12 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Unit 4
No ratings yet
Unit 4
106 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Unit VII
No ratings yet
Unit VII
30 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
Module V
No ratings yet
Module V
16 pages
Unit 5
No ratings yet
Unit 5
5 pages
Unit 4
No ratings yet
Unit 4
21 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
24 pages
Unit 4
No ratings yet
Unit 4
74 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Cluster Analysis in Data Mining
No ratings yet
Cluster Analysis in Data Mining
36 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Clustering
No ratings yet
Clustering
9 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Unit 4
No ratings yet
Unit 4
5 pages
BI UNIT-03 Chap02 Clustering
No ratings yet
BI UNIT-03 Chap02 Clustering
8 pages
Unit-V (Dmwh6em)
No ratings yet
Unit-V (Dmwh6em)
30 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Improved K-Means Clustering Algorithm by Getting Initial Cenroids
No ratings yet
Improved K-Means Clustering Algorithm by Getting Initial Cenroids
9 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
D-109 ENGLISH Manual
No ratings yet
D-109 ENGLISH Manual
16 pages
Miniprof: Technical Reference
No ratings yet
Miniprof: Technical Reference
33 pages
Business Partner Master Data
No ratings yet
Business Partner Master Data
34 pages
Default Password List
50% (2)
Default Password List
111 pages
S. No. Action For Windows For Mac: Basic
No ratings yet
S. No. Action For Windows For Mac: Basic
5 pages
Excel Intro
No ratings yet
Excel Intro
15 pages
LESSON 1 - Overview of System Analysis & Design
No ratings yet
LESSON 1 - Overview of System Analysis & Design
7 pages
C-Zone Hardware PDF
No ratings yet
C-Zone Hardware PDF
2 pages
C Programming Language Notes
No ratings yet
C Programming Language Notes
128 pages
Oracle SQL Tutorial
No ratings yet
Oracle SQL Tutorial
66 pages
MSP432 Timer Programming Guide
No ratings yet
MSP432 Timer Programming Guide
40 pages
Cybercrime Prevention Act of 2012
No ratings yet
Cybercrime Prevention Act of 2012
8 pages
5G RAN Insights for Telecom Experts
100% (3)
5G RAN Insights for Telecom Experts
76 pages
English Abbreviations (ABREVIACIONES EN INGLES
No ratings yet
English Abbreviations (ABREVIACIONES EN INGLES
60 pages
Role of Data For Emerging Technologies
87% (15)
Role of Data For Emerging Technologies
12 pages
SQL Practice for Students
No ratings yet
SQL Practice for Students
10 pages
Training Large Language Models Efficiently With Sparsity and Dataflow
No ratings yet
Training Large Language Models Efficiently With Sparsity and Dataflow
11 pages
Project Management Software Guide
No ratings yet
Project Management Software Guide
28 pages
Symbol Technologies PLC: University of Gondar Netapp Storage Implementation For Uog - Ucs Project As-Built Documentation
No ratings yet
Symbol Technologies PLC: University of Gondar Netapp Storage Implementation For Uog - Ucs Project As-Built Documentation
37 pages
4rf LR Aprisa Utility v1.1 Radioenlace
No ratings yet
4rf LR Aprisa Utility v1.1 Radioenlace
2 pages
Abaqus 6.13-1 Installation Guide
No ratings yet
Abaqus 6.13-1 Installation Guide
4 pages
Advanced Web Attacks and Exploitation: Figure 20: Burp Suite Repeater Previous Request and Response
No ratings yet
Advanced Web Attacks and Exploitation: Figure 20: Burp Suite Repeater Previous Request and Response
4 pages
Geological Database
No ratings yet
Geological Database
146 pages
Database Developer's Guide With Visual C++ 4 Second Edition
No ratings yet
Database Developer's Guide With Visual C++ 4 Second Edition
1,351 pages
ICACTCE24 Program Schedule
No ratings yet
ICACTCE24 Program Schedule
13 pages
Computer Science Important Short Questions 1 Year (ICS) : Chapter # 1
83% (12)
Computer Science Important Short Questions 1 Year (ICS) : Chapter # 1
6 pages
DevOps Essentials for IT Professionals
No ratings yet
DevOps Essentials for IT Professionals
1 page
OSAMA KHAN-Software Engineer
No ratings yet
OSAMA KHAN-Software Engineer
1 page
Breakout Board V5 Type English User Manual PDF
No ratings yet
Breakout Board V5 Type English User Manual PDF
14 pages
Incibe Evidence Gathering in Windows
No ratings yet
Incibe Evidence Gathering in Windows
76 pages

Clustering in Data Mining

Uploaded by

Clustering in Data Mining

Uploaded by

Clustering in Data Mining

It is an unsupervised learning technique that aims to identify similarities and patterns in a

A cluster can have the following properties -

Requirements of clustering in data mining:

Partitioning Method: It is used to make partitions on the data in order to

Density-Based Method: The density-based method mainly focuses on

Model-Based Method: In the model-based method, all the clusters are

Applications Of Cluster Analysis:

Advantages of Cluster Analysis:

4. It can be used for anomaly detection and outlier identification.

5. It can be used for market segmentation and customer profiling.

2. It can be sensitive to the presence of noise or outliers in the data.

3. It can be difficult to interpret the results of the analysis if the clusters

4. It can be computationally expensive for large datasets.

5. The results of the analysis can be affected by the choice of clustering

6. It is important to note that the success of cluster analysis depends on

K-Means Clustering Algorithm-

K-Means Clustering Algorithm involves the following steps:

Step 1: Calculate the number of K (Clusters).

Step 2: Randomly select K data points as cluster center.

point and each cluster center.

stopping criteria is met-

 If data points fall in the same cluster

 Reached maximum of iteration

 The newly formed cluster does not change in center points

Figure – K-mean Clustering

, P6(10,7) , P7(3,3) , P8(9,4) , P9(3,7).

iterations for the above data points.

minimum distance that points moved to the nearest cluster centroid.

C1P1 =>(3,3)(1,3) => sqrt[(1–3)²+(3–3)²] => sqrt[4] =>2

C2P1 =>(3,7)(1,3)=> sqrt[(1–3)²+(3–7)²] => sqrt[20] =>4.5

C3P1 =>(9,4)(1,3) => sqrt[(1–9)²+(3–4)²]=> sqrt[65] =>8.1

C1P2 =>(3,3)(2,2) => sqrt[(2–3)²+(2–3)²] => sqrt[2] =>1.4

C2P2 =>(3,7)(2,2)=> sqrt[(2–3)²+(2–7)²] => sqrt[26] =>5.1

C3P2 =>(9,4)(2,2) => sqrt[(2–9)²+(2–4)²]=> sqrt[53] =>7.3

C1P2 =>(3,3)(5,8) => sqrt[(5–3)²+(8–3)²] => sqrt[29] =>5.3

C2P2 =>(3,7)(5,8)=> sqrt[(5–3)²+(8–7)²] => sqrt[5] =>2.2

C3P2 =>(9,4)(5,8) => sqrt[(5–9)²+(8–4)²]=> sqrt[32] =>5.7

Similarly for other distances..

Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7)

Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4)

New center of Cluster 1 => (1+2+3)/3 , (3+2+3)/3 => 2,2.7

New center of Cluster 2 => (5+3+3)/3 , (8+9+7)/3 => 3.7,8

New center of Cluster 3 => (8+10+9)/3 , (5+7+4)/3 => 9,5.3

points with the Euclidean formula and find cluster groups.

Calcualte the distance between data points and K (C1,C2,C3)

C1(2,2.7) , C2(3.7,8) , C3(9,5.3)

C1P1 =>(2,2.7)(1,3) => sqrt[(1–2)²+(3–2.7)²] => sqrt[1.1] =>1.0

C2P1 =>(3.7,8)(1,3)=> sqrt[(1–3.7)²+(3–8)²] => sqrt[32.29] =>4.5

C3P1 =>(9,5.3)(1,3) => sqrt[(1–9)²+(3–5.3)²]=> sqrt[69.29] =>8.3

Similarly for other distances..

Cluster 1 => P1(1,3) , P2(2,2) , P7(3,3)

Cluster 2 => P3(5,8) , P5(3,9) , P9(3,7)

Cluster 3 => P4(8,5) , P6(10,7) , P8(9,4)

Center of Cluster 2 => (5+3+3)/3 , (8+9+7)/3 => 3.7,8

Center of Cluster 3 => (8+10+9)/3 , (5+7+4)/3 => 9,5.3

dot) changed in the 2nd Iteration.

Types of Hierarchical Clustering

Agglomerative Hierarchical clustering

 Step-1: Consider each alphabet as a single cluster and calculate the

Hierarchical clustering has several advantages over other

You might also like