Clustering Analysis: What Is Cluster Analysis?

Cluster analysis groups data objects into clusters based on their characteristics and relationships. The main clustering techniques are hierarchical and partitioning clustering. K-means is a common partitioning algorithm that assigns data points to clusters based on minimizing distances between points and cluster centroids. It works by iteratively updating cluster centroids until clusters are stable. The number of clusters K must be specified, and evaluation metrics like sum of squared errors help select the best K value.

Uploaded by

shyama

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Clustering Analysis: What Is Cluster Analysis?

Uploaded by

shyama

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Clustering Analysis

What is Cluster Analysis?

Cluster analysis groups data objects based only on information found in data that
describes the objects and their relationships.
Main purpose of clustering techniques is to partition a set of entities into different
groups, called clusters.

Goal of Cluster Analysis

The objects within a group be similar to one another and different from the objects
in other groups.

Types of Clustering
Partitioning and Hierarchical Clustering

Hierarchical Clustering
A set of nested clusters organized as a hierarchical tree

Partitioning Clustering
A division data objects into non-overlapping subsets (clusters) such that each data
object is in exactly one subset

What is K-means?

Partitional clustering approach

Each cluster is associated with a centroid (center point)
Each point is assigned to the cluster with the closest centroid
Number of clusters, K, must be specified

Goal of K-means:
To find the best division of n entities in k groups, so that the total distance between
the group's members and its corresponding centroid, representative of the group, is
minimized.

Basic Algorithm of K-means

Select K points as the initial centroids
repeat
Form K clusters by assigning all points to the closest centroid
Re compute the centroid of each cluster
Until the centroids don't change

Details of K-means
Initial centroids are often chosen randomly
The centroid is the mean of the points in the cluster
Closeness is measured by Euclidean distance, cosine similarity,
correlation etc.,
K-means will converge for common similarity measures mentioned above
Most of the convergence happens in the first few iterations.

Euclidean Distance

Update Centroid
We use the following equation to calculate the n dimensional centroid point using k
n-dimensional points.

Evaluating K-means Clusters:

Most Common measure is Sum of Squared Error

For each point, the error is the distance to the nearest cluster
To get SSE, we square the errors and sum them

x is a data point in cluster Ci and mi is the representative point for cluster C i

can show that mi corresponds to the center (mean) of the cluster
Given two clusters, we can choose the one with the smallest error
One easy way to reduce SSE is to increase K, the number of clusters
A good clustering with smaller K can have a lower SSE than a poor clustering
with higher K

How to choose K?
Screeplot
Elbow Method

(or)

Use another clustering method, like EM

Run algorithm on data with several different values of K
Use the prior knowledge about the characteristics of the problem

K-means with Simple Example:

http://mnemstudio.org/clustering-k-means-example-1.htm
Algorithms Used in K-means:
Lloyd's Algorithm:

Initially k random observations are chosen that will serve as the centroids of the k
clusters. Then the following steps occur in iteration till the centroids converge.
The Euclidean distance between each observation and the chosen centroids
is calculated
The observations that are closest to each centroids are tagged within k
buckets
The mean of all the observations in each bucket serves as new centroids
The new centroids replace the old centroids and the iteration goes back to
step 1 if the old and new centroids have not converged

The conditions to converge are the following: the old and the new centroids are
exactly identical, the difference between the centroids is small (of the order of 10^3) or the maximum number of iterations (10 or 100) are reached.

MacQueen's Algorithm:
This is an online version where the first k instances are chosen as centroids
Then each instance is placed in buckets depending on which centroid is
closest to that instance. The respective centroid is recalculated
Repeat this step till each instance is placed in the appropriate bucket
This algorithm only has one iteration and the loop goes on for x instances
Hartigan- Wong Algorithm:
Assign all the points/instances to random buckets and calculate the
respective centroid
Starting from the first instance find the nearest centroid and assigning that
bucket. If the bucket changed then recalculate the new centroids i.e. the
centroid of the newly assigned bucket and the centroid of the old bucket
assignment as those are two centroids that are affected by the change
Loop through all the points and get new centroids
Do a second iteration of points 2 and 3 which performs sort of a clean-up
operation and reassigns stray points to correct buckets.

19 HON HW 32 Complex Number Crossword
100% (1)
19 HON HW 32 Complex Number Crossword
2 pages
Clustering
No ratings yet
Clustering
125 pages
kmea
No ratings yet
kmea
53 pages
08_k-means
No ratings yet
08_k-means
19 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Unit 5
No ratings yet
Unit 5
63 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K Means
No ratings yet
K Means
26 pages
Cluster
No ratings yet
Cluster
50 pages
K Means
No ratings yet
K Means
33 pages
PART2
No ratings yet
PART2
61 pages
algo
No ratings yet
algo
59 pages
DM UNIT IV (1)
No ratings yet
DM UNIT IV (1)
45 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
2 1clustering
No ratings yet
2 1clustering
22 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Kmean
No ratings yet
Kmean
24 pages
Week 9
No ratings yet
Week 9
66 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
K-Means
No ratings yet
K-Means
66 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering Partitional
No ratings yet
Clustering Partitional
48 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
K-Means Clustering-converted-merged
No ratings yet
K-Means Clustering-converted-merged
76 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Partial Differential Equations: Methods and Applications
No ratings yet
Partial Differential Equations: Methods and Applications
6 pages
4 MONALISA Lidocaine Filler - Quality Control
No ratings yet
4 MONALISA Lidocaine Filler - Quality Control
6 pages
Math 172, Calculus Ii
No ratings yet
Math 172, Calculus Ii
22 pages
Eco 324
No ratings yet
Eco 324
194 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
The Importance of Statistic and Probability To The Society
No ratings yet
The Importance of Statistic and Probability To The Society
2 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
8 pages
Basic Algebra7 PDF
100% (1)
Basic Algebra7 PDF
2 pages
Hypothesis Testing 1,2 PPT 1
No ratings yet
Hypothesis Testing 1,2 PPT 1
30 pages
Non Numeric Multi Criteria Multi Person Decision Making
No ratings yet
Non Numeric Multi Criteria Multi Person Decision Making
13 pages
Optimizing Sales Forecasting_ A Comprehensive Analysis
No ratings yet
Optimizing Sales Forecasting_ A Comprehensive Analysis
11 pages
Caro BE121 Module 2 Blackboard Activity 2.1
No ratings yet
Caro BE121 Module 2 Blackboard Activity 2.1
3 pages
AI Lab Assignmentc7
No ratings yet
AI Lab Assignmentc7
5 pages
O.I. Marichev, E.L. Shishkina Overview of Fractional Calculus and Its Computer Implementation in Wolfram Mathematica
No ratings yet
O.I. Marichev, E.L. Shishkina Overview of Fractional Calculus and Its Computer Implementation in Wolfram Mathematica
62 pages
MTH302 Probability and Statistics
No ratings yet
MTH302 Probability and Statistics
7 pages
Signals Systems Question Paper
100% (1)
Signals Systems Question Paper
14 pages
Khasra No.19M, Village: Raipur, Pargana: Bhagwanpur, Tehsil - Roorkee, Distt: - Haridwar, Uttarakhand
No ratings yet
Khasra No.19M, Village: Raipur, Pargana: Bhagwanpur, Tehsil - Roorkee, Distt: - Haridwar, Uttarakhand
5 pages
SS 03 Quiz 1 PDF
No ratings yet
SS 03 Quiz 1 PDF
35 pages
2014 H2 Maths Prelim Papers - JJC P1 Solution PDF
No ratings yet
2014 H2 Maths Prelim Papers - JJC P1 Solution PDF
12 pages
Formula Sheet PDF
No ratings yet
Formula Sheet PDF
16 pages
TY Project Management Question Bank
No ratings yet
TY Project Management Question Bank
6 pages
MS 1 - 3 Report
100% (1)
MS 1 - 3 Report
19 pages
Sec 2-3-4 Examples
No ratings yet
Sec 2-3-4 Examples
9 pages
The Definite Integral - Calculus Volume 1
No ratings yet
The Definite Integral - Calculus Volume 1
18 pages
Cats Combusting
No ratings yet
Cats Combusting
12 pages
EE 413 - Engg Management: Decision Making
No ratings yet
EE 413 - Engg Management: Decision Making
30 pages
Maths
No ratings yet
Maths
9 pages
Expt 3 Partition Coefficient 1
No ratings yet
Expt 3 Partition Coefficient 1
4 pages
MS - Excel - Linear - & - Multiple - Regression Office 2007
No ratings yet
MS - Excel - Linear - & - Multiple - Regression Office 2007
7 pages