0% found this document useful (0 votes)

3 views

Unit-4 new

The document provides an overview of clustering techniques, including definitions, applications, and various methods such as k-means, hierarchical agglomerative clustering, and DBSCAN. It discusses the importance of distance metrics and the quality of clustering, as well as considerations for cluster analysis. Additionally, it details the steps involved in hierarchical clustering, including agglomerative and divisive approaches, and explains the use of dendrograms for visualizing clustering results.

Uploaded by

PARTH BHARADIA

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Unit-4 new

Uploaded by

PARTH BHARADIA

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Unit-4

Contents

• Clustering
• Choosing distance metrics
• Different clustering approaches
• Hierarchical agglomerative clustering
• k-means (Lloyd’s algorithm)
• DBSCAN
• Relative merits of each method
• Clustering tendency and quality.

2
Clustering

• Cluster: A collection of data objects

o similar (or related) to one another within the same group
o dissimilar (or unrelated) to the objects in other groups
• Cluster analysis (or clustering, data segmentation, …)
Finding similarities between data according to the characteristics found in the data and
grouping similar data objects into clusters
• Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by
examples: supervised)
• Typical applications
o As a stand-alone tool to get insight into data distribution
o As a preprocessing step for other algorithms

3
Clustering: Applications

• Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species
• Information retrieval: document clustering
• Land use: Identification of areas of similar land use in an earth observation database
• Marketing: Help marketers discover distinct groups in their customer bases, and then use this
knowledge to develop targeted marketing programs
• City-planning: Identifying groups of houses according to their house type, value, and
geographical location
• Earth-quake studies: Observed earth quake epicenters should be clustered along continent
faults
• Climate: understanding earth climate, find patterns of atmospheric and ocean
• Economic Science: market research

4
Measure the Quality of Clustering

• Dissimilarity/Similarity metric
o Similarity is expressed in terms of a distance function, typically metric: d(i, j)
o The definitions of distance functions are usually rather different for interval-scaled, boolean,
categorical, ordinal ratio, and vector variables
o Weights should be associated with different variables based on applications and data
semantics

• Quality of clustering:
o There is usually a separate “quality” function that measures the “goodness” of a cluster.
o It is hard to define “similar enough” or “good enough”
o The answer is typically highly subjective

5
Considerations for Cluster Analysis

• Partitioning criteria
o Single level vs. hierarchical partitioning (often, multi-level hierarchical partitioning is
desirable)
• Separation of clusters
o Exclusive (e.g., one customer belongs to only one region) vs. non-exclusive (e.g., one
document may belong to more than one class)
• Similarity measure
o Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-based (e.g., density or
contiguity)
• Clustering space
o Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering)

6
Major Clustering Approaches
• Partitioning approach:
o Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square
errors
o Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
o Create a hierarchical decomposition of the set of data (or objects) using some criterion
o Typical methods: Diana, Agnes, BIRCH, CAMELEON
• Density-based approach:
o Based on connectivity and density functions
o Typical methods: DBSACN, OPTICS, DenClue
• Grid-based approach:
o based on a multiple-level granularity structure
o Typical methods: STING, WaveCluster, CLIQUE

7
8
Partitioning Clustering
• Partitioning method: Construct a partition of a database D of n objects into a set of k clusters.
• Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion
o Global optimal: Continuous evaluation all partitions
o Heuristic methods: k-means and k-medoids algorithms.
• In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.

• There are many algorithms that come under partitioning

method some of the popular ones are K-Mean, PAM(K-
Medoids), CLARA algorithm (Clustering Large Applications)
etc

9
K-means clustering

10
11
Hierarchical Clustering

• Hierarchical Clustering algorithm develops the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram.

• There is no requirement to predetermine the number of clusters as we did in the K-Means algorithm.

• The hierarchical clustering technique has two approaches:

o Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with
taking all data points as single clusters and merging them until one cluster is left.
o Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down
approach.

12
Agglomerative Hierarchical clustering
• It follows the bottom-up approach.
• This algorithm considers each dataset as a single cluster at the beginning, and then start combining the closest
pair of clusters together. It does this until all the clusters are merged into a single cluster that contains all the
datasets.

Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number of clusters will also
be N.

13
Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there will now be N-1
clusters

Step-3: Again, take the two closest clusters and merge them together to form one cluster. There will be N-2 clusters.

14
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider the below images:

Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to divide the clusters as
per the problem.

15
There are various ways to calculate the distance between two clusters, and these ways decide the rule for clustering.
These measures are called Linkage methods.

• Some of the popular linkage methods are given below:

1. Single Linkage: It is the Shortest Distance between the closest points of the clusters.
2. Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one
of the popular linkage methods as it forms tighter clusters than single-linkage.
3. Average Linkage: It is the linkage method in which the distance between each pair of datasets is added
up and then divided by the total number of datasets to calculate the average distance between two
clusters.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is
calculated.

16
1. Single Linkage: It is the Shortest Distance between the closest points of the clusters.

2. Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one of the
popular linkage methods as it forms tighter clusters than single-linkage.

17
3. Average Linkage: It is the linkage method in which the distance between each pair of datasets is added up
and then divided by the total number of datasets to calculate the average distance between two clusters.

4. Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is
calculated.

18
Working of Dendrogram in Hierarchical clustering
The dendrogram is a tree-like structure that is mainly used to store each step as a memory that the HC algorithm
performs. In the dendrogram plot, the Y-axis shows the Euclidean distances between the data points, and the x-
axis shows all the data points of the given dataset.

19
Working of Dendrogram in Hierarchical clustering
• Firstly, the datapoints P2 and P3 combine together and form a cluster, correspondingly a dendrogram is
created, which connects P2 and P3 with a rectangular shape. The height is decided according to the Euclidean
distance between the data points.

• In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created. It is higher than of
previous, as the Euclidean distance between P5 and P6 is a little bit greater than the P2 and P3.

• Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram, and P4, P5, and P6,
in another dendrogram.

• At last, the final dendrogram is created that combines all the data points together

20
Divisive Clustering
• Divisive clustering works just the opposite of agglomerative clustering. It starts by considering all the data
points into a big single cluster and later on splitting them into smaller heterogeneous clusters continuously
until all data points are in their own cluster. Thus, they are good at identifying large clusters. It follows a top-
down approach and is more efficient than agglomerative clustering.

STEPS IN DIVISIVE CLUSTERING

• Consider all the data points as a single cluster.

o Split into clusters using any flat-clustering method, say K-Means.

o Choose the best cluster among the clusters to split further, choose the one that has the largest Sum of
Squared Error (SSE).
o Repeat steps 2 and 3 until a single cluster is formed.

21
Divisive Clustering
• The data points 1,2,...6 are assigned to large
cluster.
• After calculating the proximity matrix,
based on the dissimilarity the points are
split up into separate clusters.
• The proximity matrix is again computed
until each point is assigned to an individual
cluster.
• The proximity matrix and linkage function
follow the same procedure as agglomerative
clustering,

22
Agglomerative Hierarchical clustering: Example
Suppose we have 6 objects (with name A, B, C, D, E and F) and each object have two measured feature X1 and X2. We
can plot the features in a scattered plot to get the visualization of proximity between objects.

X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5

23
Calculate the proximity matrix using single linkage distance

Dist. A B C D E F
In this case, the closest cluster is
A 0 0.71 5.66 3.61 4.24 3.20
between cluster F and D with shortest
B 0.71 0 4.95 2.92 3.54 2.50 distance of 0.5.
C 5.66 4.95 0 2.24 1.41 2.50 Thus, we group cluster D and
D 3.61 2.92 2.24 0 1.00 0.50 F into cluster (D, F). Then we update the

E 4.24 3.54 1.41 1.00 0 1.12 distance matrix

F 3.20 2.50 2.50 0.50 1.12 0

We have 6 objects and we put each object into one cluster. Thus, in the beginning we have 6 clusters.

Our goal is to group those 6 clusters such that at the end of the iterations, we will have only single cluster consists of
the whole six original objects.

24
Distance between ungrouped clusters will not change from the original distance matrix. Now the problem is how to
calculate distance between newly grouped clusters (D, F) and other clusters?

Dist. A B C D, F E In this case, the closest cluster is

A 0 0.71 5.66 ? 4.24 between cluster F and D with shortest

B 0.71 0 4.95 ? 3.54 distance of 0.5.

Thus, we group cluster D and
C 5.66 4.95 0 ? 1.41
F into cluster (D, F). Then we update the
D, F ? ? ? 0 ?
distance matrix
E 4.24 3.54 1.41 ? 0

Using single linkage, we specify minimum distance between original objects of the two clusters.
Using the input distance matrix, distance between cluster (D, F) and cluster A is computed as:
𝑑𝑖𝑠𝑡 𝐷𝐹 → 𝐴 = min(𝑑𝑖𝑠𝑡𝐷𝐴 , 𝑑𝑖𝑠𝑡𝐹𝐴 )
= min(3.61, 3.20)
= 3.20

25
Dist. A B C D, F E
A 0 0.71 5.66 ? 4.24
B 0.71 0 4.95 ? 3.54
C 5.66 4.95 0 ? 1.41
D, F 3.20 ? ? 0 ?
E 4.24 3.54 1.41 ? 0

Similarly: Dist. A B C D, F E
𝑑𝑖𝑠𝑡 𝐷𝐹 → 𝐵 = min(𝑑𝑖𝑠𝑡𝐷𝐵 , 𝑑𝑖𝑠𝑡𝐹𝐵 ) = min(2.92, 2.50) = 2.50 A 0 0.71 5.66 3.20 4.24
𝑑𝑖𝑠𝑡 𝐷𝐹 → 𝐶 = min(𝑑𝑖𝑠𝑡𝐷𝐶 , 𝑑𝑖𝑠𝑡𝐹𝐶 ) = min(2.24, 2.50) = 2.24 B 0.71 0 4.95 2.50 3.54
𝑑𝑖𝑠𝑡 𝐷𝐹 → 𝐸 = min(𝑑𝑖𝑠𝑡𝐷𝐸 , 𝑑𝑖𝑠𝑡𝐹𝐸 ) = min(1.00,1.12) = 1.00 C 5.66 4.95 0 2.24 1.41
D, F 3.20 2.50 2.24 0 1.00
E 4.24 3.54 1.41 1.00 0

26
Looking at the lower triangular updated distance matrix, we found out that the closest distance between cluster B
and cluster A is now 0.71. Thus, we group cluster A and cluster B into a single cluster name (A, B).

Dist. A,B C D, F E
A,B 0 ? ? ?
C ? 0 2.24 1.41
D, F ? 2.24 0 1.00
E ? 1.41 1.00 0

Using the input distance matrix (size 6 by 6), distance between cluster C and cluster (A, B) is computed as:
𝑑𝑖𝑠𝑡 𝐶 → 𝐴𝐵 = min(𝑑𝑖𝑠𝑡𝐶𝐴 , 𝑑𝑖𝑠𝑡𝐶𝐵 ) = min(5.66, 4.95) = 4.95

𝑑𝑖𝑠𝑡 𝐷𝐹 → 𝐴𝐵 = min(𝑑𝑖𝑠𝑡𝐷𝐴 , 𝑑𝑖𝑠𝑡𝐷𝐵 , 𝑑𝑖𝑠𝑡𝐹𝐴 , 𝑑𝑖𝑠𝑡𝐹𝐵 ) = min(3.61, 2.92, 3.20, 2.50) = 2.50

𝑑𝑖𝑠𝑡 𝐸 → 𝐴𝐵 = min(𝑑𝑖𝑠𝑡𝐸𝐴 , 𝑑𝑖𝑠𝑡𝐸𝐵 ) = min(4.24, 3.54) = 3.54

27
The updated distance matrix is:

Dist. A,B C D, F E
A,B 0 4.95 2.50 3.54
C 4.95 0 2.24 1.41
D, F 2.50 2.24 0 1.00
E 3.54 1.41 1.00 0

Observing the updated distance matrix, we can see that the closest distance between clusters happens between cluster
E and (D, F) at distance 1.00. Thus, we cluster them together into cluster ((D, F), E ).

Dist. A,B C (D, F),E

A,B 0 4.95 ?
C 4.95 0 ?
(D, F), E ? ? 0

28
Distance between cluster ((D, F), E) and cluster (A, B) is calculated as:
𝑑𝑖𝑠𝑡 𝐷𝐹 , 𝐸) → (𝐴𝐵) = min(𝑑𝑖𝑠𝑡𝐷𝐴 , 𝑑𝑖𝑠𝑡𝐷𝐵 , 𝑑𝑖𝑠𝑡𝐹𝐴 , 𝑑𝑖𝑠𝑡𝐹𝐵 , 𝑑𝑖𝑠𝑡𝐸𝐴 , 𝑑𝑖𝑠𝑡𝐸𝐵 )
= min(3.61, 2.92, 3.20, 2.50, 4.24, 3.54)
= 2.50
Distance between cluster ((D, F), E) and cluster C can be computed as:
𝑑𝑖𝑠𝑡 𝐷𝐹 , 𝐸) → 𝐶 = min 𝑑𝑖𝑠𝑡𝐷𝐶 , 𝑑𝑖𝑠𝑡𝐹𝐶 , 𝑑𝑖𝑠𝑡𝐸𝐶 = min(2.24, 2.50,1.41)
= 1.41

Dist. A,B C (D, F),E

A,B 0 4.95 2.50
C 4.95 0 1.41
(D, F), E 2.50 1.41 0

we merge cluster ((D, F), E) and cluster C into a new cluster name (((D, F), E), C).

29
The updated distance matrix is shown as:

Dist. A,B ((D, F),E), C

A,B 0 2.50
((D, F), E),C 2.50 0

Distance between cluster ((D, F), E),C and cluster (A,B) can be computed as:
𝑑𝑖𝑠𝑡 ( 𝐷𝐹 , 𝐸), 𝐶) → (𝐴𝐵) = min(𝑑𝑖𝑠𝑡𝐷𝐴 , 𝑑𝑖𝑠𝑡𝐷𝐵 , 𝑑𝑖𝑠𝑡𝐹𝐴 , 𝑑𝑖𝑠𝑡𝐹𝐵 , 𝑑𝑖𝑠𝑡𝐸𝐴 , 𝑑𝑖𝑠𝑡𝐸𝐵 , 𝑑𝑖𝑠𝑡𝐶𝐴 , 𝑑𝑖𝑠𝑡𝐶𝐵 )
= min(3.61, 2.92, 3.20, 2.50, 4.24, 3.54, 5.66, 4.95)
= 2.50

Dist. ((D, F), E),C), (A,B)

((D, F), E),C), (A,B) 0

Now if we merge the remaining two clusters, we will get only single cluster contain the whole 6 objects. Thus, our
computation is finished.
30
We summarized the results of computation as follow:

1. In the beginning we have 6 clusters: A, B, C, D, E and F

2. We merge cluster D and F into cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B into (A, B) at distance 0.71
4. We merge cluster E and (D, F) into ((D, F), E) at distance 1.00
5. We merge cluster ((D, F), E) and C into (((D, F), E), C) at distance 1.41
6. We merge cluster (((D, F), E), C) and (A, B) into ((((D, F), E), C), (A, B)) at distance 2.50
7. The last cluster contain all the objects, thus conclude the computation

31
The hierarchy is given as (((D, F), E),C), (A,B). We can also plot the clustering hierarchy into XY space

Using this information, we can now draw the final results of a dendogram. The dendogram is drawn based on the
distances to merge the clusters above.

32
DBSCAN

33
Clustering Tendency
• A big issue, in cluster analysis, is that clustering methods will return clusters even if the data does not contain any
clusters. In other words, if you blindly apply a clustering method on a data set, it will divide the data into clusters
because that is what it supposed to do.

• Before applying any clustering method on your data, it’s important to evaluate whether the data sets contains
meaningful clusters (i.e.: non-random structures) or not. If yes, then how many clusters are there. This process
is defined as the assessing of clustering tendency or the feasibility of the clustering analysis.

• The two major methods for evaluating the clustering tendency:

i. a statistical (Hopkins statistic) and
ii. a visual methods (Visual Assessment of cluster Tendency (VAT) algorithm).

34
Hopkins statistic
• The Hopkins statistic (Lawson and Jurs 1990) is used to assess the clustering tendency of a data set by measuring
the probability that a given data set is generated by a uniform data distribution. In other words, it tests the spatial
randomness of the data.

Visual method
The algorithm of the visual assessment of cluster tendency (VAT) approach (Bezdek and Hathaway, 2002) is as
follow:
• Compute the dissimilarity (DM) matrix between the objects in the data set using the Euclidean distance measure
• Reorder the DM so that similar objects are close to one another. This process create an ordered dissimilarity
matrix (ODM)
• The ODM is displayed as an ordered dissimilarity image (ODI), which is the visual output of VAT

35
Happy Learning

Hoodoo Herb and Root Magic A Materia Magica of Africanamerican Conjure PDF
3% (36)
Hoodoo Herb and Root Magic A Materia Magica of Africanamerican Conjure PDF
5 pages
Programming Project Set A High
100% (2)
Programming Project Set A High
136 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Clustering
No ratings yet
Clustering
29 pages
Agnes
No ratings yet
Agnes
25 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
DM_C6
No ratings yet
DM_C6
37 pages
clustering
No ratings yet
clustering
6 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Clustering
No ratings yet
Clustering
20 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering
No ratings yet
Clustering
39 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Grouping
No ratings yet
Grouping
98 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Cluster
100% (1)
Cluster
72 pages
Partition
No ratings yet
Partition
52 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Expt-5
No ratings yet
Expt-5
3 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Clustring
No ratings yet
Clustring
20 pages
Clustering
No ratings yet
Clustering
110 pages
Heirarchical clustering
No ratings yet
Heirarchical clustering
22 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Hierarchical Clustering pdf
No ratings yet
Hierarchical Clustering pdf
7 pages
Spooo
No ratings yet
Spooo
9 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
1629189889 ML TCS Lecture Hierarchical 1608
No ratings yet
1629189889 ML TCS Lecture Hierarchical 1608
41 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
19 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Pro linux system administration Learn to build systems for your business using free and open source software Lieverdink instant download
100% (2)
Pro linux system administration Learn to build systems for your business using free and open source software Lieverdink instant download
60 pages
Introduction
No ratings yet
Introduction
45 pages
ADBMS
No ratings yet
ADBMS
12 pages
STAT 2207 Practice
No ratings yet
STAT 2207 Practice
4 pages
Globalteckz Client List Final
No ratings yet
Globalteckz Client List Final
24 pages
Web Application Fundamentals
No ratings yet
Web Application Fundamentals
24 pages
Entity Relationship Model
No ratings yet
Entity Relationship Model
37 pages
Arrays Reference Material
No ratings yet
Arrays Reference Material
114 pages
Design For Additive Manufacturing
No ratings yet
Design For Additive Manufacturing
15 pages
Department of Electrical Engineering EE 383: Instrumentation and Measurements
No ratings yet
Department of Electrical Engineering EE 383: Instrumentation and Measurements
31 pages
Sergey V Kenunen Senior Mainframe Assembler Programmer: 01/2019 - Current
No ratings yet
Sergey V Kenunen Senior Mainframe Assembler Programmer: 01/2019 - Current
4 pages
God of War 3 PC CD Keygen Generator Direct Download For PC - Rar
No ratings yet
God of War 3 PC CD Keygen Generator Direct Download For PC - Rar
3 pages
CSC 325 AI Lecture05 Local Search Fall2024 29092024 101846am
No ratings yet
CSC 325 AI Lecture05 Local Search Fall2024 29092024 101846am
61 pages
Quiz1 RPL (Attempt2)
No ratings yet
Quiz1 RPL (Attempt2)
6 pages
How To Manage A Community On Slack Like The Pros From GoDaddy, Keen IO and SparkPost
No ratings yet
How To Manage A Community On Slack Like The Pros From GoDaddy, Keen IO and SparkPost
14 pages
2.4 Dialog Boxes
No ratings yet
2.4 Dialog Boxes
16 pages
Boolean Algebra
No ratings yet
Boolean Algebra
34 pages
MTA GTFS Alerts Documentation - Dec2021
No ratings yet
MTA GTFS Alerts Documentation - Dec2021
10 pages
Readme (Edrw)
No ratings yet
Readme (Edrw)
2 pages
How to configure SAP Cloud Integration Cloud Foundry using Google IDP
No ratings yet
How to configure SAP Cloud Integration Cloud Foundry using Google IDP
26 pages
NDF Reference Manual
No ratings yet
NDF Reference Manual
8 pages
Operating System
No ratings yet
Operating System
2 pages
Lambda & Let
No ratings yet
Lambda & Let
18 pages
Terms & Conditions Smart N Fluent
No ratings yet
Terms & Conditions Smart N Fluent
4 pages
Main Distribution Frame
100% (1)
Main Distribution Frame
4 pages
Od223010385048650000 1
No ratings yet
Od223010385048650000 1
1 page
Chapter-8 (Memory Management)
No ratings yet
Chapter-8 (Memory Management)
42 pages
NEC Aspire Multi-Button User Guide
No ratings yet
NEC Aspire Multi-Button User Guide
164 pages

Unit-4 new

Uploaded by

Unit-4 new

Uploaded by

Unit-4

• Cluster: A collection of data objects

• There are many algorithms that come under partitioning

• The hierarchical clustering technique has two approaches:

• Some of the popular linkage methods are given below:

STEPS IN DIVISIVE CLUSTERING

o Split into clusters using any flat-clustering method, say K-Means.

E 4.24 3.54 1.41 1.00 0 1.12 distance matrix

F 3.20 2.50 2.50 0.50 1.12 0

Dist. A B C D, F E In this case, the closest cluster is

B 0.71 0 4.95 ? 3.54 distance of 0.5.

𝑑𝑖𝑠𝑡 𝐸 → 𝐴𝐵 = min(𝑑𝑖𝑠𝑡𝐸𝐴 , 𝑑𝑖𝑠𝑡𝐸𝐵 ) = min(4.24, 3.54) = 3.54

Dist. A,B C (D, F),E

Dist. A,B C (D, F),E

Dist. A,B ((D, F),E), C

Dist. ((D, F), E),C), (A,B)

1. In the beginning we have 6 clusters: A, B, C, D, E and F

• The two major methods for evaluating the clustering tendency:

You might also like