0% found this document useful (0 votes)
12 views91 pages

Eml 10 250825

Uploaded by

isgdarklight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views91 pages

Eml 10 250825

Uploaded by

isgdarklight
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

AI20001

Essentials of Machine
Learning
Unsupervised Learning
Koustav Rudra
25/08/2025
Example: Face Clustering
Example: Search result clustering
Example: Google News
A data set with clear cluster structure

What are some of the


issues for clustering?

What clustering algorithms


can we use?
Issues for clustering

• Representation for clustering


• How do we represent an example
• features, etc.
• Similarity/distance between examples

• Flat clustering or hierarchical?

• How many clusters you want to create?


• Fixed a priori
• Data driven
Major Types of Clustering Algorithms
• Flatalgorithms
• Usually start with a random partitioning
• Refine it iteratively
• Example: K-means clustering
• Produces disjoint set of groups

• Hierarchical algorithms
• Bottom-up, agglomerative
• Top-down, divisive
Hard vs. soft clustering

• Hard
clustering:
Each example belongs to exactly one cluster

• Softclustering:
• An example can belong to more than one cluster (probabilistic)
• A pair of sneakers may belong to two groups
• sports apparel and shoes
Flat Clustering: K-mean Clustering algorithm
K-means

• K-means is simple, efficient and widely used

• Main steps of k-means:


When to stop?
Some possibilities
1. after fixed #
STEP 1: Start with k initial cluster
centers (that is why k-mean J) iteration
2. when centers do
not change
STEP 2: Assign/cluster each

go back here
member to the closest center

Iterative steps

STEP 3: Recalculate centers as the


mean of the points in a cluster
K-means: an example
K-means: Initialize centers randomly
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center

No changes: Done
K-means

Iterate:
• Assign/cluster each example to closest center
• Recalculate centers as the mean of the points in a cluster

How do we do this?
K-means
Iterate:
• Assign/cluster each example to closest center

• Iterate over each point:


• - get distance to each cluster center
• - assign to closest center (hard cluster)

• Recalculate centers as the mean of the points in a cluster


K-means
Iterate:
n Assign/cluster each example to closest center

n Iterate over each point:


n - get distance to each cluster center
n - assign to closest center (hard cluster)

n Recalculate centers as the mean of the points in a cluster

What distance measure should we use?


Distance measure

Euclidean:
n
d(x, y) = ∑ (xi − yi )2
i=1

x and y are n-dimensional vectors:

x = (𝑥!, 𝑥", … . . , 𝑥# )
y = (𝑦!, 𝑦", … . . , 𝑦# )
K-means

Iterate:
• Assign/cluster each example to closest center
• Recalculate centers as the mean of the points in a cluster

Where are the cluster centers?


K-means

Iterate:
• Assign/cluster each example to closest center
• Recalculate centers as the mean of the points in a cluster

How do we calculate these?


K-means

Iterate:
• Assign/cluster each example to closest center
• Recalculate centers as the mean of the points in a cluster

Mean of the points in the cluster:

1
µ (C) = ∑
| C | x∈C
x

where:

n x n x
x + y = ∑ xi + yi =∑ i
i=1 C i=1 C
K-means loss function

K-means tries to minimize what is called the “k-means”


loss function:

n
loss = ∑ d(xi , µ k ) where µ k is cluster center for xi
2

i=1

That is, the sum of the squared distances from each point to the
associated cluster center.
K-means algorithm
Randomly initialize cluster centroids

Repeat {
for = 1 to
Cluster := index (from 1 to ) of cluster centroid
assignment
closest to
Move for = 1 to
centroid := average (mean) of points assigned to cluster

}
Running time of Kmeans

• In every iteration

• Assign data points to closest cluster center


• O(kn) time (k = # clusters, n = # data points)

• Change the cluster center to the average of its assigned points


• O(n)
K-means: Big Issues

• Value of k (# cluster)

• Convergence
• A fixed number of iterations
• partitions unchanged
• Cluster centers do not change

• Initial (seed) cluster centers


KMEANS: VALUE OF K
Elbow method
• Run kmeans with different k
• Plot k-means loss vs. k
• Choose k such that the curve show an elbow shape

Kmeans loss
Good value of k
Silhouette Measure

• Key Idea for good clustering

• Small within cluster variance

• Large between cluster variance


Silhouette Measure

𝐶! 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑓𝑜𝑟 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 𝑖


Within cluster measure d(i,j) = distance between i & j

Between cluster measure

Silhouette measure

s(i) = 0, if 𝑪𝒊 = 𝟏
Silhouette Plot
• Propertyof the Silhouette measure:
• High score is better

Average Silhouette score


Desirable value of k
𝒏
𝟏
4 𝑺(𝒊)
𝒏
𝒊#𝟏
Initial (seed) cluster centers
K-means: Initialize centers randomly

What would happen here?


K-means: Initialize centers randomly

Bad clustering
Choice of Initial Centroids

• Resultscan vary drastically based on random seed selection


• Slow convergence
• converges to sub-optimal clustering

• Common heuristics
• Random centers in the space
• Randomly pick from feature vectors
• Points least similar to any existing center (furthest centers heuristic)
• Try out multiple starting points
• Initialize with the results of another clustering method
Furthest centers heuristic

• μ1 = pick random point (the first center)

• 𝒇𝒐𝒓 𝒊 = 𝟐 𝒕𝒐 𝒌 (# cluster)
• μi = point that is furthest from any previous centers
K-means: Initialize furthest from centers

Say, k = 3

• Pick a random point for the first center


• Which point will be chosen next?
• Next point?
K-means: Initialize furthest from centers

Furthest point from center

Any issues/concerns with this approach?


Furthest points concerns

If k = 4, which points will get chosen?

Doesn’t deal well with outliers


A Better Approach

• K-means++
• Centers are initialized using a probabilistic approach
• Other steps are exactly the same as the standard k-means algorithm

Cluster centers initialization:

1. Choose one center 𝒄𝟏 randomly from the data (X)


2. For each 𝒙 ∈ 𝑿, compute D(x) (D(x) is the distance of x from the
closest center we have already chosen)
𝑫𝒙𝟐
3. Select a data 𝒙 from X as a new center with probability ∑𝒙∈𝑿 𝑫 𝒙 𝟐

4. Repeat step 2 and 3 until k centers are chosen


K-means++
• Cluster centers initialization:
1. Choose the first center randomly from the data, X
2. For each 𝒙 ∈ 𝑿, compute D(x) (D(x) is the distance of x from the closest center we have already
chosen)
𝑫 𝒙 𝟐
3. Select a data 𝒙 from X as a new center with probability ∑
𝒙∈𝑿 𝑫 𝒙 𝟐
4. Repeat step 2 and 3 until k centers are chosen

Illustration: say we want to create 3 clusters

most likely second center

Quiz:
first center Given the two centers, what will be the D(x)?
length of black line or read line?
x

most likely third center


Clustering Graph/Network Data
Graph/Network Data

• So far, we talked about data as n-dimensional point

• What about the following?

• Facebook friendship network

• Communities in question-answering website like Quora

• Protein interaction network


Facebook Friendship Graph/Network
Problem You Want to Solve

• Find coherent groups


• friend circles in Facebook

• Group similar objects

• So, essentially, it is again a grouping problem with different type of data


Example: Protein Interaction Network
What is Graph?

• A graph is composed of two things


1. Set of objects (called nodes of the graph)
2. Set of connections (called edges of the graph)

node

edge
Types of Graph
• Un-weighted
• Edges do not have weight
• Edges simply say whether two objects are connected or not

• Weighted
• Edges have weight
• Examples:
• Email communication network
• How often two persons exchanging emails?
• City and Road network
• Edge weight: the distance between two cities
Graph Data

• Graph data shows relationship between object pairs

• Therefore, centroid is often not valid in graph

• What does centroid mean in your Facebook friend network?

• How do you define the center in a graph?


Clustering Un-weighted Graph
Graph Clustering: Minimum Cut (or Mincut)

Min cut of a graph


• Partition on nodes in two groups 𝑆! 𝑎𝑛𝑑 𝑆" so that the # edges between
𝑆! 𝑎𝑛𝑑 𝑆" are minimized
Example
For the above graph, the min cut partition is {a,b,e,f} and {c,d,g,h}
Graph Clustering using Mincut

• Use a min cut algorithm to break a graph into two sets

• Use the min cut algorithm to further break the smaller graphs

• Continue until the stopping condition is satisfied


Karger’s Min Cut Algorithm

The algorithm is based on edge contraction

• Repeat until just two nodes remain

1. Pick an edge at random

2. Collapse its two endpoints into a single node


Karger’s Min Cut Algorithm: Example
14 edges to choose from
Pick b –f with prob 1/14 9 edges to choose from
Pick ae – bf with prob 4/9

13 edges to choose from


Pick g - h with prob 1/13
5 edges to choose from
Pick c - dgh with prob 3/5

12 edges to choose from


Pick d - gh with prob 2/12

DONE: just two nodes

10 edges to choose from


Min cut value:
Pick a - e with prob 1/10 # of parallel edges in the final two-node graph

For this example: min cut is 2


Karger’s Min Cut Algorithm

• An Important Note

• It is a randomized algorithm

• Therefore, for good result do the following

1. Run Karger’s algorithm multiple times


(it will produce multiple cuts)

2. Take the cut with minimum value


Clustering Weighted Graph
Hierarchical Clustering algorithms
• Agglomerative (bottom-up)

• Start with each object being a single cluster

• Gradually merge two most similar clusters

• Divisive (top-down)
• Start out with all objects in the same cluster.

• Then in each step of the algorithm do the following


• Partition the cluster into two smaller clusters maximizing the
distance
Hierarchical Clustering: Important Notes
1. Does not require the number of clusters in advance

2. Needs a termination/readout condition


• Could be distance threshold
Rest of the Lecture

• Hierarchical Agglomerative Clustering (HAC)

• And for simple reasons


• Simpler
• Widely used
Hierarchical Agglomerative Clustering (HAC)
• Define a similarity function for determining the similarity of two instances.

• Starts with all instances in a separate cluster

• Then repeatedly joins the two clusters that are most similar

• The history of merging forms a hierarchy (called dendogram).


65

Hierarchical Clustering

• The important question

How do you determine the “nearness” of clusters?


Closest pair of clusters

Many variants to defining closest pair of clusters

• Single-link
• Distance of the “closest” points (single-link)

• Complete-link
• Distance of the “furthest” points

• Average-link
• Average distance between pairs of elements
Single Link Agglomerative Clustering

• Use maximum similarity of pairs:

#$%#&$ "& ' ! = $%& #$%# "" !!


"!&$ " !!& '

6
4

10

Cluster 1 Cluster 2

Distance between Cluster 1 and 2: 3


Single Link Example
Problem with Single Link Clustering

Chain or elongated clusters

They are far apart, yet they are in the same cluster
Complete Link Agglomerative Clustering
• Use minimum similarity of pairs:

#$% #&$ "& ' ! = $%& #$% # "" ! !


"!&$ " !!& '

6
4

10

Cluster 1 Cluster 2

Distance between Cluster 1 and 2: 10


• Makes “tighter,” spherical clusters that are typically preferable.
Complete Link Example
Example in Detail: Single Link Clustering
Example in Detail: Single Link Clustering
Example in Detail: Single Link Clustering
Example in Detail: Single Link Clustering
Spectral Clustering
Spectral Clustering
Spectral Clustering: Examples
Spectral Clustering

• Group points based on links in the graph


Creating Graph from n-dimensional data

• Use Gaussian Kernel to compute similarity between objects i and j

• Possible Graphs:
• Fully
connected
• Connect only k-nearest neighbours
Image Segmentation
Why not min cut?
Graph Partitioning
Useful Terminologies
Graph Cut
Normalized cut
Solving Normalized Cut

NP Hard!
Solving Normalized Cut: Approximate Solution

The second smallest eigenvector is the real valued solution to this problem!!
Two-way Normalized Cut
Portioning using 2nd Eigenvector

• Second eigenvector takes continuous values


• Difficult to find a clear threshold to split

• How to choose splitting threshold?


• Pick the median value as splitting point
• Look for the splitting point that has the minimum Normalized cut value:
1. Choose n possible splitting points.
2. Compute Normalized cut value.
3. Pick minimum.
THANK YOU

You might also like