Unit – III
Unsupervised Learning, Clustering Support Vector Machines
• Parametric methods when data is known
to have a distribution
• Use data to get the parameters of the
distribution
– Typically few in number
• Maybe too rigid in some cases – ie always
same distribution
Clustering
Clustering
4
Clustering
• (Q) Supervised or unsupervised?
5
Clustering
• (Q) Supervised or unsupervised?
6
Clustering
7
K-means Clustering
• A method to allot the cluster to each
sample
• Minimize overall cost = reconstruction
error = distance between cluster “centers”
• Choosing a cluster for each sample:
8
K-Means Clustering
• Once new centers are obtained,
recalculate the assignment
• Repeat with new means and re-assign
samples
• Continue till no change in centers
9
10
11
12
13
14
15
16
K-Means Clustering
• Choose a random set of cluster centers
• The reconstruction error should be
minimized, hence, differentiate and equate
to zero
• These are the new cluster centers –
nothing but mean of the samples
17
K-Means Clustering
18
Leader Clustering
• Some outliers may skew the means
• Add a parameter “t” – max length
– If not within t of any cluster, sample become a
new cluster head
• Recalculate with new mean and continue
• (Q) What is the value of t for which leader
clustering become K-means?
19
EM Algorithm
20
Expectation Maximization (EM)
Algorithm
• Unsupervised data – only data, no class labels
• Hence, the data consists of two parts – observables X
and unknowns Z
• E-step we estimate these labels given our current
knowledge of components
• M-step we update our component knowledge given the
labels estimated in the E-step.
– (Q) Is K-Means also a type of EM algorithm?
21
EM Example
• Example: Missing Normally Distributed Data
• Assume Gaussian variable
• {5, 11, x, x}
• Start: choose random value for x
• E-step: calculate mean of data
• M-step: replace with new mean from
previous step
• Continue till convergence (Will it??)
22
Coin Example
23
Sample EM Algorithm
• Consider the six samples and the true likelihood of the two classes
• Problem in EM
– What we have: K (ie number of classes), Distribution Pattern Data
points
– What we don’t have: parameters of distribution, class labels
• How to get the two class means?
24
Sample EM Algorithm
• (Q) Outline the application of the EM algorithm for
Gaussian Mixtures.
25
Hierarchical Clustering
26
Hierarchical Clustering
27
Hierarchical Clustering
28
Agglomerative Clustering
• Agglomerate – collect into a group
• Idea: take each sample as a separate class
– Bottom up approach
• Start by combining samples close to each other
– Distance, e.g., in documents, word to vector and
Euclidean distance
• Each iteration – combine two closest samples to
agglomerate (ie, combine)
• Continue until only one group is left
29
Agglomerative Clustering
• Consider the second iteration of agglomerative
clustering – ie, all groups have two elements
• How to combine two groups – what is the
distance between the two groups with two
elements each?
– Single link clustering – calculate all the four (how?)
distances, smallest is distance between groups
– Complete Link Clustering – distance between
clusters is largest distance between all pairs
30
Agglomerative Clustering
31
• Perform Agglomerative clustering and
show the intermetidate steps on the
following data:
1 2 3 4
1 0 7 4 8
2 0 2 1
3 0 8
4 0
32
33
34
35
36
Dendrogram
37
Dendrogram
• Used to visualize the clusters
• Clusters joined at “height” – based on linkage type
• Two combined to one – agglomeration
• Can be “cut” at a height h to not group beyond that
distance
38
• (Q) Draw the dendrogram using single link
clustering and cut at height 0.55. What are
the resulting clusters?
39
Kernelised SVM
40
41
42
Kernels
• Example:
Types for feature extraction: Identity, Blur,
Edge Detection, Sharpening
Operations: Convolution, Pooling, Dilation,
Erosion, Cross-correlation
43