Mean Shift Clustering
Mean Shift Clustering
• Mean shift is a non-parametric clustering algorithm that provides
more flexibility in identifying clusters and does not require prior
knowledge of the number of clusters.
• Mean shift clustering is a density-based clustering algorithm that
identifies the modes of a density function, which represent the
clusters.
• In other words, it finds the areas of the dataset where the probability
density function is the highest and clusters the data points in those
areas together.
• This method of clustering can be useful for identifying clusters of data
that may not be easily separated using other methods.
• In many cases, mean shift clustering is able to find good clustering of
the data, and it is often used in practice for this reason.
How does Mean Shift Clustering work?
• The algorithm starts by initializing a window or kernel around each data
point.
• The kernel can be any type of function that decreases in value as the
distance from the center of the kernel increases.
• The most common kernel function used in mean shift clustering is the
Gaussian kernel.
• The algorithm then computes the mean shift vector for each data point.
• The mean shift vector represents the direction in which the density
function is increasing the most, and its magnitude represents the rate
of increase.
• The mean shift vector is computed as follows:
• where m(xᵢ) is the mean shift vector for data point xᵢ, K is the kernel
function, and N(xᵢ) is the set of data points within the window or
kernel centered at xᵢ.
• The mean shift algorithm then updates the position of each data
point by shifting it in the direction of the mean shift vector:
• Once the algorithm converges, the clusters are defined as the set of
data points that converge to the same mode of the density function.
• Hyperparameter of the Mean Shift algorithm:
• Bandwidth: It determines the size of the kernel (or window) used for
density estimation. A larger bandwidth will result in a smoother and
more spread-out density estimate, while a smaller bandwidth will
result in a more localized density estimate.
Choosing the Right Kernel
• Problem: Consider the following 2D data points:
• X={(1, 1), (1.5, 1.5), (2, 2), (5, 5), (5.5, 5.5), (6, 6)}. We will use a
Gaussian Kernel with a bandwidth (h) of 1.5 to compute the PDF and
perform Mean Shift clustering.
• Step 1: Decide Kernel Function: Gaussian Kernel
• Given n datapoints 𝑥𝑖 , 𝑖 = 1, … , 𝑛 on a d-dimensional space 𝑅𝑑 , the
multivariate kernel density estimate obtained with kernel 𝐾(𝑥) and
window radius or bandwidth (h) is
• Mean Shift vector:
• The mean shift vector always points toward the direction of the maximum
increase in the density. The mean shift procedure, obtained by successive
• Mean shift vector is guaranteed to converge to a point where the gradient
of density function is zero.
• In Mean Shift Clustering, every data point iteratively moves towards the
centroid (or mode) of its local neighborhood, and eventually, all points
converge to high-density regions. These high-density regions represent the
final centroids of the clusters.
• The final positions of the points are the final centroids of the clusters.
Step-by-Step Explanation of the problem
How Final Centroids Are Obtained
Conclusion
Advantages of Mean Shift Clustering over K-Means
Clustering
• No need to specify the number of clusters (k) as a hyperparameter.
The algorithm automatically learns the number of clusters from the
data.
• Mean-Shift can find clusters of arbitrary shapes. It can handle
complex cluster structures. So, we don’t need to make any
assumptions on the shape of clusters.
• Mean-Shift tends to be more robust to noise and outliers in the data.
Unlike K-Means, it does not rely on distances to the centroids of the
clusters. Instead, it relies on the density of the data points.
• In K-Means, we assume all clusters are roughly the same size. Mean
Shift can handle clusters of varying sizes because it focuses on the
density of points.
• Mean Shift can handle clusters that are not well separated by linear
decision boundaries.
• The output does not depend on the random initialization of clusters.
• Disadvantages of Mean-Shift Clustering
• Mean-Shift is computationally expensive O(n²).
• We need to define the radius of the region to search through when
assigning data points into clusters.
Scenarios where to use Mean Shift and DBSCAN
Comparison of K-Means, Hierarchical, DBSCAN,
and Mean Shift Clustering