0% found this document useful (0 votes)
4 views

clustering

Uploaded by

Gomathy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

clustering

Uploaded by

Gomathy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Clustering in ML

 Clustering or cluster analysis is a machine learning technique, which groups the


unlabelled dataset.
 It is an unsupervised learning method, and it deals with the unlabeled dataset.
 It can be defined as "A way of grouping the data points into different clusters,
consisting of similar data points. The objects with the possible similarities remain in a
group that has less or no similarities with another group."
 It does it by finding some similar patterns in the unlabelled dataset such as shape,
size, color, behavior, etc., and divides them as per the presence and absence of those
similar patterns.
 After applying this clustering technique, each cluster or group is provided with a
cluster-ID. ML system can use this id to simplify the processing of large and complex
datasets.
Hard clustering - datapoint belongs to only one group and
- Soft Clustering - data points can belong to another group also.
• Partitioning Clustering : It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the
centroid-based method. The most common example of partitioning clustering is the K-Means Clustering algorithm.

• The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped distributions
are formed as long as the dense region can be connected. This algorithm does it by identifying different clusters in the
dataset and connects the areas of high densities into clusters.

• The distribution model-based clustering method, the data is divided based on the probability of how a dataset belongs to
a particular distribution. The grouping is done by assuming some distributions commonly Gaussian Distribution.

• The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture Models
(GMM).

• Hierarchical clustering, the dataset is divided into clusters to create a tree-like structure, which is also called a
dendrogram. The observations or any number of clusters can be selected by cutting the tree at the correct level. The most
common example of this method is the Agglomerative Hierarchical algorithm.

• Fuzzy clustering is a type of soft method in which a data object may belong to more than one group or cluster. Each
dataset has a set of membership coefficients, which depend on the degree of membership to be in a cluster. Fuzzy C-means
algorithm is the example of this type of clustering; it is sometimes also known as the Fuzzy k-means algorithm.
K-Means clustering

• There is an algorithm that tries to minimize the distance of the points in a


cluster with their centroid – the k-means clustering technique.
• K-Means clustering is an unsupervised iterative clustering technique.
• It partitions the given data set into k predefined distinct clusters.
• A cluster is defined as a collection of data points exhibiting certain
similarities.

• K-means is a centroid-based algorithm or a distance-based algorithm,


where we calculate the distances to assign a point to a cluster. In K-Means,
each cluster is associated with a centroid.
• The main objective of the K-Means algorithm is to minimize the sum of
distances between the points and their respective cluster centroid .
K-Means ….
It partitions the data set such that-
• Each data point belongs to a cluster with the nearest mean.
• Data points belonging to one cluster have high degree of similarity.
• Data points belonging to different clusters have high degree of dissimilarity.
K-Means
Stopping Criteria for K-Means Clustering

• There are essentially three stopping criteria that can be adopted to stop the K-means algorithm:
1. Centroids of newly formed clusters do not change

2. Points remain in the same cluster

3. Maximum number of iterations is reached

You might also like