Clustering in Machine Learning
Clustering in Machine Learning
The clustering technique can be widely used in various tasks. Some most
common uses of this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its
recommendation system to provide the recommendations as per the past
search of products. Netflix also uses this technique to recommend the
movies and web-series to its users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can
see the different fruits are divided into several groups with similar properties.
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It
is also known as the centroid-based method. The most common example
of partitioning clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to
define the number of pre-defined groups. The cluster center is created in
such a way that the distance between the data points of one cluster is
minimum as compared to another cluster centroid.
Density-Based Clustering
The density-based clustering method connects the highly-dense areas into
clusters, and the arbitrarily shaped distributions are formed as long as the
dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into
clusters. The dense areas in data space are divided from each other by
sparser areas.
These algorithms can face difficulty in clustering the data points if the
dataset has varying densities and high dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based
on the probability of how a dataset belongs to a particular distribution. The
grouping is done by assuming some distributions commonly Gaussian
Distribution.
Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong
to more than one group or cluster. Each dataset has a set of membership
coefficients, which depend on the degree of membership to be in a
cluster. Fuzzy C-means algorithm is the example of this type of clustering;
it is sometimes also known as the Fuzzy k-means algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are
explained above. There are different types of clustering algorithms
published, but only a few are commonly used. The clustering algorithm is
based on the kind of data that we are using. Such as, some algorithms need
to guess the number of clusters in the given dataset, whereas some are
required to find the minimum distance between the observation of the
dataset.
Here we are discussing mainly popular Clustering algorithms that are widely
used in machine learning:
Applications of Clustering
Below are some commonly known applications of clustering technique in
Machine Learning: