0% found this document useful (0 votes)

162 views

Clustering in Machine Learning - Javatpoint

This document discusses clustering in machine learning. Clustering groups unlabeled data points into clusters with similar characteristics. It is an unsupervised learning technique. The main types of clustering are partitioning, density-based, model-based, hierarchical, and fuzzy clustering. Popular clustering algorithms include k-means, mean-shift, DBSCAN, expectation maximization, and agglomerative hierarchical clustering. Clustering has applications in areas like cancer research, search engines, customer segmentation, and biology.

Uploaded by

mangotwin22

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

162 views

Clustering in Machine Learning - Javatpoint

Uploaded by

mangotwin22

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Home AI Machine Learning DBMS Java Blockchain Control System Selenium

Clustering in Machine Learning

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset. It can be defined as "A way of grouping the data points into different clusters, consisting
of similar data points. The objects with the possible similarities remain in a group that has less or
no similarities with another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it

deals with the unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML
system can use this id to simplify the processing of large and complex datasets.

The clustering technique is commonly used for statistical data analysis.

Note: Clustering is somewhere similar to the classification algorithm, but the difference is the
type of dataset that we are using. In classification, we work with the labeled data set, whereas
in clustering, we work with the unlabelled dataset.

Example: Let's understand the clustering technique with the real-world example of Mall: When we
visit any shopping mall, we can observe that the things with similar usage are grouped together.
Such as the t-shirts are grouped in one section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are grouped in separate sections, so that we
can easily find out the things. The clustering technique also works in the same way. Other
examples of clustering are grouping documents according to the topic.

The clustering technique can be widely used in various tasks. Some most common uses of this
technique are:

Market Segmentation

Statistical data analysis

Social network analysis

Image segmentation
Anomaly detection, etc.

Apart from these general usages, it is used by the Amazon in its recommendation system to
provide the recommendations as per the past search of products. Netflix also uses this technique
to recommend the movies and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different
fruits are divided into several groups with similar properties.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one
group) and Soft Clustering (data points can belong to another group also). But there are also
other various approaches of Clustering exist. Below are the main clustering methods used in
Machine learning:

1. Partitioning Clustering

2. Density-Based Clustering

3. Distribution Model-Based Clustering

4. Hierarchical Clustering

5. Fuzzy Clustering

Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the
centroid-based method. The most common example of partitioning clustering is the K-Means
Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of
pre-defined groups. The cluster center is created in such a way that the distance between the data
points of one cluster is minimum as compared to another cluster centroid.

Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters, and the
arbitrarily shaped distributions are formed as long as the dense region can be connected. This
algorithm does it by identifying different clusters in the dataset and connects the areas of high
densities into clusters. The dense areas in data space are divided from each other by sparser
areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying
densities and high dimensions.

Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of
how a dataset belongs to a particular distribution. The grouping is done by assuming some
distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).

Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no
requirement of pre-specifying the number of clusters to be created. In this technique, the dataset
is divided into clusters to create a tree-like structure, which is also called a dendrogram. The
observations or any number of clusters can be selected by cutting the tree at the correct level.
The most common example of this method is the Agglomerative Hierarchical algorithm.

Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to more than one
group or cluster. Each dataset has a set of membership coefficients, which depend on the degree
of membership to be in a cluster. Fuzzy C-means algorithm is the example of this type of
clustering; it is sometimes also known as the Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained above. There
are different types of clustering algorithms published, but only a few are commonly used. The
clustering algorithm is based on the kind of data that we are using. Such as, some algorithms
need to guess the number of clusters in the given dataset, whereas some are required to find the
minimum distance between the observation of the dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering
algorithms. It classifies the dataset by dividing the samples into different clusters of equal
variances. The number of clusters must be specified in this algorithm. It is fast with fewer
computations required, with the linear complexity of O(n).

2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.

3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with

Noise. It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages. In this algorithm, the areas of high density are separated by the
areas of low density. Because of this, the clusters can be found in any arbitrary shape.

4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an

alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs
the bottom-up hierarchical clustering. In this, each data point is treated as a single cluster
at the outset and then successively merged. The cluster hierarchy can be represented as a
tree-structure.

6. Affinity Propagation: It is different from other clustering algorithms as it does not require
to specify the number of clusters. In this, each data point sends a message between the
pair of data points until convergence. It has O(N2T) time complexity, which is the main
drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine Learning:

In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.

In Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a
query depends on the quality of the clustering algorithm used.

Customer Segmentation: It is used in market research to segment the customers based on

their choice and preferences.

In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique.

In Land Use: The clustering technique is used in identifying the area of similar lands use in
the GIS database. This can be very useful to find that for what purpose the particular land
should be used, that means for which purpose it is more suitable.

← Prev
Next →

Youtube
For Videos Join Our Youtube Channel: Join Now
Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk SPSS tutorial Swagger T-SQL

tutorial SPSS
tutorial tutorial
Splunk Swagger Transact-SQL

Tumblr React tutorial Regex

tutorial tutorial Reinforcement
ReactJS
learning
Tumblr Regex
tutorial
Reinforcement
Learning

R RxJS tutorial React Native Python

Programming RxJS
tutorial Design Patterns
tutorial React Native
R Programming
Python Design
Patterns
Python Python Keras
Pillow tutorial Turtle tutorial tutorial
Python Pillow Python Turtle Keras

Preparation

Aptitude Logical Verbal Interview

Aptitude
Reasoning Ability Questions
Reasoning Verbal Ability Interview
Questions

Company
Interview
Questions
Company
Questions

Trending Technologies

Artificial AWS Tutorial Selenium Cloud

Intelligence AWS
tutorial Computing
Artificial Selenium Cloud Computing
Intelligence

Hadoop ReactJS Data Science Angular 7

tutorial Tutorial Tutorial Tutorial
Hadoop ReactJS Data Science Angular 7

Blockchain Git Tutorial

Tutorial Git
Blockchain
Machine DevOps
Learning Tutorial
Tutorial DevOps
Machine
Learning

B.Tech / MCA

DBMS Data DAA tutorial Operating

tutorial Structures System
DAA
tutorial
DBMS Operating
Data Structures System

Computer Compiler Computer Discrete

Network Design tutorial Organization Mathematics
tutorial Compiler Design
and Tutorial
Computer
Architecture Discrete
Network Computer Mathematics
Organization
Ethical Computer Software html tutorial
Hacking Graphics Engineering Web Technology
Ethical Hacking
Tutorial Software
Computer Engineering
Graphics

Cyber Automata C Language C++ tutorial

Security Tutorial tutorial C++
tutorial Automata C Programming
Cyber Security

Java tutorial .Net Python List of

Java
Framework tutorial Programs
tutorial Python Programs
.Net

Control Data Mining Data

Systems Tutorial Warehouse
tutorial Data Mining
Tutorial
Control System Data Warehouse

Dublin Resume Template Modern
No ratings yet
Dublin Resume Template Modern
1 page
Bhalendra Bidarkar: Projects
No ratings yet
Bhalendra Bidarkar: Projects
1 page
Ionic 5
No ratings yet
Ionic 5
635 pages
Lcup Compre Stat Review Edited
100% (1)
Lcup Compre Stat Review Edited
44 pages
Information Engineering Facility
No ratings yet
Information Engineering Facility
56 pages
Generate Birth Horoscope, Tamil Jathagam, Tamil Birth Jathakam, South and North Indian Style Horoscope Software
0% (1)
Generate Birth Horoscope, Tamil Jathagam, Tamil Birth Jathakam, South and North Indian Style Horoscope Software
4 pages
Technical Project Manager or I.T. Manager
No ratings yet
Technical Project Manager or I.T. Manager
3 pages
Steps To Move SQL Server Log Shipping Secondary Database Files
No ratings yet
Steps To Move SQL Server Log Shipping Secondary Database Files
6 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Voxtron - Odoo Partner - Company Profile 2024
No ratings yet
Voxtron - Odoo Partner - Company Profile 2024
13 pages
UGC NET Syllabus 2018 For Computer Science and Application
No ratings yet
UGC NET Syllabus 2018 For Computer Science and Application
9 pages
Testing Brush Up
No ratings yet
Testing Brush Up
20 pages
Installation O365 Manual
No ratings yet
Installation O365 Manual
19 pages
Top 60 HTML Interview Questions and Answers (2023)
No ratings yet
Top 60 HTML Interview Questions and Answers (2023)
30 pages
PHD Thesis Wireless Sensor Networks
100% (2)
PHD Thesis Wireless Sensor Networks
5 pages
Customizing CRM by Using The Microsoft Dynamics CRM Developer Toolkit
No ratings yet
Customizing CRM by Using The Microsoft Dynamics CRM Developer Toolkit
57 pages
Data Visualization - Data Mining
No ratings yet
Data Visualization - Data Mining
11 pages
Sample Resume HR
No ratings yet
Sample Resume HR
4 pages
Creating React Native Application 1) The React Native CLI: Preparing The Android Device
No ratings yet
Creating React Native Application 1) The React Native CLI: Preparing The Android Device
7 pages
Microsoft Dynamics CRM Training
No ratings yet
Microsoft Dynamics CRM Training
12 pages
Home Interview Questions Java SQL Python Javascript Angular
No ratings yet
Home Interview Questions Java SQL Python Javascript Angular
21 pages
Unlocking ChatGPT
No ratings yet
Unlocking ChatGPT
17 pages
Machine Learning Notes From AWS
No ratings yet
Machine Learning Notes From AWS
5 pages
Projects Using VB - Net 2012 - 2013
No ratings yet
Projects Using VB - Net 2012 - 2013
16 pages
cs3251-2marks-question-with-answer
No ratings yet
cs3251-2marks-question-with-answer
42 pages
Security Policy Enforcement in Cloud Infrastructure
No ratings yet
Security Policy Enforcement in Cloud Infrastructure
9 pages
Bickely - Qlik Sense Architecture Basics - Tuesday
100% (1)
Bickely - Qlik Sense Architecture Basics - Tuesday
30 pages
Senior Technical Project Manager in Denver CO Resume Fern Van Milligan
No ratings yet
Senior Technical Project Manager in Denver CO Resume Fern Van Milligan
2 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Group4 MicrosoftAzure
100% (1)
Group4 MicrosoftAzure
12 pages
Rameshbabu Rajagopalan PM SM
No ratings yet
Rameshbabu Rajagopalan PM SM
7 pages
Ise Vii Java and J2EE (10is753) Notes
No ratings yet
Ise Vii Java and J2EE (10is753) Notes
73 pages
Image Processing: By: Prof. Monika Shah
No ratings yet
Image Processing: By: Prof. Monika Shah
57 pages
CIO VP Technology in Dallas FT Worth TX Resume John Quinones
No ratings yet
CIO VP Technology in Dallas FT Worth TX Resume John Quinones
3 pages
Dice Resume CV Abhishek Goyal
No ratings yet
Dice Resume CV Abhishek Goyal
5 pages
Angular Learning Path
No ratings yet
Angular Learning Path
3 pages
Ionic Tutorial
No ratings yet
Ionic Tutorial
14 pages
Sde Profile Ashutosh 1 1
No ratings yet
Sde Profile Ashutosh 1 1
1 page
VP Director IT Professional Services in Detroit MI Resume Rick Paul
No ratings yet
VP Director IT Professional Services in Detroit MI Resume Rick Paul
3 pages
2nd Chapter Slide
No ratings yet
2nd Chapter Slide
98 pages
Design of Higher Order Multiplier With Approximate Compressor
No ratings yet
Design of Higher Order Multiplier With Approximate Compressor
7 pages
Prabhjot Gijwani - 16 Years - QA Lead
No ratings yet
Prabhjot Gijwani - 16 Years - QA Lead
9 pages
AWS Architecture Icons: Release 11.1-2021.09.21
No ratings yet
AWS Architecture Icons: Release 11.1-2021.09.21
145 pages
Swarna IT Recruiter @ Tech Mahindra
No ratings yet
Swarna IT Recruiter @ Tech Mahindra
4 pages
CI7130 Coursework 2017 - 18
No ratings yet
CI7130 Coursework 2017 - 18
3 pages
Applied Science Interview Prep
No ratings yet
Applied Science Interview Prep
4 pages
Religion and Humans With Photos
No ratings yet
Religion and Humans With Photos
44 pages
Car Number Plate Detection
No ratings yet
Car Number Plate Detection
10 pages
The Top 10 Reasons Why You Need Synthetic Monitoring: White Paper: Web Performance Management
No ratings yet
The Top 10 Reasons Why You Need Synthetic Monitoring: White Paper: Web Performance Management
6 pages
System Design Resources That Are Not ByteByteGo - DEV Community
100% (1)
System Design Resources That Are Not ByteByteGo - DEV Community
7 pages
Sumit Garg (Senior Network Engineer)
No ratings yet
Sumit Garg (Senior Network Engineer)
4 pages
Energy-Efficient Hybrid Routing Protocol To Extend The Network Lifetime in IoT Applications
No ratings yet
Energy-Efficient Hybrid Routing Protocol To Extend The Network Lifetime in IoT Applications
12 pages
Santosh Curum Java Developer
No ratings yet
Santosh Curum Java Developer
7 pages
DS Fresher Resume
No ratings yet
DS Fresher Resume
3 pages
Hemanshu Kumar Saraf - Resume New
No ratings yet
Hemanshu Kumar Saraf - Resume New
1 page
Suryamereddy 2021
No ratings yet
Suryamereddy 2021
1 page
Test Plan V1
No ratings yet
Test Plan V1
14 pages
Chap 2
No ratings yet
Chap 2
58 pages
Cross Layer Optimization
No ratings yet
Cross Layer Optimization
43 pages
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
No ratings yet
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
192 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Artificial Neural Network Tutorial - Javatpoint
No ratings yet
Artificial Neural Network Tutorial - Javatpoint
13 pages
What Are The Components of Ict 60f50f112b1c112d6c121f88
No ratings yet
What Are The Components of Ict 60f50f112b1c112d6c121f88
3 pages
Vieworder1 Aspx
No ratings yet
Vieworder1 Aspx
3 pages
Transmission Media - Javatpoint
No ratings yet
Transmission Media - Javatpoint
9 pages
Density-Based Clustering in Data Minin - Javatpoint
No ratings yet
Density-Based Clustering in Data Minin - Javatpoint
11 pages
Types of Network PDF
No ratings yet
Types of Network PDF
2 pages
Essay Writing Competition 2021 On DR B R Ambedkar: The Maker of Modern India
No ratings yet
Essay Writing Competition 2021 On DR B R Ambedkar: The Maker of Modern India
1 page
MBA Bharathiar Syllabus
100% (1)
MBA Bharathiar Syllabus
14 pages
Time Series and Forecasting: Mcgraw-Hill/Irwin
No ratings yet
Time Series and Forecasting: Mcgraw-Hill/Irwin
39 pages
TUGAS 1_Metode Penelitian_FADYA AM
No ratings yet
TUGAS 1_Metode Penelitian_FADYA AM
7 pages
Everest Report
No ratings yet
Everest Report
52 pages
Introduction To Power Apps Functions
No ratings yet
Introduction To Power Apps Functions
10 pages
Module 1
No ratings yet
Module 1
2 pages
Subzero Signals Neutrinos Under The Ice
No ratings yet
Subzero Signals Neutrinos Under The Ice
16 pages
A Web-Based Tourism Information
No ratings yet
A Web-Based Tourism Information
58 pages
Wahyuni 2012 Research Note On Research Methods
No ratings yet
Wahyuni 2012 Research Note On Research Methods
12 pages
eng-163-language-education-research_compress
No ratings yet
eng-163-language-education-research_compress
5 pages
Edem Defor CV 2022
No ratings yet
Edem Defor CV 2022
3 pages
Radwa SMM-AcM
No ratings yet
Radwa SMM-AcM
2 pages
Tourism Management: Zhiwei Liu, Sangwon Park
No ratings yet
Tourism Management: Zhiwei Liu, Sangwon Park
12 pages
Analytics Concerns
No ratings yet
Analytics Concerns
10 pages
R Linear 0.961 R Quadratic 0.989 R2 Cubic 0.991
No ratings yet
R Linear 0.961 R Quadratic 0.989 R2 Cubic 0.991
7 pages
MineScape Geostatistics
No ratings yet
MineScape Geostatistics
24 pages
Applying Data Governance using DAMA-DMBOK 2 Framework The Case for Human Capital Management Operations
No ratings yet
Applying Data Governance using DAMA-DMBOK 2 Framework The Case for Human Capital Management Operations
8 pages
Approximate Bayesian Computation (ABC) in Practice
No ratings yet
Approximate Bayesian Computation (ABC) in Practice
9 pages
IA 2023 Rubric
No ratings yet
IA 2023 Rubric
2 pages
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
No ratings yet
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
8 pages
Enhancing Soft Skills of Accounting Undergraduates Through Industrial Training
No ratings yet
Enhancing Soft Skills of Accounting Undergraduates Through Industrial Training
9 pages
T Test Z Test and ANOVA
No ratings yet
T Test Z Test and ANOVA
19 pages
Ba Numerical Ques Ans
No ratings yet
Ba Numerical Ques Ans
41 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
4 A Study On The Impact of Plastic Money
No ratings yet
4 A Study On The Impact of Plastic Money
9 pages
DATA ANALYTICS QUESTION BANK
No ratings yet
DATA ANALYTICS QUESTION BANK
4 pages
Econometrics
No ratings yet
Econometrics
9 pages
CS 412: Introduction To Data Mining Course Syllabus
No ratings yet
CS 412: Introduction To Data Mining Course Syllabus
7 pages