0% found this document useful (0 votes)

256 views32 pages

Unsupervised Learning: Clustering Techniques

The document discusses unsupervised learning techniques, specifically clustering. It describes different types of clustering methods like partitioning, hierarchical and density-based clustering. It also explains the widely used K-means clustering algorithm and how it works through iterative steps of assigning data points to centroids and recalculating the centroids.

Uploaded by

xelos82649

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

256 views32 pages

Unsupervised Learning: Clustering Techniques

Uploaded by

xelos82649

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

ADITYA COLLEGE OF ENGINEERING & TECHNOLOGY

Machine Learning
UNIT-4
Unsupervised Learning Techniques
By
B Manikyala Rao M.Tech(Ph.d)
Senior Assistant Professor
Dept of Computer Science & Engineering
Aditya College of Engineering & Technology
Surampalem
Aditya College of Engineering & Technology
Clustering in Machine Learning
➢A way of grouping the data points into different clusters, consisting of similar
data points. The objects with the possible similarities remain in a group that
has less or no similarities with another group.“
➢It is an unsupervised learning method, hence no supervision is provided to the
algorithm, and it deals with the unlabeled dataset.
➢After applying this clustering technique, each cluster or group is provided with
a cluster-ID. ML system can use this id to simplify the processing of large and
complex datasets.
Example: Let's understand the clustering technique with the real-world example
of Mall: When we visit any shopping mall, we can observe that the things with
similar usage are grouped together. Such as the t-shirts are grouped in one
section, and trousers are at other sections, similarly, at vegetable sections,
apples, bananas, Mangoes, etc., are grouped in separate sections, so that we
can easily find out the things. The clustering technique also works in the same
way.
Machine Learning B Manikyala Rao
Clustering Aditya College of Engineering & Technology

The clustering technique can be widely used in various tasks. Some

most common uses of this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
• Apart from these general usages, it is used by the Amazon in its
recommendation system to provide the recommendations as per the
past search of products. Netflix also uses this technique to
recommend the movies and web-series to its users as per the watch
history.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Types of Clustering Methods

• Partitioning Clustering :It is a type of clustering that divides the data
into non-hierarchical groups. It is also known as the centroid-based
method. The most common example of partitioning clustering is
the K-Means Clustering algorithm.
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Applications of Clustering
• In Identification of Cancer Cells: The clustering algorithms are widely used
for the identification of cancerous cells. It divides the cancerous and non-
cancerous data sets into different groups.
• In Search Engines: Search engines also work on the clustering technique.
The search result appears based on the closest object to the search query.
It does it by grouping similar data objects in one group that is far from the
other dissimilar objects. The accurate result of a query depends on the
quality of the clustering algorithm used.
• Customer Segmentation: It is used in market research to segment the
customers based on their choice and preferences.
• In Biology: It is used in the biology stream to classify different species of
plants and animals using the image recognition technique.
• In Land Use: The clustering technique is used in identifying the area of
similar lands use in the GIS database. This can be very useful to find that
for what purpose the particular land should be used, that means for which
purpose it is more suitable.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

K-Means Clustering Algorithm

• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on.
• It is an iterative algorithm that divides the unlabeled dataset into k different
clusters in such a way that each dataset belongs only one group that has similar
properties.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

• Step-1: Select the number K to decide the number of clusters.
• Step-2: Select random K points or centroids. (It can be other from the input dataset).
• Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given
below:

• Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.
• We need to choose some random k points or centroid to form the cluster. These points can be
either the points from the dataset or any other point. So, here we are selecting the below two
points as k points, which are not the part of our dataset. Consider the below image:

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Algorithm
Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute
it by applying some mathematics that we have studied to calculate the distance between two points.
So, we will draw a median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and
points to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for
clear visualization.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

As we need to find the closest cluster, so we will repeat the process by

choosing a new centroid. To choose the new centroids, we will
compute the center of gravity of these centroids, and will find new
centroids as below:

Next, we will reassign each datapoint to the new centroid. For this, we
will repeat the same process of finding a median line. The median will
be like below image:

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

• From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.

• As reassignment has taken place, so we will again go to the step-4, which is finding new centroids
or K-points.
• We will repeat the process by finding the center of gravity of centroids, so the new centroids will
be as shown in the below image:

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

As we got the new centroids so again will draw the median line and reassign the data points. So, the
image will be:

We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will
be as shown in the below image:

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Limits of K Means
• The most important limitations of Simple k-means are: The user has to
specify k (the number of clusters) in the beginning.
• k-means can only handle numerical data.
• k-means assumes that we deal with spherical clusters and that each cluster
has roughly equal numbers of observations.
Implementation:
from sklearn.cluster import Kmeans
k=5
kmeans = KMeans(n_clusters=k)
y_pred = kmeans.fit_predict(X)

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Semi-Supervised Cluster Analysis

• Semi-supervised clustering is a method that partitions unlabeled data by creating the use
of domain knowledge. It is generally expressed as pairwise constraints between
instances or just as an additional set of labeled instances.
• The quality of unsupervised clustering can be essentially improved using some weak
structure of supervision, for instance, in the form of pairwise constraints (i.e., pairs of
objects labeled as belonging to similar or different clusters). Such a clustering procedure
that depends on user feedback or guidance constraints is known as semisupervised
clustering.
There are several methods for semi-supervised clustering that can be divided into two
classes which are as follows
Constraint-based semi-supervised clustering − It can be used based on user-provided
labels or constraints to support the algorithm toward a more appropriate data
partitioning. This contains modifying the objective function depending on constraints or
initializing and constraining the clustering process depending on the labeled objects.
Distance-based semi-supervised clustering − It can be used to employ an adaptive
distance measure that is trained to satisfy the labels or constraints in the supervised
data. Multiple adaptive distance measures have been utilized, including string-edit
distance trained using Expectation-Maximization (EM), and Euclidean distance changed
by the shortest distance algorithm.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

• An interesting clustering method, known as CLTree (CLustering based on decision

TREEs). It integrates unsupervised clustering with the concept of supervised
classification. It is an instance of constraint-based semi-supervised clustering. It
changes a clustering task into a classification task by considering the set of points
to be clustered as belonging to one class, labeled as “Y,” and inserts a set of
relatively uniformly distributed, “nonexistence points” with a multiple class label,
“N.”
• The problem of partitioning the data area into data (dense) regions and empty
(sparse) regions can then be changed into a classification problem. These points
can be considered as a set of “Y” points. It shows the addition of a collection of
uniformly distributed “N” points, defined by the “o” points.
• The original clustering problem is thus changed into a classification problem,
which works out a design that distinguishes “Y” and “N” points. A decision tree
induction method can be used to partition the two-dimensional space. Two
clusters are recognized, which are from the “Y” points only.
• It can be used to insert a large number of “N” points to the original data can
introduce unnecessary overhead in the calculation. Moreover, it is unlikely that
some points added would truly be uniformly distributed in a very high-
dimensional space as this can need an exponential number of points.
Machine Learning B Manikyala Rao
Aditya College of Engineering & Technology
Using clustering for image segmentation
• Image segmentation is the task of partitioning an image into multiple segments. In semantic segmentation, all pixels
that are part of the same object type get assigned to the same segment.
• For example, in a self-driving car’s vision system, all pixels that are part of a pedestrian’s image might be assigned to
the “pedestrian” segment.
• Here, we are going to do something much simpler: color segmentation. We will simply assign pixels to the same
segment if they have a similar color. In some applications, this may be sufficient, for example if you want to analyze
satellite images to measure how much total forest area there is in a region, color segmentation may be just fine.
• First, let’s load the image using Matplotlib’s imread() function:
from matplotlib.image import imread
image = imread(“path")
image.shape (533, 800, 3)
• The image is represented as a 3D array: the first dimension’s size is the height, the second is the width, and the third is
the number of color channels, in this case red, green and blue (RGB).
• The following code reshapes the array to get a long list of RGB colors, then it clusters these colors using K-Means. For
example, it may identify a color cluster for all shades of green. Next, for each color (e.g., dark green), it looks for the
mean color of the pixel’s color cluster.
X = image.reshape(-1, 3)
kmeans = KMeans(n_clusters=8).fit(X)
segmented_img = kmeans.cluster_centers_[kmeans.labels]
segmented_img = segmented_img.reshape(image.shape)
Machine Learning B Manikyala Rao
Aditya College of Engineering & Technology

Using Clustering for Preprocessing

• Clustering can be an efficient approach to dimensionality reduction, in particular as a
preprocessing step before a supervised learning algorithm.
• let’s tackle the digits dataset which is a simple MNIST-like dataset containing 1,797
grayscale 8×8 images representing digits 0 to 9. First, let’s load the dataset:
from sklearn.datasets import load_digits
X_digits, y_digits = load_digits(return_X_y=True)
Now, let’s split it into a training set and a test set:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_digits, y_digits)
Next, let’s fit a Logistic Regression model:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)
Let’s evaluate its accuracy on the test set:
log_reg.score(X_test, y_test)
0.9666666666666667
Machine Learning B Manikyala Rao
Aditya College of Engineering & Technology

DBSCAN
• For each instance, the algorithm counts how many instances are located within a small
distance ε (epsilon) from it. This region is called the instance’s εneighborhood.
• If an instance has at least min_samples instances in its ε-neighborhood (includ‐ ing
itself), then it is considered a core instance. In other words, core instances are those that
are located in dense regions.
• All instances in the neighborhood of a core instance belong to the same cluster. This may
include other core instances, therefore a long sequence of neighboring core instances
forms a single cluster.
• Any instance that is not a core instance and does not have one in its neighbor‐ hood is
considered an anomaly.
• Let’s test it on the moons dataset,
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=1000, noise=0.05)
dbscan = DBSCAN(eps=0.05, min_samples=5)
dbscan.fit(X)

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Dimensionality Reduction
• The number of input features, variables, or columns present in a
given dataset is known as dimensionality, and the process to reduce
these features is called dimensionality reduction.
• A dataset contains a huge number of input features in various cases,
which makes the predictive modeling task more complicated. Because
it is very difficult to visualize or make predictions for the training
dataset with a high number of features, for such cases, dimensionality
reduction techniques are required to use.
• It is a way of converting the higher
dimensions dataset into lesser
dimensions dataset ensuring that it
provides similar information.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

The Curse of Dimensionality

• Handling the high-dimensional data is very difficult in practice, commonly known as
the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of
features increases, the number of samples also gets increased proportionally, and the
chance of overfitting also increases. If the machine learning model is trained on high-
dimensional data, it becomes overfitted and results in poor performance.
Benefits of applying Dimensionality Reduction:
• By reducing the dimensions of the features, the space required to store the dataset also
gets reduced.
• Less Computation training time is required for reduced dimensions of features.
• Reduced dimensions of features of the dataset help in visualizing the data quickly.
• It removes the redundant features (if present) by taking care of multicollinearity.
Disadvantages of dimensionality Reduction:
• Some data may be lost due to dimensionality reduction.
• In the PCA dimensionality reduction technique, sometimes the principal components
required to consider are unknown.
Machine Learning B Manikyala Rao
Aditya College of Engineering & Technology

1.Feature Selection: Feature selection is the process of selecting the subset of the relevant features and leaving out the irrelevant features
present in a dataset to build a model of high accuracy. In other words, it is a way of selecting the optimal features from the input dataset.
Three methods are used for the feature selection:
A. Filters Methods:
In this method, the dataset is filtered, and a subset that contains only the relevant features is taken. Some common techniques of filters
method are:
• Correlation
• Chi-Square Test
• ANOVA
• Information Gain, etc.
B. Wrappers Methods: The wrapper method has the same goal as the filter method, but it takes a machine learning model for its evaluation. In
this method, some features are fed to the ML model, and evaluate the performance. The performance decides whether to add those
features or remove to increase the accuracy of the model. This method is more accurate than the filtering method but complex to work.
Some common techniques of wrapper methods are:
• Forward Selection
• Backward Selection
• Bi-directional Elimination
C. Embedded Methods: Embedded methods check the different training iterations of the machine learning model and evaluate the
importance of each feature. Some common techniques of Embedded methods are:
• LASSO
• Elastic Net
• Ridge Regression, etc.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

2.Feature Extraction:Feature extraction is the process of transforming

the space containing many dimensions into space with fewer
dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the
information.
Some common feature extraction techniques are:
• Principal Component Analysis
• Linear Discriminant Analysis
• Kernel PCA
• Quadratic Discriminant Analysis

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Approaches of Dimension Reduction

Two main approaches to reducing dimensionality:
1.projection and
2.Manifold Learning.
1.Projection:
• In most real-world problems, training instances are not spread out
uniformly across all dimensions.
• Many features are almost constant, while others are highly
correlated. As a result, all training instances actually lie within (or
close to) a much lower-dimensional subspace of the high-dimensional
space.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

• Notice that all training instances lie close to a plane: this is a lower-
dimensional (2D) subspace of the high-dimensional (3D) space. Now
if we project every training instance perpendicularly onto this
subspace (as represented by the short lines connecting the instances
to the plane), we get the new 2D dataset.
• We have just reduced the dataset’s dimensionality from 3D to 2D

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

• projection is not always the best approach to dimensionality

reduction. In many cases the subspace may twist and turn, such as in
the famous Swiss roll toy data‐ set

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

2. Manifold Learning:
• The Swiss roll is an example of a 2D manifold. Put simply, a 2D
manifold is a 2D shape that can be bent and twisted in a higher-
dimensional space

Machine Learning B Manikyala Rao

Principal Component Analysis Aditya College of Engineering & Technology

• Principal Component Analysis is an unsupervised learning algorithm

that is used for the dimensionality reduction in machine learning.
• It is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are
called the Principal Components.
• PCA generally tries to find the lower-dimensional surface to project
the high-dimensional data.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Steps for PCA algorithm

• Getting the dataset
Firstly, we need to take the input dataset and divide it into two subparts X and Y,
where X is the training set, and Y is the validation set.
• Representing data into a structure
Now we will represent our dataset into a structure. Such as we will represent the
two-dimensional matrix of independent variable X. Here each row corresponds to
the data items, and the column corresponds to the Features. The number of
columns is the dimensions of the dataset.
• Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column, the
features with high variance are more important compared to the features with
lower variance.
If the importance of features is independent of the variance of the feature, then
we will divide each data item in a column with the standard deviation of the
column. Here we will name the matrix as Z.
• Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose it.
After transpose, we will multiply it by Z. The output matrix will be the Covariance
matrix of Z.
Machine Learning B Manikyala Rao
Aditya College of Engineering & Technology

• Calculating the Eigen Values and Eigen Vectors

Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance
matrix Z. Eigenvectors or the covariance matrix are the directions of the axes with high
information. And the coefficients of these eigenvectors are defined as the eigenvalues.
• Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing order, which
means from largest to smallest. And simultaneously sort the eigenvectors accordingly in
matrix P of eigenvalues. The resultant matrix will be named as P*.
• Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P* matrix to the
Z. In the resultant matrix Z*, each observation is the linear combination of original
features. Each column of the Z* matrix is independent of each other.
• Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and what to
remove. It means, we will only keep the relevant or important features in the new
dataset, and unimportant features will be removed out.

Machine Learning B Manikyala Rao

Aditya College of Engineering & Technology

Machine Learning B Manikyala Rao

ML R23 Material
No ratings yet
ML R23 Material
79 pages
ML Unit4
No ratings yet
ML Unit4
41 pages
Jntu Kakinada - M.tech - Mathematical Foundations of Computer Science Sup FR 28
No ratings yet
Jntu Kakinada - M.tech - Mathematical Foundations of Computer Science Sup FR 28
2 pages
Advanced C Programming Guide
No ratings yet
Advanced C Programming Guide
26 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
60 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
AIML Course File
No ratings yet
AIML Course File
31 pages
Unit V
No ratings yet
Unit V
67 pages
ML CT Question Paper 2023 24
No ratings yet
ML CT Question Paper 2023 24
2 pages
UNIT 2-3 - Notes - Unit-2-3-Notes
No ratings yet
UNIT 2-3 - Notes - Unit-2-3-Notes
16 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
31 pages
Class Notes ML 1
No ratings yet
Class Notes ML 1
108 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
CSE Lab Manual: Database Management System
No ratings yet
CSE Lab Manual: Database Management System
73 pages
Msc. 3 Sem: Unit - 1
No ratings yet
Msc. 3 Sem: Unit - 1
57 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
DWDM Question Bank (R23)
100% (1)
DWDM Question Bank (R23)
6 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
1 AI Notes Complete Watermark
No ratings yet
1 AI Notes Complete Watermark
95 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Jntuk Ads Lab Manual
50% (2)
Jntuk Ads Lab Manual
27 pages
Student Hackathon: Round 2 Details
No ratings yet
Student Hackathon: Round 2 Details
4 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
Cloud Computing Unit - 1
No ratings yet
Cloud Computing Unit - 1
10 pages
Backtracking & Branching Guide
No ratings yet
Backtracking & Branching Guide
4 pages
Golomb Code
No ratings yet
Golomb Code
11 pages
Data Analytics Data Visualization Unit V
No ratings yet
Data Analytics Data Visualization Unit V
12 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Data Analytics CSE704 Module-2
No ratings yet
Data Analytics CSE704 Module-2
42 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
14 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
AI27
No ratings yet
AI27
10 pages
K Means
No ratings yet
K Means
9 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Unit IV
No ratings yet
Unit IV
96 pages
Overview of Clustering Methods in ML
No ratings yet
Overview of Clustering Methods in ML
37 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
ML 3
No ratings yet
ML 3
100 pages
ML UNIT-5ppt
No ratings yet
ML UNIT-5ppt
46 pages
Shivwangi Banerjee (ML)
No ratings yet
Shivwangi Banerjee (ML)
8 pages
Clustering
No ratings yet
Clustering
10 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unit 4
No ratings yet
Unit 4
22 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Unit 4
No ratings yet
Unit 4
125 pages
K-Means Clustering Guide for Beginners
No ratings yet
K-Means Clustering Guide for Beginners
19 pages
K Mean
No ratings yet
K Mean
7 pages
Unit Iv
No ratings yet
Unit Iv
12 pages
Clustering and Dimensionality Reduction Techniques
No ratings yet
Clustering and Dimensionality Reduction Techniques
24 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Unsupervised Learning & Clustering Guide
No ratings yet
Unsupervised Learning & Clustering Guide
49 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
20 pages
Unit-IV - Unsupervised Learning
No ratings yet
Unit-IV - Unsupervised Learning
154 pages
Opening Script For Sharktank
No ratings yet
Opening Script For Sharktank
3 pages
Offsets With Inertial and Multibeam Systems
No ratings yet
Offsets With Inertial and Multibeam Systems
8 pages
Um 12128
No ratings yet
Um 12128
86 pages
6ES71366DC000CA0 Datasheet en
No ratings yet
6ES71366DC000CA0 Datasheet en
3 pages
Power Electronics Course Overview
No ratings yet
Power Electronics Course Overview
5 pages
Pump and Turbine Examples
No ratings yet
Pump and Turbine Examples
4 pages
ISO/IEC 17025:2017 Impartiality & Confidentiality
No ratings yet
ISO/IEC 17025:2017 Impartiality & Confidentiality
24 pages
Logo Design Assessment Guide
No ratings yet
Logo Design Assessment Guide
6 pages
Elios 3 Drone Technical Specifications
No ratings yet
Elios 3 Drone Technical Specifications
4 pages
أمثلة على طريقة أسئلة اختبار النظم الموزعة
No ratings yet
أمثلة على طريقة أسئلة اختبار النظم الموزعة
2 pages
Paket Category Article Description Qty Total Qty
No ratings yet
Paket Category Article Description Qty Total Qty
8 pages
Qualcomm Fixed-Point (QFXP) Library
No ratings yet
Qualcomm Fixed-Point (QFXP) Library
9 pages
EE 2310 Homework #3 - Simple Flip Flops and Timing Diagrams
No ratings yet
EE 2310 Homework #3 - Simple Flip Flops and Timing Diagrams
2 pages
Dissertation Writing Help for NTU Students
100% (1)
Dissertation Writing Help for NTU Students
8 pages
ZP2 Catalogue
No ratings yet
ZP2 Catalogue
8 pages
Fashion Services Explained
No ratings yet
Fashion Services Explained
8 pages
06 VMS Customization
No ratings yet
06 VMS Customization
35 pages
Manufacturing Engineering and Technology 7th Edition Full Download
No ratings yet
Manufacturing Engineering and Technology 7th Edition Full Download
407 pages
CATIA V5 Mechanical Surface Designer Advanced
No ratings yet
CATIA V5 Mechanical Surface Designer Advanced
4 pages
Wind Load: K Z S Z 0
No ratings yet
Wind Load: K Z S Z 0
1 page
SURVEYNCC
No ratings yet
SURVEYNCC
3 pages
SEI 12-02A GT-N7100 JTAG Guide PDF
No ratings yet
SEI 12-02A GT-N7100 JTAG Guide PDF
5 pages
The Pocket Guide To Medical Retina Pocket Guides
No ratings yet
The Pocket Guide To Medical Retina Pocket Guides
301 pages
Computer Basics: Hardware & Software Overview
No ratings yet
Computer Basics: Hardware & Software Overview
247 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Business Name Registration Application Form
No ratings yet
Business Name Registration Application Form
3 pages
System Maintenance Guide
No ratings yet
System Maintenance Guide
2 pages
CommScope DLX Fiber Optic Connector System
No ratings yet
CommScope DLX Fiber Optic Connector System
8 pages
ERICSTEWARTResume
No ratings yet
ERICSTEWARTResume
3 pages
MV Sensor Tech & Metering Guide
No ratings yet
MV Sensor Tech & Metering Guide
60 pages

Unsupervised Learning: Clustering Techniques

Uploaded by

Unsupervised Learning: Clustering Techniques

Uploaded by

ADITYA COLLEGE OF ENGINEERING & TECHNOLOGY

The clustering technique can be widely used in various tasks. Some

Machine Learning B Manikyala Rao

Types of Clustering Methods

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

K-Means Clustering Algorithm

Machine Learning B Manikyala Rao

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

As we need to find the closest cluster, so we will repeat the process by

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Semi-Supervised Cluster Analysis

Machine Learning B Manikyala Rao

• An interesting clustering method, known as CLTree (CLustering based on decision

Using Clustering for Preprocessing

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

The Curse of Dimensionality

Machine Learning B Manikyala Rao

2.Feature Extraction:Feature extraction is the process of transforming

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Approaches of Dimension Reduction

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

• projection is not always the best approach to dimensionality

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

• Principal Component Analysis is an unsupervised learning algorithm

Machine Learning B Manikyala Rao

Steps for PCA algorithm

• Calculating the Eigen Values and Eigen Vectors

Machine Learning B Manikyala Rao

Machine Learning B Manikyala Rao

You might also like