0% found this document useful (0 votes)

150 views113 pages

Distance and Rule-Based Models in ML

IT IS A PPT OF ML UNIT 5 BRANCH IT SPPU UNIVERSITY ITCONTAINS DETAILED EXPLANATION OF ML

Uploaded by

rishighodake2405

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

150 views113 pages

Distance and Rule-Based Models in ML

IT IS A PPT OF ML UNIT 5 BRANCH IT SPPU UNIVERSITY ITCONTAINS DETAILED EXPLANATION OF ML

Uploaded by

rishighodake2405

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning [314443]

UNIT-V

DISTANCE AND RULE BASED

MODELS
Syllabus
UNIT V : DISTANCE AND RULE BASED MODELS
• Distance Based Models: Distance Metrics (Euclidean,
Manhattan, Hamming, Minkowski Distance Metric), Neighbors
and Examples, K-Nearest Neighbour for Classification and
Regression, Clustering as Learning task: K-means clustering
Algorithm-with example, k-medoid algorithm-with example,
Hierarchical Clustering, Divisive Dendrogram for hierarchical
clustering, Performance Measures
• Association Rule Mining: Introduction, Rule learning for
subgroup discovery, Apriori Algorithm, Performance Measures
– Support, Confidence, Lift.
Distance Based Models: Distance Metrics
Introduction to Distance Metrices:
• Distance metrics are a key part of several machine learning
algorithms.
• These distance metrics are used in both supervised and
unsupervised learning, generally to calculate the similarity
between data points.
• An effective distance metric improves the performance of our
machine learning model, whether that’s for classification tasks or
clustering.
Distance Based Models: Distance Metrics
Types of Distance Metrices:
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski distance
4. Hamming Distance
Distance Metrics
• Euclidean Distance: The Euclidean distance formula helps to
find the distance of a line segment.
Distance Metrics

• Manhattan Distance: Manhattan Distance is the sum of

absolute differences between points across all the dimensions.
Distance Metrics

• Minkowski Distance: Minkowski Distance is the generalized

form of Euclidean and Manhattan Distance.
• The formula for Minkowski Distance is given as:

Some common values of ‘p’ are:-

p = 1, Manhattan Distance
p = 2, Euclidean Distance
Distance Metrics
Hamming Distance
• Hamming Distance measures the similarity between two strings
of the same length.
• The Hamming Distance between two strings of the same length
is the number of positions at which the corresponding
characters are different.
• Ex. “euclidean” and “manhattan”
• Since the length of these strings is equal, we can calculate the
Hamming Distance. We will go character by character and
match the strings. The first character of both the strings (e and
m respectively) is different. Similarly, the second character of
both the strings (u and a) is different. and so on.
• seven characters are different whereas two characters (the last
two characters) are similar:
• So hamming distance is 7.
Clustering

• K-Means Clustering is an Unsupervised Machine Learning

algorithm, which groups the unlabeled dataset into
different clusters.
• It can be defined as "A way of grouping the data points
into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a
group that has less or no similarities with another group."
Clustering Algorithms

1. K-Means algorithm:
• The k-means algorithm is one of the most popular
clustering algorithms.
• It classifies the dataset by dividing the samples into
different clusters of equal variances.
• The number of clusters must be specified in this algorithm.
• It is fast with fewer computations required, with the linear
complexity of O(n).
Clustering Algorithms

• K-means algorithm uses Partitioning Clustering. It is also

known as the centroid-based method.
• It is a type of clustering that divides the data into non-
hierarchical groups.
Clustering Algorithms

2. Agglomerative Hierarchical algorithm:

• The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering.
• In this, each data point is treated as a single cluster at the
outset and then successively merged.
• The cluster hierarchy can be represented as a tree-
structure.
Hierarchical Clustering

• Hierarchical clustering is another unsupervised machine

learning algorithm, which is used to group the unlabeled
datasets into a cluster and also known as hierarchical cluster
analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the
form of a tree, and this tree-shaped structure is known as
the dendrogram.
• Sometimes the results of K-means clustering and hierarchical
clustering may look similar, but they both differ depending on
how they work. As there is no requirement to predetermine the
number of clusters as we did in the K-Means algorithm.
Hierarchical Clustering

• The hierarchical clustering technique has two approaches:

• Agglomerative: Agglomerative is a bottom-up approach, in
which the algorithm starts with taking all data points as single
clusters and merging them until one cluster is left.
• Divisive: Divisive algorithm is the reverse of the agglomerative
algorithm as it is a top-down approach.
K- Means Clustering Algorithms

• K-Means Clustering is an unsupervised learning algorithm

that is used to solve the clustering problems in machine
learning or data science.
• Here K defines the number of pre-defined clusters that
need to be created in the process.
• It is an iterative algorithm that divides the unlabeled
dataset into k different clusters in such a way that each data
point belongs only one group that has similar properties.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this algorithm
is to minimize the sum of distances between the data point
and their corresponding clusters.
K- Means Clustering Algorithms

K-Means Clustering Algorithm involves the following steps-

Step-01:
• Choose the number of clusters K.
Step-02:
• Randomly select any K data points as cluster centers.
• Select cluster centers in such a way that they are as farther
as possible from each other.
Step-03:
• Calculate the distance between each data point and each
cluster center.
• The distance may be calculated either by using given
distance function or by using euclidean distance formula.
K- Means Clustering Algorithms

Step-04:
• Assign each data point to some cluster.
• A data point is assigned to that cluster whose center is
nearest to that data point.

Step-05:
• Re-compute the center of newly formed clusters.
• The center of a cluster is computed by taking mean of all
the data points contained in that cluster.
K- Means Clustering Algorithms

Step-06:
• Keep repeating the procedure from Step-03 to Step-05 until
any of the following stopping criteria is met-

o Center of newly formed clusters do not change

o Data points remain present in the same cluster
o Maximum number of iterations are reached
K- Means Clustering Algorithms

Problem:
Use K-Means Algorithm to create two clusters-
K- Means Clustering Algorithms

Solution:
Assume A(2, 2) and C(1, 1) are centers of the two clusters.
Iteration-01:
o We calculate the distance of each point from each of the
center of the two clusters.
o The distance is calculated by using the euclidean distance
formula.
K- Means Clustering Algorithms

Calculating Distance Between A(2, 2) and C(1, 1)-

= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]

= sqrt [ (1 – 2)2 + (1 – 2)2 ]

= sqrt [ 1 + 1 ]

= sqrt [ 2 ]

= 1.41
K- Means Clustering Algorithms

In the similar manner, we calculate the distance of other

points from each of the center of the two clusters.

Distance from Distance from

Point belongs
n Points center (2, 2) center (1, 1)
to Cluster
of Cluster-01 of Cluster-02
A(2, 2) 0 1.41 C1
B(3, 2) 1 2.24 C1
C(1, 1) 1.41 0 C2
D(3, 1) 1.41 2 C1
E(1.5, 0.5) 1.58 0.71 C2
K- Means Clustering Algorithms

From here, New clusters are-

Cluster-01: Cluster-02:
First cluster contains points- Second cluster contains points-
A(2, 2) •C(1, 1)
B(3, 2) •E(1.5, 0.5)
D(3, 1)
K- Means Clustering Algorithms

Now, We re-compute the new cluster clusters.

The new cluster center is computed by taking mean of all the points
contained in that cluster.

For Cluster-01:
Center of Cluster-01
= ((2 + 3 + 3)/3, (2 + 2 + 1)/3)
= (2.67, 1.67)

For Cluster-02:
Center of Cluster-02
= ((1 + 1.5)/2, (1 + 0.5)/2)
= (1.25, 0.75)

This is completion of Iteration-01.

K- Means Clustering Algorithms

Next, we go to iteration-02, iteration-03 and so on until the

centers do not change anymore.

Distance from Distance from

center (2.67, center (1.25, Point belongs
n Points
1.67) of 0.75) of to Cluster
Cluster-01 Cluster-02
A(2, 2) 0.746860094 1.457738 C1
B(3, 2) 0.466690476 2.150581 C1
C(1, 1) 1.799388785 0.353553 C2
D(3, 1) 0.746860094 1.767767 C1
E(1.5, 0.5) 1.654629868 0.353553 C2
K-means Clustering

• K-means clustering is an unsupervised learning algorithm

that groups data based on centroid.
• The centroids are defined by the means of all points that are
in the same cluster.
• The algorithm first chooses random points as centroids and
then iterates adjusting them until full convergence.
Clustering using Scikit-learn

import [Link] as plt

X=[4,5,10,4,3,11,14,10,12,5,12,4,11,12]
y=[18,19,14,17,16,24,24,21,21,17,23,15,15,16]

[Link](X,y)
[Link]()
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for
different values of K:

from [Link] import KMeans

data = list(zip(x, y))
inertias = []

for i in range(2,15):
kmeans = KMeans(n_clusters=i, random_state=42)
[Link](data)
[Link](kmeans.inertia_)

Inertia is the sum of squared distances between each data point and the
centroid of its assigned cluster. It’s also known as the Within-Cluster Sum of
Squares (WCSS).
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for different values of K:
[Link](range(2,15), inertias, marker='o')
[Link]('Elbow method')
[Link]('Number of clusters')
[Link](‘WCSS')
[Link]()

The elbow method shows that 3 is a good value for K, so we retrain and visualize
the result.
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for
different values of K:
kmeans = KMeans(n_clusters=3, random_state=42)
[Link](data)

[Link](X, y, c=kmeans.labels_)
[Link]()
Elbow method

• The performance of the K-means clustering algorithm

depends upon highly efficient clusters that it forms. But
choosing the optimal number of clusters is a big task.
• The Elbow method is one of the most popular ways to find
the optimal number of clusters.
• This method uses the concept of WCSS value. WCSS stands
for Within Cluster Sum of Squares, which defines the total
variations within a cluster.
Elbow method

• The formula to calculate the value of WCSS (for 3 clusters) is

given below:
WCSS= σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟1 𝑃𝑖𝐶1 2 + σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟2 𝑃𝑖𝐶2 2 +
σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟3 𝑃𝑖𝐶3 2

• In the above formula of WCSS,

o ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of
the distances between each data point and its centroid
within a cluster1 and the same for the other two terms.

• To measure the distance between data points and centroid,

we can use any method such as Euclidean distance or
Manhattan distance.
Elbow method

• To find the optimal value of clusters, the elbow method

follows the below steps:

o It executes the K-means clustering on a given dataset for

different K values (ranges from 2-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the
number of clusters K.
o The sharp point of bend or a point of the plot looks like an
arm, then that point is considered as the best value of K.
o (Look for the “elbow” point where the WCSS starts to
flatten.)
Elbow plot

• Since the graph shows the sharp bend, which looks like an
elbow, hence it is known as the elbow method. The graph for
the elbow method looks like the below image:
Silhouette score

• The Silhouette Score is a metric used to evaluate the quality

of clustering.
• It measures how well each data point fits within its cluster
compared to other clusters.
• For each point:
- How close is it to its own cluster (cohesion)?
- How far is it from the next closest cluster (separation)?
Silhouette score

Silhouette Score Formula

- For a data point 𝑖:

Where:
• 𝑎 𝑖 =average distance of point 𝑖 to all other points in the
same cluster
• 𝑏 𝑖 =average distance of point 𝑖 to points in the nearest
neighboring cluster (the one it’s not part of)
Silhouette score

Score Range: s(i)∈[−1,1]

Silhouette Score Interpretation

Point is well-matched to its
~ 1 (approximately 1)
own cluster and far from others
Point is on the border between
~ 0 (approximately 0)
clusters
Point is likely in the wrong
~ -1 (approximately -1)
cluster

Select the value of 𝑘 that gives the highest silhouette score.

Clustering using Scikit-learn
Silhouette Score to evaluate the quality of clustering.

from [Link] import silhouette_score

silhouettes=[]
for i in range(2,14):
kmeans=KMeans(n_clusters=i, random_state=42)
labels=kmeans.fit_predict(data)
score=silhouette_score(data,labels)
[Link](score)
Clustering using Scikit-learn
Silhouette Score to evaluate the quality of clustering.
[Link](range(2,14),silhouettes,marker='o’)
[Link]('Number of clusters')
[Link]('Silhouette Score')
K-Medoid Algorithm ?
• K-medoids, also known as partitioning around medoids
(PAM), is a popular clustering algorithm that groups k data
points into clusters by selecting k representative objects
within a dataset.
• Clustering is a robust unsupervised machine-learning
algorithm that establishes patterns by identifying clusters or
groups of data points with similar characteristics within a
specific dataset.
• The k-means clustering algorithm uses centroids. K-medoids is
an alternative clustering algorithm that uses medoids instead.
K-Medoid Algorithm ?
Medoids-
• A medoids can be defined as a point in the cluster within a
dataset from which the sum of distances to other points is
minimal.
• It is the data point in a cluster characterized by the lowest
dissimilarity with other data points.
• The k-means algorithm is sensitive to outliers. The k-medoids
algorithm, on the other hand, mitigates that sensitivity by
eliminating reliance on centroids.
• The k-medoids algorithm aims to group data points
into k clusters, where each data point is assigned to a medoid,
and the sum of distances between data points and their assigned
medoid is minimized.
• The algorithm iteratively assigns each data point to the closest
medoid and swaps the medoid of each cluster until convergence.
K-Medoid Algorithm ?
Medoids-

Manhattan distance-
The distance between each data point from both medoids is
calculated using the Manhattan distance formula. It is also known
as the cost.

Distance=∣x2−x1∣+∣y2−y1∣
K-Medoids algorithm
K-Medoids algorithm steps:
1. Initialize: select k random points out of the n data points as
the medoids.
2. Associate each data point to the closest medoid by using any
common distance metric methods.
3. While the cost decreases:
- For each medoid m, for each data o point which is not a
medoid:
• Swap m and o, associate each data point to the closest
medoid, recompute the cost.
• If the total cost is more than that in the previous step, undo
the swap.
The cost in K-Medoids algorithm is given as
K-Medoids Example

Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let C1 -
(4, 5) and C2 -(8, 5) are the two medoids.
K-Medoids Example

Step 2: Calculating cost.

The dissimilarity (Manhatten Distance) of each non-medoid
point with the medoids is calculated and tabulated:
Cluster
C2
C1
C1
C2
-
C1
C2
C2
C2
-

Each point is assigned to the cluster of that medoid whose

dissimilarity is less.
The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The Cost = (3 + 4 + 4) + (2 + 2 + 3 + 1 + 1) = 20
Step 3: Randomly select one non-medoid point and recalculate
the cost.
Let the randomly selected point be (8, 4). The dissimilarity of
each non-medoid point with the medoids – C1 (4, 5) and C2 (8,
4) is calculated and tabulated.
Cluster
C2
C1
C1
C2
C2
C1
C2
-
C2
-

Each point is assigned to that cluster whose dissimilarity is less.

So, the points 1, 2, 5 go to cluster C1 and 0, 3, 4, 6, 8 go to cluster
C2.
The New cost = (3 + 4 + 4) + (3+ 3 + 1 + 2 + 2) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0
• As the swap cost is not less than zero, we undo the swap.
• If the recalculated cost is less, it repeats step 4.
(Swap a non-medoid point with a medoid point and
recalculate the cost. )
• After the algorithm completes, we will have k medoid
points with their clusters.
Advantages:
• It is simple to understand and easy to implement.
• Partitioning Around Medoids is less sensitive to outliers than
other partitioning algorithms.
Disadvantages:
• The main disadvantage of K-Medoid algorithms is that it is not
suitable for clustering non-spherical (arbitrary shaped) groups
of objects. This is because it relies on minimizing the distances
between the non-medoid objects and the medoid (the cluster
centre) – briefly, it uses compactness as clustering criteria
instead of connectivity.
• It may obtain different results for different runs on the same
dataset because the first k medoids are chosen randomly.
Comparison between K-Means and K-Medoids

Feature K-Means K-Medoids

Actual data point
Cluster center Mean (centroid)
(medoid)
Robust to
No Yes
outliers
Speed Fast Slow (PAM)
Any (e.g.,
Distance metric Usually Euclidean Manhattan,
Euclidean)
Cluster shape
Spherical, equally sized Spherical (less strict)
assumption
Suitable for large Suitable for small to
Datasets
datasets medium datasets
Why Hierarchical Clustering?

• In the K-means clustering there are some challenges with this

algorithm, which are a predetermined number of clusters, and
it always tries to create the clusters of the same size.
• To solve these two challenges, we can opt for the hierarchical
clustering algorithm because, in this algorithm, we don't need
to have knowledge about the predefined number of clusters.
Hierarchical Clustering

• Hierarchical clustering is an unsupervised learning technique

used to group similar data points into clusters by building a
hierarchy (tree-like structure).
• Unlike flat clustering like k-means hierarchical clustering does
not require specifying the number of clusters in advance.
• There are two main types of hierarchical clustering.
1. Agglomerative Clustering (bottom-up approach)
2. Divisive clustering (top-down approach)
Dendogram

• A dendrogram is like a family tree for clusters.

• It shows how individual data points or groups of data merge
together.
• The bottom shows each data point as its own group and as we
move up, similar groups are combined.
• The lower the merge point, the more similar the groups are. It
helps us see how things are grouped step by step.
Dendogram

• At the bottom of the dendrogram the

points P, Q, R, S and T are all separate.
• As we move up, the closest points are
merged into a single group.
• The lines connecting the points show
how they are progressively merged
based on similarity.
• The height at which they are connected
shows how similar the points are to
each other; the shorter the line the
more similar they are
Hierarchical Agglomerative clustering

• It is also known as the bottom-up approach or hierarchical

agglomerative clustering (HAC).
• Bottom-up algorithms treat each data as a singleton cluster at
the outset and then successively agglomerate pairs of clusters
until all clusters have been merged into a single cluster that
contains all data.
Workflow for Hierarchical Agglomerative clustering

1. Start with individual points

2. Calculate distances between clusters
3. Merge the closest clusters
4. Update distance matrix: Recalculate the distances between the
new cluster and the remaining clusters.
5. Repeat steps 3 and 4: Keep merging the closest clusters and
updating the distance matrix until we have only one cluster
left.
6. Create a dendrogram
Hierarchical Divisive clustering

• Divisive clustering is also known as a top-down approach.

• Top-down clustering requires a method for splitting a cluster
that contains the whole data and proceeds by splitting clusters
recursively until individual data have been split into singleton
clusters.
Workflow for Hierarchical Divisive clustering

• Start with all data points in one cluster

• Split the cluster: Divide the cluster into two smaller clusters.
The division is typically done by finding the two most dissimilar
points in the cluster and using them to separate the data into
two parts.
• Repeat the process: For each of the new clusters, repeat the
splitting process: Choose the cluster with the most dissimilar
points and split it again into two smaller clusters.
• Stop when each data point is in its own cluster: Continue this
process until every data point is its own cluster or the stopping
condition (such as a predefined number of clusters) is met.
How the Agglomerative Hierarchical clustering
Work?
• The working of the AHC algorithm can be explained using the
below steps:
Step-1: Create each data point as a single cluster. Let's say there
are N data points, so the number of clusters will also be N.
Agglomerative Hierarchical clustering

• Step-2: Take two closest data points or clusters and merge them
to form one cluster. So, there will now be N-1 clusters.
Agglomerative Hierarchical clustering

• Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.
Agglomerative Hierarchical clustering

• Step-4: Repeat Step 3 until only one cluster left. So, we will get
the following clusters. Consider the below images:
Agglomerative Hierarchical clustering

• Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the
problem.
Measure for the distance between two clusters

• As we have seen, the closest distance between the two clusters

is crucial for the hierarchical clustering.
• There are various ways to calculate the distance between two
clusters, and these ways decide the rule for clustering. These
measures are called Linkage methods.
Measure for the distance between two clusters

• Single Linkage: It is the Shortest Distance between the closest

points of the clusters.
• Simple Linkage is also known as the Minimum Linkage
(MIN) method.
• Consider the below image:
Measure for the distance between two clusters

• Complete Linkage: It is the farthest distance between the two

points of two different clusters.
• The complete Linkage method is also known as the Maximum
Linkage (MAX) method.
• It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.
Measure for the distance between two clusters

• Average Linkage: It is the linkage method in which the distance

between each pair of datasets is added up and then divided by
the total number of datasets to calculate the average distance
between two clusters. It is also one of the most popular linkage
methods.
Measure for the distance between two clusters

• Centroid Linkage: It is the linkage method in which the distance

between the centroid of the clusters is calculated. Consider the
below image:

From the above-given approaches, we can apply any of them

according to the type of problem or business requirement.
Measure for the distance between two clusters

• Ward linkage is a method used in

hierarchical clustering to determine the
distance between two clusters before
merging them.
• It computes the sum of variances for each
cluster separately and then computes the
variance for the cluster that would be
formed by merging the two.
• The goal of Ward linkage is to minimize
the increase in variance within the newly
formed cluster when two clusters are
combined.
• The Ward's linkage supports only the
Euclidean distance.
73
Woking of Dendrogram in Hierarchical clustering

• The working of the dendrogram can be explained using the

below diagram:

• In the above diagram, the left part is showing how clusters are
created in agglomerative clustering, and the right part is
showing the corresponding dendrogram.
Woking of Dendrogram in Hierarchical clustering

The working of the dendrogram can be explained using the below

diagram:
• Firstly, the datapoints P2 and P3 combine together and form a
cluster, correspondingly a dendrogram is created, which connects
P2 and P3 with a rectangular shape. The hight is decided
according to the Euclidean distance between the data points.
• In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the
Euclidean distance between P5 and P6 is a little bit greater than
the P2 and P3.
Woking of Dendrogram in Hierarchical clustering

The working of the dendrogram can be explained using the below

diagram:
• Again, two new dendrograms are created that combine P1, P2,
and P3 in one dendrogram, and P4, P5, and P6, in another
dendrogram.
• At last, the final dendrogram is created that combines all the data
points together.

We can cut the dendrogram tree structure at any level as per our
requirement.
Implementation of Agglomerative Hierarchical
Clustering

Steps for implementation of HAC using Python:

• The steps for implementation will be the same as the k-means
clustering, except for some changes such as the method to find
the number of clusters. Below are the steps:
1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters
Implementation

# Importing the libraries

import [Link] as plt
from [Link] import AgglomerativeClustering
from [Link] import dendrogram, linkage

X = [4, 5, 10, 4, 3, 11, 14, 10, 12, 5, 12, 4,11,12]

y = [18, 19, 14, 17, 16, 24, 24, 21, 21, 17, 23, 15,15,16]

data = list(zip(X, y))

Finding the optimal number of clusters using the
Dendrogram

# Step 1: Plot Dendrogram

linked = linkage(data, method='ward’)
[Link](figsize=(10, 6))
dendrogram(linked)
[Link]('Hierarchical Clustering Dendrogram')
[Link]('Data Points')
[Link]('Distance')
[Link]()
• Output:
• By executing the above lines of code, we will get the below
output:
Using this Dendrogram, we can determine the optimal number
of clusters for our model. For this, we will find the maximum
vertical distance that does not cut any horizontal bar. Consider
the below diagram:
# Step 2: Agglomerative Clustering with chosen number of
# clusters, say 3
model = AgglomerativeClustering(n_clusters=3,
metric='euclidean', linkage='ward')
labels = model.fit_predict(data)
# Step 3: Visualize clusters
[Link](figsize=(8, 6))
[Link](X, y, c=labels, cmap='rainbow')
[Link]('Agglomerative Clustering Result')
[Link]('X')
[Link]('Y')
[Link](True)
[Link]()
Difference between K Means and Hierarchical clustering
Feature K-Means Hierarchical Clustering
Number of Must specify Can decide number of clusters
clusters 𝑘beforehand after (by cutting dendrogram)
Iterative assignment & Builds tree by merging or
Approach
centroid update splitting clusters
Best for spherical, Can find clusters of arbitrary
Cluster shape
equally sized clusters shape
Scales well for large Computationally expensive for
Scalability
datasets large datasets
Sensitive to initial
Initialization No initialization needed
centroids
Output Cluster labels only Dendrogram + cluster labels
Robustness More robust depending on
Sensitive to outliers
to outliers linkage method
86
Applications of Clustering

Clustering has a large no. of applications spread across various

domains. Some of the most popular applications of clustering
are:
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection
• Recommendation engines

89
Syllabus
UNIT V : DISTANCE AND RULE BASED MODELS
• Distance Based Models: Distance Metrics (Euclidean,
Manhattan, Hamming, Minkowski Distance Metric), Neighbors
and Examples, K-Nearest Neighbour for Classification and
Regression, Clustering as Learning task: K-means clustering
Algorithm-with example, k-medoid algorithm-with example,
Hierarchical Clustering, Divisive Dendrogram for hierarchical
clustering, Performance Measures
• Association Rule Mining: Introduction, Rule learning for
subgroup discovery, Apriori Algorithm, Performance Measures
– Support, Confidence, Lift.
Association Rule

• Association rule learning is a type of unsupervised learning

technique that checks for the dependency of one data item
on another data item and maps accordingly so that it can be
more profitable.
• For example, if a customer buys bread, he most likely can
also buy butter, eggs, or milk, so these products are stored
within a shelf or mostly nearby.
• Association rule learning can be divided into three types of
algorithms:
o Apriori
o Eclat
o F-P Growth Algorithm
Association Rule

How does Association Rule Learning work?

• Association rule learning works on the concept of If and Else
Statement, such as if A then B.
• To measure the associations between thousands of data
items, there are several metrics. These metrics are given
below:

o Support
o Confidence
o Lift
Association Rule
Support: Support is the frequency of A or how frequently an
item appears in the dataset.

Confidence: Confidence indicates how often the rule has been

found to be true.

Lift: It is the strength of any rule, which can be defined as

below formula:
Apriori Algorithm

– The Apriori algorithm uses frequent itemsets to generate

association rules, and it is designed to work on the databases
that contain transactions.
– With the help of these association rule, it determines how
strongly or how weakly two objects are connected.
– This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset associations efficiently.
– It is the iterative process for finding the frequent itemsets
from the large data
Apriori Algorithm

Steps for Apriori Algorithm:

Step-1: Determine the support of itemsets in the transactional

database, and select the minimum support and confidence.

Step-2: Take all supports in the transaction with higher support

value than the minimum or selected support value.

Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.

Apriori Algorithm

Example: Suppose we have the following dataset that has

various transactions, and from this dataset, we need to find the
frequent itemsets and generate the association rules using the
Apriori algorithm.
Apriori Algorithm

Step-1: Calculating C1(Candidate set) and L1(frequent itemset)

L1
Apriori Algorithm

Step-2: Candidate generation C2 and L2

L2
Apriori Algorithm

Step-3: Candidate generation C3 and L3

L3
Apriori Algorithm

Step-4: Finding the association rules for the subsets

To generate the association rules, first, we will create a new
table with the possible rules from the occurred combination {A,
B.C}.
For all the rules, we will calculate the Confidence using formula
confidence(A->B) = sup( A ^B)/sup(A).
After calculating the confidence value for all rules, we will
exclude the rules that have less confidence than the minimum
threshold(50%).
Apriori Algorithm

Step-4: Finding the association rules for the subsets

Rules Support Confidence
A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)
= 2/4=0.5=50%
B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)
= 2/4=0.5=50%
A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)
= 2/4=0.5=50%
C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)
= 2/5=0.4=40%
A→ B^C 2 Sup{(A^( B ^C)}/sup(A)
= 2/6=0.33=33.33%
B→ A^C 2 Sup{(B^( A ^C)}/sup(B)
= 2/7=0.28=28%
K-Nearest Neighbor(KNN) Algorithm

• K-Nearest Neighbors (KNN) is a supervised machine

learning algorithm generally used for classification but can
also be used for regression tasks.
• It works by finding the "k" closest data points (neighbors) to a
given input and makes a predictions based on the majority
class (for classification) or the average value (for regression).
K-Nearest Neighbor(KNN) Algorithm

The following two properties would define KNN well

• Non-parametric learning algorithm − KNN is also a non-
parametric learning algorithm because it doesn't assume
anything about the underlying data.

• Lazy learning algorithm- K-Nearest Neighbors is also called as

a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the entire dataset
and performs computations only at the time of classification.
K-Nearest Neighbor(KNN) Algorithm

• For example, consider the following table of data points

containing two features:

• The new point is classified as Category 2 because most of its

closest neighbors are blue squares. KNN assigns the category
based on the majority of nearby points.
How Does K-Nearest Neighbors Algorithm Work?

1. Choose the number of neighbors (k) – how many neighbors

to consider.
2. Calculate distance between the new data point and all the
training points.
3. Sort the distances and pick the k closest data points.
4. Vote for the most common class (in classification) or
average the values (in regression).
5. Return the prediction.
Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbor, these

neighbors are used for classification and regression task. To
identify nearest neighbor we use below distance metrics:
– Euclidean Distance: √((x2 - x1)² + (y2 - y1)² + ...)
– Manhattan Distance: |x1 - x2| + |y1 - y2|
– Minkowski Distance (generalized form)
K-Nearest Neighbor(KNN) Algorithm

Example
Suppose we have a dataset which can be plotted as follows −
K-Nearest Neighbor(KNN) Algorithm
Example
Now, we need to classify new data point with black dot (at
point 60,60) into blue or red class. We are assuming K = 3 i.e. it
would find three nearest data points. It is shown in the next
diagram −
K-Nearest Neighbor(KNN) Algorithm
Example
We can see in the above diagram the three nearest neighbors
of the data point. Among those three, two of them lies in Red
class hence the black dot will also be assigned in red class
How to choose the value of k for KNN Algorithm?

– The value of k in KNN decides how many neighbors the

algorithm looks at when making a prediction.
– Choosing the right k is important for good results.
– If the data has lots of noise or outliers, using a larger k can
make the predictions more stable.
– But if k is too large the model may become too simple and
miss important patterns and this is called underfitting.
– So k should be picked carefully based on the data.
How to choose the value of k for KNN Algorithm?

• Cross-Validation: Use cross-validation to evaluate different k

values and select the one that maximizes model
performance, ensuring good generalization to unseen data.
• Elbow Method: Plot error rate or accuracy against k values
and choose the point where improvements slow down—the
“elbow”—as a suitable k.
• Square Root of N Rule: A quick heuristic to pick an initial k
by setting it to the square root of the total number of data
points (N), especially when no prior knowledge or tuning is
available.
Building a K Nearest Neighbors Model

We can follow the below steps to build a KNN model −

1. Load the data − The first step is to load the dataset into
memory. This can be done using various libraries such as
pandas or numpy.
2. Split the data − The next step is to split the data into
training and test sets. The training set is used to train the
KNN algorithm, while the test set is used to evaluate its
performance.
3. Normalize the data − Before training the KNN algorithm, it
is essential to normalize the data to ensure that each
feature contributes equally to the distance metric
calculation.
Building a K Nearest Neighbors Model

4. Calculate distances − Once the data is normalized, the KNN

algorithm calculates the distances between the test data point
and each data point in the training set.
5. Select k-nearest neighbors − The KNN algorithm selects the
k-nearest neighbors based on the distances calculated in the
previous step.
6. Make a prediction − For classification problems, the KNN
algorithm assigns the test data point to the class that appears
most frequently among the k-nearest neighbors. For regression
problems, the KNN algorithm assigns the test data point the
average of the k-nearest neighbors' values.
7. Evaluate performance − Finally, the KNN algorithm's
performance is evaluated using various metrics such as
accuracy, precision, recall, and F1-score.
Building a K Nearest Neighbors Model

import [Link] as plt

x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]

[Link](x, y, c=classes)
[Link]()
Building a K Nearest Neighbors Model

from [Link] import KNeighborsClassifier

data = list(zip(x, y))

knn = KNeighborsClassifier(n_neighbors=5)

[Link](data, classes)

new_x = 8
new_y = 21
new_point = [(new_x, new_y)]

prediction = [Link](new_point)
Building a K Nearest Neighbors Model

[Link](x + [new_x], y + [new_y], c=classes + [prediction[0]])

[Link](x=new_x-1.7, y=new_y-0.7,
s=f"new point, class: {prediction[0]}")
[Link]()
Pros and Cons of KNN

Pros
• It is very simple algorithm to understand and interpret.
• It is very useful for nonlinear data because there is no
assumption about data in this algorithm.
• It is a versatile algorithm as we can use it for classification as
well as regression.
• It has relatively high accuracy but there are much better
supervised learning models than KNN.
Pros and Cons of KNN

Cons
• It is computationally a bit expensive algorithm because it
stores all the training data.
• High memory storage required as compared to other
supervised learning algorithms.
• Prediction is slow in case of big N.
• It is very sensitive to the scale of data as well as irrelevant
features.

K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
53 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
13 pages
Clustering Data Techniques Explained
No ratings yet
Clustering Data Techniques Explained
41 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
22 pages
Unsupervised Learning: K-Means & Clustering
No ratings yet
Unsupervised Learning: K-Means & Clustering
125 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
110 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
12 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
18 pages
K-Means Clustering Overview and Techniques
No ratings yet
K-Means Clustering Overview and Techniques
84 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
17 pages
K-Means Clustering and Distance Measures
No ratings yet
K-Means Clustering and Distance Measures
66 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
31 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
18 pages
Clustering Techniques and K-Means Guide
No ratings yet
Clustering Techniques and K-Means Guide
23 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
13 pages
Understanding K-Means Clustering
No ratings yet
Understanding K-Means Clustering
24 pages
Clustering Algorithms Overview and Methods
No ratings yet
Clustering Algorithms Overview and Methods
19 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
61 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
125 pages
K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
20 pages
K-means Clustering for Customer Segmentation
No ratings yet
K-means Clustering for Customer Segmentation
58 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
51 pages
Unsupervised Learning Techniques Overview
No ratings yet
Unsupervised Learning Techniques Overview
36 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
85 pages
Understanding K-Means Clustering Algorithm
No ratings yet
Understanding K-Means Clustering Algorithm
59 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
18 pages
K-means & Hierarchical Clustering Explained
No ratings yet
K-means & Hierarchical Clustering Explained
13 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
17 pages
Clustering and Ensemble Methods Overview
No ratings yet
Clustering and Ensemble Methods Overview
28 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
22 pages
Clustering Algorithms: K-Means vs Hierarchical
No ratings yet
Clustering Algorithms: K-Means vs Hierarchical
40 pages
K-Means Clustering Analysis in Jaipur
100% (1)
K-Means Clustering Analysis in Jaipur
26 pages
Types and Applications of Clustering
No ratings yet
Types and Applications of Clustering
84 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
82 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
19 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
39 pages
Understanding Clustering Techniques
No ratings yet
Understanding Clustering Techniques
64 pages
Dissimilarity Matrix in Clustering Analysis
No ratings yet
Dissimilarity Matrix in Clustering Analysis
37 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
23 pages
Cluster Analysis: Concepts & Methods
No ratings yet
Cluster Analysis: Concepts & Methods
98 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
38 pages
K-Means and Fuzzy Clustering Overview
No ratings yet
K-Means and Fuzzy Clustering Overview
40 pages
Understanding K-Means Clustering Basics
No ratings yet
Understanding K-Means Clustering Basics
31 pages
K-Means Clustering Overview and Use Cases
No ratings yet
K-Means Clustering Overview and Use Cases
20 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
29 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
51 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
7 pages
Understanding Clustering Algorithms
No ratings yet
Understanding Clustering Algorithms
23 pages
K-Means Clustering: Overview & Implementation
No ratings yet
K-Means Clustering: Overview & Implementation
20 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
20 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
26 pages
K-Means Clustering Overview
100% (2)
K-Means Clustering Overview
28 pages
Unsupervised Clustering Techniques Explained
No ratings yet
Unsupervised Clustering Techniques Explained
29 pages
K-Means Clustering in Unsupervised Learning
No ratings yet
K-Means Clustering in Unsupervised Learning
31 pages
Elbow Method for K-Means Clustering
No ratings yet
Elbow Method for K-Means Clustering
24 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
44 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
32 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
116 pages
Machine Learning Classification Techniques
No ratings yet
Machine Learning Classification Techniques
111 pages
Tree-Based and Probabilistic Models
No ratings yet
Tree-Based and Probabilistic Models
67 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
76 pages
Theory of Computation Exam Questions
No ratings yet
Theory of Computation Exam Questions
2 pages
Geospatial Gap Analysis Explained
No ratings yet
Geospatial Gap Analysis Explained
2 pages
Disease Mapping From Foundations To Multidimensional Modeling 1st Edition Miguel A. Martinez-Beneito Ebook Digital Version 2026
100% (1)
Disease Mapping From Foundations To Multidimensional Modeling 1st Edition Miguel A. Martinez-Beneito Ebook Digital Version 2026
51 pages
Understanding Spatial Information Technology
No ratings yet
Understanding Spatial Information Technology
10 pages
Fish Body Sizes Change With Temperature But Not All Species Shrink With Warming - OA
No ratings yet
Fish Body Sizes Change With Temperature But Not All Species Shrink With Warming - OA
33 pages
Preparing FME Adjacent Feature Attributes
No ratings yet
Preparing FME Adjacent Feature Attributes
4 pages
GATE 2022 Syllabus Overview
No ratings yet
GATE 2022 Syllabus Overview
13 pages
Geographic Information System
No ratings yet
Geographic Information System
14 pages
Studyguide - C17 GFM MSC 01 - 201802071553 PDF
No ratings yet
Studyguide - C17 GFM MSC 01 - 201802071553 PDF
160 pages
Understanding Crime Mapping Techniques
No ratings yet
Understanding Crime Mapping Techniques
13 pages
Spatial Thinking in Social Science Research
No ratings yet
Spatial Thinking in Social Science Research
19 pages
GIS for Solid Waste Site Selection in Dilla
No ratings yet
GIS for Solid Waste Site Selection in Dilla
16 pages
Jia 2024 - Chapter13 - Geospatial Tools For EH Issues
No ratings yet
Jia 2024 - Chapter13 - Geospatial Tools For EH Issues
12 pages
GIS for Agrotourism Suitability in Soppeng
No ratings yet
GIS for Agrotourism Suitability in Soppeng
12 pages
Crime Mapping and SARA Problem Solving
No ratings yet
Crime Mapping and SARA Problem Solving
8 pages
GeoAI in Healthcare Applications
No ratings yet
GeoAI in Healthcare Applications
14 pages
Revised Geography Syllabus F5-6 (2024-2030)
100% (1)
Revised Geography Syllabus F5-6 (2024-2030)
53 pages
Clustering Algorithms with Python Tools
No ratings yet
Clustering Algorithms with Python Tools
4 pages
Introduction to Geostatistics
No ratings yet
Introduction to Geostatistics
38 pages
Precipitation Disaggregation Methods
No ratings yet
Precipitation Disaggregation Methods
21 pages
Urban Sanitation Insights: San Francisco Study
No ratings yet
Urban Sanitation Insights: San Francisco Study
18 pages
Geostatistic Examples 211
No ratings yet
Geostatistic Examples 211
25 pages
Spatial Modeling in Earth Sciences
100% (1)
Spatial Modeling in Earth Sciences
358 pages
Habitat and Landform Mapping Unit Guide
No ratings yet
Habitat and Landform Mapping Unit Guide
15 pages
Parvareshkaran Dissertation 2014
No ratings yet
Parvareshkaran Dissertation 2014
123 pages
Essential Components of Geographic Information Systems
No ratings yet
Essential Components of Geographic Information Systems
6 pages
Spatial Regression Modeling Techniques
No ratings yet
Spatial Regression Modeling Techniques
51 pages
Ped Built Enviro SLC
No ratings yet
Ped Built Enviro SLC
9 pages
Top Geography Courses in Ghana
No ratings yet
Top Geography Courses in Ghana
1 page
Geographic Vulnerability Assessment Methods
100% (1)
Geographic Vulnerability Assessment Methods
23 pages
Evolution of Digital Cartography
100% (1)
Evolution of Digital Cartography
15 pages