0% found this document useful (0 votes)
150 views113 pages

Distance and Rule-Based Models in ML

IT IS A PPT OF ML UNIT 5 BRANCH IT SPPU UNIVERSITY ITCONTAINS DETAILED EXPLANATION OF ML

Uploaded by

rishighodake2405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views113 pages

Distance and Rule-Based Models in ML

IT IS A PPT OF ML UNIT 5 BRANCH IT SPPU UNIVERSITY ITCONTAINS DETAILED EXPLANATION OF ML

Uploaded by

rishighodake2405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning [314443]

UNIT-V

DISTANCE AND RULE BASED


MODELS
Syllabus
UNIT V : DISTANCE AND RULE BASED MODELS
• Distance Based Models: Distance Metrics (Euclidean,
Manhattan, Hamming, Minkowski Distance Metric), Neighbors
and Examples, K-Nearest Neighbour for Classification and
Regression, Clustering as Learning task: K-means clustering
Algorithm-with example, k-medoid algorithm-with example,
Hierarchical Clustering, Divisive Dendrogram for hierarchical
clustering, Performance Measures
• Association Rule Mining: Introduction, Rule learning for
subgroup discovery, Apriori Algorithm, Performance Measures
– Support, Confidence, Lift.
Distance Based Models: Distance Metrics
Introduction to Distance Metrices:
• Distance metrics are a key part of several machine learning
algorithms.
• These distance metrics are used in both supervised and
unsupervised learning, generally to calculate the similarity
between data points.
• An effective distance metric improves the performance of our
machine learning model, whether that’s for classification tasks or
clustering.
Distance Based Models: Distance Metrics
Types of Distance Metrices:
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski distance
4. Hamming Distance
Distance Metrics
• Euclidean Distance: The Euclidean distance formula helps to
find the distance of a line segment.
Distance Metrics

• Manhattan Distance: Manhattan Distance is the sum of


absolute differences between points across all the dimensions.
Distance Metrics

• Minkowski Distance: Minkowski Distance is the generalized


form of Euclidean and Manhattan Distance.
• The formula for Minkowski Distance is given as:

Some common values of ‘p’ are:-

p = 1, Manhattan Distance
p = 2, Euclidean Distance
Distance Metrics
Hamming Distance
• Hamming Distance measures the similarity between two strings
of the same length.
• The Hamming Distance between two strings of the same length
is the number of positions at which the corresponding
characters are different.
• Ex. “euclidean” and “manhattan”
• Since the length of these strings is equal, we can calculate the
Hamming Distance. We will go character by character and
match the strings. The first character of both the strings (e and
m respectively) is different. Similarly, the second character of
both the strings (u and a) is different. and so on.
• seven characters are different whereas two characters (the last
two characters) are similar:
• So hamming distance is 7.
Clustering

• K-Means Clustering is an Unsupervised Machine Learning


algorithm, which groups the unlabeled dataset into
different clusters.
• It can be defined as "A way of grouping the data points
into different clusters, consisting of similar data points.
The objects with the possible similarities remain in a
group that has less or no similarities with another group."
Clustering Algorithms

1. K-Means algorithm:
• The k-means algorithm is one of the most popular
clustering algorithms.
• It classifies the dataset by dividing the samples into
different clusters of equal variances.
• The number of clusters must be specified in this algorithm.
• It is fast with fewer computations required, with the linear
complexity of O(n).
Clustering Algorithms

• K-means algorithm uses Partitioning Clustering. It is also


known as the centroid-based method.
• It is a type of clustering that divides the data into non-
hierarchical groups.
Clustering Algorithms

2. Agglomerative Hierarchical algorithm:


• The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering.
• In this, each data point is treated as a single cluster at the
outset and then successively merged.
• The cluster hierarchy can be represented as a tree-
structure.
Hierarchical Clustering

• Hierarchical clustering is another unsupervised machine


learning algorithm, which is used to group the unlabeled
datasets into a cluster and also known as hierarchical cluster
analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the
form of a tree, and this tree-shaped structure is known as
the dendrogram.
• Sometimes the results of K-means clustering and hierarchical
clustering may look similar, but they both differ depending on
how they work. As there is no requirement to predetermine the
number of clusters as we did in the K-Means algorithm.
Hierarchical Clustering

• The hierarchical clustering technique has two approaches:


• Agglomerative: Agglomerative is a bottom-up approach, in
which the algorithm starts with taking all data points as single
clusters and merging them until one cluster is left.
• Divisive: Divisive algorithm is the reverse of the agglomerative
algorithm as it is a top-down approach.
K- Means Clustering Algorithms

• K-Means Clustering is an unsupervised learning algorithm


that is used to solve the clustering problems in machine
learning or data science.
• Here K defines the number of pre-defined clusters that
need to be created in the process.
• It is an iterative algorithm that divides the unlabeled
dataset into k different clusters in such a way that each data
point belongs only one group that has similar properties.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this algorithm
is to minimize the sum of distances between the data point
and their corresponding clusters.
K- Means Clustering Algorithms

K-Means Clustering Algorithm involves the following steps-


Step-01:
• Choose the number of clusters K.
Step-02:
• Randomly select any K data points as cluster centers.
• Select cluster centers in such a way that they are as farther
as possible from each other.
Step-03:
• Calculate the distance between each data point and each
cluster center.
• The distance may be calculated either by using given
distance function or by using euclidean distance formula.
K- Means Clustering Algorithms

Step-04:
• Assign each data point to some cluster.
• A data point is assigned to that cluster whose center is
nearest to that data point.

Step-05:
• Re-compute the center of newly formed clusters.
• The center of a cluster is computed by taking mean of all
the data points contained in that cluster.
K- Means Clustering Algorithms

Step-06:
• Keep repeating the procedure from Step-03 to Step-05 until
any of the following stopping criteria is met-

o Center of newly formed clusters do not change


o Data points remain present in the same cluster
o Maximum number of iterations are reached
K- Means Clustering Algorithms

Problem:
Use K-Means Algorithm to create two clusters-
K- Means Clustering Algorithms

Solution:
Assume A(2, 2) and C(1, 1) are centers of the two clusters.
Iteration-01:
o We calculate the distance of each point from each of the
center of the two clusters.
o The distance is calculated by using the euclidean distance
formula.
K- Means Clustering Algorithms

Calculating Distance Between A(2, 2) and C(1, 1)-

= sqrt [ (x2 – x1)2 + (y2 – y1)2 ]

= sqrt [ (1 – 2)2 + (1 – 2)2 ]

= sqrt [ 1 + 1 ]

= sqrt [ 2 ]

= 1.41
K- Means Clustering Algorithms

In the similar manner, we calculate the distance of other


points from each of the center of the two clusters.

Distance from Distance from


Point belongs
n Points center (2, 2) center (1, 1)
to Cluster
of Cluster-01 of Cluster-02
A(2, 2) 0 1.41 C1
B(3, 2) 1 2.24 C1
C(1, 1) 1.41 0 C2
D(3, 1) 1.41 2 C1
E(1.5, 0.5) 1.58 0.71 C2
K- Means Clustering Algorithms

From here, New clusters are-


Cluster-01: Cluster-02:
First cluster contains points- Second cluster contains points-
A(2, 2) •C(1, 1)
B(3, 2) •E(1.5, 0.5)
D(3, 1)
K- Means Clustering Algorithms

Now, We re-compute the new cluster clusters.


The new cluster center is computed by taking mean of all the points
contained in that cluster.

For Cluster-01:
Center of Cluster-01
= ((2 + 3 + 3)/3, (2 + 2 + 1)/3)
= (2.67, 1.67)

For Cluster-02:
Center of Cluster-02
= ((1 + 1.5)/2, (1 + 0.5)/2)
= (1.25, 0.75)

This is completion of Iteration-01.


K- Means Clustering Algorithms

Next, we go to iteration-02, iteration-03 and so on until the


centers do not change anymore.

Distance from Distance from


center (2.67, center (1.25, Point belongs
n Points
1.67) of 0.75) of to Cluster
Cluster-01 Cluster-02
A(2, 2) 0.746860094 1.457738 C1
B(3, 2) 0.466690476 2.150581 C1
C(1, 1) 1.799388785 0.353553 C2
D(3, 1) 0.746860094 1.767767 C1
E(1.5, 0.5) 1.654629868 0.353553 C2
K-means Clustering

• K-means clustering is an unsupervised learning algorithm


that groups data based on centroid.
• The centroids are defined by the means of all points that are
in the same cluster.
• The algorithm first chooses random points as centroids and
then iterates adjusting them until full convergence.
Clustering using Scikit-learn

import [Link] as plt

X=[4,5,10,4,3,11,14,10,12,5,12,4,11,12]
y=[18,19,14,17,16,24,24,21,21,17,23,15,15,16]

[Link](X,y)
[Link]()
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for
different values of K:

from [Link] import KMeans


data = list(zip(x, y))
inertias = []

for i in range(2,15):
kmeans = KMeans(n_clusters=i, random_state=42)
[Link](data)
[Link](kmeans.inertia_)

Inertia is the sum of squared distances between each data point and the
centroid of its assigned cluster. It’s also known as the Within-Cluster Sum of
Squares (WCSS).
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for different values of K:
[Link](range(2,15), inertias, marker='o')
[Link]('Elbow method')
[Link]('Number of clusters')
[Link](‘WCSS')
[Link]()

The elbow method shows that 3 is a good value for K, so we retrain and visualize
the result.
Clustering using Scikit-learn
Now we utilize the elbow method to visualize the intertia for
different values of K:
kmeans = KMeans(n_clusters=3, random_state=42)
[Link](data)

[Link](X, y, c=kmeans.labels_)
[Link]()
Elbow method

• The performance of the K-means clustering algorithm


depends upon highly efficient clusters that it forms. But
choosing the optimal number of clusters is a big task.
• The Elbow method is one of the most popular ways to find
the optimal number of clusters.
• This method uses the concept of WCSS value. WCSS stands
for Within Cluster Sum of Squares, which defines the total
variations within a cluster.
Elbow method

• The formula to calculate the value of WCSS (for 3 clusters) is


given below:
WCSS= σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟1 𝑃𝑖𝐶1 2 + σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟2 𝑃𝑖𝐶2 2 +
σ𝑃𝑖 𝑖𝑛 𝐶𝑙𝑢𝑠𝑡𝑒𝑟3 𝑃𝑖𝐶3 2

• In the above formula of WCSS,


o ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of
the distances between each data point and its centroid
within a cluster1 and the same for the other two terms.

• To measure the distance between data points and centroid,


we can use any method such as Euclidean distance or
Manhattan distance.
Elbow method

• To find the optimal value of clusters, the elbow method


follows the below steps:

o It executes the K-means clustering on a given dataset for


different K values (ranges from 2-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the
number of clusters K.
o The sharp point of bend or a point of the plot looks like an
arm, then that point is considered as the best value of K.
o (Look for the “elbow” point where the WCSS starts to
flatten.)
Elbow plot

• Since the graph shows the sharp bend, which looks like an
elbow, hence it is known as the elbow method. The graph for
the elbow method looks like the below image:
Silhouette score

• The Silhouette Score is a metric used to evaluate the quality


of clustering.
• It measures how well each data point fits within its cluster
compared to other clusters.
• For each point:
- How close is it to its own cluster (cohesion)?
- How far is it from the next closest cluster (separation)?
Silhouette score

Silhouette Score Formula


- For a data point 𝑖:

Where:
• 𝑎 𝑖 =average distance of point 𝑖 to all other points in the
same cluster
• 𝑏 𝑖 =average distance of point 𝑖 to points in the nearest
neighboring cluster (the one it’s not part of)
Silhouette score

Score Range: s(i)∈[−1,1]

Silhouette Score Interpretation


Point is well-matched to its
~ 1 (approximately 1)
own cluster and far from others
Point is on the border between
~ 0 (approximately 0)
clusters
Point is likely in the wrong
~ -1 (approximately -1)
cluster

Select the value of 𝑘 that gives the highest silhouette score.


Clustering using Scikit-learn
Silhouette Score to evaluate the quality of clustering.

from [Link] import silhouette_score


silhouettes=[]
for i in range(2,14):
kmeans=KMeans(n_clusters=i, random_state=42)
labels=kmeans.fit_predict(data)
score=silhouette_score(data,labels)
[Link](score)
Clustering using Scikit-learn
Silhouette Score to evaluate the quality of clustering.
[Link](range(2,14),silhouettes,marker='o’)
[Link]('Number of clusters')
[Link]('Silhouette Score')
K-Medoid Algorithm ?
• K-medoids, also known as partitioning around medoids
(PAM), is a popular clustering algorithm that groups k data
points into clusters by selecting k representative objects
within a dataset.
• Clustering is a robust unsupervised machine-learning
algorithm that establishes patterns by identifying clusters or
groups of data points with similar characteristics within a
specific dataset.
• The k-means clustering algorithm uses centroids. K-medoids is
an alternative clustering algorithm that uses medoids instead.
K-Medoid Algorithm ?
Medoids-
• A medoids can be defined as a point in the cluster within a
dataset from which the sum of distances to other points is
minimal.
• It is the data point in a cluster characterized by the lowest
dissimilarity with other data points.
• The k-means algorithm is sensitive to outliers. The k-medoids
algorithm, on the other hand, mitigates that sensitivity by
eliminating reliance on centroids.
• The k-medoids algorithm aims to group data points
into k clusters, where each data point is assigned to a medoid,
and the sum of distances between data points and their assigned
medoid is minimized.
• The algorithm iteratively assigns each data point to the closest
medoid and swaps the medoid of each cluster until convergence.
K-Medoid Algorithm ?
Medoids-

Manhattan distance-
The distance between each data point from both medoids is
calculated using the Manhattan distance formula. It is also known
as the cost.

Distance=∣x2−x1∣+∣y2−y1∣
K-Medoids algorithm
K-Medoids algorithm steps:
1. Initialize: select k random points out of the n data points as
the medoids.
2. Associate each data point to the closest medoid by using any
common distance metric methods.
3. While the cost decreases:
- For each medoid m, for each data o point which is not a
medoid:
• Swap m and o, associate each data point to the closest
medoid, recompute the cost.
• If the total cost is more than that in the previous step, undo
the swap.
The cost in K-Medoids algorithm is given as
K-Medoids Example

Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let C1 -
(4, 5) and C2 -(8, 5) are the two medoids.
K-Medoids Example

Step 2: Calculating cost.


The dissimilarity (Manhatten Distance) of each non-medoid
point with the medoids is calculated and tabulated:
Cluster
C2
C1
C1
C2
-
C1
C2
C2
C2
-

Each point is assigned to the cluster of that medoid whose


dissimilarity is less.
The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The Cost = (3 + 4 + 4) + (2 + 2 + 3 + 1 + 1) = 20
Step 3: Randomly select one non-medoid point and recalculate
the cost.
Let the randomly selected point be (8, 4). The dissimilarity of
each non-medoid point with the medoids – C1 (4, 5) and C2 (8,
4) is calculated and tabulated.
Cluster
C2
C1
C1
C2
C2
C1
C2
-
C2
-

Each point is assigned to that cluster whose dissimilarity is less.


So, the points 1, 2, 5 go to cluster C1 and 0, 3, 4, 6, 8 go to cluster
C2.
The New cost = (3 + 4 + 4) + (3+ 3 + 1 + 2 + 2) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0
• As the swap cost is not less than zero, we undo the swap.
• If the recalculated cost is less, it repeats step 4.
(Swap a non-medoid point with a medoid point and
recalculate the cost. )
• After the algorithm completes, we will have k medoid
points with their clusters.
Advantages:
• It is simple to understand and easy to implement.
• Partitioning Around Medoids is less sensitive to outliers than
other partitioning algorithms.
Disadvantages:
• The main disadvantage of K-Medoid algorithms is that it is not
suitable for clustering non-spherical (arbitrary shaped) groups
of objects. This is because it relies on minimizing the distances
between the non-medoid objects and the medoid (the cluster
centre) – briefly, it uses compactness as clustering criteria
instead of connectivity.
• It may obtain different results for different runs on the same
dataset because the first k medoids are chosen randomly.
Comparison between K-Means and K-Medoids

Feature K-Means K-Medoids


Actual data point
Cluster center Mean (centroid)
(medoid)
Robust to
No Yes
outliers
Speed Fast Slow (PAM)
Any (e.g.,
Distance metric Usually Euclidean Manhattan,
Euclidean)
Cluster shape
Spherical, equally sized Spherical (less strict)
assumption
Suitable for large Suitable for small to
Datasets
datasets medium datasets
Why Hierarchical Clustering?

• In the K-means clustering there are some challenges with this


algorithm, which are a predetermined number of clusters, and
it always tries to create the clusters of the same size.
• To solve these two challenges, we can opt for the hierarchical
clustering algorithm because, in this algorithm, we don't need
to have knowledge about the predefined number of clusters.
Hierarchical Clustering

• Hierarchical clustering is an unsupervised learning technique


used to group similar data points into clusters by building a
hierarchy (tree-like structure).
• Unlike flat clustering like k-means hierarchical clustering does
not require specifying the number of clusters in advance.
• There are two main types of hierarchical clustering.
1. Agglomerative Clustering (bottom-up approach)
2. Divisive clustering (top-down approach)
Dendogram

• A dendrogram is like a family tree for clusters.


• It shows how individual data points or groups of data merge
together.
• The bottom shows each data point as its own group and as we
move up, similar groups are combined.
• The lower the merge point, the more similar the groups are. It
helps us see how things are grouped step by step.
Dendogram

• At the bottom of the dendrogram the


points P, Q, R, S and T are all separate.
• As we move up, the closest points are
merged into a single group.
• The lines connecting the points show
how they are progressively merged
based on similarity.
• The height at which they are connected
shows how similar the points are to
each other; the shorter the line the
more similar they are
Hierarchical Agglomerative clustering

• It is also known as the bottom-up approach or hierarchical


agglomerative clustering (HAC).
• Bottom-up algorithms treat each data as a singleton cluster at
the outset and then successively agglomerate pairs of clusters
until all clusters have been merged into a single cluster that
contains all data.
Workflow for Hierarchical Agglomerative clustering

1. Start with individual points


2. Calculate distances between clusters
3. Merge the closest clusters
4. Update distance matrix: Recalculate the distances between the
new cluster and the remaining clusters.
5. Repeat steps 3 and 4: Keep merging the closest clusters and
updating the distance matrix until we have only one cluster
left.
6. Create a dendrogram
Hierarchical Divisive clustering

• Divisive clustering is also known as a top-down approach.


• Top-down clustering requires a method for splitting a cluster
that contains the whole data and proceeds by splitting clusters
recursively until individual data have been split into singleton
clusters.
Workflow for Hierarchical Divisive clustering

• Start with all data points in one cluster


• Split the cluster: Divide the cluster into two smaller clusters.
The division is typically done by finding the two most dissimilar
points in the cluster and using them to separate the data into
two parts.
• Repeat the process: For each of the new clusters, repeat the
splitting process: Choose the cluster with the most dissimilar
points and split it again into two smaller clusters.
• Stop when each data point is in its own cluster: Continue this
process until every data point is its own cluster or the stopping
condition (such as a predefined number of clusters) is met.
How the Agglomerative Hierarchical clustering
Work?
• The working of the AHC algorithm can be explained using the
below steps:
Step-1: Create each data point as a single cluster. Let's say there
are N data points, so the number of clusters will also be N.
Agglomerative Hierarchical clustering

• Step-2: Take two closest data points or clusters and merge them
to form one cluster. So, there will now be N-1 clusters.
Agglomerative Hierarchical clustering

• Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.
Agglomerative Hierarchical clustering

• Step-4: Repeat Step 3 until only one cluster left. So, we will get
the following clusters. Consider the below images:
Agglomerative Hierarchical clustering

• Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the
problem.
Measure for the distance between two clusters

• As we have seen, the closest distance between the two clusters


is crucial for the hierarchical clustering.
• There are various ways to calculate the distance between two
clusters, and these ways decide the rule for clustering. These
measures are called Linkage methods.
Measure for the distance between two clusters

• Single Linkage: It is the Shortest Distance between the closest


points of the clusters.
• Simple Linkage is also known as the Minimum Linkage
(MIN) method.
• Consider the below image:
Measure for the distance between two clusters

• Complete Linkage: It is the farthest distance between the two


points of two different clusters.
• The complete Linkage method is also known as the Maximum
Linkage (MAX) method.
• It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.
Measure for the distance between two clusters

• Average Linkage: It is the linkage method in which the distance


between each pair of datasets is added up and then divided by
the total number of datasets to calculate the average distance
between two clusters. It is also one of the most popular linkage
methods.
Measure for the distance between two clusters

• Centroid Linkage: It is the linkage method in which the distance


between the centroid of the clusters is calculated. Consider the
below image:

From the above-given approaches, we can apply any of them


according to the type of problem or business requirement.
Measure for the distance between two clusters

• Ward linkage is a method used in


hierarchical clustering to determine the
distance between two clusters before
merging them.
• It computes the sum of variances for each
cluster separately and then computes the
variance for the cluster that would be
formed by merging the two.
• The goal of Ward linkage is to minimize
the increase in variance within the newly
formed cluster when two clusters are
combined.
• The Ward's linkage supports only the
Euclidean distance.
73
Woking of Dendrogram in Hierarchical clustering

• The working of the dendrogram can be explained using the


below diagram:

• In the above diagram, the left part is showing how clusters are
created in agglomerative clustering, and the right part is
showing the corresponding dendrogram.
Woking of Dendrogram in Hierarchical clustering

The working of the dendrogram can be explained using the below


diagram:
• Firstly, the datapoints P2 and P3 combine together and form a
cluster, correspondingly a dendrogram is created, which connects
P2 and P3 with a rectangular shape. The hight is decided
according to the Euclidean distance between the data points.
• In the next step, P5 and P6 form a cluster, and the corresponding
dendrogram is created. It is higher than of previous, as the
Euclidean distance between P5 and P6 is a little bit greater than
the P2 and P3.
Woking of Dendrogram in Hierarchical clustering

The working of the dendrogram can be explained using the below


diagram:
• Again, two new dendrograms are created that combine P1, P2,
and P3 in one dendrogram, and P4, P5, and P6, in another
dendrogram.
• At last, the final dendrogram is created that combines all the data
points together.

We can cut the dendrogram tree structure at any level as per our
requirement.
Implementation of Agglomerative Hierarchical
Clustering

Steps for implementation of HAC using Python:


• The steps for implementation will be the same as the k-means
clustering, except for some changes such as the method to find
the number of clusters. Below are the steps:
1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters
Implementation

# Importing the libraries


import [Link] as plt
from [Link] import AgglomerativeClustering
from [Link] import dendrogram, linkage

X = [4, 5, 10, 4, 3, 11, 14, 10, 12, 5, 12, 4,11,12]


y = [18, 19, 14, 17, 16, 24, 24, 21, 21, 17, 23, 15,15,16]

data = list(zip(X, y))


Finding the optimal number of clusters using the
Dendrogram

# Step 1: Plot Dendrogram


linked = linkage(data, method='ward’)
[Link](figsize=(10, 6))
dendrogram(linked)
[Link]('Hierarchical Clustering Dendrogram')
[Link]('Data Points')
[Link]('Distance')
[Link]()
• Output:
• By executing the above lines of code, we will get the below
output:
Using this Dendrogram, we can determine the optimal number
of clusters for our model. For this, we will find the maximum
vertical distance that does not cut any horizontal bar. Consider
the below diagram:
# Step 2: Agglomerative Clustering with chosen number of
# clusters, say 3
model = AgglomerativeClustering(n_clusters=3,
metric='euclidean', linkage='ward')
labels = model.fit_predict(data)
# Step 3: Visualize clusters
[Link](figsize=(8, 6))
[Link](X, y, c=labels, cmap='rainbow')
[Link]('Agglomerative Clustering Result')
[Link]('X')
[Link]('Y')
[Link](True)
[Link]()
Difference between K Means and Hierarchical clustering
Feature K-Means Hierarchical Clustering
Number of Must specify Can decide number of clusters
clusters 𝑘beforehand after (by cutting dendrogram)
Iterative assignment & Builds tree by merging or
Approach
centroid update splitting clusters
Best for spherical, Can find clusters of arbitrary
Cluster shape
equally sized clusters shape
Scales well for large Computationally expensive for
Scalability
datasets large datasets
Sensitive to initial
Initialization No initialization needed
centroids
Output Cluster labels only Dendrogram + cluster labels
Robustness More robust depending on
Sensitive to outliers
to outliers linkage method
86
Applications of Clustering

Clustering has a large no. of applications spread across various


domains. Some of the most popular applications of clustering
are:
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection
• Recommendation engines

89
Syllabus
UNIT V : DISTANCE AND RULE BASED MODELS
• Distance Based Models: Distance Metrics (Euclidean,
Manhattan, Hamming, Minkowski Distance Metric), Neighbors
and Examples, K-Nearest Neighbour for Classification and
Regression, Clustering as Learning task: K-means clustering
Algorithm-with example, k-medoid algorithm-with example,
Hierarchical Clustering, Divisive Dendrogram for hierarchical
clustering, Performance Measures
• Association Rule Mining: Introduction, Rule learning for
subgroup discovery, Apriori Algorithm, Performance Measures
– Support, Confidence, Lift.
Association Rule

• Association rule learning is a type of unsupervised learning


technique that checks for the dependency of one data item
on another data item and maps accordingly so that it can be
more profitable.
• For example, if a customer buys bread, he most likely can
also buy butter, eggs, or milk, so these products are stored
within a shelf or mostly nearby.
• Association rule learning can be divided into three types of
algorithms:
o Apriori
o Eclat
o F-P Growth Algorithm
Association Rule

How does Association Rule Learning work?


• Association rule learning works on the concept of If and Else
Statement, such as if A then B.
• To measure the associations between thousands of data
items, there are several metrics. These metrics are given
below:

o Support
o Confidence
o Lift
Association Rule
Support: Support is the frequency of A or how frequently an
item appears in the dataset.

Confidence: Confidence indicates how often the rule has been


found to be true.

Lift: It is the strength of any rule, which can be defined as


below formula:
Apriori Algorithm

– The Apriori algorithm uses frequent itemsets to generate


association rules, and it is designed to work on the databases
that contain transactions.
– With the help of these association rule, it determines how
strongly or how weakly two objects are connected.
– This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset associations efficiently.
– It is the iterative process for finding the frequent itemsets
from the large data
Apriori Algorithm

Steps for Apriori Algorithm:

Step-1: Determine the support of itemsets in the transactional


database, and select the minimum support and confidence.

Step-2: Take all supports in the transaction with higher support


value than the minimum or selected support value.

Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum confidence.

Step-4: Sort the rules as the decreasing order of lift.


Apriori Algorithm

Example: Suppose we have the following dataset that has


various transactions, and from this dataset, we need to find the
frequent itemsets and generate the association rules using the
Apriori algorithm.
Apriori Algorithm

Step-1: Calculating C1(Candidate set) and L1(frequent itemset)


C1

L1
Apriori Algorithm

Step-2: Candidate generation C2 and L2


C2

L2
Apriori Algorithm

Step-3: Candidate generation C3 and L3


C3

L3
Apriori Algorithm

Step-4: Finding the association rules for the subsets


To generate the association rules, first, we will create a new
table with the possible rules from the occurred combination {A,
B.C}.
For all the rules, we will calculate the Confidence using formula
confidence(A->B) = sup( A ^B)/sup(A).
After calculating the confidence value for all rules, we will
exclude the rules that have less confidence than the minimum
threshold(50%).
Apriori Algorithm

Step-4: Finding the association rules for the subsets


Rules Support Confidence
A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)
= 2/4=0.5=50%
B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)
= 2/4=0.5=50%
A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)
= 2/4=0.5=50%
C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)
= 2/5=0.4=40%
A→ B^C 2 Sup{(A^( B ^C)}/sup(A)
= 2/6=0.33=33.33%
B→ A^C 2 Sup{(B^( A ^C)}/sup(B)
= 2/7=0.28=28%
K-Nearest Neighbor(KNN) Algorithm

• K-Nearest Neighbors (KNN) is a supervised machine


learning algorithm generally used for classification but can
also be used for regression tasks.
• It works by finding the "k" closest data points (neighbors) to a
given input and makes a predictions based on the majority
class (for classification) or the average value (for regression).
K-Nearest Neighbor(KNN) Algorithm

The following two properties would define KNN well


• Non-parametric learning algorithm − KNN is also a non-
parametric learning algorithm because it doesn't assume
anything about the underlying data.

• Lazy learning algorithm- K-Nearest Neighbors is also called as


a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the entire dataset
and performs computations only at the time of classification.
K-Nearest Neighbor(KNN) Algorithm

• For example, consider the following table of data points


containing two features:

• The new point is classified as Category 2 because most of its


closest neighbors are blue squares. KNN assigns the category
based on the majority of nearby points.
How Does K-Nearest Neighbors Algorithm Work?

1. Choose the number of neighbors (k) – how many neighbors


to consider.
2. Calculate distance between the new data point and all the
training points.
3. Sort the distances and pick the k closest data points.
4. Vote for the most common class (in classification) or
average the values (in regression).
5. Return the prediction.
Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbor, these


neighbors are used for classification and regression task. To
identify nearest neighbor we use below distance metrics:
– Euclidean Distance: √((x2 - x1)² + (y2 - y1)² + ...)
– Manhattan Distance: |x1 - x2| + |y1 - y2|
– Minkowski Distance (generalized form)
K-Nearest Neighbor(KNN) Algorithm

Example
Suppose we have a dataset which can be plotted as follows −
K-Nearest Neighbor(KNN) Algorithm
Example
Now, we need to classify new data point with black dot (at
point 60,60) into blue or red class. We are assuming K = 3 i.e. it
would find three nearest data points. It is shown in the next
diagram −
K-Nearest Neighbor(KNN) Algorithm
Example
We can see in the above diagram the three nearest neighbors
of the data point. Among those three, two of them lies in Red
class hence the black dot will also be assigned in red class
How to choose the value of k for KNN Algorithm?

– The value of k in KNN decides how many neighbors the


algorithm looks at when making a prediction.
– Choosing the right k is important for good results.
– If the data has lots of noise or outliers, using a larger k can
make the predictions more stable.
– But if k is too large the model may become too simple and
miss important patterns and this is called underfitting.
– So k should be picked carefully based on the data.
How to choose the value of k for KNN Algorithm?

• Cross-Validation: Use cross-validation to evaluate different k


values and select the one that maximizes model
performance, ensuring good generalization to unseen data.
• Elbow Method: Plot error rate or accuracy against k values
and choose the point where improvements slow down—the
“elbow”—as a suitable k.
• Square Root of N Rule: A quick heuristic to pick an initial k
by setting it to the square root of the total number of data
points (N), especially when no prior knowledge or tuning is
available.
Building a K Nearest Neighbors Model

We can follow the below steps to build a KNN model −


1. Load the data − The first step is to load the dataset into
memory. This can be done using various libraries such as
pandas or numpy.
2. Split the data − The next step is to split the data into
training and test sets. The training set is used to train the
KNN algorithm, while the test set is used to evaluate its
performance.
3. Normalize the data − Before training the KNN algorithm, it
is essential to normalize the data to ensure that each
feature contributes equally to the distance metric
calculation.
Building a K Nearest Neighbors Model

4. Calculate distances − Once the data is normalized, the KNN


algorithm calculates the distances between the test data point
and each data point in the training set.
5. Select k-nearest neighbors − The KNN algorithm selects the
k-nearest neighbors based on the distances calculated in the
previous step.
6. Make a prediction − For classification problems, the KNN
algorithm assigns the test data point to the class that appears
most frequently among the k-nearest neighbors. For regression
problems, the KNN algorithm assigns the test data point the
average of the k-nearest neighbors' values.
7. Evaluate performance − Finally, the KNN algorithm's
performance is evaluated using various metrics such as
accuracy, precision, recall, and F1-score.
Building a K Nearest Neighbors Model

import [Link] as plt


x = [4, 5, 10, 4, 3, 11, 14 , 8, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
classes = [0, 0, 1, 0, 0, 1, 1, 0, 1, 1]

[Link](x, y, c=classes)
[Link]()
Building a K Nearest Neighbors Model

from [Link] import KNeighborsClassifier

data = list(zip(x, y))


knn = KNeighborsClassifier(n_neighbors=5)

[Link](data, classes)

new_x = 8
new_y = 21
new_point = [(new_x, new_y)]

prediction = [Link](new_point)
Building a K Nearest Neighbors Model

[Link](x + [new_x], y + [new_y], c=classes + [prediction[0]])


[Link](x=new_x-1.7, y=new_y-0.7,
s=f"new point, class: {prediction[0]}")
[Link]()
Pros and Cons of KNN

Pros
• It is very simple algorithm to understand and interpret.
• It is very useful for nonlinear data because there is no
assumption about data in this algorithm.
• It is a versatile algorithm as we can use it for classification as
well as regression.
• It has relatively high accuracy but there are much better
supervised learning models than KNN.
Pros and Cons of KNN

Cons
• It is computationally a bit expensive algorithm because it
stores all the training data.
• High memory storage required as compared to other
supervised learning algorithms.
• Prediction is slow in case of big N.
• It is very sensitive to the scale of data as well as irrelevant
features.

You might also like