0% found this document useful (0 votes)
8 views

Module 5-Part 1

Uploaded by

rudrav728
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Module 5-Part 1

Uploaded by

rudrav728
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

AMT305 –

INTRODUCTION TO
MACHINE LEARNING
MODULE-5 (UNSUPERVISED LEARNING) ENSEMBLE METHODS,
VOTING, BAGGING, BOOSTING. UNSUPERVISED LEARNING -
CLUSTERING METHODS -SIMILARITY MEASURES, K-MEANS
CLUSTERING, EXPECTATION-MAXIMIZATION FOR SOFT
CLUSTERING, HIERARCHICAL CLUSTERING METHODS , DENSITY
BASED CLUSTERING
2

MODULE 5—PART I
Ensemble methods, Voting, Bagging,
Boosting. Unsupervised Learning - Clustering
Methods -Similarity measures, K-means
clustering

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


3
ENSEMBLE METHODS
 Ensemble methods are a powerful class of techniques in machine
learning that aim to improve model performance by combining multiple
models to make predictions
 ensemble methods aggregate the predictions of several models, which can
lead to more accurate, robust, and stable predictions.
 a group of weak models, when combined, can produce a strong model.
 Weak Learner: A model that performs slightly better than random
guessing
 Strong Learner: A model that achieves high accuracy and generalizes
well to unseen data.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


4 Contd…

Why Use Ensemble Methods?


•Improved Accuracy: Ensemble methods tend to have better predictive
performance compared to individual models, as they reduce variance,
bias, or improve generalization.
•Robustness: Combining multiple models reduces the risk of a poor
generalization, as the ensemble averages out the weaknesses of individual
models.
•Reduction in Overfitting: Some ensemble techniques like bagging can
reduce overfitting in high-variance models like decision trees.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


VOTİNG
5
 Combine predictions from multiple models to make a final decision.
 In voting ensembles, each model makes a prediction, and the final prediction is
made based on the majority vote for classification or averaging for regression.
Linear combination
L
y  w jd j
j 1
L
w j  0 and w
j 1
j 1

Classification
L
y i   w j d ji
j 1

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


contd…
6
Types of Voting:
Hard Voting: Uses the majority vote (most frequent class label).
Soft Voting: Uses the average of the predicted probabilities and selects the class
with the highest average probability.
Advantages:
Simple to implement and works well when models are diverse.
Reduces the variance in predictions, leading to improved generalization.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


7 BAGGING (Bootstrap Aggregating)

 Reduce variance and prevent overfitting.


 In bagging, multiple instances of the same base learner (e.g., decision
trees) are trained on different random subsets of the training data
created via bootstrapping (sampling with replacement).
 Use bootstrapping to generate L training sets and train one base-
learner with each (Breiman, 1996)
 Each model makes predictions independently, and the final prediction
is made by averaging (for regression) or majority voting (for
classification).
 Example: Random Forest is a popular bagging method where many
decision trees are trained on different bootstrapped samples of the
data, and their outputs are averaged (for regression) or voted (for
classification).
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
8 Contd…
 A collection of decision trees, each trained on a random subset of the
data.

 Each tree provides a prediction, and the final prediction is made by


averaging (regression) or voting (classification).

 Reduces overfitting by averaging multiple decision trees.

 Advantages:
 Reduces variance by averaging predictions.
 Robust against overfitting, especially with high-variance models like
decision trees.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
BOOSTING
9
 Reduce bias and improve predictive accuracy
 In boosting, models are trained sequentially, where each new model
focuses on correcting the mistakes of the previous one.
 Boosting gives more weight to misclassified examples, and each
subsequent model tries to improve upon the errors made by the
previous ones.
 Example: AdaBoost (Adaptive Boosting) and Gradient Boosting.

AdaBoost:
 Each weak learner is trained on the data, and misclassified examples
are given higher weights so that the next model pays more attention
to these difficult examples.
 Final predictions are made by a weighted majority vote of the weak
learners.
AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
Contd…
10

• Gradient Boosting:
• Models are built sequentially, but instead of adjusting the weights, gradient
boosting fits the next model to the residuals (the errors of the previous model).
This effectively minimizes the error at each iteration.
• Advantages:
• Reduces bias and increases accuracy.
• Works well with weak learners (e.g., shallow decision trees)

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


11 Contd…
AdaBoost

Generate a
sequence of
base-learners
each focusing
on previous
one’s errors
(Freund and
Schapire, 1996)

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


UNSUPERVISED LEARNING:- Similarity measures-
12 Minkowski distance measures( Manhattan, Euclidean), Cosine Similarity

Minkowski Distance

 The Minkowski distance is a generalization of both the Euclidean and


Manhattan distances and can be defined for different values of the
parameter p.
 Minkowski distance between two points A=(x1,x2,…,xn) and
B=(y1,y2,…,yn) in an n-dimensional space is given by the formula:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


13
Contd…
For different values of ppp, we obtain different distance metrics:

1.Manhattan Distance (L1 Norm, p=1)

 Also known as city block distance or taxicab distance, it


calculates the distance between two points along the axes of the
space.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
14
2. Euclidean Distance (L2 Norm, p=2)

 Measures the straight-line (as-the-crow-flies) distance between two


points in space.
 Used in tasks where the shortest physical distance is meaningful,
such as in k-nearest neighbors (k-NN) classification and clustering
(e.g., k-means).

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
15
 General form for all p≥1.

•Behavior for different p:

p=1: Manhattan distance.


p=2: Euclidean distance.
As p→∞: It approaches the Chebyshev distance, where only the
largest difference across any dimension matters.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Cosine Similarity
16
 Cosine Similarity measures the cosine of the angle between two non-
zero vectors in an inner product space.
 Cosine similarity focuses on the direction of the vectors and is often
used when the magnitude of the vectors is not important.
 The formula for cosine similarity between two vectors
A=(x1,x2,…,xn)and B=(y1,y2,…,yn)is:

Where:
•A B is the dot product of vectors A and B.
• A and B are the magnitudes (Euclidean norms) of vectors A and B.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
17
Range: Cosine similarity ranges from −1 to 1, where:
•1 means the vectors are identical in direction.
•0 means the vectors are orthogonal (completely dissimilar).
•−1means the vectors are exactly opposite.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


18 Contd…
Key Differences Between Cosine Similarity and Minkowski Distance

 Cosine similarity measures the angle between vectors and is concerned


with the orientation rather than the magnitude of the data points. It is
commonly used in text analysis, information retrieval, and
recommendation systems, where the presence or absence of features
(e.g., words in a document) is more important than their absolute values.

 Minkowski distance measures the absolute difference between vectors


and takes into account both the direction and magnitude of the data
points. It is used in spatial data analysis, where the physical distance or
difference between data points matters.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


K-MEANS CLUSTERING
19
Clustering or cluster analysis is the task of grouping a set of objects
objects in the same group (called a cluster) are more similar (in some
sense) to each other than to those in other groups (clusters).
 One of the simplest unsupervised learning algorithms for solving the
clustering problem.
 Start by choosing k points arbitrarily as the “centers” of the clusters, one
for each cluster
 Then associate each of the given data points with the nearest center.
 Now take the averages of the data points associated with a center and
replace the center with the average, and this is done for each of the centers.

 repeat the process until the centres converge to some fixed points.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


20
k-means clustering - Algorithm

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


21 k-means clustering – Problem 1

Hint:-The distance between the points (x1, x2) and (y1, y2) will be
calculated using the familiar distance formula of elementary analytical
geometry:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


22 Solution
1. In the problem, the required number of clusters is 2 and we take k = 2.
2. We choose two points arbitrarily as the initial cluster centres. Let us
choose arbitrarily

3. We compute the distances of the given data points from the cluster
centers.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


24
Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


contd
25 4. The cluster centers are recalculated as follows:

5. We compute the distances of the given data points from the new cluster
centers.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


26

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


6. The cluster centres are recalculated as follows:

27

(2,1.75)

(4.5,4)

7. We compute the distances of the given data points from the new
cluster centers.

27 by DEpt. of CSE, CE Kottarakkara


AMT 305 Introduction to Machine Learning,prepared
Contd…
28 8. The cluster centers are recalculated as follows:

9. This divides the data into two clusters as follows

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
29 10. The cluster centres are recalculated as follows:

11. These are identical to the cluster centres calculated in Step 8. So there
will be no reassignment of data points to different clusters and hence the
computations are stopped here.
12. Conclusion: The k means clustering algorithm with k = 2 applied to
the dataset yields the following clusters and the associated cluster centres:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


30

MODULE 5 – PART I ENDS

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara

You might also like