0% found this document useful (0 votes)
66 views

K-Nearest Neighbor (KNN) ..: Class or Value

K-Nearest Neighbors (kNN) is a simple machine learning algorithm that classifies new data points based on the majority class of the k closest training examples in the feature space. It requires storing all available cases and classifying new cases based on similarity measured by a distance function. The value of k determines how many nearest neighbors to use for classification, with larger values reducing the effect of noise but increasing the chance of misclassifying outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

K-Nearest Neighbor (KNN) ..: Class or Value

K-Nearest Neighbors (kNN) is a simple machine learning algorithm that classifies new data points based on the majority class of the k closest training examples in the feature space. It requires storing all available cases and classifying new cases based on similarity measured by a distance function. The value of k determines how many nearest neighbors to use for classification, with larger values reducing the effect of noise but increasing the chance of misclassifying outliers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

K-Nearest Neighbor (kNN)..

•kNN is non parametric an instance or memory based


method for predicting class or value for a given query.
• Classification is computed from a simple majority vote of
the nearest neighbors of each point.
• New data point is assigned a class which the most data
points have, in the nearest neighbors of the point.
•Suited for complex and multiple relationships between
the features and targets.
•Not suitable if the data is noisy and target classes don’t
have clear demarcation in terms of attribute values.

4/05/2021 Dr Geetishree Mishra 1


Requires 3 things:
• Set of stored records
• Distance metric to compute distance between
the records.
• The value of k, the number of nearest
neighbors to retrieve.

4/05/2021 Dr Geetishree Mishra 2


1. Measuring similarity with distance between the points using Euclidian method.
2. Other distance measures: Manhattan, Minkowski, Cosine similarity, Chebyshev etc.

4/05/2021 Dr Geetishree Mishra 3


4/05/2021 Dr Geetishree Mishra 4
Example..

Points X1 (shape ) X2(colour) Y=Classification

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD

4/05/2021 Dr Geetishree Mishra 5


Points X1(shape) X2(colour) Y(Classification)

P1 7 7 BAD

P2 7 4 BAD

P3 3 4 GOOD

P4 1 4 GOOD

P5 3 7 ?

4/05/2021 Dr Geetishree Mishra 6


Visualize..

4/05/2021 Dr Geetishree Mishra 7


Euclidian Distance..

KNN
P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from

4/05/2021 Dr Geetishree Mishra 8


P1 P2 P3 P4

(7,7) (7,4) (3,4) (1,4)


Euclidean
Distance of
P5(3,7) from

Class BAD BAD GOOD GOOD

4/05/2021 Dr Geetishree Mishra 9


Variation in kNN..

4/05/2021 Dr Geetishree Mishra 10


Different values of ‘k’..

4/05/2021 Dr Geetishree Mishra 11


kNN Classification Algorithm..

4/05/2021 Dr Geetishree Mishra 12


4/05/2021 Dr Geetishree Mishra 13
Choosing the value of ‘k’..
• If ‘k’ is chosen too small, decision becomes too biased.
•If ‘k’ is too large, neighborhood may include points from
other class. It suppresses impact of noise but prone to
majority class dominating.

4/05/2021 Dr Geetishree Mishra 14


Choosing the value of ‘k’..
• The choice of k value in KNN classifier is critical.
• A small value of K implies a higher influence of
noise over result whereas a large value makes it
computationally intensive.
• Some heuristics suggest to choose a K value =
sqrt(N)/2 where N is the size of training dataset.
• an odd value (3, 5, 7,..) of K helps to avoid tie
between predicted classes.

4/05/2021 Dr Geetishree Mishra 15


… Normalize all the features: (z-score=x- μ /σ)

4/05/2021 Dr Geetishree Mishra 16


Pros & Cons..
Pros-
• K- nearest neighbor algorithm is very simple to implement
and the algorithm is robust when the error ratio is small.
• No explicit model building.
• It also does not make any assumption about the
distribution of classes and can work with multiple classes
simultaneously.
Cons-
• It calculates distance for every new point so become
computationally expensive(Lazy Learner).
• The method is not effective when distribution overlaps with
each other.
• Fixing an optimal value of K is the challenge in KNN.

4/05/2021 Dr Geetishree Mishra 17


Summary
• KNN solves both the problems of classification as well as
regression.
• This working principle is mainly based on feature similarity in both
classification and regression problem.
• KNN classifier is different from other probabilistic classifiers where
the model comprises a learning step of computing probabilities
from a training sample and use them for future prediction of a test
sample.
• In contrast, in KNN there is no learning step involved instead the
dataset is stored in memory and is used to classify the test query
on the fly.
• KNN is also known as Lazy learner as it does not create a model
using training set in advance. It is one of the simplest methods of
classification.
• In KNN, the term `k’ is a parameter which refers to the number of
nearest neighbors.
4/05/2021 Dr Geetishree Mishra 18

You might also like