K-Nearest Neighbors(KNN)
Algorithm
By Tharuka Vishwajith
What is KNN?
KNN is a type of supervised ML algorithm which can be
used for both classification as well as regression
predictive problems. However, it is mainly used for
classification predictive problems in industry.
• Lazy learning algorithm
• Non-parametric learning algorithm
Idea:
• Similar examples have similar label.
• Classify new examples like similar training examples.
Algorithm:
• Given some new example x for which we need to predict its class y
• Find most similar training examples
• Classify x “like” these most similar examples
Questions:
• How to determine similarity?
• How many similar training examples to consider?
• How to resolve inconsistencies among the training examples?
Instance-Based Learning
Nearest Neighbor
One of the simplest of all machine learning classifiers
Simple idea: label a new point the same as the closest known point
Label it red.
K Nearest Neighbor
Label it red, when k = 3
Label it blue, when k = 7
Generalizes 1-NN to smooth away noise in the labels
A new point is now assigned the most frequent label of its k nearest neighbors
1 - Nearest Neighbor
K = 1
Forms a Voronoi tessellation of the instance space
Distance Metrics
Different metrics can change the decision surface
Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2
K = 1
Distance Metrics
• Euclidean distance
• Manhattan distance
• Hamming distance (for discrete data)
• Others (e.g., normal, cosine)
When different units are used for each dimension normalize each dimension by
standard deviation.
https://youtu.be/n_1I2HerDvcMore info about distance calculation:
Pros and Cons of KNN
Pros
• Simple algorithm to understand and interpret.
• Useful for nonlinear data because there is no assumption
about data in this algorithm.
• Versatile algorithm as we can use it for classification as well as
regression.
• Has Relatively high accuracy but there are much better
supervised learning models than KNN.
Pros and Cons of KNN
Cons
• Computationally a bit expensive algorithm because it stores all
the training data.
• High memory storage required as compared to other
supervised learning algorithms.
• Prediction is slow in case of big N.
• Sensitive to the scale of data as well as irrelevant features.
Applications of KNN
Banking System
KNN can be used in banking system to predict weather an individual is fit
for loan approval? Does that individual have the characteristics similar to
the defaulters one?
Calculating Credit Ratings
KNN algorithms can be used to find an individual’s credit rating by
comparing with the persons having similar traits.
Applications of KNN
Politics
With the help of KNN algorithms, we can classify a potential voter into
various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party
‘Congress’, “Will Vote to Party ‘BJP’.
Other areas
Speech Recognition, Handwriting Detection, Image Recognition and Video
Recognition.

K Nearest Neighbor Algorithm

  • 1.
  • 2.
    What is KNN? KNNis a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. • Lazy learning algorithm • Non-parametric learning algorithm
  • 3.
    Idea: • Similar exampleshave similar label. • Classify new examples like similar training examples. Algorithm: • Given some new example x for which we need to predict its class y • Find most similar training examples • Classify x “like” these most similar examples Questions: • How to determine similarity? • How many similar training examples to consider? • How to resolve inconsistencies among the training examples? Instance-Based Learning
  • 4.
    Nearest Neighbor One ofthe simplest of all machine learning classifiers Simple idea: label a new point the same as the closest known point Label it red.
  • 5.
    K Nearest Neighbor Labelit red, when k = 3 Label it blue, when k = 7 Generalizes 1-NN to smooth away noise in the labels A new point is now assigned the most frequent label of its k nearest neighbors
  • 6.
    1 - NearestNeighbor K = 1 Forms a Voronoi tessellation of the instance space
  • 7.
    Distance Metrics Different metricscan change the decision surface Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2 K = 1
  • 8.
    Distance Metrics • Euclideandistance • Manhattan distance • Hamming distance (for discrete data) • Others (e.g., normal, cosine) When different units are used for each dimension normalize each dimension by standard deviation. https://youtu.be/n_1I2HerDvcMore info about distance calculation:
  • 9.
    Pros and Consof KNN Pros • Simple algorithm to understand and interpret. • Useful for nonlinear data because there is no assumption about data in this algorithm. • Versatile algorithm as we can use it for classification as well as regression. • Has Relatively high accuracy but there are much better supervised learning models than KNN.
  • 10.
    Pros and Consof KNN Cons • Computationally a bit expensive algorithm because it stores all the training data. • High memory storage required as compared to other supervised learning algorithms. • Prediction is slow in case of big N. • Sensitive to the scale of data as well as irrelevant features.
  • 11.
    Applications of KNN BankingSystem KNN can be used in banking system to predict weather an individual is fit for loan approval? Does that individual have the characteristics similar to the defaulters one? Calculating Credit Ratings KNN algorithms can be used to find an individual’s credit rating by comparing with the persons having similar traits.
  • 12.
    Applications of KNN Politics Withthe help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party ‘Congress’, “Will Vote to Party ‘BJP’. Other areas Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition.