CSC354
Machine Learning
Dr Muhammad Sharjeel
04
K-Nearest Neighbours
What we have learnt (so far)
Inductive step: Construction of model from data
Deductive step: Applying the (derived) model to unseen data
In DT (or rule based classifiers), the model(s) is constructed immediately once the
training data is provided
Such models are called, eager learners
Intend to learn as soon as possible (based on the training data)
Eager Learning: Given a set of training instances, construct a model before
receiving new (i.e., test instances) data
CSC354 – Machine Learning Dr Muhammad Sharjeel
Lazy learners delay the process of generalising (learning) using the training
instances until it is needed to predict the test (unseen) instance(s)
Also known as the instance-based learners
Simply store the training instances and wait until given the test instance(s)
Generalise only when provided with the test instance(s)
Do less work when a training instance is presented
More work when making a prediction
One of the most famous type of lazy learner classifier is K-NN (K-Nearest
Neighbours)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Nearest Neighbour
Are based on learning by analogy
Compare a given test instance with the training instance(s) that are relatively similar to
the test instance
These training instances are called nearest neighbours of the test instance
The test instance is classified according to the class of its neighbours
K-Nearest Neighbour
When given a test instance, a K-NN classifier searches the pattern space
(training data) for the k training instances that are closest to the test
instance
These k training instances are the k-nearest neighbours of the test instance
CSC354 – Machine Learning Dr Muhammad Sharjeel
If it looks like a duck, swims like a duck, and quacks like a duck, then it
probably is a duck
CSC354 – Machine Learning Dr Muhammad Sharjeel
KNN requires three things
Set of training instances (input + output)
Distance metric, to compute distance between records
The value of k, the number of nearest neighbours to retrieve
CSC354 – Machine Learning Dr Muhammad Sharjeel
K-NN can be used for classification as well as regression
For classification:
Test instance is assigned the most common class among its k-nearest neighbours
For regression:
Compute the average value associated with the k-nearest neighbours of the test
instance
CSC354 – Machine Learning Dr Muhammad Sharjeel
Algorithm:
Let k be the number of nearest neighbours and D be the set of training instances
1. For each test instance z = (x’,y’) do
2. Compute d(x’,x), the distance between z and every example, (x,y) ε D
3. Select Dz ⊆ D, the set of k closest training instances to z
4. y’ = argmax𝑣 σ(𝑥𝑖,𝑦𝑖 )∈ 𝐷𝑧 𝐼 (𝑣 = 𝑦𝑖)
5. End for
CSC354 – Machine Learning Dr Muhammad Sharjeel
To classify a test (unseen) instance, a K-NN classifier
Compute distance to other training instances
Identify k nearest neighbours
Use class labels (target variable) of nearest neighbours to determine the class label of
the test instance (e.g., by taking majority vote)
CSC354 – Machine Learning Dr Muhammad Sharjeel
K-nearest neighbours of an instance X are data points that have the k smallest
distance to X
CSC354 – Machine Learning Dr Muhammad Sharjeel
Choosing the value of k:
If too small, sensitive to noise points
If too large, neighbourhood may include points from other classes
Choose experimentally
Try a range of values (e.g., 1-n), select the most suitable (the one that gives highest
accuracy or minimum error rate)
To avoid a tie, in binary classification use an odd value for k
If there is a tie, decision is based on weighted distance
All neighbours are not all at equal distance
Closer neighbours are strong contenders than neighbours farther away
Wd = 1 / d2
CSC354 – Machine Learning Dr Muhammad Sharjeel
“Closeness” with neighbours is defined in terms of a distance metric
Many distance metrics are proposed
Euclidean distance
𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑝𝑖 − 𝑞𝑖 2
Manhattan distance
𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 |𝑝𝑖 − 𝑞𝑖|
Hamming distance
Used for categorical attributes
If p and q are same, d is equal to 0, otherwise, 1
CSC354 – Machine Learning Dr Muhammad Sharjeel
Heterogeneous attributes may have to be scaled to prevent distance measures
from being dominated by one of the attributes
Examples:
Height of a person may vary from 1.5m to 1.8m
Weight of a person may vary from 60KG to 100KG
Income of a person may vary from Rs 10K to Rs 200k
CSC354 – Machine Learning Dr Muhammad Sharjeel
Min-Max normalisation
Scale the values of a feature to a range between 0 and 1 by subtracting the
minimum value of the feature from each value, and then dividing by the range
of the feature
Can be used to transform value of v of a numeric attribute A to v0 in the
range [0, 1]
𝑣 − min(𝐴)
𝑣0 = max(𝐴) − min(𝐴)
For a different scale from 0 to 1, for example, 10 to 100
𝑣 − min(𝐹)
𝑣0 = max(𝐹) − min(𝐹)
𝑛𝑒𝑤𝑚𝑎𝑥(𝐹) − 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹) + 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Normalisation methods
z-score normalization
Scale the values of a feature to have a mean of 0 and a standard
deviation of 1, by subtracting the mean of the feature from each value,
and then dividing by the standard deviation
Decimal scaling
Scale the values of a feature by dividing the values of a feature by a
power of 10
Logarithmic transformation
Apply a logarithmic transformation to the values of a feature
Root transformation
Apply a square root transformation to the values of a feature
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes
Test data
No. x1 x2 Y
1 3 7 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes
K=3
Distance matric = Euclidean
Test data
No. x1 x2 Y
1 3 7 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (7-3)2 + (4-7)2 = √25 = 5 4
(3-3)2 + (4-7)2 = √9 = 3 1
Test data No. x1 x2 Y (1-3)2 + (4-7)2 = √13 = 3.6 2
1 3 7 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1
(1-3)2 + (4-7)2 = √13 = 3.6 2
Test data No. x1 x2 Y
1 3 7 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3 No
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1 Yes
(1-3)2 + (4-7)2 = √13 = 3.6 2 Yes
Test data No. x1 x2 Y
1 3 7 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example:
Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3 No
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1 Yes
(1-3)2 + (4-7)2 = √13 = 3.6 2 Yes
Test data No. x1 x2 Y
1 3 7 Yes
CSC354 – Machine Learning Dr Muhammad Sharjeel
Advantages:
No need to build a model (training data is itself a model)
No need to rebuild the model when adding new data
Interpretable, all the neighbors are well known
Good in predicting numeric values
More flexible (non-rectilinear) decision boundaries
Disadvantages:
Computationally expensive
Irrelevant or correlated values have high impact
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example: Apply K-NN to classify the two test instances given below using K = 3
Age Income Credit_Rating Response
35 35,000 3 No
22 50,000 2 Yes
63 200,000 1 No
59 170,000 1 No
25 40,000 4 Yes
37 50,000 2 ?
42 75,000 1 ?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example: K-NN Regression
CSC354 – Machine Learning Dr Muhammad Sharjeel
Test instance = point (7, 8)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Three points closest to point (7, 8) are
(7, 9), (5, 8), and (9, 7) with costs of 33, 30, and 32
Take the average (33+30+32)/3 = 31.67
CSC354 – Machine Learning Dr Muhammad Sharjeel
Thanks