0% found this document useful (0 votes)
49 views28 pages

Lec 4

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It is a tree-like model of decisions, where each internal node represents a test on a feature, each branch denotes an outcome of the test, and each leaf node represents a class label or regression value. The algorithm splits data into subsets based on the feature that provides the maximum information gain, resulting in a hierarchical structure that is easy to interpret. Decision Trees are often used

Uploaded by

Mohammad Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views28 pages

Lec 4

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It is a tree-like model of decisions, where each internal node represents a test on a feature, each branch denotes an outcome of the test, and each leaf node represents a class label or regression value. The algorithm splits data into subsets based on the feature that provides the maximum information gain, resulting in a hierarchical structure that is easy to interpret. Decision Trees are often used

Uploaded by

Mohammad Ahmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CSC354

Machine Learning
Dr Muhammad Sharjeel
04
K-Nearest Neighbours
 What we have learnt (so far)
 Inductive step: Construction of model from data
 Deductive step: Applying the (derived) model to unseen data
 In DT (or rule based classifiers), the model(s) is constructed immediately once the
training data is provided
 Such models are called, eager learners
 Intend to learn as soon as possible (based on the training data)
 Eager Learning: Given a set of training instances, construct a model before
receiving new (i.e., test instances) data

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Lazy learners delay the process of generalising (learning) using the training
instances until it is needed to predict the test (unseen) instance(s)
 Also known as the instance-based learners
 Simply store the training instances and wait until given the test instance(s)
 Generalise only when provided with the test instance(s)
 Do less work when a training instance is presented
 More work when making a prediction
 One of the most famous type of lazy learner classifier is K-NN (K-Nearest
Neighbours)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Nearest Neighbour
 Are based on learning by analogy
 Compare a given test instance with the training instance(s) that are relatively similar to
the test instance
 These training instances are called nearest neighbours of the test instance
 The test instance is classified according to the class of its neighbours
 K-Nearest Neighbour
 When given a test instance, a K-NN classifier searches the pattern space
(training data) for the k training instances that are closest to the test
instance
 These k training instances are the k-nearest neighbours of the test instance

CSC354 – Machine Learning Dr Muhammad Sharjeel


 If it looks like a duck, swims like a duck, and quacks like a duck, then it
probably is a duck

CSC354 – Machine Learning Dr Muhammad Sharjeel


 KNN requires three things
 Set of training instances (input + output)
 Distance metric, to compute distance between records
 The value of k, the number of nearest neighbours to retrieve

CSC354 – Machine Learning Dr Muhammad Sharjeel


 K-NN can be used for classification as well as regression
 For classification:
 Test instance is assigned the most common class among its k-nearest neighbours
 For regression:
 Compute the average value associated with the k-nearest neighbours of the test
instance

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Algorithm:
 Let k be the number of nearest neighbours and D be the set of training instances
1. For each test instance z = (x’,y’) do
2. Compute d(x’,x), the distance between z and every example, (x,y) ε D
3. Select Dz ⊆ D, the set of k closest training instances to z
4. y’ = argmax𝑣 σ(𝑥𝑖,𝑦𝑖 )∈ 𝐷𝑧 𝐼 (𝑣 = 𝑦𝑖)
5. End for

CSC354 – Machine Learning Dr Muhammad Sharjeel


 To classify a test (unseen) instance, a K-NN classifier
 Compute distance to other training instances
 Identify k nearest neighbours
 Use class labels (target variable) of nearest neighbours to determine the class label of
the test instance (e.g., by taking majority vote)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 K-nearest neighbours of an instance X are data points that have the k smallest
distance to X

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Choosing the value of k:
 If too small, sensitive to noise points
 If too large, neighbourhood may include points from other classes
 Choose experimentally
 Try a range of values (e.g., 1-n), select the most suitable (the one that gives highest
accuracy or minimum error rate)
 To avoid a tie, in binary classification use an odd value for k
 If there is a tie, decision is based on weighted distance
 All neighbours are not all at equal distance
 Closer neighbours are strong contenders than neighbours farther away
 Wd = 1 / d2

CSC354 – Machine Learning Dr Muhammad Sharjeel


 “Closeness” with neighbours is defined in terms of a distance metric
 Many distance metrics are proposed
 Euclidean distance
 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑝𝑖 − 𝑞𝑖 2

 Manhattan distance
 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 |𝑝𝑖 − 𝑞𝑖|
 Hamming distance
 Used for categorical attributes
 If p and q are same, d is equal to 0, otherwise, 1

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Heterogeneous attributes may have to be scaled to prevent distance measures
from being dominated by one of the attributes
 Examples:
 Height of a person may vary from 1.5m to 1.8m
 Weight of a person may vary from 60KG to 100KG
 Income of a person may vary from Rs 10K to Rs 200k

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Min-Max normalisation
 Scale the values of a feature to a range between 0 and 1 by subtracting the
minimum value of the feature from each value, and then dividing by the range
of the feature
 Can be used to transform value of v of a numeric attribute A to v0 in the
range [0, 1]
𝑣 − min(𝐴)
 𝑣0 = max(𝐴) − min(𝐴)
 For a different scale from 0 to 1, for example, 10 to 100
𝑣 − min(𝐹)
 𝑣0 = max(𝐹) − min(𝐹)
𝑛𝑒𝑤𝑚𝑎𝑥(𝐹) − 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹) + 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Normalisation methods
 z-score normalization
 Scale the values of a feature to have a mean of 0 and a standard
deviation of 1, by subtracting the mean of the feature from each value,
and then dividing by the standard deviation
 Decimal scaling
 Scale the values of a feature by dividing the values of a feature by a
power of 10
 Logarithmic transformation
 Apply a logarithmic transformation to the values of a feature
 Root transformation
 Apply a square root transformation to the values of a feature

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes

 Test data
No. x1 x2 Y
1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes
K=3
Distance matric = Euclidean

 Test data
No. x1 x2 Y
1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (7-3)2 + (4-7)2 = √25 = 5 4
(3-3)2 + (4-7)2 = √9 = 3 1

 Test data No. x1 x2 Y (1-3)2 + (4-7)2 = √13 = 3.6 2


1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1
(1-3)2 + (4-7)2 = √13 = 3.6 2

 Test data No. x1 x2 Y


1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3 No
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1 Yes
(1-3)2 + (4-7)2 = √13 = 3.6 2 Yes

 Test data No. x1 x2 Y


1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3 No
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1 Yes
(1-3)2 + (4-7)2 = √13 = 3.6 2 Yes

 Test data No. x1 x2 Y


1 3 7 Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Advantages:
 No need to build a model (training data is itself a model)
 No need to rebuild the model when adding new data
 Interpretable, all the neighbors are well known
 Good in predicting numeric values
 More flexible (non-rectilinear) decision boundaries
 Disadvantages:
 Computationally expensive
 Irrelevant or correlated values have high impact

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example: Apply K-NN to classify the two test instances given below using K = 3

Age Income Credit_Rating Response


35 35,000 3 No
22 50,000 2 Yes
63 200,000 1 No
59 170,000 1 No
25 40,000 4 Yes
37 50,000 2 ?
42 75,000 1 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Example: K-NN Regression

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Test instance = point (7, 8)

CSC354 – Machine Learning Dr Muhammad Sharjeel


 Three points closest to point (7, 8) are
 (7, 9), (5, 8), and (9, 7) with costs of 33, 30, and 32
 Take the average (33+30+32)/3 = 31.67

CSC354 – Machine Learning Dr Muhammad Sharjeel


Thanks

You might also like