0% found this document useful (0 votes)

49 views28 pages

Lec 4

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It is a tree-like model of decisions, where each internal node represents a test on a feature, each branch denotes an outcome of the test, and each leaf node represents a class label or regression value. The algorithm splits data into subsets based on the feature that provides the maximum information gain, resulting in a hierarchical structure that is easy to interpret. Decision Trees are often used

Uploaded by

Mohammad Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views28 pages

Lec 4

Uploaded by

Mohammad Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

CSC354

Machine Learning
Dr Muhammad Sharjeel
04
K-Nearest Neighbours
 What we have learnt (so far)
 Inductive step: Construction of model from data
 Deductive step: Applying the (derived) model to unseen data
 In DT (or rule based classifiers), the model(s) is constructed immediately once the
training data is provided
 Such models are called, eager learners
 Intend to learn as soon as possible (based on the training data)
 Eager Learning: Given a set of training instances, construct a model before
receiving new (i.e., test instances) data

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Lazy learners delay the process of generalising (learning) using the training
instances until it is needed to predict the test (unseen) instance(s)
 Also known as the instance-based learners
 Simply store the training instances and wait until given the test instance(s)
 Generalise only when provided with the test instance(s)
 Do less work when a training instance is presented
 More work when making a prediction
 One of the most famous type of lazy learner classifier is K-NN (K-Nearest
Neighbours)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Nearest Neighbour
 Are based on learning by analogy
 Compare a given test instance with the training instance(s) that are relatively similar to
the test instance
 These training instances are called nearest neighbours of the test instance
 The test instance is classified according to the class of its neighbours
 K-Nearest Neighbour
 When given a test instance, a K-NN classifier searches the pattern space
(training data) for the k training instances that are closest to the test
instance
 These k training instances are the k-nearest neighbours of the test instance

CSC354 – Machine Learning Dr Muhammad Sharjeel

 If it looks like a duck, swims like a duck, and quacks like a duck, then it
probably is a duck

CSC354 – Machine Learning Dr Muhammad Sharjeel

 KNN requires three things
 Set of training instances (input + output)
 Distance metric, to compute distance between records
 The value of k, the number of nearest neighbours to retrieve

CSC354 – Machine Learning Dr Muhammad Sharjeel

 K-NN can be used for classification as well as regression
 For classification:
 Test instance is assigned the most common class among its k-nearest neighbours
 For regression:
 Compute the average value associated with the k-nearest neighbours of the test
instance

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Algorithm:
 Let k be the number of nearest neighbours and D be the set of training instances
1. For each test instance z = (x’,y’) do
2. Compute d(x’,x), the distance between z and every example, (x,y) ε D
3. Select Dz ⊆ D, the set of k closest training instances to z
4. y’ = argmax𝑣 σ(𝑥𝑖,𝑦𝑖 )∈ 𝐷𝑧 𝐼 (𝑣 = 𝑦𝑖)
5. End for

CSC354 – Machine Learning Dr Muhammad Sharjeel

 To classify a test (unseen) instance, a K-NN classifier
 Compute distance to other training instances
 Identify k nearest neighbours
 Use class labels (target variable) of nearest neighbours to determine the class label of
the test instance (e.g., by taking majority vote)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 K-nearest neighbours of an instance X are data points that have the k smallest
distance to X

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Choosing the value of k:
 If too small, sensitive to noise points
 If too large, neighbourhood may include points from other classes
 Choose experimentally
 Try a range of values (e.g., 1-n), select the most suitable (the one that gives highest
accuracy or minimum error rate)
 To avoid a tie, in binary classification use an odd value for k
 If there is a tie, decision is based on weighted distance
 All neighbours are not all at equal distance
 Closer neighbours are strong contenders than neighbours farther away
 Wd = 1 / d2

CSC354 – Machine Learning Dr Muhammad Sharjeel

 “Closeness” with neighbours is defined in terms of a distance metric
 Many distance metrics are proposed
 Euclidean distance
 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑝𝑖 − 𝑞𝑖 2

 Manhattan distance
 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 |𝑝𝑖 − 𝑞𝑖|
 Hamming distance
 Used for categorical attributes
 If p and q are same, d is equal to 0, otherwise, 1

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Heterogeneous attributes may have to be scaled to prevent distance measures
from being dominated by one of the attributes
 Examples:
 Height of a person may vary from 1.5m to 1.8m
 Weight of a person may vary from 60KG to 100KG
 Income of a person may vary from Rs 10K to Rs 200k

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Min-Max normalisation
 Scale the values of a feature to a range between 0 and 1 by subtracting the
minimum value of the feature from each value, and then dividing by the range
of the feature
 Can be used to transform value of v of a numeric attribute A to v0 in the
range [0, 1]
𝑣 − min(𝐴)
 𝑣0 = max(𝐴) − min(𝐴)
 For a different scale from 0 to 1, for example, 10 to 100
𝑣 − min(𝐹)
 𝑣0 = max(𝐹) − min(𝐹)
𝑛𝑒𝑤𝑚𝑎𝑥(𝐹) − 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹) + 𝑛𝑒𝑤𝑚𝑖𝑛(𝐹)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Normalisation methods
 z-score normalization
 Scale the values of a feature to have a mean of 0 and a standard
deviation of 1, by subtracting the mean of the feature from each value,
and then dividing by the standard deviation
 Decimal scaling
 Scale the values of a feature by dividing the values of a feature by a
power of 10
 Logarithmic transformation
 Apply a logarithmic transformation to the values of a feature
 Root transformation
 Apply a square root transformation to the values of a feature

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example:
 Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes

 Test data
No. x1 x2 Y
1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example:
 Training data
No. x1 x2 Y
1 7 7 No
2 7 4 No
3 3 4 Yes
4 1 4 Yes
K=3
Distance matric = Euclidean

 Test data
No. x1 x2 Y
1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (7-3)2 + (4-7)2 = √25 = 5 4
(3-3)2 + (4-7)2 = √9 = 3 1

 Test data No. x1 x2 Y (1-3)2 + (4-7)2 = √13 = 3.6 2

1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1
(1-3)2 + (4-7)2 = √13 = 3.6 2

 Test data No. x1 x2 Y

1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example:
 Training data No. x1 x2 Y
K=3
1 7 7 No Distance matric = Euclidean
2 7 4 No
3 3 4 Yes (7-3)2 + (7-7)2 = √16 = 4 3 No
4 1 4 Yes (3-3)2 + (4-7)2 = √9 = 3 1 Yes
(1-3)2 + (4-7)2 = √13 = 3.6 2 Yes

 Test data No. x1 x2 Y

1 3 7 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test data No. x1 x2 Y

1 3 7 Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Advantages:
 No need to build a model (training data is itself a model)
 No need to rebuild the model when adding new data
 Interpretable, all the neighbors are well known
 Good in predicting numeric values
 More flexible (non-rectilinear) decision boundaries
 Disadvantages:
 Computationally expensive
 Irrelevant or correlated values have high impact

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example: Apply K-NN to classify the two test instances given below using K = 3

Age Income Credit_Rating Response

35 35,000 3 No
22 50,000 2 Yes
63 200,000 1 No
59 170,000 1 No
25 40,000 4 Yes
37 50,000 2 ?
42 75,000 1 ?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example: K-NN Regression

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test instance = point (7, 8)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Three points closest to point (7, 8) are
 (7, 9), (5, 8), and (9, 7) with costs of 33, 30, and 32
 Take the average (33+30+32)/3 = 31.67

CSC354 – Machine Learning Dr Muhammad Sharjeel

Thanks

k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ML-LECTURE9 KNN Classification
No ratings yet
ML-LECTURE9 KNN Classification
23 pages
Week 07
No ratings yet
Week 07
24 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Nearest Neighbor Classifier Guide
No ratings yet
Nearest Neighbor Classifier Guide
16 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
Lecture-11-KNearest Clustering-Part-1
No ratings yet
Lecture-11-KNearest Clustering-Part-1
18 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Ch2 - Lec2 - K Nearest Neighbour (KNN)
No ratings yet
Ch2 - Lec2 - K Nearest Neighbour (KNN)
18 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
K Nearest Neighbour
100% (1)
K Nearest Neighbour
35 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Ue21cs352a 20230830121009
No ratings yet
Ue21cs352a 20230830121009
42 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
k-NN Algorithm: Basics, Applications, and Advantages
No ratings yet
k-NN Algorithm: Basics, Applications, and Advantages
42 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
Nearest-Neighbor Classifier Guide
No ratings yet
Nearest-Neighbor Classifier Guide
2 pages
PowerPoint Presentation - KNN Presentation
No ratings yet
PowerPoint Presentation - KNN Presentation
16 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Cs4758 KNN Lectureslides
No ratings yet
Cs4758 KNN Lectureslides
34 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
73 pages
ML DSBA Lab4
No ratings yet
ML DSBA Lab4
5 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
KNN Algorithm Overview & Steps
No ratings yet
KNN Algorithm Overview & Steps
27 pages
ML Unit - 2
No ratings yet
ML Unit - 2
85 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
KNN Basics for Machine Learning Beginners
100% (1)
KNN Basics for Machine Learning Beginners
8 pages
Nearest Neighbor Algorithms Guide
No ratings yet
Nearest Neighbor Algorithms Guide
26 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
ML KN
No ratings yet
ML KN
12 pages
Machine Learning Course Outline
No ratings yet
Machine Learning Course Outline
50 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Lec 23 - 24 KNN
No ratings yet
Lec 23 - 24 KNN
25 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
Classification (K-Nearest Neighbor)
No ratings yet
Classification (K-Nearest Neighbor)
22 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
K-Nearest Neighbors Guide
No ratings yet
K-Nearest Neighbors Guide
25 pages
Shivwangi Banerjee - ML
No ratings yet
Shivwangi Banerjee - ML
10 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Lecture 3 - KNN Algorithm
No ratings yet
Lecture 3 - KNN Algorithm
28 pages
Lecture 18
No ratings yet
Lecture 18
62 pages
Understanding CPU Memory & Cache
No ratings yet
Understanding CPU Memory & Cache
41 pages
Lecture 19
No ratings yet
Lecture 19
50 pages
Lecture 12
No ratings yet
Lecture 12
73 pages
Lecture 14
No ratings yet
Lecture 14
45 pages
Lecture 17
No ratings yet
Lecture 17
49 pages
Lecture 16
No ratings yet
Lecture 16
43 pages
Lecture 8
No ratings yet
Lecture 8
37 pages
Lecture 10
No ratings yet
Lecture 10
44 pages
Lecture 7
No ratings yet
Lecture 7
40 pages
Intro to Storage Devices
No ratings yet
Intro to Storage Devices
56 pages
Lecture 9
No ratings yet
Lecture 9
44 pages
Lecture 4
No ratings yet
Lecture 4
40 pages
Lecture 1
No ratings yet
Lecture 1
53 pages
Lecture 5
No ratings yet
Lecture 5
40 pages
Lecture 6
No ratings yet
Lecture 6
39 pages
Lecture 3
No ratings yet
Lecture 3
43 pages
Lec 1
No ratings yet
Lec 1
52 pages
Lab 11 Updated
No ratings yet
Lab 11 Updated
6 pages
Insider Trading: India vs USA
100% (1)
Insider Trading: India vs USA
13 pages
Testbank & Ebook College Algebra 11th Edition Larson Instant
No ratings yet
Testbank & Ebook College Algebra 11th Edition Larson Instant
17 pages
Biometric Reuse
No ratings yet
Biometric Reuse
1 page
Challenges and Opportunities For Online Education in India
No ratings yet
Challenges and Opportunities For Online Education in India
5 pages
Pamflet CPHI 2018
No ratings yet
Pamflet CPHI 2018
1 page
Cell: The Building Blocks of Life: Awaluddin, M.Kes
No ratings yet
Cell: The Building Blocks of Life: Awaluddin, M.Kes
39 pages
Akash Padhiyar Profile
No ratings yet
Akash Padhiyar Profile
2 pages
DIFAL Calculation
100% (1)
DIFAL Calculation
8 pages
Vectors Notes
No ratings yet
Vectors Notes
13 pages
Connection 07
No ratings yet
Connection 07
17 pages
CR-1010 2ND Basement Plan
No ratings yet
CR-1010 2ND Basement Plan
1 page
Untitled
No ratings yet
Untitled
3 pages
Ways To Heal - Mind & Body
No ratings yet
Ways To Heal - Mind & Body
16 pages
Classic Metallic Brochure 2010
No ratings yet
Classic Metallic Brochure 2010
24 pages
Ysr Designs-14561
No ratings yet
Ysr Designs-14561
3 pages
Orrifice
No ratings yet
Orrifice
5 pages
US Apparel Market Forecast 2024
No ratings yet
US Apparel Market Forecast 2024
24 pages
Yeast - WPS Office
No ratings yet
Yeast - WPS Office
4 pages
Future Literacy 30-1 Vocab Practice
No ratings yet
Future Literacy 30-1 Vocab Practice
128 pages
Arson Investigation
No ratings yet
Arson Investigation
36 pages
Reid Hoffman On Musk, AI and The Future of Humanity
No ratings yet
Reid Hoffman On Musk, AI and The Future of Humanity
16 pages
Pitched Roofs
No ratings yet
Pitched Roofs
9 pages
TOEIC Test Prep Course Syllabus
No ratings yet
TOEIC Test Prep Course Syllabus
10 pages
Respiratory System - History and Physical Examination
No ratings yet
Respiratory System - History and Physical Examination
22 pages
BoardingCard 345631232 VNO BVA
No ratings yet
BoardingCard 345631232 VNO BVA
1 page
Laptop Guide PDF
No ratings yet
Laptop Guide PDF
19 pages
Vendor/Contractor Data Form
No ratings yet
Vendor/Contractor Data Form
6 pages
CV
No ratings yet
CV
5 pages
02 Second Law
No ratings yet
02 Second Law
61 pages

Lec 4

Uploaded by

Lec 4

Uploaded by

CSC354

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test data No. x1 x2 Y (1-3)2 + (4-7)2 = √13 = 3.6 2

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test data No. x1 x2 Y

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test data No. x1 x2 Y

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Test data No. x1 x2 Y

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

Age Income Credit_Rating Response

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

CSC354 – Machine Learning Dr Muhammad Sharjeel

You might also like