0% found this document useful (0 votes)
60 views

K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan

The document discusses the k-nearest neighbor classifier, an instance-based machine learning algorithm. The k-NN classifier stores all available cases from the training data and classifies new cases based on a similarity measure (usually distance) to the k most similar cases in the training data, where k is a positive integer. It assigns the new case to the class that is most common among its k nearest neighbors. The k-NN classifier is considered a lazy learning algorithm since it does not explicitly build a model from the training data but instead simply stores instances of the training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan

The document discusses the k-nearest neighbor classifier, an instance-based machine learning algorithm. The k-NN classifier stores all available cases from the training data and classifies new cases based on a similarity measure (usually distance) to the k most similar cases in the training data, where k is a positive integer. It assigns the new case to the class that is most common among its k nearest neighbors. The k-NN classifier is considered a lazy learning algorithm since it does not explicitly build a model from the training data but instead simply stores instances of the training data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

1

K-Nearest Neighbor Classifier


This slide is modified from Dr. Tans slides. Thanks to Dr. Tan.







Instance-Based Classifiers
Atr1
...
AtrN Class
A
B
B
C
A
C
B
Set of Stored Cases
Atr1
...
AtrN
Unseen Case
Store the training records
Use training records to
predict the class label of
unseen cases
Instance Based Classifiers
: use specific training instances to make predictions without
having to maintain a model derived from the data

Lazy learners: Examples:
Rote-learner
Memorizes entire training data and performs classification only if
attributes of record match one of the training examples exactly
Nearest neighbor
Uses k closest points (nearest neighbors) for performing
classification

Nearest Neighbor Classifiers
Basic idea:
If it walks like a duck, quacks like a duck, then its
probably a duck
Training
Records
Test Record
Compute
Distance
Choose k of the
nearest records
Nearest-Neighbor Classifiers
l Requires three things
The set of stored records
Distance Metric to compute
distance between records
The value of k, the number of
nearest neighbors to retrieve

l To classify an unknown record:
Compute distance to other
training records
Identify k nearest neighbors
Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Unknown record
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points that
have the k smallest distance to x
Nearest Neighbor Classification
Compute distance between two points:
Euclidean distance



Determine the class from nearest neighbor list
take the majority vote of class labels among the k-
nearest neighbors
Weigh the vote according to distance
weight factor, w = 1/d
2

i
i i
q p q p d
2
) ( ) , (
Nearest Neighbor Classification
Choosing the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from
other classes
X
Nearest Neighbor Classification
Scaling issues
Attributes may have to be scaled to prevent
distance measures from being dominated by one
of the attributes
Example:
height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 90lb to 300lb
income of a person may vary from $10K to $1M
Nearest neighbor Classification
k-NN classifiers are lazy learners
It does not build models explicitly
Unlike eager learners such as decision tree
induction and rule-based systems
Classifying unknown records are relatively
expensive
quite susceptible to noise: based on local
information
Computing time for lazy learners and eager
learners

You might also like