K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
The document discusses the k-nearest neighbor classifier, an instance-based machine learning algorithm. The k-NN classifier stores all available cases from the training data and classifies new cases based on a similarity measure (usually distance) to the k most similar cases in the training data, where k is a positive integer. It assigns the new case to the class that is most common among its k nearest neighbors. The k-NN classifier is considered a lazy learning algorithm since it does not explicitly build a model from the training data but instead simply stores instances of the training data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
60 views
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
The document discusses the k-nearest neighbor classifier, an instance-based machine learning algorithm. The k-NN classifier stores all available cases from the training data and classifies new cases based on a similarity measure (usually distance) to the k most similar cases in the training data, where k is a positive integer. It assigns the new case to the class that is most common among its k nearest neighbors. The k-NN classifier is considered a lazy learning algorithm since it does not explicitly build a model from the training data but instead simply stores instances of the training data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11
1
K-Nearest Neighbor Classifier
This slide is modified from Dr. Tans slides. Thanks to Dr. Tan.
Instance-Based Classifiers Atr1 ... AtrN Class A B B C A C B Set of Stored Cases Atr1 ... AtrN Unseen Case Store the training records Use training records to predict the class label of unseen cases Instance Based Classifiers : use specific training instances to make predictions without having to maintain a model derived from the data
Lazy learners: Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses k closest points (nearest neighbors) for performing classification
Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then its probably a duck Training Records Test Record Compute Distance Choose k of the nearest records Nearest-Neighbor Classifiers l Requires three things The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve
l To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Unknown record Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x Nearest Neighbor Classification Compute distance between two points: Euclidean distance
Determine the class from nearest neighbor list take the majority vote of class labels among the k- nearest neighbors Weigh the vote according to distance weight factor, w = 1/d 2
i i i q p q p d 2 ) ( ) , ( Nearest Neighbor Classification Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes X Nearest Neighbor Classification Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M Nearest neighbor Classification k-NN classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree induction and rule-based systems Classifying unknown records are relatively expensive quite susceptible to noise: based on local information Computing time for lazy learners and eager learners