0% found this document useful (0 votes)
113 views3 pages

KNN Algorithm Process and Implementation

The document describes the process and parameters for K-Nearest Neighbors (KNN) machine learning algorithm. KNN is a supervised learning algorithm that can be used for classification or regression tasks. It works by finding the K closest training examples in the feature space and predicting the class based on a majority vote of its neighbors. The document outlines the steps for KNN including importing libraries, instantiating variables, fitting the model to training data, and using the model to make predictions on new data. It also describes different distance metrics and parameters that can be configured for the KNN model.

Uploaded by

Joe1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views3 pages

KNN Algorithm Process and Implementation

The document describes the process and parameters for K-Nearest Neighbors (KNN) machine learning algorithm. KNN is a supervised learning algorithm that can be used for classification or regression tasks. It works by finding the K closest training examples in the feature space and predicting the class based on a majority vote of its neighbors. The document outlines the steps for KNN including importing libraries, instantiating variables, fitting the model to training data, and using the model to make predictions on new data. It also describes different distance metrics and parameters that can be configured for the KNN model.

Uploaded by

Joe1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

PROCESS OF ML CODE/ALGORITHM

Import  Instantiate Fit  Transform/Predict

 Import the required/necessary libraries/modules


 Instantiate the required variable
 Fit the input/independent variable with the output/dependent variables

KNN Algorithm

 Supervised
 Lazy learning algorithm – It completely learns from training data only
 KNN Classifier sets the benchmark for other complex classifiers such as Artificial neural networks
(ANN) and support vector machines (SVM)
 It is slow
 Good domain expert on the team is a must in order to use KNN in order to select appropriate set
of features/training set
 KNN can be used for classification as well as regression
 Whenever we have a new point to classify we find the K nearest neighbors from the training
data set.
 The distance calculation in KNN can be done in 3 way:
 Euclidean distance
 Minkowski distance
 Mahalanobis distance
 Output of KNN is predicted class

KNN TYPE I – INPUT TEST SAMPLE METHOD:

Entire dataset is fed to the model as training dataset  a single test sample input/ feature provided by
user is provided to the model  Test sample is matched against each training data  a class is
predicted for test sample based on the comparison and the distance (nearest)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=1, n_neighbors=3, p=2, weights='uniform')
 algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

Algorithm used to compute the nearest neighbors:

 ‘ball_tree’ will use BallTree


 ‘kd_tree’ will use KDTree
 ‘brute’ will use a brute-force search.
 ‘auto’ will attempt to decide the most appropriate algorithm based on the
values passed to fit method.

 leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree. This can affect the speed of the
construction and query, as well as the memory required to store the tree. The optimal
value depends on the nature of the problem.

 metric : string or callable, default ‘minkowski’

The distance metric to use for the tree. The default metric is minkowski, and with p=2 is
equivalent to the standard Euclidean metric.

 metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

 n_jobs : int, optional (default = 1)

The number of parallel jobs to run for neighbor’s search. If -1, then the number of jobs
is set to the number of CPU cores. Doesn’t affect fit method.

 n_neighbors : int, optional (default = 5)

Number of neighbors to use by default for kneighbors queries.

An odd value is preferred to avoid tie.

 p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using
manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p,
minkowski_distance (l_p) is used.

 weights : str or callable, optional (default = ‘uniform’)


weight function used in prediction. Possible values:

 ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
 ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors
of a query point will have a greater influence than neighbors which are further away.
 [callable] : a user-defined function which accepts an array of distances, and returns an
array of the same shape containing the weights.

You might also like