0% found this document useful (0 votes)

19 views

AIML PPT[1]

The document provides an introduction to the K-Nearest Neighbours (KNN) algorithm, detailing its non-parametric nature, lazy learning approach, and the steps involved in its operation, including storing the dataset, calculating distances, selecting nearest neighbours, and making predictions. It discusses the importance of choosing the right distance metric and the optimal value of 'K', along with practical considerations for implementation, evaluation metrics for classification and regression, and specific use cases in India. The document also highlights the strengths and weaknesses of KNN, emphasizing the need for careful data preprocessing and optimization techniques.

Uploaded by

vansh4835

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

AIML PPT[1]

Uploaded by

vansh4835

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

K-Nearest Neighbours

(KNN): An Introduction
In this presentation, we'll explore the K-Nearest Neighbours
algorithm, a simple yet powerful tool for both classification and
regression tasks. KNN is non-parametric, meaning it makes no
assumptions about the underlying data distribution. It's also a lazy
learner, as it doesn't explicitly learn a model during the training
phase. Instead, it stores the entire dataset and performs
calculations at the time of prediction. Join us as we unravel the
intricacies of KNN and discover its potential!
KNN (How KNN Works?)
1 Store the Dataset
The algorithm begins by storing the entire dataset, which will serve as its
reference point for future predictions.

2 Calculate Distances
When a new data point arrives, KNN calculates the distance between this point
and every other point in the existing dataset.

3 Select K-Nearest Neighbours

Based on the calculated distances, the algorithm selects the 'K' nearest
neighbours to the new data point.

4 Assign Class or Predict Value

For classification, KNN assigns the class label based on the majority class among
the K neighbours. For regression, it predicts the value based on the average (or
weighted average) of the K neighbours.

For example, imagine you have a new customer. KNN helps decide which customer
segment they belong to based on the characteristics of their nearest neighbours in the
dataset.
Distance Metrics:
(Measuring Distances)
Euclidean Manhatta Minkowsk Hamming
Distance n i Distance Distance
Distance
The straight- The sum of A The number
line distance the absolute generalizatio of positions
between two differences n of both at which the
points in between the Euclidean correspondin
Euclidean coordinates and g symbols
space. of two points. Manhattan are different
distances. (for
categorical
data).

Choosing the right distance metric is crucial for KNN's performance,

and it depends on the nature of your data. Each metric captures
distance in a unique way, impacting the algorithm's decision-making
process.
Choosing the Right 'K': Neighbors? (How Many Nei
Small K
Sensitive to noise, leading to a complex and potentially overfitting decision boundary. With too few neighbors, even a
single noisy data point can significantly influence the classification.

Large K
Smoother, more generalized decision boundary, but may misclassify data points, especially in regions with local
variations. It can mask minority classes.

Optimal K
Balances noise sensitivity and misclassification potential, creating a robust and accurate model. This is the sweet spot.

Finding the right 'K' is essential for KNN's success. As a rule of thumb, K can be set to the square root of the number of data points.
Techniques like cross-validation (such as k-fold cross-validation) and the Elbow method can help you find the optimal K for your
specific dataset. For example, if you're dealing with sparse customer data with many outliers, a slightly larger 'K' might be better to
reduce the impact of those outliers. Conversely, for dense datasets where local patterns are important, a smaller 'K' could be more
appropriate.
KNN Implementation: Practical Considerations
Data Pre-processing Categorical Features Missing Values
Scaling and normalisation are critical Use one-hot encoding to convert Employ imputation techniques to
to prevent features with larger categorical features into numerical handle missing values in your
ranges from dominating the distance data. Use one-hot encoding to dataset. Use imputation techniques
calculation. Data pre-processing, convert categorical features into to handle missing values in your
scaling and normalisation are very numerical data. dataset. This maintains the quality of
important. This ensures that features the data and ensures the algorithm
with very large values do not gives correct results.
dominate the distance calculation.

Distance Metrics
Experiment with different metrics like Euclidean, Manhattan, or Minkowski distance to find the one that best fits your data.
The choice of distance metric is very important for the performance of KNN.

Keep in mind the computational complexity of KNN, which can be slow for large datasets. Consider KD-trees and ball trees for
optimisation. Keep in mind the computational complexity of KNN, which can be slow for large datasets. Consider KD-trees and
ball trees for optimisation.
Comparison with Other Algorithms
Algorithm Pros Cons

KNN Simple to implement, versatile, no Computationally expensive, sensitive

training phase, adapts to new data to irrelevant features, requires large
memory for storing training data

Decision Tree Easy to interpret, handles non-linear Overfitting, high variance, can be
data, can capture complex unstable, biased towards dominant
relationships, requires minimal data classes
preparation
SVM Effective in high dimensional spaces, Difficult to interpret, parameter
memory efficient, robust to outliers, tuning required, can be
good generalization capabilities computationally intensive, sensitive
to kernel selection
Logistic Regression Simple, efficient, easy to interpret, Assumes linearity, sensitive to
provides probability estimates, outliers, can underperform with
computationally inexpensive complex data, requires careful
feature engineering
Evaluation Metrics for
Classification:
Accuracy Precision Recall
Overall Ability to Ability to
correctness of predict positive identify all
the model. outcomes positive
correctly. outcomes.

F1-Score
Harmonic mean
of precision and
recall.

Use a Confusion Matrix to visualize correct and incorrect predictions.

In real-world scenarios, such as predicting loan defaults, focus on
recall to minimize false negatives. These metrics provide valuable
insights into your model's performance.
Evaluation Metrics for
Regression

MSE RMSE
Mean Squared Error Root Mean Squared Error

MAE
Mean Absolute Error

R-squared, also known as the coefficient of determination, measures the

explained variance in your regression model. If you're predicting house prices,
RMSE tells you the average prediction error in rupees, helping you understand
the model's accuracy in a practical context. These metrics provide a
comprehensive evaluation of regression model performance.
KNN in Action: Use Cases in India

Healthcare Finance
Disease prediction based on Credit risk assessment and fraud
1 2
patient data. detection.

Agriculture E-commerce
Crop yield prediction and soil 4 3 Recommendation systems and
classification. customer segmentation.

Specific to India, KNN can be used to predict monsoon patterns based on historical weather data, assisting farmers
in making informed decisions about planting and harvesting. Its adaptability makes it a valuable tool across diverse
sectors.
Strengths and Weaknesses of KNN
Advantages Disadvantages

• Simple to understand and implement. • Computationally expensive for large datasets.

• Versatile: Can be used for classification and regression. • Sensitive to irrelevant features.
• No training period required. • Optimal value of K is data-dependent.
• Impact of imbalanced data.

Mitigate weaknesses with data preprocessing and optimizations like KD-trees and feature selection. Acknowledging
these limitations is crucial for effective application.
Limitations and Key Takeaways
K-Nearest Neighbors (KNN) faces several limitations that must be addressed for effective real-world application, especially within the Indian c

• High Computational Cost: KNN's prediction time grows linearly with dataset size because it calculates distances to every point.
For large Indian datasets like census data or nationwide transaction records, this becomes impractical without optimizations.
• Memory Intensive: Storing the entire training dataset is memory-intensive. High-dimensional datasets such as images require significan
• Sensitivity to Feature Scaling: Features with larger scales dominate distance calculations. Inconsistent units can skew results;
standardize income (INR) and expenditure (categorical) before calculating distances.

Key Takeaways:

• KNN is intuitive for classification and regression but requires careful tuning.
• Choose K based on your dataset size and validation performance. Different distance metrics suit different data types (Euclidean
for continuous, Hamming for categorical).
• Preprocess data: scale features, handle missing values, and reduce dimensionality.
• KNN is powerful when used correctly, but awareness of limitations and mitigation strategies are vital for successful deployment.
Team Members

• HRISHIKA BHATNAGAR – BTF/22/141

• VANSH SHARMA – BTF/22/161

• ISHAAN GARG – BTF/22/151

• ARMAAN SINGH – BTF/22/156

• AMRIT KUMAR RAO – BTF/22/160

• KESHAV MITTAL – BTF/22/157

BSC CS Shell Programming
No ratings yet
BSC CS Shell Programming
33 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
KNN
No ratings yet
KNN
29 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
KNN
No ratings yet
KNN
53 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Shubh
No ratings yet
Shubh
10 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
Intro to Knn
No ratings yet
Intro to Knn
8 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
KNN
No ratings yet
KNN
3 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Mastering K-Nearest Neighbors (KNN) For Accurate Predictions
No ratings yet
Mastering K-Nearest Neighbors (KNN) For Accurate Predictions
18 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
Notes: KNN: K-Nearest Neighbors
No ratings yet
Notes: KNN: K-Nearest Neighbors
4 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
26. K Nearest Neighbor
No ratings yet
26. K Nearest Neighbor
32 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Lecture Note #3_PEC-CS701E
No ratings yet
Lecture Note #3_PEC-CS701E
27 pages
K-Nearest Neighbor (KNN) ..: Class or Value
No ratings yet
K-Nearest Neighbor (KNN) ..: Class or Value
18 pages
Machine learning note 4
No ratings yet
Machine learning note 4
2 pages
m3 final-1
No ratings yet
m3 final-1
171 pages
S3-K-Nearest-Neighbor-LKW-15Jan2025
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
16 pages
KNN 2
No ratings yet
KNN 2
53 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Lec 23 - 24 kNN
No ratings yet
Lec 23 - 24 kNN
25 pages
k-Nearest Neighbors (k-NN) Algorithm
No ratings yet
k-Nearest Neighbors (k-NN) Algorithm
10 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
KNN Using Python
No ratings yet
KNN Using Python
23 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
4+KNN+Classifier
No ratings yet
4+KNN+Classifier
6 pages
06-knn
No ratings yet
06-knn
41 pages
K Nearest neighbour’s(knn)[1] using R
No ratings yet
K Nearest neighbour’s(knn)[1] using R
9 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
s3 1080p Full HD Camera 2 in 1 Charger Manual
No ratings yet
s3 1080p Full HD Camera 2 in 1 Charger Manual
8 pages
Swami Deshmukh Resume
No ratings yet
Swami Deshmukh Resume
3 pages
Customizing WP Login and Logout Commands
No ratings yet
Customizing WP Login and Logout Commands
10 pages
Cfadisk
No ratings yet
Cfadisk
9 pages
Lec 7 - RSA
No ratings yet
Lec 7 - RSA
41 pages
Reading Material 2 Lesson 2
No ratings yet
Reading Material 2 Lesson 2
15 pages
Cybersecurity Fundamentals Specialist by ISA Actual Free Exam Q&As - ITExams.com_5
No ratings yet
Cybersecurity Fundamentals Specialist by ISA Actual Free Exam Q&As - ITExams.com_5
2 pages
505DR - Product Spec
No ratings yet
505DR - Product Spec
5 pages
Prepare Materials and Tools Used For Configuration
No ratings yet
Prepare Materials and Tools Used For Configuration
33 pages
Sel 787 Хамгаалалт 2
No ratings yet
Sel 787 Хамгаалалт 2
16 pages
SD Lab Manual Kushagra Mehrotra A117
No ratings yet
SD Lab Manual Kushagra Mehrotra A117
62 pages
Trial 4G TNL Congestion Control
100% (3)
Trial 4G TNL Congestion Control
3 pages
PASWI Membership Form 4
No ratings yet
PASWI Membership Form 4
1 page
An Introduction To G Code
100% (1)
An Introduction To G Code
50 pages
Term-I Project
No ratings yet
Term-I Project
8 pages
Adva FSP 150CC-GE201 Data Sheet
No ratings yet
Adva FSP 150CC-GE201 Data Sheet
2 pages
NAT User Guide
No ratings yet
NAT User Guide
11 pages
ProblemsetRPC03
No ratings yet
ProblemsetRPC03
19 pages
Paper Class 12 Computer Science - 045937
No ratings yet
Paper Class 12 Computer Science - 045937
3 pages
Implementation of Matched Filters
No ratings yet
Implementation of Matched Filters
5 pages
Unit 3 Digital Empowerment
No ratings yet
Unit 3 Digital Empowerment
13 pages
I510 Manual
No ratings yet
I510 Manual
536 pages
UNIT 4 OCI
No ratings yet
UNIT 4 OCI
51 pages
C++ Bible
No ratings yet
C++ Bible
77 pages
310 Navtex JRC NCR 333 Instruct Manual PC Software 24 4 2006 - 1552307601 - c2052dc1
No ratings yet
310 Navtex JRC NCR 333 Instruct Manual PC Software 24 4 2006 - 1552307601 - c2052dc1
8 pages
CH 31 - SQL Server 2019 Beginner's Guide, 7E - 1260458873
No ratings yet
CH 31 - SQL Server 2019 Beginner's Guide, 7E - 1260458873
23 pages
PVSystem Design Tool
No ratings yet
PVSystem Design Tool
215 pages
Einstein: The Einstein E640 Flash Unit by Paul C. Buff, Inc. User Manual
No ratings yet
Einstein: The Einstein E640 Flash Unit by Paul C. Buff, Inc. User Manual
24 pages
Huwuawei - OptiXstar P612E Datasheet PDF
No ratings yet
Huwuawei - OptiXstar P612E Datasheet PDF
3 pages

AIML PPT[1]

Uploaded by

AIML PPT[1]

Uploaded by

K-Nearest Neighbours

3 Select K-Nearest Neighbours

4 Assign Class or Predict Value

Choosing the right distance metric is crucial for KNN's performance,

KNN Simple to implement, versatile, no Computationally expensive, sensitive

Use a Confusion Matrix to visualize correct and incorrect predictions.

R-squared, also known as the coefficient of determination, measures the

• Simple to understand and implement. • Computationally expensive for large datasets.

• HRISHIKA BHATNAGAR – BTF/22/141

• VANSH SHARMA – BTF/22/161

• ISHAAN GARG – BTF/22/151

• ARMAAN SINGH – BTF/22/156

• AMRIT KUMAR RAO – BTF/22/160

• KESHAV MITTAL – BTF/22/157

You might also like