0% found this document useful (0 votes)
24 views59 pages

07 - KNN & Naive Bayes

The document explains two machine learning models: K-Nearest Neighbors (K-NN) and Naive Bayes. K-NN classifies new data points based on the majority label of their nearest neighbors, while Naive Bayes uses Bayes' Theorem to predict class probabilities based on feature independence. Both models are simple yet effective for classification tasks, with K-NN focusing on distance metrics and Naive Bayes leveraging prior probabilities and likelihoods.

Uploaded by

david1milad1982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views59 pages

07 - KNN & Naive Bayes

The document explains two machine learning models: K-Nearest Neighbors (K-NN) and Naive Bayes. K-NN classifies new data points based on the majority label of their nearest neighbors, while Naive Bayes uses Bayes' Theorem to predict class probabilities based on feature independence. Both models are simple yet effective for classification tasks, with K-NN focusing on distance metrics and Naive Bayes leveraging prior probabilities and likelihoods.

Uploaded by

david1milad1982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

K - Nearest Neighbor Model

&&
Naive Bayes Model
Nearest Neighbor
➢ One of the simplest of all machine
learning classifiers
➢ label a new point the same as the
closest known point

Label it
red.
Nearest Neighbor
How Does the K-Nearest Neighbors Algorithm Work?

The K-NN algorithm compares a new data entry to the values in a given

data set (with different classes or categories).

Based on its closeness or similarities in a given range (K) of neighbors, the

algorithm assigns the new data to a class or category in the data set

(training data).
Nearest Neighbor
Step #1 - Assign a value to K.

Step #2 - Calculate the distance between the new data entry and all other

existing data entries (you'll learn how to do this shortly). Arrange them in

ascending order.

Step #3 - Find the K nearest neighbors to the new entry based on the

calculated distances.

Step #4 - Assign the new data entry to the majority class in the nearest

neighbors.
Nearest Neighbor

K-nearest neighbor: predict


based on K closest training
samples
x
x
x o
x x x
+ o
o x
x
o o+
o
o
x2
x1
Nearest Neighbor

1-nearest
neighbor
x
x
x o
x x x
+ o
o x
o x
o+
o
o
x2
x1
Nearest Neighbor

3-nearest
neighbor
x
x
x o
x x x
+ o
o x
o x
o+
o
o
x2
x1
Nearest Neighbor

5-nearest
neighbor
x
x
x o
x x x
+ o
o x
x
o o+
o
o
x2
x1
k – Nearest Neighbor
◼ Generalizes 1-NN to smooth away
noise in the labels
◼ A new point is now assigned the
most frequent label of its k
nearest neighbors

Label it red, when k = 3

Label it blue, when


k=7
Distance Metrics
◼ Different metrics can change the decision
surface

Dist(a,b) =(a1 – b1)2 + (a2 – b2)2 Dist(a,b) =(a1 – b1)2 + (3a2 – 3b2)2

◼ Standard Euclidean distance metric:


◼ Two-‐dimensional: Dist(a,b) = sqrt((a 1 – b1) 2 + (a2 – b2)2)
◼ Multivariate: Dist(a,b) = sqrt(∑ (ai – bi)2)
Three Aspects of an
Instance-‐Based Learner:

❑ A distance metric
❑ How many nearby neighbors to
look at?
❑ How to fit with the local points?
1-NN’s Three Aspects of an
Instance-Based Learner:
❑ A distance metric
❑ Euclidian
❑ How many nearby neighbors t o look at?
❑ One
❑ How t o f i t w i t h the local points?
❑ Just predict the same output as the
nearest neighbor.
Example on classification
◼ First we calculated distance for each one.

◼ Then we assumed k =4

◼ Then to predict E find least distance form 4 points, so it will be A.

◼ So prediction of E will be Bad


Another example on classification
Euclidean distance formula.
As you can see above, the majority class within the 5 nearest neighbors to
the new entry is Red. Therefore, we'll classify the new entry as Red.
Example on Regression
◼ First we calculated distance for each one.

◼ Then we assumed k =3

◼ Then to predict 48 find average value for HPI for least 3 values.
Example on Regression
◼ First we calculated distance for each one.

◼ Then we assumed k =3

◼ Then to predict 48 find average value for HPI for least 3 values.
Example
Example
Example
Bayesian Algorithm
Naive Bayes Model
What is Bayesian Algorithm ?
➢ Bayesian Algorithm is a classification technique support by Bayes’
Theorem with associate degree assumption of independence
among predictors.

In easy terms, a Naïve Bayes categorize assumes that the presence


of a specific feature in a class is unrelated to the presence of the
other feature.
What is Naive Bayes Classifier?
➢ Naive Bayes is a statistical classification technique based on
Bayes Theorem.
➢ It is one of the simplest supervised learning algorithms.
➢ Naive Bayes classifier is the fast, accurate and reliable algorithm.
➢ Naïve Bayes is used in classification and regression.
➢ Naive Bayes classifiers have high accuracy and speed on large
datasets.
➢ Naive Bayes classifier assumes that the effect of a particular
feature in a class is independent of other features.
Naive Bayes Classifier (Bayes
theorem)
(Bayes theorem)
How Naive Bayes Classifier Works? Multinomial Naive Bayes
classifier

Naive Bayes classifier calculates the probability of an event in the


following steps:
• Step 1: Calculate the prior probability for given class labels
• Step 2: Find Likelihood probability with each attribute for each class
• Step 3: Put these value in Bayes Formula and calculate posterior
probability.
• Step 4: See which class has a higher probability, given the input
belongs to the higher probability class.
Example
Example

▶ In learning phase calculate the probability based on


table.
Example
▶ In Test phase: try all cases then give decision for higher
probability
Naive Bayes Classifier

Example:
Filtering Spam Emails
Initially: we have 15 normal emails and 5 spam emails.
Category Number of mails Probability Word Number of Probability
occurences

Normal mails 15 See


Meet
Free
Cash
Total

Spam mails 5 See


Meet
Free
Cash
Total 20 Total
Naive Bayes Classifier
Category Number of mails Probability Word Number of Probability
occurences

Normal mails 15 0.75 See


Meet
Free
Cash
Total

Spam mails 5 0.25 See


Meet
Free
Cash
Total 20 Total

Then compute the proportion of normal and spam emails in the


sample. For example, proportion of normal emails in the
sample is 15/20 = 0.75.
Naive Bayes Classifier

Category Number of mails Probability Word Number of occurences Probability

Normal mails 15 0.75 See 9


Meet 8
Free 2
Cash 6
Total 25

Spam mails 5 0.25 See 4


Meet 1
Free 10
Cash 5
Total 20 Total 20

In reality, we can analyse every single word in the emails.


But for simplicity, we analyse 4 words in this activity.

Count the occurrences of different words and record them.


Naive Bayes Classifier

Category Number of mails Probability Word Number of occurences Probability

Normal mails 15 0.75 See 9 0.36


Meet 8 0.32
Free 2 0.08
Cash 6 0.24
Total 25 1

Spam mails 5 0.25 See 4 0.2


Meet 1 0.05
Free 10 0.5
Cash 5 0.25
Total 20 Total 20 1

Afterwards, compute the probability of occurrence for each


word. We now construct the probability distribution tables
for normal and spam emails.
Naive Bayes Classifier

Category Number of mails Probability Word Number of occurences Probability

Normal mails 15 0.75 See 9 0.36


Meet 8 0.32
Free 2 0.08
Cash 6 0.24
Total 25 1

Spam mails 5 0.25 See 4 0.2


Meet 1 0.05
Free 10 0.5
Cash 5 0.25
Total 20 Total 20 1

Given an email is having the words “Free” and “Cash”, the probability of this
email is normal =(0.75) (0.08) (0.24)

= 0.0144
Naive Bayes Classifier
Category Number of mails Probability Word Number of occurences Probability

Normal mails 15 0.75 See 9 0.36


Meet 8 0.32
Free 2 0.08
Cash 6 0.24
Total 25 1

Spam mails 5 0.25 See 4 0.2

Meet 1 0.05
Free 10 0.5
Cash 5 0.25
Total 20 Total 20 1

Given an email is having the words “Free” and “Cash”, the


probability of this email is spam =(0.25) (0.5) (0.25)

= 0.03125 > 0.0144


Naive Bayes Classifier

Category Number of mails Probability Word Number of occurences Probability

Normal mails 15 0.75 See 9 0.36


Meet 8 0.32
Free 2 0.08
Cash 6 0.24
Total 25 1

Spam mails 5 0.25 See 4 0.2


Meet 1 0.05
Free 10 0.5
Cash 5 0.25
Total 20 Total 20 1

Conclusion:

Given an email is having the words “Free” and “Cash”, most probably, it
is a spam email.
Naive Bayes Classifier

Advantages
1. This algorithm works very fast and can easily predict the class of a test dataset.
2. You can use it to solve multi-class prediction problems as it’s quite useful with them.
3. Naive Bayes classifier performs better than other models with less training data if the
assumption of independence of features holds.

Disadvantages
1. If your test data set has a categorical variable of a category that wasn’t present in the
training data set, the Naive Bayes model will assign it zero probability and won’t be able to
make any predictions in this regard.
2. It assumes that all the features are independent. While it might sound great in theory, in
real life, you’ll hardly find a set of independent features.
Example
we will generate synthetic data using scikit-learn and train and evaluate the
Gaussian Naive Bayes algorithm.

Generating the Dataset

Scikit-learn provides us with a machine learning ecosystem so that you can


generate the dataset and evaluate various machine learning algorithms.

In our case, we are creating a dataset with six features, three classes, and
800 samples using the `make_classification` function.
Example
Example
Example
Decision tree example with 3
output (classes)
▶ When building a decision tree with three classes using entropy,
the base of the logarithm does not depend on the number
of classes; you can use any base (e.g., base 2, base 3, or base
10). However, base 2 (log2) is commonly used for entropy
calculations in decision trees.
▶ Using log3 instead of log2 would scale the entropy values, but
it does not affect the relative comparisons of information
gain, so the decision tree structure remains the same.
Decision tree example with 3
output (classes)
Example: Decision Tree with Three Classes Using Entropy (Base 2,
and Base 3)
Let’s consider a dataset:
Decision tree example with 3
output (classes)
Decision tree example with 3 output (classes)
Decision tree example with 3 output (classes)
Decision tree example with 3 output (classes)

You might also like