0% found this document useful (0 votes)
3 views

Lecture Note #7_PEC-CS701E

Uploaded by

halderriya56732
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture Note #7_PEC-CS701E

Uploaded by

halderriya56732
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

PEC-CS701E

Naive Bayes’ Classifier


Subhas Halder
Department of Computer Science and Engineering
Classification Techniques
• A number of classification techniques are known, which can be broadly
classified into the following categories:

1. Statistical-Based Methods
• Regression
• Bayesian Classifier

2. Distance-Based Classification
• K-Nearest Neighbours

3. Decision Tree-Based Classification


• ID3, C 4.5, CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (ANN)


2
Bayesian Classifier

3
Bayesian Classifier
• A statistical classifier
• Performs probabilistic prediction, i.e., predicts class membership probabilities

• Foundation
• Based on Bayes’ Theorem.

• Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.

• Called “Naïve” classifier because of these assumptions.


• Empirically proven to be useful.
• Scales very well.

4
Bayesian Classifier
• In many applications, the relationship between the attributes set and the class
variable is non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.

• In such a situation, the classification can be achieved probabilistically.

• The Bayesian classifier is an approach for modelling probabilistic relationships


between the attribute set and the class variable.

• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for


classification.

• Before going to discuss the Bayesian classifier, we should have a quick look at
the Theory of Probability and then Bayes’ Theorem.

5
Bayes’ Theorem of Probability

6
Simple Probability

Definition : Simple Probability

If there are n elementary events associated with a random experiment and m of n


of them are favorable to an event A, then the probability of happening or
occurrence of A is
𝑚
𝑃 𝐴 =
𝑛

7
Simple Probability
• Suppose, A and B are any two events and P(A), P(B) denote the probabilities
that the events A and B will occur, respectively.

• Mutually Exclusive Events:


• Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

8
Simple Probability
• Independent events: Two events are independent if occurrences of one does
not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.


(How many events are here?)

9
Joint Probability

Definition : Joint Probability

If P(A) and P(B) are the probability of two events, then

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0


If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events


𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵

10
Conditional Probability

Definition : Conditional Probability

If events are dependent, then their probability is expressed by conditional


probability. The probability that A occurs given that B is denoted by 𝑃(𝐴|𝐵).

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is
given by

Number of events in 𝐵 which are favourable to 𝐴


𝑃 𝐴𝐵 =
Number of events in 𝐵

Number of events favourable to 𝐴 ∩ 𝐵


=
Number of events favourable to 𝐵

𝑃(𝐴 ∩ 𝐵)
=
𝑃(𝐵)
11
Conditional Probability
Corollary : Conditional Probability

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
or 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0

For three events A, B and C

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴𝐵 =0 if events are mutually exclusive
𝑃 𝐴𝐵 =𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴) otherwise,
P A ∩ B = P(B ∩ A) 12
Conditional Probability
• Generalization of Conditional Probability:

P(A ∩ B) P(B ∩ A)
P AB = =
P(B) P(B)

P(B|A)∙P(A)
= ∵P A ∩ B = P(B|A) ∙P(A) = P(A|B) ∙P(B)
P(B)

ഥ , where A
By the law of total probability : P(B) = P B ∩ A ∪ B ∩ A ഥ denotes the
compliment of event A. Thus,

P(B|A) ∙ P(A)
P AB =
P B∩A ∪ B∩A ഥ

P BA∙P(A)
=
P BA ∙P A +P(B│Aഥ )∙P(A
ഥ)
13
Conditional Probability

In general,
P(A) ∙ P D A
P AD =
P A ∙ P D A + P B ∙ P D B + P(C) ∙ P(D│C)

14
Total Probability
Definition : Total Probability

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴 𝐸1 + 𝑃 𝐸2 . 𝑃 𝐴 𝐸2 + ⋯ … … … . +𝑃 𝐸𝑛 . 𝑃(𝐴|𝐸𝑛 )

15
Bayes’ Theorem

Theorem : Bayes’ Theorem

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )
𝑃(𝐸𝑖 𝐴 =
σ𝑛𝑖=1 𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )

16
Probability Basics
• Prior, conditional and joint probability
– Prior probability: P(X)
– Conditional probability: P(X1 |X2 ), P(X2 |X1 )
– Joint probability: X = (X1 , X2 ), P(X) = P(X1 ,X2 )
– Relationship: P(X1 ,X2 ) = P(X2 |X1 )P(X1 ) = P(X1 |X2 )P(X2 )
– Independence: P(X2 |X1 ) = P(X2 ), P(X1 |X2 ) = P(X1 ), P(X1 ,X2 ) = P(X1 )P(X2 )
• Bayesian Rule

P( X|C )P(C ) Likelihood Prior


P(C |X) = Posterior =
P( X) Evidence

17
Prior and Posterior Probabilities
• P(A) and P(B) are called prior probabilities X Y
• P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example 8.6: Prior versus Posterior Probabilities 𝑥2 A
• This table shows that the event Y has two outcomes 𝑥3 B
namely A and B, which is dependent on another event X
with various outcomes like 𝑥1 , 𝑥2 and 𝑥3 . 𝑥3 A
• Case1: Suppose, we don’t have any information of the 𝑥2 B
event A. Then, from the
5
given sample space, we can
calculate P(Y = A) = 10 = 0.5 𝑥1 A

• Case2: Now, 2
suppose, we want to calculate P(X = 𝑥1 B
𝑥2 |Y =A) = 5 = 0.4 .
𝑥3 B

The later is the conditional or posterior probability, where 𝑥2 B


as the former is the prior probability.
𝑥2 A
18
Naïve Bayesian Classifier
• Suppose, Y is a class variable and X = 𝑋1, 𝑋2 , … . . , 𝑋𝑛 is a set of attributes,
with instance of Y.

INPUT (X) CLASS(Y)


… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦 𝑖
… … … …

• The classification problem, then can be expressed as the class-conditional


probability
𝑃 𝑌 = 𝑦𝑖 | 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛

19
Naïve Bayesian Classifier
• Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which is
as follows.

• From Bayes’ theorem on conditional probability, we have


𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖 ) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
▪ 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.

▪ The probability P(Y|X) (also called class conditional probability) is therefore


proportional to P(X|Y)∙ 𝑃(𝑌).

▪ Thus, P(Y|X) can be taken as a measure of Y given that X.


P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
20
Naïve Bayesian Classifier
• Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1 ) and ….. (𝑋𝑛 = 𝑥𝑛 )).

• There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x) and
P(Y= 𝑦𝑗 | X=x).

• If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗
for the instance X = x.

• The strongest 𝑦𝑖 is the classification for the instance X = x.

21
Example
• Example: Play Tennis

22
Example

Outlook Play=Yes Play=No Temperature Play=Yes Play=No


Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No


High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

23
Naïve Bayesian Classifier
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck).

There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have


values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛

Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k


𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑛𝑗=1 𝑃(𝐴𝑗 = 𝑎𝑗 |𝐶𝑖 )
𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘

Output: 𝐶𝑥 is the classification

Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion values (to posterior probabilities)
24
Naïve Bayesian Classifier
Pros and Cons
• The Naïve Bayes’ approach is a very popular one, which often works well.

• However, it has a number of potential problems

• It relies on all attributes being categorical.

• If the data is less, then it estimates poorly.

25

You might also like