unit II AI PPT.pptx
unit II AI PPT.pptx
intelligence
Causes of uncertainty:
2. Conditional Probability
P(R/A)=Red ball/Total
No. of balls
P(X/A) 5R
2W
PP(R/A)=5/7
Bayes' Theorem
• The probability that the randomly selected item was from machine B, given
that it is defective, is given by P (E2|X).
Classifier
• In machine learning, a classifier is an algorithm that automatically assigns
data points to a range of categories or classes. Within the classifier category,
there are two main models: supervised and unsupervised.
In the supervised model, classifiers train to make distinctions between
labeled and unlabeled data. This training allows them to recognize patterns
and ultimately operate autonomously without using labels.
Unsupervised algorithms use pattern recognition to classify unlabeled
datasets, progressively becoming more accurate.
History of naive Bayes classifier
• A naive Bayes classifier is a simple probabilistic
classifier based on applying Bayes' theorem with
strong (naive) independence assumptions.
• Bayes' theorem was named after the Reverend
Thomas Bayes (1702–61), who studied how to
compute a distribution for the probability parameter
of a binomial distribution. After Bayes' death, his
friend Richard Price edited and presented this work in
1763, as An Essay towards solving a Problem in the
Doctrine of Chances.
• So it is safe to say that Bayes classifiers have been
around since the 2nd half of the 18th century.
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning
algorithm, which is based on Bayes
theorem and used for solving classification
problems.
Naïve Bayes
• The probability that the randomly selected item was from machine A,
given that it is defective, is given by P (E2|X).
Classification Is to Derive the Maximum Posteriori
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem
needs to be maximized
28
Naïve Bayes Classifier
• A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between attributes):
and P(xk|Ci) is
29
Working of Naïve Bayes' Classifier
• Working of Naïve Bayes' Classifier can be understood with
the help of the below example:
• Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this dataset
we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to
solve this problem, we need to follow the below steps:
• Convert the given dataset into frequency tables.
• Generate Likelihood table by finding the probabilities of
given features.
• Now, use Bayes theorem to calculate the posterior
probability.
Workflow
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
32
Naïve Bayes Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
33
S.NO Outlook Temperature Humidity Windy Class
1 Sunny Hot High FALSE N
2 Sunny Hot High TRUE N
3 Overcast Hot High FALSE P
4 Rain Mild High FALSE P
5 Rain Cool Normal FALSE P
6 Rain Cool Normal TRUE N
7 Overcast Cool Normal TRUE P
8 Sunny Mild High FALSE N
9 Sunny Cool Normal FALSE P
10 Rain Mild Normal FALSE P
11 Sunny Mild Normal TRUE P
12 Overcast Mild High TRUE P
13 Overcast Hot Normal FALSE P
14 Rain Mild High TRUE P
• Consider the tuple X to classify.
X= (outlook = sunny, temperature = cool, humidity = high,
windy = false)
The data tuple described by the attribute outlook,
temperature, humidity, windy
The class label class which has 2 distinct values P and N
The C1 corresponds to the class P & C2 corresponds to the
class N
X= (outlook = sunny, temperature = cool, humidity =
high, windy = false)
We need to calculate position probability
P(X|Ci) P(Ci) for i=1,2
P(class = P) = 10 /14 = 0.714
P(class = N) = 4 /14 = 0.285
• To compute P(X|Ci) for i=1,2 we compute the following conditional
probabilities.
P(outlook=Sunny/class = P) = 2/10 = 0.2
P(outlook=Sunny/class = N) = 3/4= 0.75
P(temperature = cool/ class = P) = 3 /10 = 0.3
P(temperature = cool/ class = N) = 1 /4 = 0.25
P(humidity = high / class = P) = 4 /10 = 0.4
P(humidity = high / class = N) = 3 /4 = 0.75
P(windy = false / class = P) = 6 /10 = 0.6
P(windy = false / class = N) = 2 /4 = 0.5
P(X| class = P)
= P(outlook=Sunny/class = P) * P(temperature = cool/ class = P) *
P(humidity = high / class= P) * P(windy = false / class = P)
= 0.2 * 0.3 * 0.4 * 0.6
= 0.0144
• P(X| class = N)
= P(outlook=Sunny/class = N) * P(temperature = cool/ class
= N) *
P(humidity = high / class = N) * P(windy = false / class = N)
=0.75 * 0.25 * 0.75 * 0.5
=0.0703
To find the class Ci that maximizes P(X|Ci) P(Ci) we compute
P(X | class = P) P(class = P) = 0.0144 * 0.714 = 0.01028
P(X | class = N) P(class = N) = 0.0703 * 0.285= 0.0200
The Naive Bayes classifier algorithm predicts class N for
tuple X
• Bayesian Network can be used for building
models from data and experts opinions, and it
consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
Bayesian Belief Network in artificial intelligence