0% found this document useful (0 votes)
5 views

unit II AI PPT.pptx

The document discusses probabilistic reasoning in artificial intelligence, highlighting causes and sources of uncertainty in real-world scenarios. It explains the importance of probabilistic reasoning, Bayes' theorem, and the Naïve Bayes classifier for handling uncertain knowledge and making predictions. The document also provides examples of calculating probabilities and applying Bayes' rule in classification problems.

Uploaded by

Vinayasri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

unit II AI PPT.pptx

The document discusses probabilistic reasoning in artificial intelligence, highlighting causes and sources of uncertainty in real-world scenarios. It explains the importance of probabilistic reasoning, Bayes' theorem, and the Naïve Bayes classifier for handling uncertain knowledge and making predictions. The document also provides examples of calculating probabilities and applying Bayes' rule in classification problems.

Uploaded by

Vinayasri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Probabilistic reasoning in Artificial

intelligence
Causes of uncertainty:

• Following are some leading causes of


uncertainty to occur in the real world.
• Information occurred from unreliable sources.
• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
Sources of uncertainty
• Uncertain inputs
– Missing data
– Noisy data
• Uncertain knowledge
– Multiple causes lead to multiple effects
– Incomplete enumeration of conditions or effects
– Incomplete knowledge of causality in the domain
– Probabilistic/stochastic effects
• Uncertain outputs
– Abduction and induction are inherently uncertain
– Default reasoning, even in deductive fashion, is uncertain
– Incomplete deductive inference may be uncertain
Probabilistic reasoning only gives probabilistic results
(summarizes uncertainty from various sources)
3
Probabilistic reasoning:

• Probabilistic reasoning is a way of knowledge


representation where we apply the concept of
probability to indicate the uncertainty in
knowledge.
• In probabilistic reasoning, we combine
probability theory with logic to handle the
uncertainty.
Need of probabilistic reasoning in AI:

• When there are unpredictable outcomes.


• When specifications or possibilities of predicates
becomes too large to handle.
• When an unknown error occurs during an
experiment.
• In probabilistic reasoning, there are two ways to
solve problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics
Probability:
• Probability can be defined as a chance that an
uncertain event will occur. It is the numerical
measure of the likelihood that an event will occur.
The value of probability always remains between
0 and 1 that represent ideal uncertainties.
• 0 ≤ P(A) ≤ 1, where P(A) is the probability of an e
vent A.
• P(A) = 0, indicates total uncertainty in an event A.

• P(A) =1, indicates total certainty in an event A.


• We can find the probability of an uncertain
event by using the below formula.
• P(¬A) = probability of a not happening event.
• P(¬A) + P(A) = 1.
• Event: Each possible outcome of a variable is called an
event.
• Sample space: The collection of all possible events is called
sample space.
• Random variables: Random variables are used to represent
the events and objects in the real world.
• Prior probability: The prior probability of an event is
probability computed before observing new information.
• Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.
Conditional probability:

• Conditional probability is a probability of occurring an event when


another event has already happened.
• Let's suppose, we want to calculate the event A when event B has
already occurred, "the probability of A under the conditions of B", it
can be written as:

• Where P(A⋀B)= Joint probability of a and B


• P(B)= Marginal probability of B.
• If the probability of A is given and we need to find the probability of
B, then it will be given as:
• In a class, there are 70% of the students who like English
and 40% of the students who likes English and
mathematics, and then what is the percent of students
those who like English also like mathematics?
• Solution:
• Let, A is an event that a student likes Mathematics
• B is an event that a student likes English.
• Hence, 57% are the students who like English also like
Mathematics.
1. Normal Probability P(A)=1/2
P(B)=1/2
A B
A

2. Conditional Probability
P(R/A)=Red ball/Total
No. of balls

P(X/A) 5R
2W
PP(R/A)=5/7
Bayes' Theorem

• Bayes' theorem is also known as Bayes'


Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge.
It depends on the conditional probability.
• The formula for Bayes' theorem is given as:

– Let X be a data sample (“evidence”): class label is


unknown
– Let H be a hypothesis that X belongs to class C
• Where,
• P(H|E) is Posterior probability: Probability of
hypothesis H on the observed event E.
• P(E|H) is Likelihood probability: Probability of
the evidence given that the probability of a
hypothesis is true.
• P(H) is Prior Probability: Probability of hypothesis
before observing the evidence.
• P(E) is Marginal Probability: Probability of
Evidence.
Applying Bayes' rule:

• Bayes' rule allows us to compute the single term


P(B|A) in terms of P(A|B), P(B), and P(A). This is
very useful in cases where we have a good
probability of these three terms and want to
determine the fourth one. Suppose we want to
perceive the effect of some unknown cause, and
want to compute that cause, then the Bayes' rule
becomes:

• Question: From a standard deck of playing cards, a single card is
drawn. The probability that the card is king is 4/52, then calculate
posterior probability P(King|Face), which means the drawn face
card is a king card.
• Solution:

• P(king): probability that the card is King= 4/52= 1/13


• P(face): probability that a card is a face card= 3/13
• P(Face|King): probability of face card when we assume it is a king =
1
• Putting all values in equation (i) we will get:
• A factory has two machines A and B. Past record shows that
machine A produced 60% of the items of output and machine
B produced 40% of the items. Further, 2% of the items
produced by machine A and 1% produced by machine B were
defective. All the items are put into one stockpile and then
one item is chosen at random from this and is found to be
defective. What is the probability that was produced by
machine B?
• Probability of items produced by machine A, P (E1)

• Probability of items produced by machine B, P (E2)

• Probability that machine A produced defective items, P (X|E1)

• Probability that machine B produced defective items, P (X|E2)

• The probability that the randomly selected item was from machine B, given
that it is defective, is given by P (E2|X).
Classifier
• In machine learning, a classifier is an algorithm that automatically assigns
data points to a range of categories or classes. Within the classifier category,
there are two main models: supervised and unsupervised.
In the supervised model, classifiers train to make distinctions between
labeled and unlabeled data. This training allows them to recognize patterns
and ultimately operate autonomously without using labels.
Unsupervised algorithms use pattern recognition to classify unlabeled
datasets, progressively becoming more accurate.
History of naive Bayes classifier
• A naive Bayes classifier is a simple probabilistic
classifier based on applying Bayes' theorem with
strong (naive) independence assumptions.
• Bayes' theorem was named after the Reverend
Thomas Bayes (1702–61), who studied how to
compute a distribution for the probability parameter
of a binomial distribution. After Bayes' death, his
friend Richard Price edited and presented this work in
1763, as An Essay towards solving a Problem in the
Doctrine of Chances.
• So it is safe to say that Bayes classifiers have been
around since the 2nd half of the 18th century.
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning
algorithm, which is based on Bayes
theorem and used for solving classification
problems.
Naïve Bayes

• The Naïve Bayes algorithm is comprised of two


words Naïve and Bayes, Which can be
described as:
• Naïve: It is called Naïve because it assumes
that the occurrence of a certain feature is
independent of the occurrence of other
features.
• if the fruit is identified on the bases of color,
shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify
that it is an apple without depending on each
other.
• Bayes: It is called Bayes because it depends on
the principle of Bayes' Theorem.
Bayes classification methods
• Bayes classifiers are statistical classifiers.
• They can predict the class based on
probabilities, such as the probability that a
given tuple belongs to a particular class
• A factory has two machines A and B. Past record shows
that machine A produced 60% of the items of output and
machine B produced 40% of the items. Further, 2% of
the items produced by machine A and 1% produced by
machine B were defective. All the items are put into one
stockpile and then one item is chosen at random from
this and is found to be defective. What is the probability
that was produced by machine A?
• Probability of items produced by machine A, P (E1)

• Probability of items produced by machine B, P (E2)

• Probability that machine A produced defective items, P (X|E1)

• Probability that machine B produced defective items, P (X|E2)

• The probability that the randomly selected item was from machine A,
given that it is defective, is given by P (E2|X).
Classification Is to Derive the Maximum Posteriori
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem

• Since P(X) is constant for all classes, only

needs to be maximized

28
Naïve Bayes Classifier
• A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between attributes):

• This greatly reduces the computation cost: Only counts the


class distribution
• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
• If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ

and P(xk|Ci) is

29
Working of Naïve Bayes' Classifier
• Working of Naïve Bayes' Classifier can be understood with
the help of the below example:
• Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this dataset
we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to
solve this problem, we need to follow the below steps:
• Convert the given dataset into frequency tables.
• Generate Likelihood table by finding the probabilities of
given features.
• Now, use Bayes theorem to calculate the posterior
probability.
Workflow
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’

Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)

32
Naïve Bayes Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
33
S.NO Outlook Temperature Humidity Windy Class
1 Sunny Hot High FALSE N
2 Sunny Hot High TRUE N
3 Overcast Hot High FALSE P
4 Rain Mild High FALSE P
5 Rain Cool Normal FALSE P
6 Rain Cool Normal TRUE N
7 Overcast Cool Normal TRUE P
8 Sunny Mild High FALSE N
9 Sunny Cool Normal FALSE P
10 Rain Mild Normal FALSE P
11 Sunny Mild Normal TRUE P
12 Overcast Mild High TRUE P
13 Overcast Hot Normal FALSE P
14 Rain Mild High TRUE P
• Consider the tuple X to classify.
X= (outlook = sunny, temperature = cool, humidity = high,
windy = false)
The data tuple described by the attribute outlook,
temperature, humidity, windy
The class label class which has 2 distinct values P and N
The C1 corresponds to the class P & C2 corresponds to the
class N
X= (outlook = sunny, temperature = cool, humidity =
high, windy = false)
We need to calculate position probability
P(X|Ci) P(Ci) for i=1,2
P(class = P) = 10 /14 = 0.714
P(class = N) = 4 /14 = 0.285
• To compute P(X|Ci) for i=1,2 we compute the following conditional
probabilities.
P(outlook=Sunny/class = P) = 2/10 = 0.2
P(outlook=Sunny/class = N) = 3/4= 0.75
P(temperature = cool/ class = P) = 3 /10 = 0.3
P(temperature = cool/ class = N) = 1 /4 = 0.25
P(humidity = high / class = P) = 4 /10 = 0.4
P(humidity = high / class = N) = 3 /4 = 0.75
P(windy = false / class = P) = 6 /10 = 0.6
P(windy = false / class = N) = 2 /4 = 0.5
P(X| class = P)
= P(outlook=Sunny/class = P) * P(temperature = cool/ class = P) *
P(humidity = high / class= P) * P(windy = false / class = P)
= 0.2 * 0.3 * 0.4 * 0.6
= 0.0144
• P(X| class = N)
= P(outlook=Sunny/class = N) * P(temperature = cool/ class
= N) *
P(humidity = high / class = N) * P(windy = false / class = N)
=0.75 * 0.25 * 0.75 * 0.5
=0.0703
To find the class Ci that maximizes P(X|Ci) P(Ci) we compute
P(X | class = P) P(class = P) = 0.0144 * 0.714 = 0.01028
P(X | class = N) P(class = N) = 0.0703 * 0.285= 0.0200
The Naive Bayes classifier algorithm predicts class N for
tuple X
• Bayesian Network can be used for building
models from data and experts opinions, and it
consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
Bayesian Belief Network in artificial intelligence

• Bayesian belief network is key computer technology for


dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network
as:
• "A Bayesian network is a probabilistic graphical model
which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network,
decision network, or Bayesian model.
• Bayesian networks are probabilistic, because
these networks are built from a probability
distribution, and also use probability theory
for prediction and anomaly detection
• Real world applications are probabilistic in
nature, and to represent the relationship
between multiple events, we need a Bayesian
network. It can also be used in various tasks
including prediction, anomaly detection,
diagnostics, automated insight, reasoning,
time series prediction, and decision making
under uncertainty.
Bayesian network
• A Bayesian network graph is made up of nodes and Arcs (directed links), where:
• Each node corresponds to the random variables, and a
variable can be continuous or discrete.
• Arc or directed arrows represent the causal relationship
or conditional probabilities between random variables.
These directed links or arrows connect the pair of nodes
in the graph.
These links represent that one node directly influence
the other node, and if there is no directed link that
means that nodes are independent with each other
•In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.

•If we are considering node B, which is connected with node A by a


directed arrow, then node A is called the parent of Node B.

•Node C is independent of node A.

You might also like