0% found this document useful (0 votes)
9 views

Ch6

Uploaded by

Pruthavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Ch6

Uploaded by

Pruthavi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Uncertainty:

Till now, we have learned knowledge representation using first-order


logic and propositional logic with certainty, which means we were
sure about the predicates. With this knowledge representation, we
might write A→B, which means if A is true then B is true, but
consider a situation where we are not sure about whether A is true
or not then we cannot express this statement, this situation is called
uncertainty.

So to represent uncertain knowledge, where we are not sure about


the predicates, we need uncertain reasoning or probabilistic
reasoning.

Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real
world.

1. Information occurred from unreliable sources.


2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where
we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability theory
with logic to handle the uncertainty.

We use probability in probabilistic reasoning because it provides a


way to handle the uncertainty that is the result of someone's
laziness and ignorance.

In the real world, there are lots of scenarios, where the certainty of
something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two
players." These are probable sentences for which we can assume
that it will happen but not sure about it, so here we use probabilistic
reasoning.

Need of probabilistic reasoning in AI:

o When there are unpredictable outcomes.


o When specifications or possibilities of predicates becomes too
large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with


uncertain knowledge:

o Bayes' rule
o Bayesian Statistics

Note: We will learn the above two rules in later chapters.

As probabilistic reasoning uses probability and related terms, so


before understanding probabilistic reasoning, let's understand some
common terms:

Probability: Probability can be defined as a chance that an


uncertain event will occur. It is the numerical measure of the
likelihood that an event will occur. The value of probability always
remains between 0 and 1 that represent ideal uncertainties.

1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

1. P(A) = 0, indicates total uncertainty in an event A.

1. P(A) =1, indicates total certainty in an event A.

We can find the probability of an uncertain event by using the below


formula.
o P(¬A) = probability of a not happening event.
o P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.

Sample space: The collection of all possible events is called


sample space.

Random variables: Random variables are used to represent the


events and objects in the real world.

Prior probability: The prior probability of an event is probability


computed before observing new information.

Posterior Probability: The probability that is calculated after all


evidence or information has taken into account. It is a combination
of prior probability and new information.

Conditional probability:
Conditional probability is a probability of occurring an event when
another event has already happened.

Let's suppose, we want to calculate the event A when event B has


already occurred, "the probability of A under the conditions of B", it
can be written as:

Where P(A⋀B)= Joint probability of a and B

P(B)= Marginal probability of B.

If the probability of A is given and we need to find the probability of


B, then it will be given as:
It can be explained by using the below Venn diagram, where B is
occurred event, so sample space will be reduced to set B, and now
we can only calculate event A when event B is already occurred by
dividing the probability of P(A⋀B) by P( B ).

Example:

In a class, there are 70% of the students who like English and 40%
of the students who likes English and mathematics, and then what is
the percent of students those who like English also like
mathematics?

Solution:

Let, A is an event that a student likes Mathematics

B is an event that a student likes English.


Hence, 57% are the students who like English also like
Mathematics.

Bayes' theorem in Artificial intelligence


Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law,
or Bayesian reasoning, which determines the probability of an
event with uncertain knowledge.

In probability theory, it relates the conditional probability and


marginal probabilities of two random events.

Bayes' theorem was named after the British


mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to Bayesian
statistics.

It is a way to calculate the value of P(B|A) with the knowledge of


P(A|B).

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Bayes' theorem allows updating the probability prediction of an


event by observing new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes'
theorem, we can determine the probability of cancer more
accurately with the help of age.

Bayes' theorem can be derived using product rule and conditional


probability of event A with known event B:

As from product rule we can write:

1. P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:

1. P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes'


theorem. This equation is basic of most modern AI systems
for probabilistic inference.

It shows the simple relationship between joint and conditional


probabilities. Here,

P(A|B) is known as posterior, which we need to calculate, and it will


be read as Probability of hypothesis A when we have occurred an
evidence B.

P(B|A) is called the likelihood, in which we consider that hypothesis


is true, then we calculate the probability of evidence.

P(A) is called the prior probability, probability of hypothesis before


considering the evidence

P(B) is called marginal probability, pure probability of an


evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai),
hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and


exhaustive events.

Applying Bayes' rule:


Bayes' rule allows us to compute the single term P(B|A) in terms of
P(A|B), P(B), and P(A). This is very useful in cases where we have a
good probability of these three terms and want to determine the
fourth one. Suppose we want to perceive the effect of some
unknown cause, and want to compute that cause, then the Bayes'
rule becomes:

Example-1:

Question: what is the probability that a patient has diseases


meningitis with a stiff neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a


stiff neck, and it occurs 80% of the time. He is also aware of some
more facts, which are given as follows:

o The Known probability that a patient has meningitis disease is


1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the
proposition that patient has meningitis. , so we can calculate the
following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has


meningitis disease with a stiff neck.

Example-2:

Question: From a standard deck of playing cards, a single


card is drawn. The probability that the card is king is 4/52,
then calculate posterior probability P(King|Face), which
means the drawn face card is a king card.

Solution:

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king =


1

Putting all values in equation (i) we will get:


Application of Bayes' theorem in Artificial
intelligence:
Following are some applications of Bayes' theorem:

o It is used to calculate the next step of the robot when the


already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

Bayesian Belief Network in artificial


intelligence
Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty.
We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which


represents a set of variables and their conditional dependencies
using a directed acyclic graph."

It is also called a Bayes network, belief network, decision


network, or Bayesian model.

Bayesian networks are probabilistic, because these networks are


built from a probability distribution, and also use probability
theory for prediction and anomaly detection.

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s

Real world applications are probabilistic in nature, and to represent


the relationship between multiple events, we need a Bayesian
network. It can also be used in various tasks including prediction,
anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under
uncertainty.

Bayesian Network can be used for building models from data and
experts opinions, and it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

The generalized form of Bayesian network that represents and solve


decision problems under uncertain knowledge is known as
an Influence diagram.

A Bayesian network graph is made up of nodes and Arcs


(directed links), where:
o Each node corresponds to the random variables, and a
variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or
conditional probabilities between random variables. These
directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the
other node, and if there is no directed link that means that
nodes are independent with each other
o In the above diagram, A, B, C, and D are random
variables represented by the nodes of the network
graph.
o If we are considering node B, which is connected
with node A by a directed arrow, then node A is
called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.

The Bayesian network has mainly two components:

o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of the
parent on that node.

Bayesian network is based on Joint probability distribution and


conditional probability. So let's first understand the joint probability
distribution:

Joint probability distribution:


If we have variables x1, x2, x3,....., xn, then the probabilities of a
different combination of x1, x2, x3.. xn, are known as Joint
probability distribution.

P[x1, x2, x3,....., xn], it can be written as the following way in terms
of the joint probability distribution.

= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]

= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].

In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

Explanation of Bayesian network:


Let's understand the Bayesian network through an example by
creating a directed acyclic graph:

Example: Harry installed a new burglar alarm at his home to detect


burglary. The alarm reliably responds at detecting a burglary but
also responds for minor earthquakes. Harry has two neighbors David
and Sophia, who have taken a responsibility to inform Harry at work
when they hear the alarm. David always calls Harry when he hears
the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Sophia likes to listen
to high music, so sometimes she misses to hear the alarm. Here we
would like to compute the probability of Burglary Alarm.

Problem:

Calculate the probability that alarm has sounded, but there


is neither a burglary, nor an earthquake occurred, and David
and Sophia both called the Harry.

Solution:

o The Bayesian network for the above problem is given below.


The network structure is showing that burglary and earthquake
is the parent node of the alarm and directly affecting the
probability of alarm's going off, but David and Sophia's calls
depend on alarm probability.
o The network is representing that our assumptions do not
directly perceive the burglary and also do not notice the minor
earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as
conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries
in the table represent an exhaustive set of cases for the
variable.
o In CPT, a boolean variable with k boolean parents contains
2K probabilities. Hence, if there are two parents, then CPT will
contain 4 probability values

List of all events occurring in this network:

o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of


probability: P[D, S, A, B, E], can rewrite the above probability
statement using joint probability distribution:

P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]

= P [D| A]. P [ S| A, B, E]. P[ A, B, E]

= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]

= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]


Let's take the observed probability for the Burglary and earthquake
component:

P(B= True) = 0.002, which is the probability of burglary.

P(B= False)= 0.998, which is the probability of no burglary.

P(E= True)= 0.001, which is the probability of a minor earthquake

P(E= False)= 0.999, Which is the probability that an earthquake not


occurred.

We can provide the conditional probabilities as per the below tables:

Conditional probability table for Alarm A:

The Conditional probability of Alarm A depends on Burglar and


earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the


probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95


Conditional probability table for Sophia Calls:

The Conditional probability of Sophia that she calls is depending on


its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98

From the formula of joint distribution, we can write the problem


statement in the form of probability distribution:

P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P


(¬E).

= 0.75* 0.91* 0.001* 0.998*0.999

= 0.00068045.

Hence, a Bayesian network can answer any query about the


domain by using Joint distribution.

The semantics of Bayesian Network:

There are two ways to understand the semantics of the Bayesian


network, which is given below:

1. To understand the network as the representation of the


Joint probability distribution.

It is helpful to understand how to construct the network.

2. To understand the network as an encoding of a collection


of conditional independence statements.

It is helpful in designing inference procedure


Representing Knowledge in Uncertain Domain

full joint probability distribution can answer any question about the domain, but
can become intractably large as the number of variables grows. Furthermore,
specifying probabilities for possible worlds one by one is unnatural and tedious.
We also saw that independence and conditional independence relationships among
variables can greatly reduce the number of probabilities that need to be specified in
order to define the full joint distribution. This section introduces a data structure
called a Bayesian network1 BAYESIAN NETWORK to represent the
dependencies among variables. Bayesian networks can represent essentially any
full joint probability distribution and in many cases can do so very concisely .

A Bayesian network is a directed graph in which each node is annotated with


quantitative probability information. The full specification is as follows: 1. Each
node corresponds to a random variable, which may be discrete or continuous. 2. A
set of directed links or arrows connects pairs of nodes. If there is an arrow from
node X to node Y , X is said to be a parent of Y. The graph has no directed cycles
(and hence is a directed acyclic graph, or DAG. 3. Each node Xi has a conditional
probability distribution P(Xi | P arents(Xi))that quantifies the effect of the parents
on the node. The topology of the network—the set of nodes and links—specifies
the conditional independence relationships that hold in the domain, in a way that
will be made precise shortly. The intuitive meaning of an arrow is typically that X
has a direct influence on Y, which suggests that causes should be parents of
effects. It is usually easy for a domain expert to decide what direct influences exist
in the domain—much easier, in fact, than actually specifying the probabilities
themselves. Once the topology of the Bayesian network is laid out, we need only
specify a conditional probability distribution for each variable, given its parents.
We will see that the combination of the topology and the conditional distributions
suffices to specify (implicitly) the full joint distribution for all the variables. Recall
the simple world described in Chapter 13, consisting of the variables Toothache,
Cavity, Catch, and Weather . We argued that Weather is independent of the other
variables; furthermore, we argued that Toothache and Catch are conditionally
independent, given Cavity. These relationships are represented by the Bayesian
network structure shown in Figure 14.1. Formally, the conditional independence of
Toothache and Catch, given Cavity, is indicated by the absence of a link between
Toothache and Catch. Intuitively, the network represents the fact that Cavity is a
direct cause of Toothache and Catch, whereas no direct causal relationship exists
between Toothache and Catch. Now consider the following example, which is just
a little more complex. You have a new burglar alarm installed at home. It is fairly
reliable at detecting a burglary, but also responds on occasion to minor
earthquakes. (This example is due to Judea Pearl, a resident of Los Angeles—
hence the acute interest in earthquakes.) You also have two neighbors, John and
Mary, who have promised to call you at work when they hear the alarm. John
nearly always calls when he hears the alarm, but sometimes confuses the telephone
ringing with
the alarm and calls then, too. Mary, on the other hand, likes rather loud music and
often misses the alarm altogether. Given the evidence of who has or has not called,
we would like to estimate the probability of a burglary. A Bayesian network for
this domain appears in Figure 14.2. The network structure shows that burglary and
earthquakes directly affect the probability of the alarm’s going off, but whether
John and Mary call depends only on the alarm. The network thus represents our
assumptions that they do not perceive burglaries directly, they do not notice minor
earthquakes, and they do not confer before calling. The conditional distributions in
Figure 14.2 are shown as a conditional probability table, or CPT. (This form of
table can be used for discrete variables; other representations, CONDITIONAL
PROBABILITY TABLE including those suitable for continuous variables, are
described in Section 14.2.) Each row CONDITIONING CASE in a CPT contains
the conditional probability of each node value for a conditioning case. A
conditioning case is just a possible combination of values for the parent nodes—a
miniature possible world, if you like. Each row must sum to 1, because the entries
represent an exhaustive set of cases for the variable. For Boolean variables, once
you know that the probability of a true value is p, the probability of false must be 1
– p, so we often omit the second number, as in Figure 14.2. In general, a table for a
Boolean variable with k Boolean parents contains 2k independently specifiable
probabilities. A node with no parents has only one row, representing the prior
probabilities of each possible value of the variable. Notice that the network does
not have nodes corresponding to Mary’s currently listening to loud music or to the
telephone ringing and confusing John. These factors are summarized in the
uncertainty associated with the links from Alarm to JohnCalls and MaryCalls. This
shows both laziness and ignorance in operation: it would be a lot of work to find
out why those factors would be more or less likely in any particular case, and we
have no reasonable way to obtain the relevant information anyway. The
probabilities actually summarize a potentially Section 14.2. The Semantics of
Bayesian Networks 513 infinite set of circumstances in which the alarm might fail
to go off (high humidity, power failure, dead battery, cut wires, a dead mouse stuck
inside the bell, etc.) or John or Mary might fail to call and report it (out to lunch,
on vacation, temporarily deaf, passing helicopter, etc.). In this way, a small agent
can cope with a very large world, at least approximately. The degree of
approximation can be improved if we introduce additional relevant information

You might also like