cs188 Su24 Lec07
cs188 Su24 Lec07
Bayesian Networks
o Temperature: ▪ Weather:
W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Joint Distributions
o A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):
T W P
o Must obey: (non-negativity) hot sun 0.4
(normalization) hot rain 0.1
cold sun 0.2
cold rain 0.3
o Size of distribution if n variables with domain sizes d?
o For all but the smallest distributions, impractical to write out!
Probability
~h ~s
T W P
h s hot sun 0.4
hot rain 0.1
U cold sun 0.2
cold rain 0.3
AI to teach AI
Marginal Distributions
o Marginal distributions are sub-tables which eliminate variables
o Marginalization (summing out): Combine collapsed rows by
adding
T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Probability
~h ~s
T W P
h s hot sun 0.4
hot rain 0.1
U cold sun 0.2
cold rain 0.3
𝑃(𝑠, ℎ)
𝑃 ℎ = 𝑃 ℎ, 𝑠 + 𝑃(ℎ, ~𝑠) 𝑃 𝑠|ℎ =
𝑃(ℎ)
~h ~s ~h ~s
h s h s
U U
Conditional Distributions
o Conditional distributions are probability distributions
over some variables given fixed values of others
Conditional Distributions
Joint Distribution
W P
T W P
sun 0.8
hot sun 0.4
rain 0.2
hot rain 0.1
cold sun 0.2
W P cold rain 0.3
sun 0.4
rain 0.6
Normalization Trick
T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick
o Example
W P Normalize W P
sun 0.2 sun 0.4
rain 0.3 Z = 0.5 rain 0.6
Probabilistic Inference
o Probabilistic inference: compute a desired
probability from other known probabilities
(e.g. conditional from joint)
o P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
o P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
o P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
o P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
o P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
P(rain)=1-.65=.35
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
o P(W | winter, hot)?
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
o P(W | winter, hot)?
summer cold rain 0.05
winter hot sun 0.10
P(sun|winter,hot)~.1
winter hot rain 0.05
P(rain|winter,hot)~.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
o P(W | winter, hot)?
summer cold rain 0.05
winter hot sun 0.10
P(sun|winter,hot)~.1
winter hot rain 0.05
P(rain|winter,hot)~.05
P(sun|winter,hot)=2/3 winter cold sun 0.15
P(rain|winter,hot)=1/3 winter cold rain 0.20
Inference by Enumeration
o General case: ▪ We want:
o Evidence variables:
o Query* variable:
o Hidden variables: All variables
▪ Step 1: Select the ▪ Step 2: Sum out H to get joint ▪ Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Independence
Independence
o Two variables are independent if:
o This says that their joint distribution factors into a product of two
simpler distributions
o Another form:
o We write:
s s
~s ~s
𝑃(𝑠, ℎ)
𝑃 𝑠|ℎ =
𝑃(ℎ)
𝑃 𝑠, ℎ = 𝑃 𝑠|ℎ ∗ 𝑃(ℎ)
Example: Independence?
T P
hot 0.5
cold 0.5
T W P T W P
hot sun 0.4 hot sun 0.3
hot rain 0.1 hot rain 0.2
cold sun 0.2 cold sun 0.3
cold rain 0.3 cold rain 0.2
W P
sun 0.6
rain 0.4
Example: Independence
o N fair, independent coin flips:
o Trivial decomposition:
o Arcs: interactions
o MAY indicate influence between variables
o Formally: encode conditional independence
relationships (more later)
X1 X2 Xn
R R
T
T
o Why is an agent using model 2 better?
Bayes Net: DAG + CPTs
Example: Alarm Network
o Variables
o B: Burglary
o A: Alarm goes off
o M: Mary calls
o J: John calls
o E: Earthquake!
Example: Alarm Network
o Variables
o B: Burglary
o A: Alarm goes off
o M: Mary calls
o J: John calls
o E: Earthquake!
Burglary Earthqk
Alarm
John Mary
calls calls
Example: Humans
o G: human’s goal / human’s reward parameters
o S: state of the physical world
o A: human’s action
47
Example: Traffic II
o Variables
o T: Traffic
o R: It rains
o L: Low pressure
o D: Roof drips
o B: Ballgame
o C: Cavity
Bayesian Network Semantics
Probabilities in BNs
o Bayes nets implicitly encode joint distributions
o As a product of local conditional distributions
o To see what probability a BN gives to a full assignment, multiply
all the relevant conditionals together:
o Example:
Probabilities in BNs
o Why are we guaranteed that setting
🡪 Consequence:
o Not every BN can represent every joint distribution
o The topology enforces certain conditional independencies
Example: Coin Flips
X1 X2 Xn
P(h)P(h)P(t)P(h)
+r 1/4 P(+r)P(-t|+r) = ¼* ¼
R
-r 3/4
+r +t 3/4
T -t 1/4
-r +t 1/2
-t 1/2
Example: Alarm Network
B P(B) E P(E)
Burglary Earthqk +e 0.002
+b 0.001
-b 0.999 -e 0.998
Alarm
B E A P(A|B,E)
+b +e +a 0.95
John Mary
calls calls +b +e -a 0.05
+b -e +a 0.94
A J P(J|A) A M P(M|A) +b -e -a 0.06
+a +j 0.9 +a +m 0.7 -b +e +a 0.29 P(M|A)P(J|A)
+a -j 0.1 +a -m 0.3 -b +e -a 0.71 P(A|B,E)P(E)
P(B)
-a +j 0.05 -a +m 0.01 -b -e +a 0.001
-a -j 0.95 -a -m 0.99 -b -e -a 0.999
Example: Traffic
o Causal direction
+r 1/4
R
-r 3/4
+r +t 3/16
+r -t 1/16
+r +t 3/4
-r +t 6/16
T -t 1/4
-r -t 6/16
-r +t 1/2
-t 1/2
Example: Reverse Traffic
o Reverse causality?
+t 9/16
T
-t 7/16
+r +t 3/16
+r -t 1/16
+t +r 1/3
-r +t 6/16
R -r 2/3
-r -t 6/16
-t +r 1/7
-r 6/7
Causality?
o When Bayes’ nets reflect the true causal patterns:
o Often simpler (nodes have fewer parents)
o Often easier to think about
o Often easier to elicit from experts
▪ Step 1: Select the ▪ Step 2: Sum out H to get joint ▪ Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration in Bayes’ Net
o Given unlimited time, inference in BNs is easy B E
𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵 𝑃 𝐸 = 𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵, 𝐸 = 𝑃(𝐴, 𝐵, 𝐸)
A
J M
Inference by Enumeration in Bayes’ Net
o Given unlimited time, inference in BNs is easy
A,B,E
𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵 𝑃 𝐸 = 𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵, 𝐸 = 𝑃(𝐴, 𝐵, 𝐸)
J M
Inference by Enumeration in Bayes’ Net
o Given unlimited time, inference in BNs is easy
B A,B,E E
𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵 𝑃 𝐸 = 𝑃 𝐴 𝐵, 𝐸 𝑃 𝐵, 𝐸 = 𝑃(𝐴, 𝐵, 𝐸)
𝑃 𝐽 𝐴 𝑃 𝑀 𝐴 𝑃 𝐴, 𝐵, 𝐸 A
= 𝑃 𝐽, 𝑀 𝐴 𝑃 𝐴, 𝐵, 𝐸
= 𝑃 𝐽, 𝑀 𝐴, 𝐵, 𝐸 𝑃 𝐴, 𝐵, 𝐸
= 𝑃(𝐽, 𝑀, 𝐴, 𝐵, 𝐸)
J M
Inference by Enumeration in Bayes’ Net
o Given unlimited time, inference in BNs is easy B E
J M
Example: Traffic Domain
o T: Traffic
o L: Late for class! T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
L
+t +l 0.3
+t -l 0.7
-t +l 0.1
-t -l 0.9
Inference by Enumeration: Procedural Outline
o Track objects called factors
o Initial factors are local CPTs (one per node)
o Procedure: Join all factors, then sum out all hidden variables
Operation 1: Join Factors
o First basic operation: joining factors
o Combining factors:
o Just like a database join
o Get all factors over the joining variable
o Build a new factor over the union of the variables
involved
o Example: Join on R
R
+r 0.1 +r +t 0.8 +r +t 0.08
-r 0.9 +r -t 0.2 +r -t 0.02 R,T
-r +t 0.1 -r +t 0.09
T -r -t 0.9 -r -t 0.81
o Computation for each entry: pointwise products
Example: Multiple Joins
Example: Multiple Joins
+r 0.1
R -r 0.9 Join R
+r +t 0.08
Join T
R, T, L
+r -t 0.02
T +r +t 0.8 -r +t 0.09
+r -t 0.2 -r -t 0.81 R, T
-r +t 0.1 +r +t +l 0.024
-r -t 0.9 +r +t -l 0.056
L
L +r -t +l 0.002
+t +l 0.3 +r -t -l 0.018
+t +l 0.3 +t -l 0.7 -r +t +l 0.027
+t -l 0.7 -t +l 0.1 -r +t -l 0.063
-t +l 0.1 -t -l 0.9 -r -t +l 0.081
-t -l 0.9 -r -t -l 0.729
Operation 2: Eliminate
o Second basic operation:
marginalization
o Take a factor and sum out a variable
o Shrinks a factor to a smaller one
o A projection operation
o Example:
+r +t 0.08
+r -t 0.02 +t 0.17
-r +t 0.09 -t 0.83
-r -t 0.81
Multiple Elimination
R, T, L T, L L
+r +t +l 0.024
+r +t -l 0.056
+r -t +l 0.002 Sum Sum
+r -t -l 0.018 out R +t +l 0.051 out T
-r +t +l 0.027 +t -l 0.119 +l 0.134
-r +t -l 0.063 -t +l 0.083 -l 0.866
-r -t +l 0.081 -t -l 0.747
-r -t -l 0.729
Thus Far: Multiple Join, Multiple Eliminate (= Inf by Enumeration)
Recap: Inference by Enumeration
* Works fine with
o General case: ▪ We want: multiple query
o Evidence variables: variables, too
o Query* variable:
o Hidden variables: All variables
▪ Step 1: Select the entries ▪ Step 2: Sum out H to get joint ▪ Step 3: Normalize
consistent with the evidence of Query and evidence
Thus Far: Multiple Join, Multiple Eliminate (= Inference by
Enumeration)
▪ [Step 3: Normalize]
Thus Far: Multiple Join, Multiple Eliminate (= Inference by
Enumeration)
Inference by Enumeration vs. Variable Elimination
o Why is inference by enumeration slow? ▪ Idea: interleave joining and marginalizing!
o You join up the whole joint distribution before ▪ Called “Variable Elimination”
you sum out the hidden variables ▪ Still NP-hard, but usually much faster than
inference by enumeration
Traffic Domain
R
Join on r Join on r
Join on t Eliminate r
Eliminate r Join on t
Eliminate t Eliminate t
Traffic Domain
R
L
(5𝑎) + (5𝑏) 5(𝑎 + 𝑏)
Marginalizing Early (Variable Elimination)
Variable Elimination
Evidence
o If evidence, start with factors that select that evidence
o No evidence uses these initial factors:
Normalize
+r +l 0.026 +l 0.26
+r -l 0.074 -l 0.74
o That ’s it!
General Variable Elimination
o Query:
All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency!
Example
Choose A
Example
Choose E
Finish with B
Normalize
Variable Elimination Example
Variable Elimination Ordering
o For the query P(Xn|y1,…,yn) work through the following two different
orderings as done in previous slide: Z, X1, …, Xn-1 and X1, …, Xn-1, Z.
What is the size of the maximum factor generated for each of the
orderings?
o The elimination ordering can greatly affect the size of the largest
factor.
o E.g., previous slide’s example 2n vs. 2