Probabilistic Graphical Models
Probabilistic Graphical Models
Lecture 2
Milos Hauskrecht
[email protected]
5329 Sennott Square, x4-8845
http://www.cs.pitt.edu/~milos/courses/cs3710/
1
Uncertainty
To make diagnostic inference possible we need to represent
knowledge (axioms) that relate symptoms and diagnosis
Pneumonia
Pneumonia
2
Modeling uncertainty with probabilities
Random variables:
Binary Pneumonia is either True, False
Random variable Values
Multi-valued Pain is one of {Nopain, Mild, Moderate, Severe}
Random variable Values
Continuous
HeartRate is a value in < 0 ; 250 >
Random variable Values
Probabilities
Quantifies how likely is the outcome of a random variable
Unconditional probabilities (prior probabilities)
3
Probability distribution
Defines probability for all possible value assignments
Example 1:
Pneumonia P(Pneumonia)
P( Pneumonia = True) = 0.001
True 0.001
P( Pneumonia = False) = 0.999 False 0.999
P( Pneumonia = True) + P( Pneumonia = False) = 1
Probabilities sum to 1 !!!
Example 2:
P(WBCcount = high) = 0.005 WBCcount P(WBCcount)
P(WBCcount = normal) = 0.993 high 0.005
normal 0.993
P(WBCcount = high) = 0.002 low 0.002
WBCcount
high normal low
4
Joint probabilities
Marginalization
reduces the dimension of the joint distribution
Sums variables out
P ( pneumonia , WBCcount ) 2 3 matrix
P(Pneumonia)
WBCcount
high normal low
P(WBCcount)
Marginalization (here summing of columns or rows)
CS 3710 Probabilistic Graphical Models
5
Conditional probabilities
Conditional probability distribution
Defines probabilities of outcomes of a variable, given a fixed
assignment to some other variable values
P( Pneumonia = true | WBCcount = high)
Conditional probabilities
Conditional probability
Is defined in terms of the joint probability:
P( A, B)
P( A | B) = s.t. P( B) 0
P( B)
Example:
P( pneumonia = true | WBCcount = high) =
P( pneumonia= true,WBCcount = high)
P(WBCcount = high)
6
Conditional probabilities
Conditional probability distribution.
P( A, B)
P( A | B) = s.t. P( B) 0
P( B)
Product rule. Joint probability can be expressed in terms of
conditional probabilities
P ( A, B) = P( A | B) P( B)
Bayes rule
Conditional probability.
P( A, B) P( A, B) = P( B | A) P( A)
P( A | B) =
P( B)
Bayes rule: P( B | A) P( A)
P( A | B) =
P( B)
When is it useful?
When we are interested in computing the diagnostic query
from the causal probability
P(effect | cause) P(cause)
P(cause | effect ) =
P(effect )
Reason: It is often easier to assess causal probability
E.g. Probability of pneumonia causing fever
vs. probability of pneumonia given fever
7
Bayes rule
Assume a variable A with multiple values a1 , a2 ,K ak
Bayes rule can be rewritten as:
P(B = b | A = a j )P( A = a j )
P ( A = a j | B = b) =
P ( B = b)
P(B = b | A = a j )P( A = a j )
=
k
i =1
P(B = b | A = a j )P( A = a j )
Probabilistic inference
Various inference tasks:
P( Pneumonia | Fever = T )
Prediction task. (from cause to effect)
P( Fever | Pneumonia = T )
Other probabilistic queries (queries on joint distributions).
P (Fever )
P ( Fever , ChestPain )
CS 3710 Probabilistic Graphical Models
8
Inference
Any query can be computed from the full joint distribution !!!
Joint over a subset of variables is obtained through
marginalization
P( A = a, C = c) = P( A = a, B = bi , C = c, D = d j )
i j
P( A = a, B = b , C = c, D = d )
i j
i j
Inference.
Any query can be computed from the full joint distribution !!!
Any joint probability can be expressed as a product of
conditionals via the chain rule.
P( X 1 , X 2 ,K X n ) = P( X n | X 1, K X n 1 ) P( X 1, K X n 1 )
= P( X n | X 1, K X n 1 ) P( X n 1 | X 1, K X n 2 ) P( X 1, K X n 2 )
= i =1 P( X i | X 1, K X i 1 )
n
9
Modeling uncertainty with probabilities
Defining the full joint distribution makes it possible to
represent and reason with uncertainty in a uniform way
We are able to handle an arbitrary inference problem
Problems:
Space complexity. To store a full joint distribution we
need to remember O(d n ) numbers.
n number of random variables, d number of values
Inference (time) complexity. To compute some queries
n
requires O(d. ) steps.
Acquisition problem. Who is going to define all of the
probability entries?
10
Graphical models
Aim: alleviate the representational and computational
bottlenecks
Idea: Take advantage of the structure, in particular,
independences and conditional independences that hold
among random variables
P ( A, B | C ) = P ( A | C ) P ( B | C )
P( A | C , B) = P( A | C )
11
Alarm system example.
Assume your house has an alarm system against burglary.
You live in the seismically active area and the alarm system
can get occasionally set off by an earthquake. You have two
neighbors, Mary and John, who do not know each other. If
they hear the alarm they call you, but this is not guaranteed.
We want to represent the probability distribution of events:
Burglary, Earthquake, Alarm, Mary calls and John calls
Alarm
JohnCalls MaryCalls
Alarm
JohnCalls MaryCalls
12
Bayesian belief network.
2. Local conditional distributions
relate variables and their parents
Alarm P(A|B,E)
P(J|A) P(M|A)
JohnCalls MaryCalls
P(B) P(E)
T F T F
Burglary 0.001 0.999 Earthquake 0.002 0.998
P(A|B,E)
B E T F
T T 0.95 0.05
Alarm T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
P(J|A) P(M|A)
A T F A T F
JohnCalls T 0.90 0.1 MaryCalls T 0.7 0.3
F 0.05 0.95 F 0.01 0.99
13
Bayesian belief networks (general)
Two components: B = ( S , S ) B E
Directed acyclic graph
Nodes correspond to random variables A
(Missing) links encode independences
J M
Parameters
Local conditional probability distributions
for every variable-parent configuration P(A|B,E)
B E T F
P ( X i | pa ( X i ))
T T 0.95 0.05
Where: T F 0.94 0.06
F T 0.29 0.71
pa ( X i ) - stand for parents of Xi F F 0.001 0.999
P ( X 1 , X 2 ,.., X n ) = P( X
i =1,.. n
i | pa ( X i ))
B E
Example:
Assume the following assignment A
of values to random variables
B = T, E = T, A = T, J = T, M = F J M
14
Bayesian belief networks (BBNs)
Bayesian belief networks
Represent the full joint distribution over the variables more
compactly using the product of local conditionals.
But how did we get to local parameterizations?
Answer:
Graphical structure encodes conditional and marginal
independences among random variables
A and B are independent P ( A , B ) = P ( A ) P ( B )
A and B are conditionally independent given C
P( A | C , B) = P( A | C )
P ( A, B | C ) = P ( A | C ) P ( B | C )
The graph structure implies the decomposition !!!
CS 3710 Probabilistic Graphical Models
Independences in BBNs
3 basic independence structures:
1. Burglary
2. 3.
Burglary Earthquake
Alarm
Alarm
JohnCalls
15
Independences in BBNs
1. Burglary
2. 3.
Burglary Earthquake
Alarm
Alarm
Alarm JohnCalls MaryCalls
JohnCalls
Independences in BBNs
1. Burglary
2. 3.
Burglary Earthquake
Alarm
Alarm
JohnCalls
16
Independences in BBNs
1. Burglary
2. 3.
Burglary Earthquake
Alarm
Alarm
Alarm JohnCalls MaryCalls
JohnCalls
Independences in BBN
BBN distribution models many conditional independence
relations among distant variables and sets of variables
These are defined in terms of the graphical criterion called d-
separation
D-separation and independence
Let X,Y and Z be three sets of nodes
If X and Y are d-separated by Z, then X and Y are
conditionally independent given Z
D-separation :
A is d-separated from B given C if every undirected path
between them is blocked with C
Path blocking
3 cases that expand on three basic independence structures
17
Undirected path blocking
A is d-separated from B given C if every undirected path
between them is blocked
Z in C
X in A Y in B
X Y
Z in C
X in A Y in B
18
Undirected path blocking
A is d-separated from B given C if every undirected path
between them is blocked
Independences in BBNs
Burglary Earthquake
RadioReport
Alarm
JohnCalls MaryCalls
19
Independences in BBNs
Burglary Earthquake
RadioReport
Alarm
JohnCalls MaryCalls
Independences in BBNs
Burglary Earthquake
RadioReport
Alarm
JohnCalls MaryCalls
20
Independences in BBNs
Burglary Earthquake
RadioReport
Alarm
JohnCalls MaryCalls
Independences in BBNs
Burglary Earthquake
RadioReport
Alarm
JohnCalls MaryCalls
21
Bayesian belief networks (BBNs)
Bayesian belief networks
Represents the full joint distribution over the variables more
compactly using the product of local conditionals.
So how did we get to local parameterizations?
P ( X 1 , X 2 ,.., X n ) = P( X
i =1,.. n
i | pa ( X i ))
P(B = T, E = T, A = T, J = T, M = F) = J M
22
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A
P(B = T, E = T, A = T, J = T, M = F) = J M
= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(B = T, E = T, A = T, J = T, M = F) = J M
= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
23
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A
P(B = T, E = T, A = T, J = T, M = F) = J M
= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)
P(B = T, E = T, A = T, J = T, M = F) = J M
= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)
P(B = T)P(E = T)
24
Full joint distribution in BBNs
B E
Rewrite the full joint probability using the
product rule: A
P(B = T, E = T, A = T, J = T, M = F) = J M
= P(J = T | B = T, E = T, A = T, M = F)P(B = T, E = T, A = T, M = F)
= P(J = T | A = T)P(B = T, E = T, A = T, M = F)
P(M = F | B = T, E = T, A = T)P(B = T, E = T, A = T)
P(M = F | A = T)P(B = T, E = T, A = T)
P( A = T | B = T, E = T)P(B = T, E = T)
P(B = T)P(E = T)
= P(J = T | A = T)P(M = F | A = T)P(A = T | B = T, E = T)P(B = T)P(E = T)
CS 3710 Probabilistic Graphical Models
25