Lecture Slides for
Graphical Models
Aka Bayesian networks, probabilistic networks
Nodes are hypotheses (random vars) and the probabilities
corresponds to our belief in the truth of the hypothesis
Arcs are direct influences between hypotheses
The structure is represented as a directed acyclic graph
(DAG)
The parameters are the conditional probabilities in the
arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
Causes and Bayes’ Rule
Diagnostic inference:
diagnostic Knowing that the grass is wet,
what is the probability that rain is
causal the cause?
P W |R P R
P R |W
P W
P W |R P R
P W |R P R P W |~ R P ~ R
0.9 0.4
0.75
0.9 0.4 0.2 0.6
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Conditional Independence
X and Y are independent if
P(X,Y)=P(X)P(Y)
X and Y are conditionally independent given Z if
P(X,Y|Z)=P(X|Z)P(Y|Z)
or
P(X|Y,Z)=P(X|Z)
Three canonical cases: Head-to-tail, Tail-to-tail, head-to-
head
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5
Case 1: Head-to-Head
P(X,Y,Z)=P(X)P(Y|X)P(Z|Y)
P(W|C)=P(W|R)P(R|C)+P(W|~R)P(~R|C)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6
Case 2: Tail-to-Tail
P(X,Y,Z)=P(X)P(Y|X)P(Z|X)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Case 3: Head-to-Head
P(X,Y,Z)=P(X)P(Y)P(Z|X,Y)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8
Causal vs Diagnostic Inference
Causal inference: If the
sprinkler is on, what is the
probability that the grass is wet?
P(W|S) = P(W|R,S) P(R|S) +
P(W|~R,S) P(~R|S)
= P(W|R,S) P(R) +
P(W|~R,S) P(~R)
= 0.95 0.4 + 0.9 0.6 = 0.92
Diagnostic inference: If the grass is wet, what is the probability
that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S)
P(S|R,W) = 0.21 Explaining away: Knowing that it has rained
decreases the probability that the sprinkler is on.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Causes
Causal inference:
P(W|C) = P(W|R,S) P(R,S|C) +
P(W|~R,S) P(~R,S|C) +
P(W|R,~S) P(R,~S|C) +
P(W|~R,~S) P(~R,~S|C)
and use the fact that
P(R,S|C) = P(R|C) P(S|C)
Diagnostic: P(C|W ) = ?
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
Classification
Bayes’ rule inverts the arc:
diagnostic
px |C P C
P C | x
P (C | x ) px
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
Naive Bayes’ Classifier
Given C, xj are independent:
p(x|C) = p(x1|C) p(x2|C) ... p(xd|C)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Linear Regression
p(r '| x' , r , X) p(r '| x' , w)p(w | X, r)dw
p(r | X, w)p(w)
p(r '| x' , w) dw
p(r)
p(r '| x' , w) p(r t | xt , w)p(w)dw
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Junction Trees
If X does not separate E+ and E-, we convert it into a
junction tree and then apply the polytree algorithm
Tree of moralized,
clique nodes
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
Undirected Graphs: Markov
Random Fields
In a Markov random field, dependencies are symmetric,
for example, pixels in an image
In an undirected graph, A and B are independent if
removing C makes them unconnected.
Potential function yc(Xc) shows how favorable is the
particular configuration X over the clique C
The joint is defined in terms of the clique potentials
1
p( X ) y C ( X C ) where normalizer Z y C ( X C )
Z C X C
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15
Factor Graphs
Define new factor nodes and write the joint in terms of
them
1
p( X )
Z S
fS ( X S )
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16
Learning a Graphical Model
Learning the conditional probabilities, either as tables (for
discrete case with small number of parents), or as
parametric functions
Learning the structure of the graph: Doing a state-space
search over a score function that uses both goodness of
fit to data and some measure of complexity
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
Influence Diagrams
decision node
chance node utility node
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18