DRP Proposal - 210103
DRP Proposal - 210103
Jennah Gosciak
January 3, 2021
1 Introduction
The goal of the belief propagation algorithm is to compute the marginal probability of a
random variable, conditional on observed characteristics[8]. This marginal probability corre-
sponds to a node in a graphical model, either a Bayesian network or a Markov Random Field.
A Bayesian network is a directed acyclic graph that represents the conditional dependencies
of each node through arrows. Often the Bayes network tells a story with a clear direction,
i.e. a person’s attitude to health a↵ects their smoking and drinking habits that in turn a↵ect
the amount of tar in their lungs. A Markov Random Field is an undirected cyclic or acyclic
graph that represents conditional dependencies satisfying the following Markov Property for
some node v 2 V = {set of all vertices}:
2
2 Background
Belief propagation is an algorithm that allows for efficient approximation of marginal distri-
butions. The output of this algorithm is a belief function that is an approximation of the
marginal distribution. Consider the following equation, in which and are cost functions.
1 Y Y
P (X = x|Y = y) = ( ) ij (xi , xj ) ij (xi , yi ) (1)
Z i,j i
While the relationships between Xi and Xj imply the structure and dependence relation-
ships of a pairwise Markov Random Field, the steps of the algorithm are generalizable to
other graphical models.
Step 1: Compute the messages passed from node j to i, denoted mji (xi ). N (i) = {the
set of all v that share an edge with j}.
X Y
mji (xi ) i (xi ) ij (xi , xj ) mki (xi ) (2)
xi k2N (i)\j
Step 2: For each node, we calculate the belief associated with it. The belief Xi is propor-
tional to the evidence Yi associated with it. We denote this relationship i (xi ). For a belief
at node i:
Y
bi (xi ) = k i (xi ) mji (xi ). (3)
j2N (i)
It is not difficult to show that for a tree graph, the beliefs exactly represent the marginal
distribution of some node xi . As an example, we will prove this result for a path graph with
3 nodes.
Proposition 1. Consider a path graph with three nodes and the corresponding probability
distribution, conditional on observed characteristics.
1 Y Y
P (X|Y ) = ( ) i (xi ) ij (xi , xj )
Z i2V
(i,j)2E
X X X
m12 (x2 ) = 1 (x1 ) (x1 , x2 ) m21 (x1 ) = 2 (x2 ) (x2 , x1 ) 3 (x3 ) (x3 , x2 )
x1 x2 x3
X X X
m32 (x2 ) = 3 (x3 ) (x3 , x2 ) m23 (x3 ) = 2 (x2 ) (x2 , x3 ) 1 (x1 ) (x1 , x2 )
x3 x2 x1
3
Figure 2: Here is an example of an unwrapped network [7].
X X
b2 (x2 ) = k 2 (x2 )m32 (x2 )m12 (x2 ) =k 2 (x2 ) 3 (x3 ) (x3 , x2 ) 1 (x1 ) (x1 , x2 ) = P (x2 |y2 )
x3 x1
X X
b3 (x3 ) = k 3 (x3 )m23 (x3 ) = k 3 (x3 ) 2 (x2 ) (x2 , x3 ) 1 (x1 ) (x1 , x2 ) = P (x3 |y3 ).
x2 x1
By the definition of marginalization, the beliefs are equivalent to the marginal probabilities.
This is true for any tree graph, i.e. a graph where any two vertices are only connected by
one path [8]. For non-tree graphs, such as loopy or cyclic graphs where at least one pair
of vertices is connected by multiple paths, the BP algorithm does not compute the exact
marginal probabilities. It is possible to run the BP algorithm on a loopy graph, but it does
not always converge. One outcome is that the messages circulate indefinitely. However,
there are also cases of loopy graphs where the BP algorithm works well. It is not always
clear why BP converges. Empirical evidence shoes that when BP converges, the beliefs are
a good approximation for the marginals [4].
One way to understand why BP converges on loopy graphs is to consider an unwrapped
graph. Let us look at the example G = C3 . Without loss of generality, let the root = B (see
Fig. 2). At step t = 0, the graph is 3 nodes A B C. At step t = 1, the graph is 5 nodes
C 00 A B C A0 . At step t = 2, the graph is 7 nodes B 00 C 00 A B C A0 B 0 .
Definition 1. Let 1 ⌧ be the belief of the root at step ⌧ and b1 t is the belief of Xi 2 C3 .
After ⌧ iterations of BP, b1 t = 1 ⌧ .
Since the unwrapped graph is acyclic and singly-connected, then BP on a loopy graph is
equivalent to BP on the unwrapped graph. If the BP algorithm converges after N iterations,
4
then the beliefs are the marginal probabilities of the unwrapped graph of length N. In
comparing the probability distribution of the unwrapped graph to the original problem, we
find that the algorithm will give us the most likely sequence of information, but that the
marginal distributions for each node have no relationship to the marginal distributions of
G = C3 [7]. For example, consider the cyclic graph with joint distribution
1 1
P (X|Y ) = exp ( E(X)).
z T
P
E(X) is an energy function equal to gi (xi )+hij (xi , xj ). If we want to maximize the proba-
bility distribution to find the most likely sequence of states for random variables Xi , then we
want to minimize E(X). Computation over the unwrapped graph as opposed to the cyclic
graph changes the value of T , but if we’re interested in the sequences that maximizes the
joint distribution, the new temperature T 0 doesn’t change the result. This scaling, however,
does not translate to finding the marginal distribution. The beliefs for the unwrapped graph
give us the marginal distribution for the unwrapped graph, but not for the original cyclic
graph.
Convergence occurs when the center nodes in an unwrapped graph are independent of the
nodes on the boundary. After a finite number of iterations for some nodes n, the additional
iterations have no change on the probabilities of the original center nodes [7].
3 Proposed Methodology
We will begin by learning how to implement the BP algorithm for simple graphs, such as a
path graph. We will first program the BP algorithm for the 3 node path graph, and then we
will program the BP algorithm for a 3 node cyclic graph, to experiment with the optimal
parameters for convergence [8]. We will explore damping, which has been used to improve
BP when it doesn’t converge or converges slowly. Damping turns the messages sent between
nodes into a weighted average between old estimates and new estimates[6].
A significant amount of time will be spent on applications of BP–specifically in image
restoration. Once we’ve coded the BP algorithm for simple graphs, we will test it out on
an example, such as a blurry image. We can best model image restoration using a pairwise
Markov Random Field (MRF). In the structure of a grid, we assign each pixel or group of
pixels a position i corresponding to a random variable Xi . Each Xi has evidence Yi that is
observed. We can model unobserved and observed qualities with i (xi ), with the assumption
that xi should be similar to yi [8]. It is also reasonable to expect that neighboring pixels will
be similar. This gives us the function (xi , xj ), which represents the dependence between
xi and xj . If we return to (1), the joint probability of the image conditional on observed
qualities is
1 Y Y
P (X = x|Y = y) = ( ) ij (xi , xj ) i (xi , yi ).
Z ij
(i)
Conventionally, i and ij are energy functions where i (xi ) = e g(xi ,yi ) and ij =
h(xi ,xj
e ). Since we want to maximize the probability distribution, we want to minimize
5
g(xi ) and h(xi , xj ). To minimize g(xi , yi ) and h(xi , xj ), it is useful to represent them as
cost functions, most commonly with the l1 or the l2 norm. For our programming purposes,
g(xi , yi ) = ||xi yi ||1 and h(xi , xj ) = ||xi xj ||1 . We will experiment with di↵erent cost
functions and norms, to see if this changes the convergence rate of BP on the pairwise MRF.
References
[1] C. Furtlehner, J.-M. Lasgouttes, and A. de La Fortelle, A belief propagation
approach to traffic prediction using probe vehicles, in 2007 IEEE Intelligent Transporta-
tion Systems Conference, IEEE, 2007, pp. 1022–1027.
[2] F. Lafarge and C. Mallet, Building large urban environments from unstructured
point data, in 2011 International Conference on Computer Vision, IEEE, 2011, pp. 1068–
1075.
[3] M. Mezard and A. Montanari, Information, physics, and computation, Oxford Uni-
versity Press, 2009.
[4] K. P. Murphy, Y. Weiss, and M. I. Jordan, Loopy belief propagation for ap-
proximate inference: An empirical study, in Proceedings of the Fifteenth conference on
Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 1999, pp. 467–
475.
[6] P. Som and A. Chockalingam, Damped belief propagation based near-optimal equal-
ization of severely delay-spread uwb mimo-isi channels, in 2010 IEEE International Con-
ference on Communications, IEEE, 2010, pp. 1–5.
[7] Y. Weiss, Belief propagation and revision in networks with loops, (1997).