0% found this document useful (0 votes)
1K views6 pages

DRP Proposal - 210103

This document proposes studying the belief propagation algorithm. It begins with an introduction to belief propagation and graphical models. It then provides more background on the basic belief propagation algorithm, showing that for tree graphs the algorithm computes exact marginal probabilities. It proposes studying belief propagation on simple path and loopy graphs, coding the algorithm, and examining applications like image restoration. It discusses how belief propagation may converge on loopy graphs by considering an "unwrapped" version of the graph.

Uploaded by

api-445507715
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views6 pages

DRP Proposal - 210103

This document proposes studying the belief propagation algorithm. It begins with an introduction to belief propagation and graphical models. It then provides more background on the basic belief propagation algorithm, showing that for tree graphs the algorithm computes exact marginal probabilities. It proposes studying belief propagation on simple path and loopy graphs, coding the algorithm, and examining applications like image restoration. It discusses how belief propagation may converge on loopy graphs by considering an "unwrapped" version of the graph.

Uploaded by

api-445507715
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Applications of Belief Propagation on Loopy Graphs

Proposal submitted to the Applied Mathematics


Directed Reading Program

Jennah Gosciak
January 3, 2021

1 Introduction
The goal of the belief propagation algorithm is to compute the marginal probability of a
random variable, conditional on observed characteristics[8]. This marginal probability corre-
sponds to a node in a graphical model, either a Bayesian network or a Markov Random Field.
A Bayesian network is a directed acyclic graph that represents the conditional dependencies
of each node through arrows. Often the Bayes network tells a story with a clear direction,
i.e. a person’s attitude to health a↵ects their smoking and drinking habits that in turn a↵ect
the amount of tar in their lungs. A Markov Random Field is an undirected cyclic or acyclic
graph that represents conditional dependencies satisfying the following Markov Property for
some node v 2 V = {set of all vertices}:

P (Xv = xv | all the other variables ) = P (Xv = xv | neighbors )

Figure 1: Examples of a directed and undirected graph. [8].


Belief propagation is an algorithm used for inference on graphical models, often with ap-
plications in artificial intelligence and information theory. Graphical models are a useful tool
when analyzing complex probability distributions. They model the conditional dependence
relations among the random variables in a multivariate distribution. Each node in a graph-
ical model represents a random variable. The edges between nodes represent dependence
relationships.
There are no arrows between the nodes in a Markov Random Field (MRF) and direct
dependence relationships can only be inferred by the following: 8j 2 V, (edge between xj
and xi ) ! (xj and xi are dependent). We can infer independence when there are no edges
between any two subsets of vertices. For A, B ✓ V , xj 2 A is independent of xi 2 B only
when A and B are completely disconnected, i.e. there are no edges between any xn 2 A and
xm 2 B
It is computationally difficult to compute marginal probabilities. The workload grows
exponentially with the number of random variables, i.e. unobserved nodes. For a state space
X with N nodes, the brute force calculation of the marginal probabilities is of computational
complexity O(|X|N ) [3]. But with belief propagation, the marginal probabilities, known as
beliefs, only grow linearly.
Belief Propagation (BP) represents an advancement in computational applications of
information theory, computer vision, and artificial intelligence. Problems that seem difficult
or complicated, with many nodes and loops, become simplified when handled with the BP
algorithm. It grows linearly and is more efficient than even Monte Carlo approximation [8].
The BP algorithm was first proposed in 1982 by mathematician Judea Pearl. The paper
outlined a more formal system of computing beliefs and messages based on evidence, which
was previously inefficiently calculated on an ad-hoc basis. The paper, which focuses on
trees, demonstrated that Bayesian inference holds over multi-valued variables. It also showed
that on trees the beliefs computed in BP are exactly the marginal probabilities [5]. More
recent research focuses on undirected, cyclic graphs. Even outside Pearl’s tree and polytree
examples (a polytree is a directed acyclic graph which has an underlying undirected graph
that is a tree), loopy BP does converge often and the approximated distributions are a
relatively good approximation for the marginal[4].
We propose to first study the basic structure of the belief propagation algorithm for a
three node path graph. Then we will extend the belief propagation algorithm to a simple
loopy graph, and analyze the challenges associated with convergence among loopy graphs.
We will focus on coding the belief propagation algorithm in both of these cases: a simple path
graph and a simple loopy graph. The coded product will allow for interactive manipulation
of parameters. Finally, we will consider useful applications of the BP algorithm. Primarily,
we will examine how BP is useful for image restoration and computer vision problems[8].
However, we also want to think of new applications related to the problems facing cities.
For instance, researchers have applied the BP algorithm to real-time prediction of traffic
conditions or for modeling cities from unstructured data points [1][2].

2
2 Background
Belief propagation is an algorithm that allows for efficient approximation of marginal distri-
butions. The output of this algorithm is a belief function that is an approximation of the
marginal distribution. Consider the following equation, in which and are cost functions.
1 Y Y
P (X = x|Y = y) = ( ) ij (xi , xj ) ij (xi , yi ) (1)
Z i,j i

While the relationships between Xi and Xj imply the structure and dependence relation-
ships of a pairwise Markov Random Field, the steps of the algorithm are generalizable to
other graphical models.

Step 1: Compute the messages passed from node j to i, denoted mji (xi ). N (i) = {the
set of all v that share an edge with j}.
X Y
mji (xi ) i (xi ) ij (xi , xj ) mki (xi ) (2)
xi k2N (i)\j

Step 2: For each node, we calculate the belief associated with it. The belief Xi is propor-
tional to the evidence Yi associated with it. We denote this relationship i (xi ). For a belief
at node i:
Y
bi (xi ) = k i (xi ) mji (xi ). (3)
j2N (i)

It is not difficult to show that for a tree graph, the beliefs exactly represent the marginal
distribution of some node xi . As an example, we will prove this result for a path graph with
3 nodes.
Proposition 1. Consider a path graph with three nodes and the corresponding probability
distribution, conditional on observed characteristics.
1 Y Y
P (X|Y ) = ( ) i (xi ) ij (xi , xj )
Z i2V
(i,j)2E

We want to show that 8xi , b(xi ) = P (Xi|Y i).


Proof. To prove this statement, we compute the messages between each node.

X X X
m12 (x2 ) = 1 (x1 ) (x1 , x2 ) m21 (x1 ) = 2 (x2 ) (x2 , x1 ) 3 (x3 ) (x3 , x2 )
x1 x2 x3
X X X
m32 (x2 ) = 3 (x3 ) (x3 , x2 ) m23 (x3 ) = 2 (x2 ) (x2 , x3 ) 1 (x1 ) (x1 , x2 )
x3 x2 x1

In following the BP algorithm, we arrive at:


X X
b1 (x1 ) = k 1 (x1 )m21 (x1 ) = k 1 (x1 ) 2 (x2 ) (x2 , x1 ) 3 (x3 ) (x3 , x2 ) = P (x1 |y1 )
x2 x3

3
Figure 2: Here is an example of an unwrapped network [7].
X X
b2 (x2 ) = k 2 (x2 )m32 (x2 )m12 (x2 ) =k 2 (x2 ) 3 (x3 ) (x3 , x2 ) 1 (x1 ) (x1 , x2 ) = P (x2 |y2 )
x3 x1
X X
b3 (x3 ) = k 3 (x3 )m23 (x3 ) = k 3 (x3 ) 2 (x2 ) (x2 , x3 ) 1 (x1 ) (x1 , x2 ) = P (x3 |y3 ).
x2 x1

By the definition of marginalization, the beliefs are equivalent to the marginal probabilities.
This is true for any tree graph, i.e. a graph where any two vertices are only connected by
one path [8]. For non-tree graphs, such as loopy or cyclic graphs where at least one pair
of vertices is connected by multiple paths, the BP algorithm does not compute the exact
marginal probabilities. It is possible to run the BP algorithm on a loopy graph, but it does
not always converge. One outcome is that the messages circulate indefinitely. However,
there are also cases of loopy graphs where the BP algorithm works well. It is not always
clear why BP converges. Empirical evidence shoes that when BP converges, the beliefs are
a good approximation for the marginals [4].
One way to understand why BP converges on loopy graphs is to consider an unwrapped
graph. Let us look at the example G = C3 . Without loss of generality, let the root = B (see
Fig. 2). At step t = 0, the graph is 3 nodes A B C. At step t = 1, the graph is 5 nodes
C 00 A B C A0 . At step t = 2, the graph is 7 nodes B 00 C 00 A B C A0 B 0 .
Definition 1. Let 1 ⌧ be the belief of the root at step ⌧ and b1 t is the belief of Xi 2 C3 .
After ⌧ iterations of BP, b1 t = 1 ⌧ .
Since the unwrapped graph is acyclic and singly-connected, then BP on a loopy graph is
equivalent to BP on the unwrapped graph. If the BP algorithm converges after N iterations,

4
then the beliefs are the marginal probabilities of the unwrapped graph of length N. In
comparing the probability distribution of the unwrapped graph to the original problem, we
find that the algorithm will give us the most likely sequence of information, but that the
marginal distributions for each node have no relationship to the marginal distributions of
G = C3 [7]. For example, consider the cyclic graph with joint distribution
1 1
P (X|Y ) = exp ( E(X)).
z T
P
E(X) is an energy function equal to gi (xi )+hij (xi , xj ). If we want to maximize the proba-
bility distribution to find the most likely sequence of states for random variables Xi , then we
want to minimize E(X). Computation over the unwrapped graph as opposed to the cyclic
graph changes the value of T , but if we’re interested in the sequences that maximizes the
joint distribution, the new temperature T 0 doesn’t change the result. This scaling, however,
does not translate to finding the marginal distribution. The beliefs for the unwrapped graph
give us the marginal distribution for the unwrapped graph, but not for the original cyclic
graph.
Convergence occurs when the center nodes in an unwrapped graph are independent of the
nodes on the boundary. After a finite number of iterations for some nodes n, the additional
iterations have no change on the probabilities of the original center nodes [7].

3 Proposed Methodology
We will begin by learning how to implement the BP algorithm for simple graphs, such as a
path graph. We will first program the BP algorithm for the 3 node path graph, and then we
will program the BP algorithm for a 3 node cyclic graph, to experiment with the optimal
parameters for convergence [8]. We will explore damping, which has been used to improve
BP when it doesn’t converge or converges slowly. Damping turns the messages sent between
nodes into a weighted average between old estimates and new estimates[6].
A significant amount of time will be spent on applications of BP–specifically in image
restoration. Once we’ve coded the BP algorithm for simple graphs, we will test it out on
an example, such as a blurry image. We can best model image restoration using a pairwise
Markov Random Field (MRF). In the structure of a grid, we assign each pixel or group of
pixels a position i corresponding to a random variable Xi . Each Xi has evidence Yi that is
observed. We can model unobserved and observed qualities with i (xi ), with the assumption
that xi should be similar to yi [8]. It is also reasonable to expect that neighboring pixels will
be similar. This gives us the function (xi , xj ), which represents the dependence between
xi and xj . If we return to (1), the joint probability of the image conditional on observed
qualities is
1 Y Y
P (X = x|Y = y) = ( ) ij (xi , xj ) i (xi , yi ).
Z ij
(i)

Conventionally, i and ij are energy functions where i (xi ) = e g(xi ,yi ) and ij =
h(xi ,xj
e ). Since we want to maximize the probability distribution, we want to minimize

5
g(xi ) and h(xi , xj ). To minimize g(xi , yi ) and h(xi , xj ), it is useful to represent them as
cost functions, most commonly with the l1 or the l2 norm. For our programming purposes,
g(xi , yi ) = ||xi yi ||1 and h(xi , xj ) = ||xi xj ||1 . We will experiment with di↵erent cost
functions and norms, to see if this changes the convergence rate of BP on the pairwise MRF.

References
[1] C. Furtlehner, J.-M. Lasgouttes, and A. de La Fortelle, A belief propagation
approach to traffic prediction using probe vehicles, in 2007 IEEE Intelligent Transporta-
tion Systems Conference, IEEE, 2007, pp. 1022–1027.

[2] F. Lafarge and C. Mallet, Building large urban environments from unstructured
point data, in 2011 International Conference on Computer Vision, IEEE, 2011, pp. 1068–
1075.

[3] M. Mezard and A. Montanari, Information, physics, and computation, Oxford Uni-
versity Press, 2009.

[4] K. P. Murphy, Y. Weiss, and M. I. Jordan, Loopy belief propagation for ap-
proximate inference: An empirical study, in Proceedings of the Fifteenth conference on
Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 1999, pp. 467–
475.

[5] J. Pearl, Reverend Bayes on inference engines: A distributed hierarchical approach,


Cognitive Systems Laboratory, School of Engineering and Applied Science , 1982.

[6] P. Som and A. Chockalingam, Damped belief propagation based near-optimal equal-
ization of severely delay-spread uwb mimo-isi channels, in 2010 IEEE International Con-
ference on Communications, IEEE, 2010, pp. 1–5.

[7] Y. Weiss, Belief propagation and revision in networks with loops, (1997).

[8] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding belief propagation


and its generalizations, Exploring artificial intelligence in the new millennium, 8 (2003),
pp. 236–239.

You might also like