
Efficient Inference in Fully Connected CRFs with
Gaussian Edge Potentials
Philipp Kr
¨
ahenb
¨
uhl
Computer Science Department
Stanford University
Vladlen Koltun
Computer Science Department
Stanford University
Abstract
Most state-of-the-art techniques for multi-class image segmentation and labeling
use conditional random fields defined over pixels or image regions. While region-
level models often feature dense pairwise connectivity, pixel-level models are con-
siderably larger and have only permitted sparse graph structures. In this paper, we
consider fully connected CRF models defined on the complete set of pixels in an
image. The resulting graphs have billions of edges, making traditional inference
algorithms impractical. Our main contribution is a highly efficient approximate
inference algorithm for fully connected CRF models in which the pairwise edge
potentials are defined by a linear combination of Gaussian kernels. Our experi-
ments demonstrate that dense connectivity at the pixel level substantially improves
segmentation and labeling accuracy.
1 Introduction
Multi-class image segmentation and labeling is one of the most challenging and actively studied
problems in computer vision. The goal is to label every pixel in the image with one of several prede-
termined object categories, thus concurrently performing recognition and segmentation of multiple
object classes. A common approach is to pose this problem as maximum a posteriori (MAP) infer-
ence in a conditional random field (CRF) defined over pixels or image patches [8, 12, 18, 19, 9].
The CRF potentials incorporate smoothness terms that maximize label agreement between similar
pixels, and can integrate more elaborate terms that model contextual relationships between object
classes.
Basic CRF models are composed of unary potentials on individual pixels or image patches and pair-
wise potentials on neighboring pixels or patches [19, 23, 7, 5]. The resulting adjacency CRF struc-
ture is limited in its ability to model long-range connections within the image and generally results
in excessive smoothing of object boundaries. In order to improve segmentation and labeling accu-
racy, researchers have expanded the basic CRF framework to incorporate hierarchical connectivity
and higher-order potentials defined on image regions [8, 12, 9, 13]. However, the accuracy of these
approaches is necessarily restricted by the accuracy of unsupervised image segmentation, which is
used to compute the regions on which the model operates. This limits the ability of region-based
approaches to produce accurate label assignments around complex object boundaries, although sig-
nificant progress has been made [9, 13, 14].
In this paper, we explore a different model structure for accurate semantic segmentation and labeling.
We use a fully connected CRF that establishes pairwise potentials on all pairs of pixels in the image.
Fully connected CRFs have been used for semantic image labeling in the past [18, 22, 6, 17], but the
complexity of inference in fully connected models has restricted their application to sets of hundreds
of image regions or fewer. The segmentation accuracy achieved by these approaches is again limited
by the unsupervised segmentation that produces the regions. In contrast, our model connects all
1