
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2019.2903318, IEEE
Transactions on Image Processing
IEEE TRANSACTIONS ON , VOL. **, NO. *, JANUARY 2019 2
have achieved high accuracy in non-occluded regions. How-
ever, the evaluation error that contain occluded regions are
almost doubled for most error metrics. Therefore, accurate
estimation near occluded regions is still a challenging problem.
In addition, all of the top-ten methods run more than 120s,
which makes them hard to be applied in computationally
intensive applications. To tackle these problems, we developed
a stereo matching method that has a higher accuracy and lower
computational cost with occlusion handling.
The proposed method directly refines the WTA disparity
map computed from the raw matching cost with the guid-
ance of the color reference image. The reference image is
segmented into superpixels and the proposed method oper-
ates on superpixel-level, which is of high efficiency. First, a
front-parallel disparity map is obtained by estimating mean
disparities of superpixels. Then a slanted-surfaces disparity
map is refined by assigning each superpixel a plane. The front-
parallel to slanted-surfaces framework is achieved by a two-
layer optimization. In the global optimization layer, the front-
parallel disparity map is estimated by MRF optimization. In
the local optimization layer, the slanted-surfaces disparity map
is refined by the RANSAC plane fitting and the probability-
based disparity plane refinement. The two layers are connected
by two constraints: the slanted-surfaces disparity map cannot
deviate far from the front-parallel disparity map; the two
disparity map share the same depth discontinuities. The first
constraint helps remove outliers and deal with degeneracy in
the RANSAC plane fitting. The second constraint is embedded
in a 3D neighborhood system and contributes for occlusion
handling. The proposed method is evaluated on the Middlebury
2014 and the KITTI 2015 dataset and compared with the state-
of-the-art disparity refinement methods. Experimental results
demonstrate its accuracy, efficiency, and robustness.
In summary, the main contributions of this paper are: (1) A
pure disparity refinement method that directly refines the WTA
disparity map with the guidance of the color reference image
and achieves the state-of-the-art performance with occlusion
handling. (2) A 1D label MRF formulation with a novel data
term that is based on disparity distributions. And a theoretical
analysis that proves the 1D label MRF cannot model the
highly slanted surfaces. (3) A front-parallel to slanted-surfaces
framework with a Bayesian inference and Bayesian prediction
based disparity plane refinement that makes the 1D label
approach robust to slanted surfaces.
II. RELATED WORK
This section mainly focuses on MRF stereo methods and
segment-based stereo methods which are more related to our
work. We refer readers to [2] and [17] for more comprehensive
reviews.
A. MRF Stereo Methods
Markov Random Fields (MRF) stereo methods formalize
stereo matching as a label problem and the goal is to optimize
a global energy function which measures the quality of the
labeling.
Conventional MRF stereo methods [18], [19] assign each
pixel a 1D discrete disparity label. Optimizations such as
graph cuts [20], [21], belief propagation [22], [23] and TRW
[24] can be used to minimize the energy function. Graph cuts
based expansion moves and swap moves [18] are shown to
have good performance. These moves can update labels of all
pixels simultaneously and therefore the optimization is hard
to be trapped by the local minima. The drawback of 1D label
stereo methods is modeling the highly slanted surfaces. 3D
label stereo methods [25]–[27] are proposed to model the
scene more accurately. These methods can not only model
highly slanted surface but also achieve second order smooth-
ness constraints [25], [28], [29]. Therefore, they usually have
better accuracy than 1D label methods. However, the global
optimization on pixel level complicates the computation.
Our method performs 1D label MRF inference on superpixel
level. Since the number of superpixels is much less than
that of pixels in an image, the inference is much faster.
Unlike previous work, ours takes the discrete mean disparity
of superpixel as the label. Even in highly slanted surfaces, the
mean disparities can also be correctly estimated.
B. Segment-based Stereo Methods
Segment-based stereo methods [30]–[32] assume the scene
structure to be piece-wise planar and the estimation of dispar-
ity map transforms into assigning a 3D disparity plane to each
segment. First, these methods segment the reference image
into regions with homogeneous color. Then an initial disparity
map is computed by a known stereo matching method and
candidate disparity planes are generated by plane fitting to the
disparity map. Finally, a global optimization, e.g., graph cuts
and belief propagation, is utilized to assign each segment an
optimal plane label. The final results rely on the quality of
the segmentation. To relax the segment constraints, several
improvements have been proposed. Over-segmentation is a
common solution to ensure that depth discontinuities only
occur in the boundary of segments. Bleyer et al. [33] proposed
a pixel-wise MRF formulation that incorporated soft segment
constraints. Joint segmentation and disparity computation [34],
[35] can improve the segmentation quality during optimiza-
tion. But these methods have a common drawback. The final
plane label of a segment is assigned from the candidate label
set. The finite set may not contain the correct label of the
segment, in such case the estimation of the disparity plane
is false and the error can not be corrected. In the work of
Wang and Zheng [36], the total energy function is optimized
by cooperative optimization and is decomposed into the sum
of sub-target energy functionals which are locally optimized.
The optimization process is done iteratively and false plane
label can be corrected by the local optimization. Therefore,
their method is robust to the initial plane-fitting result.
In contrast to previous works which optimize a global
energy function, our method refines the disparity plane by
a local optimization and constraints smoothness implicitly.
Moreover, the disparity refinement method demands no cor-
relation information between the stereo image pairs and thus
can be processed on a single view.