基于分段的用于立体匹配的遮挡处理的差异细化

需积分: 50 91 浏览量 2019-06-14 09:25:36 上传评论收藏 29.38MB PDF 举报

根据提供的文件信息，可以提炼出以下知识点： ### 标题知识点： - **立体匹配的遮挡处理**：立体匹配（Stereo Matching）是三维重建中的关键步骤，它通过匹配两张经过校正的彩色图像中的对象投影来计算视差（Disparity）。遮挡（Occlusion）处理是立体匹配中的一项重要技术，用于解决前景与背景遮挡的问题。 - **差异细化**：差异细化（Disparity Refinement）是指对已有的视差图进行优化以提高精确度的过程。在本文中，提到的是一种基于分段的差异细化方法。 ### 描述知识点： - **基于分段的差异细化方法**：该方法的核心在于直接对胜者全得（Winner-Take-All，WTA）视差图进行细化，通过探索其统计显著性来提高视差的精度。胜者全得视差图是一种直接从立体图像对中获得的视差图，但可能存在误差。 - **改进的随机样本共识（RANSAC）**：为了对超像素（Superpixel）拟合一个视差平面，本文采用了改进的随机样本共识（Random Sample Consensus，RANSAC）方法。RANSAC是一种鲁棒的参数估计方法，常用于计算机视觉领域中去除异常值。 - **两层优化设计**：设计了一种两层优化方案来细化视差平面。在全局优化阶段，利用马尔可夫随机场（Markov Random Fields，MRF）推断超像素的平均视差，并从平均视差中衍生出三维邻域系统来处理遮挡问题。在局部优化阶段，采用贝叶斯推断和贝叶斯预测的概率模型来在三维邻居之间隐式实现二阶平滑。 ### 标签知识点： - **立体匹配（Stereo Matching）**：立体匹配是计算机视觉中的一个研究领域，旨在从不同视角的图像中恢复出深度信息。 - **MRF（Markov Random Fields）**：MRF是一种统计模型，用于描述随机场中各个位置像素之间的相互依赖关系。在图像处理中，MRF可以用于图像去噪、分割以及进行图像的后处理任务，例如在此处的视差图优化过程中。 ### 内容知识点： - **超像素（Superpixel）**：超像素是图像中一种小的、均匀的、视觉上一致的图像区域。在立体匹配中，将参考图像划分为超像素有助于提升算法处理的效率和准确性。 - **视差平面拟合（Disparity Plane Fitting）**：视差平面拟合是指为每个超像素找到一个反映其视差的平面，以达到精化立体匹配效果的目的。 - **贝叶斯推断和预测（Bayesian Inference and Prediction）**：贝叶斯推断是一种基于贝叶斯定理进行概率推断的方法，可以用于整合先验知识和新证据。在此处应用贝叶斯方法进行视差细化，有助于实现更精确的立体匹配。 - **三维邻域系统（3D Neighborhood System）**：三维邻域系统是指在立体图像中建立的，基于三维空间位置关系的邻域结构。这种结构对于处理遮挡和视差图优化至关重要。 - **计算成本低的解决方案（Low Computational Cost Solution）**：本文提出的“匹配代价计算+视差细化”框架，是一种在低计算成本下产生精确视差图的可能解决方案，它强调了算法效率和计算资源的优化利用。 ### 应用场景： - **三维重建（3D Reconstruction）**：立体匹配是三维重建过程中的一个关键步骤，可以利用匹配得到的视差图和相机参数直接重构深度信息。 - **自动驾驶（Autonomous Driving）**：在自动驾驶汽车的视觉系统中，立体匹配用于精确计算周围环境的三维结构，尤其对障碍物检测和避障有着重要的应用价值。 - **机器人导航（Robot Navigation）**：机器人通过立体匹配技术可以获得其工作环境的三维地图，有助于进行路径规划和障碍物规避。 ### 结论： - **高精度立体匹配的实现**：本文提出的基于分段的差异细化方法，通过结合全局和局部优化，能够高效地处理遮挡区域，并在低计算成本下产生高精度的立体匹配结果。 - **实用性与效率**：该方法在Middlebury和KITTI数据集上的实验结果证明了其准确性和处理遮挡的能力，展示了作为低成本解决方案的巨大潜力。

资源推荐

资源详情

资源评论

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2019.2903318, IEEE

Transactions on Image Processing

IEEE TRANSACTIONS ON , VOL. **, NO. *, JANUARY 2019 1

Segment-based Disparity Reﬁnement with

Occlusion Handling for Stereo Matching

Tingman Yan, Yangzhou Gan, Member, IEEE,

Zeyang Xia, Senior Member, IEEE, and Qunfei Zhao

Abstract—In this paper, we propose a disparity reﬁnement

method that directly reﬁnes the winner-take-all (WTA) disparity

map by exploring its statistical signiﬁcance. According to the pri-

mary steps of the segment-based stereo matching, the reference

image is over-segmented into superpixels and a disparity plane

is ﬁtted for each superpixel by an improved random sample

consensus (RANSAC). We design a two-layer optimization to

reﬁne the disparity plane. In the global optimization, mean

disparities of superpixels are estimated by Markov Random

Fields (MRF) inference and then a 3D neighborhood system

is derived from the mean disparities for occlusion handling. In

the local optimization, a probability model exploiting Bayesian

inference and Bayesian prediction is adopted and achieves second

order smoothness implicitly among 3D neighbors. The two-layer

optimization is a pure disparity reﬁnement method because no

correlation information between stereo image pairs is demanded

during the reﬁnement. Experimental results on the Middlebury

and the KITTI dataset demonstrate that the proposed method can

perform accurate stereo matching with a faster speed and handle

the occlusion effectively. It can be indicated that the “matching

cost computation + disparity reﬁnement” framework is a possible

solution to produce accurate disparity map at low computational

cost.

Index Terms—Stereo vision, disparity reﬁnement, Markov

random ﬁelds, RANSAC, Bayesian inference

I. INTRODUCTION

TEREO matching is a key step in 3D reconstruction. It

takes two rectiﬁed color images as input and matches

object projections in the image domain to compute disparities.

Depth can be directly reconstructed via disparity and camera

parameters. The foreground-background occlusion which is

almost inevitable makes the matching difﬁcult since the oc-

cluded regions are only visible in one view. Matching is also

ambiguous in scenes with low or repetitive textures. Other

challenges include imperfect rectiﬁcation and radiometric dif-

ferences. The Middlebury 2014 benchmark [1] provides an

evaluation that contains all these challenges, researchers can

upload their results for fair comparison. Besides accuracy, fast

computation is also required for real-time applications.

Stereo matching methods usually have (subsets of) four

steps [2]: matching cost computation, cost aggregation, dispar-

ity computation/optimization, and disparity reﬁnement. Match-

T. Yan and Q. Zhao are with the Department of Automation, Shanghai Jiao

Tong University, Shanghai, 200240 China. (E-mail: [email protected];

[email protected])

Y. Gan and Z. Xia are with Shenzhen Institutes of Advanced Technology,

Chinese Academy of Sciences, Shenzhen 518055, China, and also with CAS

Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shen-

zhen Institutes of Advanced Technology, Shenzhen 518055, China. (E-mail:

[email protected]; zy[email protected]).

ing cost measures the pixel-wise or patch-wise similarity

between image locations. Common methods include absolute

differences (AD), sampling insensitive measure (BT) [3], nor-

malized cross-correlation (NCC), census and rank transforms

[4], and the combination of these methods like AD-Census [5].

Recently, the powerful convolutional neural networks (CNN)

has been applied to matching cost computation. Zbontar and

LeCun [6] developed the MC-CNN method which learns

the similarity measure between image patches. Fast network

architecture which is able to produce an accurate result within

one second was proposed by Luo et al. [7].

Cost aggregation and disparity computation/optimization are

two key steps that determine the accuracy of stereo methods.

Local stereo methods perform averaging or weighted averaging

of matching costs [8] in a ﬁxed size window and disparities

are computed by the winner-take-all (WTA) operation to the

cost volume. Yang [9] proposed a non-local cost aggregation

on the minimum spanning tree (MST) structure. This idea was

extended to 3D non-local cost aggregation on a 3D-multiple-

MST structure [10]. Global stereo methods usually omit the

cost aggregation step. Instead, a global energy function which

penalizes depth discontinuities is optimized on the cost volume

to compute disparity. Although the global method is much

more accurate than the local one, it is far more computational

complicated.

Disparity reﬁnement is designed to further improve the

results in hard regions, such as occluded regions and low

texture regions. Most reﬁnement methods follow the detection

and ﬁlling scheme, followed by a ﬁltering step. Left-right

consistency check (LRC) [11] is commonly used to detect

outliers. Jang and Ho [12] proposed an energy function to

detect occlusion and classiﬁed the occlusions into leftmost oc-

clusions and inner occlusions. Banno and Ikeuchi [13] labeled

pixels that failed the LRC as low conﬁdence and introduced

a directed anisotropic diffusion to reﬁne these pixels. Huang

and Zhang [14] proposed a fast reﬁnement including belief

aggregation for outlier detection and belief propagation for

ﬁlling. In the work of Mei et al. [5], outliers were detected and

classiﬁed into occlusions and mismatches and then an iterative

region voting was applied to interpolate these outliers accord-

ingly. Filtering like bilateral ﬁltering and weighted median

ﬁltering [15] were also employed for disparity reﬁnement. It is

shown [16] that multi-step and iterative reﬁnement strategies

can result in competitive results. However, in large occluded

regions, inner occlusions cannot be directly reﬁned by these

strategies and cumulative error may be introduced.

In the Middlebury 2014 benchmark [1], top-rank methods

Transactions on Image Processing

IEEE TRANSACTIONS ON , VOL. **, NO. *, JANUARY 2019 2

have achieved high accuracy in non-occluded regions. How-

ever, the evaluation error that contain occluded regions are

almost doubled for most error metrics. Therefore, accurate

estimation near occluded regions is still a challenging problem.

In addition, all of the top-ten methods run more than 120s,

which makes them hard to be applied in computationally

intensive applications. To tackle these problems, we developed

a stereo matching method that has a higher accuracy and lower

computational cost with occlusion handling.

The proposed method directly reﬁnes the WTA disparity

map computed from the raw matching cost with the guid-

ance of the color reference image. The reference image is

segmented into superpixels and the proposed method oper-

ates on superpixel-level, which is of high efﬁciency. First, a

front-parallel disparity map is obtained by estimating mean

disparities of superpixels. Then a slanted-surfaces disparity

map is reﬁned by assigning each superpixel a plane. The front-

parallel to slanted-surfaces framework is achieved by a two-

layer optimization. In the global optimization layer, the front-

parallel disparity map is estimated by MRF optimization. In

the local optimization layer, the slanted-surfaces disparity map

is reﬁned by the RANSAC plane ﬁtting and the probability-

based disparity plane reﬁnement. The two layers are connected

by two constraints: the slanted-surfaces disparity map cannot

deviate far from the front-parallel disparity map; the two

disparity map share the same depth discontinuities. The ﬁrst

constraint helps remove outliers and deal with degeneracy in

the RANSAC plane ﬁtting. The second constraint is embedded

in a 3D neighborhood system and contributes for occlusion

handling. The proposed method is evaluated on the Middlebury

2014 and the KITTI 2015 dataset and compared with the state-

of-the-art disparity reﬁnement methods. Experimental results

demonstrate its accuracy, efﬁciency, and robustness.

In summary, the main contributions of this paper are: (1) A

pure disparity reﬁnement method that directly reﬁnes the WTA

disparity map with the guidance of the color reference image

and achieves the state-of-the-art performance with occlusion

handling. (2) A 1D label MRF formulation with a novel data

term that is based on disparity distributions. And a theoretical

analysis that proves the 1D label MRF cannot model the

highly slanted surfaces. (3) A front-parallel to slanted-surfaces

framework with a Bayesian inference and Bayesian prediction

based disparity plane reﬁnement that makes the 1D label

approach robust to slanted surfaces.

II. RELATED WORK

This section mainly focuses on MRF stereo methods and

segment-based stereo methods which are more related to our

work. We refer readers to [2] and [17] for more comprehensive

reviews.

A. MRF Stereo Methods

Markov Random Fields (MRF) stereo methods formalize

stereo matching as a label problem and the goal is to optimize

a global energy function which measures the quality of the

labeling.

Conventional MRF stereo methods [18], [19] assign each

pixel a 1D discrete disparity label. Optimizations such as

graph cuts [20], [21], belief propagation [22], [23] and TRW

[24] can be used to minimize the energy function. Graph cuts

based expansion moves and swap moves [18] are shown to

have good performance. These moves can update labels of all

pixels simultaneously and therefore the optimization is hard

to be trapped by the local minima. The drawback of 1D label

stereo methods is modeling the highly slanted surfaces. 3D

label stereo methods [25]–[27] are proposed to model the

scene more accurately. These methods can not only model

highly slanted surface but also achieve second order smooth-

ness constraints [25], [28], [29]. Therefore, they usually have

better accuracy than 1D label methods. However, the global

optimization on pixel level complicates the computation.

Our method performs 1D label MRF inference on superpixel

level. Since the number of superpixels is much less than

that of pixels in an image, the inference is much faster.

Unlike previous work, ours takes the discrete mean disparity

of superpixel as the label. Even in highly slanted surfaces, the

mean disparities can also be correctly estimated.

B. Segment-based Stereo Methods

Segment-based stereo methods [30]–[32] assume the scene

structure to be piece-wise planar and the estimation of dispar-

ity map transforms into assigning a 3D disparity plane to each

segment. First, these methods segment the reference image

into regions with homogeneous color. Then an initial disparity

map is computed by a known stereo matching method and

candidate disparity planes are generated by plane ﬁtting to the

disparity map. Finally, a global optimization, e.g., graph cuts

and belief propagation, is utilized to assign each segment an

optimal plane label. The ﬁnal results rely on the quality of

the segmentation. To relax the segment constraints, several

improvements have been proposed. Over-segmentation is a

common solution to ensure that depth discontinuities only

occur in the boundary of segments. Bleyer et al. [33] proposed

a pixel-wise MRF formulation that incorporated soft segment

constraints. Joint segmentation and disparity computation [34],

[35] can improve the segmentation quality during optimiza-

tion. But these methods have a common drawback. The ﬁnal

plane label of a segment is assigned from the candidate label

set. The ﬁnite set may not contain the correct label of the

segment, in such case the estimation of the disparity plane

is false and the error can not be corrected. In the work of

Wang and Zheng [36], the total energy function is optimized

by cooperative optimization and is decomposed into the sum

of sub-target energy functionals which are locally optimized.

The optimization process is done iteratively and false plane

label can be corrected by the local optimization. Therefore,

their method is robust to the initial plane-ﬁtting result.

In contrast to previous works which optimize a global

energy function, our method reﬁnes the disparity plane by

a local optimization and constraints smoothness implicitly.

Moreover, the disparity reﬁnement method demands no cor-

relation information between the stereo image pairs and thus

can be processed on a single view.

剩余11页未读，继续阅读

评论收藏

内容反馈

8BitCat

粉丝: 65

基于分段的用于立体匹配的遮挡处理的差异细化

视差细化，包括立体垫的遮挡处理

基于分割的立体匹配及算法-Segment_Based_Stereo_Matching.part1.rar

基于分割的立体匹配及算法-Segment_Based_Stereo_Matching.part2.rar

ZXPSignLib-minimal.dll

OpenCV 4.8.0

OriginPro 色卡

CorelDRAW-X4-SP2精简增强版

落雪音乐自定义音源切换

基于FPGA的ov5640图像采集

Video Speed Controller谷歌插件

GIF图片制作神器gifcam.exe

xfeatures2d.zip

组态图库-精美图1000+

《数字图像处理》期末复习题库3 + 试题答案

win10下cdr_X4.X5.X6菜单.7z

python调用DXGI实时快速截屏，是python截屏的最快版了

鸿蒙HarmonyOS壁纸，万物皆鸿蒙（无水印版）.rar

源代码-C#与halcon通用开发框架.zip

《数字图像处理》期末复习题库1 + 试题答案

海康VM框架软件PLC通信功能使用详解

zernike拟合matlab程序

HDMI2.1 spec

基于matlab实现高斯赛德尔迭代潮流计算的步骤和代码示例.pdf

PS2025百度网盘连接，下载安装即可用

global speed.zip

智能视频处理｜小咖自动剪辑批量混剪软件，支持按镜头 / 语音分割、多场景裂变合成、智能混剪，集成格式转换 / 文案提取 / 音频合成，适配自媒体 / 电商批量创作，助力高效产出爆款视频！

一款能支持高带宽高分辨率的EDID编辑工具

vdhcoapp-linux-x86-64.tar-2.0.19

PotPlayer免费视频播放器

图片批量下载软件.zip

JAVA：策略模式（Strategy Pattern）的技术指南

请求添加或删除指定服务器上的功能失败，安装一个或多个角色，角色服务器或功能失效。找不到源文件。请使用源选项指定还原该功能所需文件位置。错误0x800f081f

最新资源