Fast Range Image Segmentation and Smooth
Fast Range Image Segmentation and Smooth
I. I NTRODUCTION
As robots and autonomous systems move away from
laboratory setups towards complex real-world scenarios, both
the perception capabilities of these systems and their abilities
to acquire and model semantic information must become
more powerful. A key issue is the extraction of semantic
information from sensory data and its decomposition into
(c) Segmentation (d) Polygonalization
parts of interest that are relevant for the tasks of the robot. For
Fig. 1. Example of surface reconstruction and (plane) segmentation on
mobile manipulation in domestic environments, for example, a RGB-D point cloud: (a) input cloud; (b) constructed mesh; (c) result
the perception of objects and their surroundings is a key of segmenting planes in the mesh (red points are assigned to multiple
prerequisite. A common approach [1] in 3D perception is to planes); (d) polygonalization as a collection of alpha shapes. Using the
Microsoft Kinect camera at a resolution of 160 × 120, we can compute full
1) detect horizontal support planes, polygonalizations with roughly 30 Hz on a standard dual-core notebook.
2) extract and cluster points on top of these planes, and
3) perform further processing, e.g., recognizing, classify-
ing or tracking of the found clusters.
This paper is organized as follows: After giving a brief
Here, one of the fundamental problems is to segment the 3D overview on related work in range image and 3D plane
data into planes and other geometric primitives—or regions segmentation in Section II, we present our approach in
of local surface continuity in general. Section III and discuss how to detect geometric primitives
In this paper, we address the problem of segmenting using initial segmentations. We use multiple data sets for
range images and organized point clouds in real-time on evaluating the efficiency and robustness of our approach and
domestic service robots. The central idea of our approach summarize the results in Section IV.
is to approximately reconstruct the surface and segment
the range image by growing regions using the resulting
local mesh neighborhoods. By means of easily exchangeable
II. R ELATED W ORK
components, our generalized region growing approach allows
for different region models (e.g. planes) that are segmented in
the data. We present models for segmenting planes, regions Research on computer and robot vision produced a wide
of local surface continuity, and simple geometric primitives variety of approaches to range image segmentation—and
at high frame rates (see Fig. 1). plane segmentation in particular. Hoover et al. [2] compiled
Furthermore, we use the same mesh neighborhoods to a survey and performed an evaluation of early work. Three
1) efficiently compute local surface normals and curvature different types of approaches can be distinguished according
estimates, as well as 2) smooth both the 3D measurements to the underlying working principle: methods using random
and the computed normals using a bilateral filter. sample consensus (RANSAC), 3D Hough transforms, and
region growing. In the context of segmenting 3D laser range
This research has been partially funded by the FP7 ICT-2007.2.2 project scans, another popular approach is to first detect lines in
ECHORD (grant agreement 231143) experiment ActReMa.
D. Holz and S. Behnke are with the Autonomous Intelligent Systems planar cuts, and to merge neighboring lines into local plane
Group, University of Bonn, Germany. Contact: [email protected]. patches (see Vosselman et al. [3] for an overview).
A. Segmentation based on Sample Consensus C. Segmentation using Region Growing
RANSAC-based approaches try to find models for geo- The idea of region growing-based segmentation is to ex-
metric primitives that best explain a set of points and the ploit the image-like data structure. Hähnel et al. [12] connect
set of inliers supporting it. For segmenting a complete range neighboring points in 3D laser range scans to a mesh-
image, Lee et al. [4] sequentially remove inliers from the like structure. The scans are then segmented recursively by
original data set, and continue the segmentation with the merging connected patches that are likely to lie on the same
residual points. Silva et al. [5] first identify connected regions planar surface. Poppinga et al. [13] apply the same approach
and apply RANSAC region-wise. Gotardo et al. [6] compute to Time-of-Flight cameras and re-formulate the algorithm in
an edge map for pre-segmentation and use a variant based an incremental fashion. They grow planar regions by adding
on the M-estimator sample consensus (MSAC) to fit model neighboring points whose distance to the plane lies below a
parameters. threshold. Centroid and covariance matrix for estimating the
Another efficient solution to segmenting even unorganized plane parameters are thereby incrementally updated.
point clouds and detecting simple geometric primitives such Here, we follow a similar approach. Instead of incremen-
as planes, cylinders and spheres has been proposed by Schn- tally computing the covariance matrix, however, we compute
abel et al. [7]. They decompose unorganized point clouds the normals for all points beforehand and simply average
using an octree subdivision and apply RANSAC only to local surface normals to obtain an estimate of the plane’s
subsets of the original point cloud. normal. That is, we only store and incrementally update the
In previous work [8], we adapted the perception scheme centroids in both Cartesian and normal space.
from Section I as well as the techniques from [1] and [7], Other popular region growing-approaches to range image
and made them applicable to the measurements of time-of- segmentation make use of local surface curvature. Regions
fight (ToF) cameras. We presented techniques to cope with are grown until points with a considerably larger curvature
the specific error sources of the cameras, and to speed up are reached. Just like Gotardo et al. [6] for RANSAC-
processing by exploiting the image-like data organization. based segmentation, Harati et al. [14], first compute an edge
After detecting the most dominant plane, we applied the map to find connected regions of local surface continuity.
octree-based primitive detection of Schnabel et al. [7] only to Rabbani et al. [15] approximate local surface curvature by
already extracted and segmented points above that plane. In first fitting planar segments to local point neighborhoods and
[9], we further speeded up the segmentation process by using then computing, for each point, the distance to that plane.
integral images for computing local surface normals more Recently, Cupec et al. [16] followed a similar approach. They
efficiently, and using the index neighborhood underlying the first apply 2.5D Delaunay triangulation on a range image to
3D data to extract and track segments of points and object obtain an initial triangular mesh and then use the maximum
candidates, respectively. The overall approach is applicable distance of an examined point to all triangles in a region to
in real time on a Microsoft Kinect camera and has been used determine whether or not the point is added.
for real-time object tracking and grasp planning [10]. Here, we deduce a surface reconstruction directly from
B. Hough-based Plane Segmentation the image-like data structure, and use the local ring neigh-
borhood around vertices to 1) efficiently compute local
The Hough transform is the de-facto standard for finding surface normals and curvature estimates, and 2) efficiently
lines and circles in 2D images. Various extensions to 3D smooth the depth measurements using a bilateral filter. Our
exist that try to find, respectively, planes and maxima in framework allows for using different types of models for
histograms over the possible space of plane orientations and region growing, including the approaches of Poppinga et al.
distances. For an overview, and an evaluation for Hough- [13], Harati et al. [14], Rabbani et al. [15] and Cupec et al.
based segmentation approaches, we refer to the works of [16], as well as our fast approximation (see Section III).
Vosselman et al. [3] and Borrmann et al. [11].
RANSAC- and Hough-based segmentation share a com- III. FAST M ESH C ONSTRUCTION AND S EGMENTATION
mon disadvantage. Points belonging to the same segment
In this section, we describe our approaches to approximate
do not necessarily lie on connected components. Both ap-
surface reconstruction and segmentation based on region
proaches will merge plane segments if they share a common
growing. The overall processing pipeline is composed of the
orientation and distance to the origin. In addition, Hough-
following components:
based segmentation may suffer from discretization effects.
In [9], we present a fast plane segmentation approach 1) Deduce approximate mesh from image neighborhood.
that uses a similar parameter space as the Hough transform. 2) Use mesh neighborhood to compute approximate local
We pre-cluster points and segment planes first in normal surface normals and curvature estimates.
space and then, for each cluster, in distance space to obtain 3) Bilateral filtering to smooth both points and normals.
individual planes. We compensate for discretization effects 4) Segmentation based on region growing.
by conducting a post-processing step in which neighboring All components described in the above list use the same fixed
segments are merged if their parameters do not considerably neighborhoods. These neighborhoods come either directly
deviate. Still, unconnected planar patches may get merged from the mesh structure or can be pre-computed using a
into the same cluster. variety of search trees, if the input data is unstructured.
B. Fast Computation of Surface Normals and Curvature
We compute the local surface normal ni for point pi as
the weighted average of the plane normals of the faces sur-
rounding pi . Using the cross product between the difference
(a) Quad (b) Adaptive (c) Left cut (d) Right cut
vectors of the bounding vertices to compute the face normals,
Fig. 2. Fast approximate meshing using a quad mesh (a) and different
triangulations (b-d). Compared to the adaptive approach (b), triangulations and choosing the weights to be proportional to the area of
using only left cuts (c) or only right cuts (d) can be obtained slightly faster. triangles, removes the need of normalizing the face normals
beforehand. Thus, we can obtain ni as:
N
T
A. Exploiting Structure for Fast Approximate Meshing ∑ j=0 (p j,a − p j,b ) × (p j,a − p j,c )
ni = , (2)
The central idea of our surface reconstruction approxima- k ∑Nj=0
i
(p j,a − p j,b ) × (p j,a − p j,c )k
tion is to deduce the desired mesh structure directly from
where p j,a , p j,b and p j,c form triangle j. In the actual
the image-like organization of measurements. In fact, the
implementation, we simply iterate over the faces, compute
algorithms presented in the following could easily be applied
the difference vectors and their cross product, and add them
on local index neighborhoods in range images. However, an
to the normals of the involved points. Finally, we normalize
approximate mesh allows for 1) applying a wide variety of
all point normals at once. An example for computed local
sophisticated algorithms from the field of computer graphics,
surface normals (color coded) can be seen in Fig. 3(b).
as well as for storing certain edge weights, e.g., the difference
vectors for integral image-based normal computation from C. Bilateral Filtering
our previous work [9].
We traverse a given range image R once and check for Naturally, sensor measurements are affected by noise.
every point pi = R(u, v): Since this noise can hinder further processing like segmen-
tation, we apply a bilateral filter for smoothing both the
• if R(u, v) and its neighbors R(u, v + 1), R(u + 1, v + 1),
points and their normals while preserving edges in the sensed
and R(u + 1, v) (in the next row and the next column)
geometric structures. Again, instead of searching, we directly
are valid depth measurements, as well as
extract a point’s neighborhood from the mesh. That is, we
• if all edges between R(u, v) and these three neighbors
filter both a point pi and its normal ni over its 1-ring-
are not occluded.
neighborhood Ni , i.e., all points that are directly connected
The first check is necessary because of the structure in the to pi by an edge in the mesh:
sensory data that we exploit. If the sensor cannot acquire a
valid depth measurement for a certain pixel, it has to store pi = ∑ wi j pi / ∑ wi j , ni = ∑ wi j ni / ∑ wi j , (3)
an invalid one, in order to keep the structure organized. j∈Ni j∈Ni j∈Ni j∈Ni
The latter occlusion checks can be easily done by examin- wi j = |ekp{z
i −pj k kni −n j k1
} |e {z } |e {z } ,
(Ii −I j )/cI
(4)
ing the difference vectors between pi and its three neighbors. distance term normal term intensity term
If it falls into a common line of sight with the viewpoint
from where the measurements have been taken, one of the where the optional intensity term is only evaluated for col-
underlying surfaces occludes the other. The condition for the ored point clouds and range images where also an intensity
validity of an edge between point pi and its neighbor p j can image is available. The normalization constant cI is used to
be formulated as scale the intensity differences to lie in the interval [0, 1]. An
example of filtering an input mesh can be seen in Fig. 3(c).
valid = pi · p j ≤ cos εθ ∧ kpi − pj k2 ≤ εd2 , (1)
D. Region Growing-based Segmentation
where εθ and εd denote maximum angular and length toler-
ances, respectively. Despite the generalization over different neighborhood
If all checks are passed, R(u, v) and its neighbors are searches and region models, the implementation of our
used to extend the so far built mesh. Otherwise, holes arise. segmentation algorithm does not considerably deviate from
Referring to Fig. 2, we distinguish four types of meshes: other region growing algorithms in the literature. Given a set
1) Quad meshes are formed by connecting pixel R(u, v) to of seed points (and a priority queue of seeds),
R(u, v + 1), R(u + 1, v + 1), and R(u + 1, v). Outer loop, until all points are processed:
2) Fixed left and right cut meshes are formed by cutting 1) we iteratively select the next seed point,
quads either from top right to bottom left (left cut) or 2) initialize the region model of interest, and
from top left to bottom right (right cut). 3) put the seed point onto the empty processing queue.
3) Adaptive triangulation cuts the quad along the diagonal Inner loop, while the processing queue is not empty:
that has a smaller length. 4) We take the next point from the processing queue,
For triangulations, a single invalid neighbor causes that only 5) check its compatibility with the region model,
one triangle is added. After construction, we simplify the 6) and add it in case of compatibility.
resulting mesh by removing all vertices that are not used in 7) We add the point’s neighbors to the processing queue
any polygon. An example triangulation is shown in Fig. 3(b). (again, only if they’re compatible).
(a) Input cloud (b) Initial mesh (c) Filtered mesh (d) Segmented mesh
Fig. 3. Approximate surface reconstruction, bilateral filtering and segmentation on an example point cloud (a). The filtered mesh (c) is considerably
smoother than the initially approximated mesh (b) and allows for cleanly segmenting regions of local surface continuity (d). The meshes in (b) and (c) are
colored w.r.t. the vertices’ normal orientation (mapped to RGB space, object-space normal map). In (d) segments are colored randomly.
We have presented a simple, yet efficient approach to 1 The latest stable release of the Point Cloud Library PCL is available at
segmenting range images and organized point clouds. Using http://pointclouds.org.
(a) 10 / 14, 4 misses (b) 18 / 24, 6 misses (c) 10 / 13, 2 over, 1 miss (d) 12 / 13, 1 over (e) 22 / 22
Fig. 5. Plane segmentation using the SegComp PERCEPTRON training data set (segments randomly colored). In total, 109 of 126 planes were correctly
segmented (approx. 86.5%). Not correctly found are very small plane segments, e.g., the inner parts of the objects in (a) and (b). In addition, some planes
are oversegmented due to noise, e.g., the support plane in (c). The estimated plane normals deviate from ground truth by roughly (2.5 ± 1.6)◦ .
TABLE II
D ETAILED BENCHMARKING RESULTS ON THE S EG C OMP DATA SETS FOR PLANE SEGMENTATION
[10] J. Stückler, R. Steffens, D. Holz, and S. Behnke, “Real-time 3D per- Autonomous Vehicles (IAV), Toulouse, France, 2006.
ception and efficient grasp planning for everyday manipulation tasks,” [15] T. Rabbani, F. van den Heuvel, and G. Vosselman, “Segmentation
in Proc. of the European Conference on Mobile Robots (ECMR), of point clouds using smoothness constraint,” International Archives
Örebro, Sweden, 2011, pp. 177–182. of the Photogrammetry, Remote Sensing and Spatial Information
[11] D. Borrmann, J. Elseberg, K. Lingemann, and A. Nuechter, “The 3D Sciences, vol. 36, pp. 248–253, 2006.
Hough transform for plane detection in point clouds: A review and a [16] R. Cupec, E. K. Nyarko, and D. Filko, “Fast 2.5D mesh segmentation
new accumulator design,” 3D Research, vol. 2, pp. 1–13, 2011. to approximately convex surfaces,” in Proc. of the European Confer-
[12] D. Hähnel, W. Burgard, and S. Thrun, “Learning compact 3D models ence on Mobile Robots (ECMR), Örebro, Sweden, 2011, pp. 49–54.
of indoor and outdoor environments with a mobile robot,” Robotics [17] B. Oehler, J. Stückler, J. Welle, D. Schulz, and S. Behnke, “Efficient
and Autonomous Systems, vol. 44, no. 1, pp. 15–27, 2003. multi-resolution plane segmentation of 3D point clouds,” in Proc. of
[13] J. Poppinga, N. Vaskevicius, A. Birk, and K. Pathak, “Fast plane the International Conference on Intelligent Robotics and Applications
detection and polygonalization in noisy 3D range images,” in Proc. (ICIRA), Aachen, Germany, 2011, pp. 145–156.
of the IEEE/RSJ International Conference on Intelligent Robots and [18] D. Anderson, H. Herman, and A. Kelly, “Experimental characterization
Systems (IROS), Nice, France, 2008, pp. 3378–3383. of commercial flash ladar devices,” in International Conference of
[14] A. Harati, S. Gächter, and R. Siegwart, “Fast range image segmentation Sensing and Technology, Palmerston North, New Zealand, 2005.
for indoor 3D-SLAM,” in Proc. of the IFAC Symposium on Intelligent