
two kinds of fake keypoints may exist, one of which comes from the
layered structures between the foreground and the background, and
another is on the edge of a curved surface such as the edge of a bowl, as
shown in Fig. 3.
As known, a part of unstable or similar keypoints can be removed in
the feature matching step, considering the geometric consistence with
respect to transformation. However, these fake keypoints can hardly be
filtered out by RANSAC (Fischler and Bolles, 1981) and other feature
matching algorithms when the change of the views is not large enough
in the application. It is because they can be continuously tracked and
treated as inliers. In this case, the fake keypoints will contaminate the
matching process and decrease the accuracy of the transform estima-
tion. On the other hand, when the views dramatically change in an
application, the fake keypoints may lead to worse cases. Even if they
can be filtered out by RANSAC due to the variation of their spatial
positions, these outliers may result in the bigger risk of NOT con-
vergence of RANSAC, which will be discussed in the experiment part.
To the best of our knowledge, it is impossible to distinguish or re-
move the fake keypoints in a single 2D image, in which the spatial
information is not complete. In this work, we will try to filter out fake
keypoints in the RGB-D images during the feature transform process,
with the help of the depth information.
2.3. The perspective projection issue
Although the stable keypoint maintains the spatial location, the
appearance of the keypoint in the 2D image may still change with the
variance of views. One reason is the well-known perspective transform
of the feature surface, which has been widely studied, and another is
the variation of the background, especially when the keypoint is a
spatial corner.
Now with the help of the depth data, it is possible to filter out the
background information before computing the feature descriptor. In the
proposed method, the background of the keypoint is filtered out and the
perspective invariant feature transform is applied.
3. Invariant feature patch extraction
As mentioned earlier, we adopt the multi-scale FAST as the 2D
keypoint detecting method in the color image which is proposed and
applied in BRISK and ORB features. It can detect multi-scale keypoints
and has good performance in time consumption. After keypoint detec-
tion, an image patch around the keypoint is extracted to describe the
characteristics of the neighborhood of the keypoint. In our work, this
feature patch is supposed to be invariant to the perspective projection.
To achieve this function, we extract the feature patch through the fol-
lowing steps: Firstly, with the help of depth information, we remove the
background information from the image patch to make sure that the
feature patch is stable under different views. This can be done via a
depth based segmentation in the image patch. Secondly, we assume that
the normal of the feature patch is invariant to perspective projection in
3D space. Then, we project the feature patch on its spatial tangent plane
Fig. 2. The two registration results together with the original
images. The black pixels indicate the non-dense issue, which
mostly appears aside the edge of an object.
Fig. 3. Examples of stable keypoints (green) and fake keypoints (red) in a scene. The red
keypoints can keep their spatial location invariant to different views, while the red
keypoints cannot. (For interpretation of the references to color in this figure legend, the
reader is referred to the web version of this article.)
Q. Yu et al.
Computer Vision and Image Understanding xxx (xxxx) xxx–xxx
3