sensors-24-07546
sensors-24-07546
1 National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China;
[email protected] (B.S.); [email protected] (X.P.); [email protected] (X.Q.);
[email protected] (Z.G.)
2 University of Chinese Academy of Sciences, Beijing 100049, China
3 Hangzhou Insititute for Adcanced Study, Chinese Academy of Sciences, Hangzhou 310024, China
* Correspondence: [email protected]
Abstract: Current lidar-inertial SLAM algorithms mainly rely on the geometric features of the lidar
for point cloud alignment. The issue of incorrect feature association arises because the matching
process is susceptible to influences such as dynamic objects, occlusion, and environmental changes.
To address this issue, we present a lidar-inertial SLAM system based on the LIO-SAM framework,
combining semantic and geometric constraints for association optimization and keyframe selection.
Specifically, we mitigate the impact of erroneous matching points on pose estimation by comparing
the consistency of normal vectors in the surrounding region. Additionally, we incorporate semantic
information to establish semantic constraints, further enhancing matching accuracy. Furthermore, we
propose an adaptive selection strategy based on semantic differences between frames to improve the
reliability of keyframe generation. Experimental results on the KITTI dataset indicate that, compared
to other systems, the accuracy of the pose estimation has significantly improved.
1. Introduction
Citation: Shen, B.; Xie, W.; Peng, X.;
Qiao, X.; Guo, Z. LIO-SAM++: A SLAM (Simultaneous localization and mapping) technology involves a robot con-
Lidar-Inertial Semantic SLAM with structing environment maps and estimating its position in an unknown environment
Association Optimization and by collecting data from its surrounding environment using its mounted sensors. Due
Keyframe Selection. Sensors 2024, 24, to the continuous development of intelligent technology, SLAM technology has gained
7546. https://doi.org/10.3390/ widespread attention and application in robotics and autonomous driving. SLAM can
s24237546 be categorized into Lidar-based and visual-based approaches, depending on the type of
Academic Editor: Hui Kong
sensors used for environmental perception. Visual SLAM captures abundant visual data
through cameras, providing color and texture information of the scene; however, it is
Received: 24 September 2024 susceptible to changes in illumination. Conversely, LIDAR (Light Detection and Ranging)
Revised: 8 November 2024 is capable of accurately measuring distance and remains unaffected by varying illumination
Accepted: 25 November 2024 conditions, rendering it more commonly utilized in large-scale environments. Notably,
Published: 26 November 2024
the laser-based multi-sensor fusion can further enhance environmental perception and
serve as a crucial method for improving the performance of SLAM systems.
In recent years, many solutions have been proposed for lidar-inertial SLAM, such as
Copyright: © 2024 by the authors.
LEGO-LOAM (Lightweight and Ground-Optimized Lidar Odometry and Mapping) [1],
Licensee MDPI, Basel, Switzerland. LIO-SAM (Lidar Inertial Odometry via Smoothing and Mapping) [2], and FAST-LIO (A
This article is an open access article fast robust LiDAR-inertial odometry) [3]. These methods perform well in some scenes
distributed under the terms and but may be disturbed in the real word. Most of these works select the associated points
conditions of the Creative Commons in the feature-matching process based only on the nearest-neighbor distance information.
Attribution (CC BY) license (https:// However, this way of association does not consider the effects of point positions and
creativecommons.org/licenses/by/ densities between point cloud frames, which results in a lack of robustness in the pose
4.0/). transformation. In addition, the point cloud maps of the environment generated by these
algorithms only contain geometry information and lack semantic descriptions of the objects
in the scene, making it difficult to understand the scene in depth. As neural networks
advance in the field of point cloud semantic segmentation, researchers have integrated
semantic information into LIDAR-based SLAM frameworks. Semantic-aided LiDAR SLAM,
such as SUMA++ (Efficient LiDAR-based Semantic SLAM) [4] and SA-LOAM (Semantic-
aided LiDAR SLAM) [5], use semantic segmentation networks to obtain the semantic labels
of the scene, which provides additional constraints for the point cloud alignment process
and reduces the dependence on the environment geometry. Although these methods
have made some progress, the performance of these can still be deficient when facing
environmental changes in the scene.
In this paper, we propose a lidar-inertial SLAM with association optimization and
keyframe selection. We optimize the feature association by combining geometric and seman-
tic information and reduce trajectory drift by adaptively generating keyframes according
to scene changes. The main contributions presented in this work can be summarized
as follows:
(1) We propose a feature association method based on neighborhood normal vector con-
sistency, which uses local region geometric information to filter the nearest neighbor
feature points to reduce the problem of inaccurate feature point association in point
cloud registration.
(2) We propose a semantic-assisted point cloud registration method for pose estimation,
which constructs a weighted cost function based on semantic attributes to achieve
more reliable odometry data association and improve pose estimation accuracy.
(3) We propose a semantic-based keyframe optimization method to adaptively generate
keyframes through the difference of semantic information between frames, which
reduces the probability of losing the valid point cloud frame.
2. Related Work
2.1. Feature Association
In SLAM systems, the process of scan registration is obtaining the pose transformation
by aligning two successive point cloud frames or the current point cloud frame with the
established map. Geometry-based association methods often employ geometry primitives,
such as points, lines, planes, or curvature features, to facilitate rapid matching. In LOAM [6],
point cloud curvature is computed to categorize points as either surface or corner features.
Residual equations are then formulated from the distances of corner points to lines and
surface points to planes, which are utilized to optimize pose estimation via the Levenberg-
Marquardt method. Many subsequent algorithms continue to adopt this feature-matching
approach, such as LeGO-LOAM [1], which reduces the complexity of computation by
segmenting the ground points of the point cloud frames and then clustering them to
obtain labels, and using the label information as the constraints for inter-frame matching.
LIO-SAM [2] extended LeGO-LOAM by removing the frame-to-frame matching part and
incorporating IMU pre-integration and GPS measurements factor within the factor graph
optimization. FAST-LIO [3] also extracts the corner and surface point features and proposes
a new Kalman gain formula to achieve efficient computation. In an effort to further
improve the reliability of solving pose transformation, some works began to focus on
the categorization of features. Guo et al. [7] introduced an approach using Principal
Components Analysis (PCA) to distinguish between corner and surface points utilizing
the properties of the points. Similarly, MULLS [8] extracted various features based on
PCA, which proposed a multi-metric linear least squares method for iterative optimization
of pose estimation. Meng et al. [9] proposed an adaptive feature extraction method for
different scenes, which adjusts planar and linear feature thresholds according to local
environmental properties and constructs multiclass cost functions for distinct categories
of features.
The above research enhanced the precision of feature point association by catego-
rizing feature points and seeking within the identical category. However, we consider
Sensors 2024, 24, 7546 3 of 18
that feature points of the same class are still associated with nearest neighbors, which
affects the accuracy of the association in cases when point cloud frames and maps do not
overlap completely. Therefore, unlike these feature association optimization methods, our
approach improves the reliability of feature association by expanding the scope of the
nearest neighbor search and combining it with normal vector consistency for filtering.
3. Method
The architecture of the proposed lidar-inertial semantic SLAM LIO-SAM++ is depicted
in Figure 1. The input is point cloud data from the LiDAR and high-frequency pose
data from the IMU, and the output is the state estimation and an environment map with
semantic information. This process has several stages, including IMU pre-integration, lidar
odometry, keyframe selection, loop closure detection and graph optimization.
(1) During the IMU pre-integration phase, the IMU measurements between two frames
of LiDAR data are pre-integrated;
(2) During the lidar odometry phase, semantic segmentation is applied to the point cloud
frames via a pretrained network to get semantic labels; Subsequently, corner points and
surface points are extracted from these frames, with the IMU pre-integration providing
an initial pose estimate. The pose of the current frame is obtained by minimizing the
residuals with semantics and geometry. For details, refer to Section 3.1;
(3) During the keyframe selection phase, keyframes are generated based on the scene
semantic changes and estimated pose transformation for back-end graph optimization.
For details, refer to Section 3.2;
(4) During the loop closure detection phase, potential loop closure relationships are found
by detecting historical keyframes to establish loop closure constraints;
(5) During the graph optimization phase, lidar odometry factors, IMU pre-integration
factors and loop closure detection factors are constructed. These factors are used
to update the state of system estimation through factor graph optimization and the
output of accurate pose estimation at the rate of the IMU. Meanwhile, global consistent
point cloud maps are generated by transforming the keyframe collection to the world
coordinate system and fusing them with the constructed global map;
In the next sections, we present detailed information about each module.
where mtj and pit is the corresponding points between the local map and point cloud frame.
1
| S | · ∥ri ∥ j∈∑
c= ∥ (ri − r j )∥2 (2)
S,j̸=i
where S represents the collection of points located on either side of the current point along
the same lidar scan line, ri and r j denote the vectors from the origin of the coordinate
system to the points pi and p j . A point is designated as a corner feature if its curvature
exceeds a predefined threshold, whereas a point below this threshold is classified as a
surface feature.
Ultimately, the extracted corner features Fie and surface features Fis from the point
cloud of each frame, which are combined into a feature set Fi = { Fie , Fis }. This set represents
the current frame and is utilized for subsequent registration of point clouds.
positional variance of these neighboring points from the mean, we calculate the covariance
matrix C. Finally, we perform an eigendecomposition of C to extract its eigenvalues λ and
corresponding eigenvectors. The eigenvector corresponding to the smallest eigenvalue is
the normal vector of the local region. The unit normal vector n̂ is defined by the formula:
N
1
f¯ =
N ∑ fi (3)
i =1
N
1
C=
N ∑ ( fi − f¯)( fi − f¯)T (4)
i =1
Cn = λmin n (5)
n
n̂ = (6)
∥n∥
where f i represents the coordinates of the ith point in the neighborhood, N is the total
count of points in the neighborhood, and λmin refers to the eigenvalue of the smallest
eigenvalue,n denotes the normal vector.
According to Equation (6), we compute the local unit normal vector n̂1 for the points
in the point cloud scan and the local unit normal vector n̂2 for each candidate matching
point in the map. Subsequently, we compare the difference between n̂1 and n̂2 . If the local
normal vector direction of a point in a point cloud scan is consistent with that of a matching
point in the local map, it indicates that the two points may lie on the same plane, and the
match is retained. The consistency of normal vectors is defined by the angular difference
between the vectors:
θ = arccos(n̂1 · n̂2 ) (7)
where n̂1 and n̂2 are the unit normal vectors corresponding to the point cloud scan and the
local map, respectively. Consistency of the normal vector is indicated if the angle θ is below
a predetermined threshold.By calculating the local unit normal vector consistency between
the point cloud frames and the corresponding points of the local map, five more accurate
matching points are selected from the 15 candidate matching points. These five matching
points will be used to construct geometric residual and semantic attribute constraints to
improve the reliability of the matching results.
The feature association strategy based on normal vector consistency has higher robust-
ness than the traditional method that only relies on nearest neighbor search. The traditional
methods may produce false matches due to noise. In contrast, the proposed method can
effectively filter out incorrect matches and preserve points with similar geometric features,
thus improving the accuracy of the matches.
The process of the association point selecting method is shown in the Algorithm 1.
The method flow is as follows: firstly, the distance residuals are constructed separately
for corner and surface feature points. Next, the semantic consistency probability is calcu-
lated and the residuals are weighted using this probability. Finally, the weighted residuals
are optimized as the cost function of the lidar odometry.
For corner points, the residual is defined as the distance from the feature point of the
current frame to the line formed by the 2 nearest neighbors of the corresponding 5 points in
the local map. For surface points, the residual is defined as calculating the distance from the
feature point of the current frame to the plane formed by the 3 non-collinear points among
the corresponding 5 points in the local map. The corner point residuals are calculated
as follows:
e − f e ) × ( f e − f e )|
|( f i,k u,m i,k v,m
de = e − fe |
(8)
| f u,m v,m
e is the corner
where de denotes the distance from the corner feature point to the line, f i,k
e e
point in the current point cloud frame, and f u,k and f v,k are the two nearest neighbors
corner points in the local map.
The surface point residuals are calculated as follows:
s − f s ) · (( f s − f s ) × ( f s − f s ))|
|( f i,k u,m u,m v,m u,m w,m
ds = s − f s ) × ( f s − f s )|
(9)
|( f u,m v,m u,m w,m
where ds denotes the distance from the surface point to the plane, f i,k s is the surface point
s s s
in the current point cloud frame, and f u,m , f v,m s, and f w,m are the three nearest-neighbour
noncollinear points in the local map.
For semantics, the traditional approach is to reduce the error caused by mismatching
by directly identifying the points that are inconsistent with the semantic attributes of the
current point. However, in this paper, considering the possible inaccuracy of semantic
segmentation, especially at the edges, instead of adopting the constraint construction
by considering only the points with the same semantic categories [5], the residuals are
weighted by calculating the semantic weights. Specifically, semantic weights are first
obtained by calculating the probability of semantic consistency between the current point
and its five matches selected by the normal vector consistency-based feature association
method. Then, this semantic weight is utilized to weight the residuals to mitigate the
effect of outliers. The semantic label lic of the current point f i has the semantic distribution
Sensors 2024, 24, 7546 8 of 18
MSi = {lkm }kN=1 corresponding to the set of matching points MDi , and the semantic
consistency judgment is expressed as follows.
(
1 if lic = lkm
nik = (10)
0 otherwise
The weighted corner point residuals d′e and surface point residuals d′s are formulated
as follows:
(∑kN=1 nik )
d′e = de × (11)
N
(∑kN=1 nik )
d′s = ds × (12)
N
where de denotes the distance residuals of corner points , ds denotes the distance residuals of
surface points, n denotes the number of points in the matched points that are consistent with
the semantic attributes of the current point, and N denotes the number of matched points.
The introduction of semantic weights changes the objective of residual computation,
enabling the optimization to not only depend on geometry distances but also consider the
consistency of semantic attributes. This method effectively reduces the adverse effect of
geometric similar but semantic inconsistent points on the optimization and then improves
the accuracy of the solution. When semantic inconsistencies occur, the contribution of
wrong matches to the optimization objective function can be reduced by multiplying the
weights by less than one.
Ultimately, the optimal transformation matrix is solved by constructing a cost function
with the weighted residuals of corner and surface point distances and then optimizing this
cost function through the Gauss-Newton optimization method. The optimization formula
is as follows:
T ∗ = arg min{ ∑ d′e + ∑ d′s } (13)
T∗ e ∈ Fe
f i,k i
s ∈ Fs
f i,k i
dynamic objects, selecting more stable keyframes can make the pose estimation process
smoother, reduce the error association, and reduce the phenomenon of pose jumps due to
dynamic objects.
Therefore, given the limitations of heuristics at the level of understanding the environ-
ment, we propose to introduce a semantic description of the scene. Usually, the distribution
of objects in the same scene region is relatively stable, and the semantic class of objects
can be used to characterize the region. Since the semantic changes between neighboring
point cloud frames can reflect the apparent differences in the environment, it enables the
SLAM system to capture the environmental changes more accurately in complex dynamic
environments and rapidly changing scenes. Therefore, we introduce semantic label infor-
mation in the keyframe selection stage to optimize the keyframe selection strategy through
semantic change detection and semantic stability detection. Compared with heuristics, our
approach better reflects the scene characteristics.
Before describing our keyframe selection method in detail, the relevant statistical
background is first provided. The KL (Kullback-Leibler) divergence is derived from the
concept of information entropy, which is used to describe the amount of information that is
available when generating a random variable from a distribution. For a random variable x
given a probability distribution P( x ), the information entropy H ( P) is defined as follows:
H ( P) = − ∑ P( x ) log P( x ) (14)
x
where Pic and Pil are the frequency of each semantic class object in the current and previous
frames. A larger KL divergence indicates a larger semantic difference between the current
frame and the previous frame, and then the current frame is more likely to be selected as
a keyframe. Compared with keyframe selection relying only on pose changes, modeling
semantic information using KL divergence can more acutely capture changes in object
Sensors 2024, 24, 7546 10 of 18
categories and distributions in the scene, especially when the scene changes drastically,
but the pose changes are not obvious.
Secondly, we exploit semantic information to select point cloud frames that contain a
larger number of absolute static object labels. The semantic stability score SS of the current
frame is indicated by calculating the proportion of static objects relative to dynamic objects
with the following formula:
ns
SS = (18)
nd + 1
where ns is the number of absolute static objects, and nd is the number of dynamic objects.
The semantic stability score reflects the static degree of the current frame. It effectively
reduces the interference caused by dynamic objects by favoring the selection of frames with
higher static as keyframes.
Next, based on the calculated semantic distribution probability and stability score,
different semantic weights Wse are defined to adjust the threshold of keyframe selection
based on translation and rotation in the traditional geometry method. The equations for
calculating the translation and rotation thresholds are as follows:
0
if SS > 1 ∗ tss
m= 1 if 1 ∗ tss < SS < 1.2 ∗ tss (19)
2 if SS > 1.2 ∗ tss
k
Ni = ∑ nj (23)
j =1
where n j denotes the number of times the point with matching point number i is selected
in the jth feature association process,k denotes the total number of times of feature associa-
tion process.
Figures 2 and 3 show the statistics of the number of times the match point sequence
numbers in sequences 01 and 06, respectively. Figures 2a,b and 3a,b indicate the statistics
of the traditional and proposed methods, respectively. The horizontal axis of the figure
indicates the serial number of the candidate match point and the vertical axis shows the
number of times the match point is selected as a match point in the feature association
phase. It can be observed that in the traditional method, only the five nearest neighbor
points are selected as matching points. Whereas the proposed algorithm is not limited to
the first five nearest neighbor points, the 6th to 15th nearest neighbor points are also used
as matching points. Although these matching points are used less frequently than the first
5 points, they still account for a larger proportion of feature association.
To further validate the effectiveness of the method, we conducted comparative exper-
iments on pose estimation accuracy for sequences 01 and 06. The results show that the
accuracy of the proposed method is improved by 2.77% and 2.86% in these two sequences,
respectively. Therefore, by introducing normal vector information, the associated points
can be selected more accurately, thus improving the accuracy of feature point matching.
Sensors 2024, 24, 7546 13 of 18
(a) (b)
Figure 2. The number of times use of 01 sequence correlation points. (a) Traditional method.
(b) Proposed method.
(a) (b)
Figure 3. The number of times of 06 sequence correlation points. (a) Traditional method. (b) Proposed
method.
(a)
(b)
Figure 4. In the same sequence, the keyframe sequence numbers selected by the traditional pose
change method and the method using the scene semantic information were compared. (a) sequence
06. (b) sequence 07.
Table 3. Comparison of absolute trajectory error on the KITTI datasets (RMSE of ATE [m]).
LIO-SAM++
Sequence LEGO-LOAM LIO-SAM FAST-LIO SUMA++
(Ours)
01 - 20.31 14.96 - 19.76
02 - 9.21 18.87 30.25 6.25
04 0.39 0.25 0.16 0.94 0.25
05 3.29 0.81 3.14 1.11 0.87
06 26.84 14.51 4.90 1.14 13.99
07 2.02 0.43 0.84 0.93 0.44
09 11.76 9.73 14.08 14.65 8.95
10 2.12 2.17 4.94 8.81 1.97
Average 7.55 7.18 7.74 8.14 6.56
For a more in-depth analysis, we chose sequences 02, 09 and 10 from the KITTI dataset
for trajectory comparison, which cover diverse scenarios such as urban and rural areas. We
show the trajectory comparisons of different algorithms and the deviation from the ground
truth, respectively.
Figure 5 depict the comparison results of trajectories, relative position, and relative
rotations for each algorithm in sequence 02.Sequence 02 is a 5067-m-long urban scene
sequence containing a large number of curves. In this scene, the trajectories generated
by the LIO-SAM++ algorithm are very close to the ground truth, especially in the curved
sections, where it significantly outperforms other algorithms.In comparison, the trajectories
of the other algorithms and the deviations of other algorithms are larger. That is because
the keyframe thresholds set by LIO-SAM++ through the adaptive understanding of the en-
vironment can better adapt to the scene changes, avoiding the omission of valid keyframes
when traveling in the curves and thus enabling more accurate estimation of the trajectory.
In Sequence 09, the vehicle navigates through a rural road environment, and there are
many objects like vegetation, buildings, and roads, as well as significant elevation changes
along the route. Figure 6 displays the trajectory comparison outcomes for this sequence,
where the FAST-LIO has significant trajectory errors.In contrast, the trajectories obtained
by LEGO-LOAM and LIO-SAM are largely coincident, and the LIO-SAM++ algorithm
behaves closer to the true values.
Sequence 10 is also in complex rural road conditions and contains a large number of
dynamic objects. Figure 7 illustrates that LIO-SAM++ surpasses other methods, with its
trajectory almost exactly aligning with the real trajectories, especially in the error control
of translational and rotational poses. This is due to the more stable and accurate features
provided by the proposed feature association method, which ensures the robustness of the
Sensors 2024, 24, 7546 16 of 18
pose estimation through normal vector and semantic consistency constraints, thus reducing
the system drift.
In addition, from the intermediate subplots (labeled ‘b’) of Figures 5–7, it can be seen
that the different algorithms have more consistent trajectory estimation in the x-axis and
z-axis directions. Still, there is a significant difference in the y-axis direction. This is due to
the lower measurement resolution of LiDAR in the y-axis direction and less observation
information from the ground, resulting in insufficient y-axis constraints. Nevertheless,
the proposed LIO-SAM++ still shows strong robustness in the y-axis.
Overall, the experiment results verify the effectiveness of the proposed algorithm in
various scenes and demonstrate that it can improve the accuracy of the SLAM system.
Figure 5. Comparative analysis of the estimated trajectories against the GT in sequence 02.
(a) trajectories. (b) position. (c) rotation.
Figure 6. Comparative analysis of the estimated trajectories against the GT in sequence 09.
(a) trajectories. (b) position. (c) rotation.
Sensors 2024, 24, 7546 17 of 18
Figure 7. Comparative analysis of the estimated trajectories against the GT in sequence 10.
(a) trajectories. (b) position. (c) rotation.
5. Conclusions
In this paper, we propose a lidar-inertial semantic SLAM algorithm with association
optimization and keyframe selection. The algorithm obtains more accurate feature associa-
tions by geometrically modeling the local point cloud and filtering out associated points
that are not in the same plane. Then, semantic consistency is added to the estimation
problem using the semantic label as a constraint weighting term, and this approach takes
semantic inaccuracies fully into account. In the optimization process, the distribution of
semantic labels is used to calculate the inter-frame differences and the degree of stability of
the current frame to focus on the changing characteristics of the scene, thus achieving a
more reasonable keyframe selection.
Experimental validation on the KITTI Odometry dataset demonstrates that our al-
gorithm enhances the precision of pose estimation and simultaneously generates clearly
structured semantic point cloud maps. Even though the proposed framework provides
reliable results, the work of semantic mapping still faces some challenges. Currently,
the only open-source dataset with semantic labeling of LiDAR frames is SemannticKITTI,
which limits the generality of the point cloud segmentation network used. In addition,
the adaptability in dynamic environments has not been fully validated. In subsequent
research, we aim to explore the potential of semantic information further to handle dynamic
objects better and, consequently, to deal with more complex real-world situations.
Author Contributions: Conceptualization, B.S. and W.X.; methodology, B.S.; software, B.S.; valida-
tion, B.S.; formal analysis, B.S., Z.G. and X.Q.; investigation, B.S. and Z.G.; resources, W.X.; data
curation, B.S.; writing—original draft preparation, B.S.; writing—review and editing, B.S. and W.X.;
visualization, B.S.; supervision, X.P. All authors have read and agreed to the published version of the
manuscript.
Funding: This research was funded by Cultivation Fund of Chinese Academy of Sciences (no.
KGFZD-145-24-33).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset used in the paper is the public KITTI Odometry, which
can be downloaded at: https://www.cvlibs.net/datasets/kitti/eval_odometry.php (accessed on 24
November 2024).
Conflicts of Interest: The authors declare no conflict of interest.
Sensors 2024, 24, 7546 18 of 18
References
1. Shan, T.; Englot, B. LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain.
In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain,
1–5 October 2018; pp. 4758–4765.
2. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing
and Mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas,
NV, USA, 25–29 October 2020; pp. 5135–5142.
3. Xu, W.; Zhang, F. FAST-LIO: A fast robust LiDAR-inertial odometry package by tightly-coupled iterated Kalman filter. IEEE
Robot. Autom. Lett. 2021, 6, 3317–3324. [CrossRef]
4. Chen, X.; Milioto, A.; Palazzolo, E.; Giguère, P.; Behley, J.; Stachniss, C. SuMa++: Efficient LiDAR-based Semantic SLAM.
In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China,
4–8 November 2019; pp. 4530–4537.
5. Li, L.; Kong, X.; Zhao, X.; Li, W.; Wen, F.; Zhang, H.; Liu, Y. SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure. In
Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021;
pp. 7627–7634.
6. Ji, H.; Singh, S. LOAM: Lidar Odometry and Mapping in Real-time. Robot. Sci. Syst. 2014, 2, 9.
7. Guo, S.; Rong, Z.; Wang, S.; Wu, Y. A LiDAR SLAM With PCA-Based Feature Extraction and Two-Stage Matching. IEEE Trans.
Instrum. Meas. 2022, 71, 8501711. [CrossRef]
8. Pan, Y.; Xiao, P.; He, Y.; Shao, Z.; Li, Z. MULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square. In Proceedings of
the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021 ; pp. 11633–11640.
9. Xu, M.; Lin, S.; Wang, J.; Chen, Z. A LiDAR SLAM System With Geometry Feature Group-Based Stable Feature Selection and
Three-Stage Loop Closure Optimization. IEEE Trans. Instrum. Meas. 2023, 72, 8504810. [CrossRef]
10. Zhao, Z.; Zhang, W.; Gu, J. Lidar mapping optimization based on lightweight semantic segmentation. IEEE Trans. Intell. Veh.
2019, 4, 353–362. [CrossRef]
11. Du, S.; Li, Y.; Li, X.; Wu, M. LiDAR Odometry and Mapping Based on Semantic Information for Outdoor Environment. Remote
Sens. 2021, 13, 2864. [CrossRef]
12. Forster, C.; Pizzoli, M.; Scaramuzza, D. SVO: Fast semi-direct monocular visual odometry. In Proceedings of the 2014 IEEE
International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 15–22.
13. Mur-Artal, R.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot.
2015, 31, 1147–1163. [CrossRef]
14. Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [CrossRef]
[PubMed]
15. Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34,
1004–1020. [CrossRef]
16. Kuo, J.; Muglikar, M.; Zhang, Z.; Scaramuzza, D. Redesigning SLAM for Arbitrary Multi-Camera Systems. In Proceedings of the
2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020.
17. Jiao, J.; Zhu, H.Y.Y.; Liu, M. Robust Odometry and Mapping for Multi-LiDAR Systems with Online Extrinsic Calibration. IEEE
Trans. Robot. 2022, 38, 351–371. [CrossRef]
18. Lin, Y.; Dong, H.; Ye, W.; Dong, X.; Xu, S. InfoLa-SLAM: Efficient Lidar-Based Lightweight Simultaneous Localization and
Mapping with Information-Based Keyframe Selection and Landmarks Assisted Relocalization. Remote Sens. 2023, 15, 4627.
[CrossRef]
19. Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation. In Proceedings
of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019;
pp. 4213–4220.
20. Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C. SemanticKITTI: A Dataset for Semantic Scene
Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV),
Seoul, Republic of Korea, 27 October–2 November 2019.
21. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 3354–3361.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.