0% found this document useful (0 votes)
70 views10 pages

IndoorGS: Enhanced Indoor Scene Reconstruction

The paper presents IndoorGS, a novel method for indoor scene reconstruction that enhances the performance of 3D Gaussian Splatting (3DGS) by leveraging geometric cues such as 2D lines and 3D planes. It introduces a geometric-cue-guided adaptive density control strategy to optimize the reconstruction process, resulting in improved accuracy and rendering quality compared to existing methods. Extensive experiments demonstrate that IndoorGS outperforms traditional 3DGS approaches in texture-rich and textureless indoor environments.

Uploaded by

muzach145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views10 pages

IndoorGS: Enhanced Indoor Scene Reconstruction

The paper presents IndoorGS, a novel method for indoor scene reconstruction that enhances the performance of 3D Gaussian Splatting (3DGS) by leveraging geometric cues such as 2D lines and 3D planes. It introduces a geometric-cue-guided adaptive density control strategy to optimize the reconstruction process, resulting in improved accuracy and rendering quality compared to existing methods. Extensive experiments demonstrate that IndoorGS outperforms traditional 3DGS approaches in texture-rich and textureless indoor environments.

Uploaded by

muzach145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

IndoorGS: Geometric Cues Guided Gaussian Splatting for


Indoor Scene Reconstruction

Cong Ruan Yuesong Wang* Tao Guan


School of Computer Science & Technology, Huazhong University of Science and Technology, China
{rc22, yuesongwang, qd gt}@[Link]

Bin Zhang
Shenzhen Smart City Technology Development Group Co., Ltd., China
zhangbin@[Link]

Lili Ju
Department of Mathematics, University of South Carolina, USA
ju@[Link]

Figure 1. Our IndoorGS achieves state-of-the-art (SOTA) performance in indoor scene reconstruction. Compared to other methods that produce poor
geometric results, our approach generates well-structural geometries while preserving all details. We can achieve superior rendering quality thanks to our
precise geometry estimation.

Abstract cues in indoor scenes to improve the reconstruction qual-


ity. Specifically, we first extract 2D lines from input images
3D Gaussian Splatting (3DGS) has shown impressive per- and fuse them into 3D line cues via feature-based matching,
formance in scene reconstruction, offering high rendering which can provide a structural understanding of the target
quality and rapid rendering speed with short training time. scene. We then apply the statistical outlier removal to refine
However, it often yields unsatisfactory results when ap- Structure-from-Motion (SfM) points, ensuring robust cues
plied to indoor scenes due to its poor ability to learn ge- in texture-rich areas. Based on these two types of cues, we
ometries without enough textural information. In this pa- further extract reliable 3D plane-like cues for textureless re-
per, we propose a new 3DGS-based method “IndoorGS”, gions. Such geometric information will be utilized not only
that leverages the commonly found yet important geometric for initialization but also in the realization of a geometric-
cue-guided adaptive density control (ADC) strategy. The
* Corresponding author. Contact him at yuesongwang@[Link]. proposed ADC approach is grounded in the principle of

844
divide-and-conquer and optimizes the use of each type of points can provide good geometric priors in texture-rich
geometric cues to enhance overall reconstruction perfor- regions, they usually behave as outliers in textureless or
mance. Extensive experiments on multiple indoor datasets repeating-textured areas and can also trouble the reconstruc-
show that our method can deliver much more accurate ge- tion. Therefore, we filter the SfM points via Statistical Out-
ometry and higher rendering quality for indoor scenes than lier Removal (SOR) and use the remaining points as reli-
existing 3DGS approaches. able geometric cues in texture-rich regions. We further ex-
tend these cues to the weakly textured planar-like regions
commonly existing in indoor scenes, such as walls, floors,
1. Introduction ceilings, and furniture surfaces. We first extract the weakly
textured regions from the 2D images and back-project them
Indoor scene reconstruction is one of common tasks in com- into 3D space using aligned monocular depths. We then an-
puter vision, with broad applications in virtual reality, gam- alyze the spatial relationship between the extracted 3D lines
ing, autonomous driving, robotics, and so on [25, 51]. Re- and these back-projected planes, allowing us to extract reli-
cently, 3D Gaussian Splatting (3DGS) [15] has emerged as able 3D plane-like cues. The above three types of geometric
a promising scene reconstruction method, which introduces cues are fed to 3DGS for initialization.
a hybrid optimization method with Gaussian primitives Based on the distinctive characteristics of these geo-
similar to point clouds and combines differentiable back- metric cues, we further divide the Gaussians into three
propagation with adaptive density control. This method has types, namely line Gaussians, plane-like Gaussians, and
demonstrated superior efficacy, exhibiting expedited train- SfM Gaussians, and design a geometric-cue-guided adap-
ing process, accelerated rendering speed, and high-quality tive density control strategy to enhance the optimization of
view synthesis, particularly in highly textured outdoor envi- 3DGS marching towards global optima. As 3D lines are
ronments. However, it often performs poorly when applied typically located at object edges with high spatial accuracy,
to indoor environments, which mainly attributes to the fact we apply a tangent space densification method to these line
that 3DGS is not capable of reconstructing correct geometry Gaussians, encouraging them to expand to complete the ob-
when facing textureless regions, as there are too many po- ject. For plane-like Gaussians, since the initial seeds from
tential combinations of geometry and appearance that can 3D plane-like region cues may introduce spatial offsets and
satisfy the training images. redundancies, we disable the splitting and cloning opera-
Although there has been a large amount of research tions for these Gaussians, focusing solely on optimizing
works [5, 9, 10, 14, 47, 48] devoted to improving the geom- Gaussian parameters and pruning processes. For SfM ge-
etry inference ability of 3DGS-based methods, their recon- ometric cues, because SfM points are often noisier than 3D
struction quality is still unsatisfactory when dealing with lines, we implement a densification strategy that prioritizes
indoor scenes, resulting in holes or floaters in the render- line Gaussian over SfM Gaussian.
ing images, as shown in Fig. 1. There are many local op- In summary, our main contributions are as follows:
tima during the training process, and without strong con- • We propose a geometric cue extraction method for ob-
straints and correct guidance, 3DGS could be easily trapped taining different types of reliable geometric priors from
by these local optima. What’s even worse is that the initial- indoor scenes to guide the reconstruction.
ization of 3DGS relies entirely on SfM points, which are • We classify Gaussian primitives according to their geo-
inherently sparse and noisy in regions with weak textures, metric types and design a geometric-cue-guided adaptive
and the lack of reliable seeds can lead to blind exploration density control strategy to better utilize the unique prop-
of 3DGS in these regions, further exacerbating the likeli- erties of each type of geometric cue.
hood of being trapped by the local optima. To improve the • Our proposed IndoorGS demonstrates superior perfor-
performance of 3DGS for indoor scene reconstruction, we mance in both the rendering quality and the accuracy of
need to, on the one hand, provide better reliable seeds and, extracted geometry across various indoor datasets.
on the other hand, add strong geometric constraints to help
the optimization move towards the global optima.
2. Related Work
To obtain better initial seeds, we propose to extract
more reliable geometric cues commonly found in the indoor 3DGS has been as a prominent 3D reconstruction technique
scene and use them to help initialize the 3DGS. Specifically, in the past couple of years due to its training efficiency,
we first extract 2D line segments and perform feature-based rendering performance, and impressive results across var-
matching on these 2D lines to obtain 3D line cues. Match- ious 3D tasks [24, 33, 37, 41]. The core of 3DGS lies
ing only line segments can drastically reduce the computa- in a carefully designed rasterization [53] method that effi-
tional complexity and the matching ambiguity, allowing us ciently leverages GPU computation by representing scenes
to get accurate and noise-free 3D line cues to help 3DGS un- as collections of Gaussian primitives. Despite the re-
derstand the scene’s construction. While SfM [26, 32, 34] markable success of 3DGS and its subsequent modifica-

845
Figure 2. Overview of the proposed IndoorGS. (a) The left part illustrates the extraction process of geometric cues, i.e., line cues, plane-like
cues, and SfM cues, which are then used to initialize 3DGS. (b) The right part depicts the optimization process, where the 3DGS is further
optimized using geometric-cue-guided adaptive density control and supervised by a series of loss functions.

tions [23, 35, 46, 50] in novel view synthesis, it faces sig- scene reconstruction.
nificant challenges in surface reconstruction, mainly due to
the discrete and unstructured nature of its Gaussian primi- 3. Preliminaries
tives. Recent research has sought to enhance 3DGS-based
3.1. 3D Gaussian Splatting
surface reconstruction [5, 9, 10, 14, 47, 48], yielding im-
provements in geometrically rich environments with strong 3DGS [15] explicitly represents a 3D scene with a set of
textures. However, these methods still struggle in weakly Gaussian primitives. Each of them is parameterized via a
textured settings, such as indoor scenes, which has driven 3D covariance matrix Σ and mean µ:
growing interest in 3DGS reconstruction for indoor envi- 1 ⊺
ronments. GaussianRoom [42] integrates neural signed dis- G(p) = exp(− (p − µ) Σ−1 (p − µ)), (1)
2
tance functions (SDF) with 3DGS, incorporating monocu-
lar normal and edge priors to enhance indoor scene details. where Σ can be further factorized into RSS⊺ R⊺ with a
Kim et al.[16] proposed an approach that employs 3D Gaus- scaling matrix S and a rotation matrix R. αi is the blending
sians for object reconstruction while using meshes to cap- coefficient for a Gaussian with center µi in screen space:
ture the room layout. 360-GS [1] leverages four panoramic
 
1
images to initialize 3D Gaussians based on room layout αi = oi · exp − (p − µi )⊺ (Σ′i )−1 (p − µi ) . (2)
2
and depth maps derived from panoramas for comprehen-
sive indoor reconstruction. FreeSplat [39] applies 2D back- where oi is the Gaussian opacity, the 3D Gaussian is trans-
bones and a cost volume for multi-view aggregation, while formed into the image coordinates with world-to-camera
MGSLAM [21] utilizes the Manhattan World hypothesis transform matrix W and projected onto pixel space with
to improve geometric accuracy and completeness in real- the Jacobian of the affine approximation for projection ma-
time SLAM [2–4] reconstructions. Despite the contribu- trix J, then the 2D covariance matrix Σ′ is computed as
tions, these methods usually require specialized inputs or Σ′ = JWΣW⊺ J⊺ . The rendering result is computed us-
are inefficient, exhibiting limitations in texture-poor envi- ing point-based alpha-blending on each ray:
ronments. Indoor scenes are rich in geometric cues like X i−1
Y
lines and planes, which are useful for 3DGS-based recon- Ĉ = ci αi Ti , where Ti = (1 − αj ) (3)
structions. However, SfM often encounters issues in these i∈N j=1
environments due to large textureless surfaces and repeti- where Ti is the accumulated transmittance at pixel location
tive elements. This is where 3D line map reconstruction p, ci is the view-dependent color defined by spherical har-
approaches [12, 13, 20] show natural advantages in over- monics (SH) coefficients, and N is the number of Gaus-
coming weak and repetitive textures common in indoor set- sian primitives. A photometric loss function optimizes the
tings. By fully leveraging these geometric cues, 3DGS is Gaussian parameters to reduce the discrepancies between
expected to better handle the unique challenges of indoor the rendered images and the actual observational data.

846
3.2. Rasterizing Depth and Normal the idea of LIMAP [20] to extract 3D lines, mainly con-
sisting of 2D line extraction and line matching. Although
Following RaDeGS [48], the center µi of a Gaussian Gi is
rich 2D line segment features can be extracted from im-
initially projected into the camera space as µ′i . The depth
ages using learning-based methods such as DeepLSD [29]
of each pixel correspondent to the Gaussian is computed as:
or Sold2 [28], when encountering large continuous weakly
 
 ′ 
xi textured or untextured areas, such as wall corner joint be-
∆x tween white walls, they still have difficulties in obtaining
d = zi′ + p , µ′i =  yi′  = W µi + t, (4)
∆y distinctive 2D line segment features using only color in-
zi′
formation. However, we can easily extract distinctive line
where zi′ represents the depth of the Gaussian center, ∆x = segment features from textureless regions on their normal
x′i − x and ∆y = yi′ − y denote the relative pixel posi- maps, as shown in Fig. 3.
tions. The vector p is determined by the Gaussian param-
eters [W , t] ∈ R3×4 . Per-pixel z–depth estimates D̂ are
rendered using the discrete volume rendering approxima-
tion similar to color values:

X i−1
Y
D̂ = di αi (1 − αj ), (5)
i∈N j=1

where di is the ith Gaussian’s depth according to the Eq. (4).


The normal direction of each projected Gaussian is com-
puted as follows:
 ′ ⊺
zi
n̂ = −J ⊺ p, 1 (6)
zi

Normals are then alpha-blended, resulting in a single, uni-


fied per-pixel normal estimate: Figure 3. The complementary effects of extracting structured lines
from RGB images and normal maps. Top row: The line segments
can be easily seen from the normal map, but extracting the corre-
X
N̂ = n̂i αi Ti . (7)
sponding line segments from the RGB image is difficult. Bottom
i∈N
row: The opposite case.
4. Proposed Method
However, at the beginning of the reconstruction, we do
Fig. 2 gives an overview of our methods. We first extract not know the scene’s geometry and can not render the nor-
geometric cues, i.e., line, plane, and SfM point cues, from mal maps. Fortunately, nowadays, foundation models for
the input images (Section 4.1). Then, we divide Gaussians monocular normal estimation can give us a good prior.
into three types according to the geometric cues and treat Thus, we extracted 2D line segments from RGB images and
them differently in our geometric-cue-guided adaptive den- monocular normal maps using the DeepLSD [29] method.
sity control (Section 4.2). The training process is supervised Subsequently, the GlueStick [30] method was used to ex-
by color and geometric loss (Section 4.3). tract the line segments and match them.
Finally, we obtain the 3D line segments for the scene. To
4.1. Geometric Cues Extraction capture color information along these segments, we extract
SfM points are often too sparse to guide the reconstruction the corresponding 2D line colors from all visible views of
of indoor scenes and sometimes contain geometric errors, each 3D line, then perform multiview color averaging to
which can cause lots of floaters in the subsequent 3DGS achieve a consistent color representation, and sample 3D
reconstruction process. Therefore, we need to provide the points on 3D line segments in fixed steps.
3DGS with sufficient and accurate geometric cues. As de- Geometric Cues from SfM. The sparse points of SfM
picted in Fig. 2, there are three types of geometric cues that in indoor scenes often contain noise or outliers to repeti-
we use to guide the reconstruction. tive structures, reflective surfaces, and limited textures. Al-
Geometric Cues from 3D lines. In contrast to sparse though sparse 3D feature points are generally accurate in
points from SfM, which is structureless, 3D line segments strong-texture regions, they are often misaligned in weak-
can concisely encode the high-level layout, as they often de- texture regions, as 2D points are more challenging to match
lineate the main structural elements of the scene. We follow correctly across multiple views. Therefore, filtering SfM

847
points is crucial, as erroneous inputs can be amplified Specifically, we allow GM to split and clone freely be-
during adaptive density control (ADC) in 3DGS, causing tween 500-700 iterations to ensure it occupies strong tex-
floater artifacts. To address this, we apply a statistical out- ture regions correctly. Between iterations 700 and 3500,
lier removal filter (SOR) [31], which calculates the average no ADC process is applied to GM ; only the tangent space
distance between each point and its K nearest neighbors, densification of GL occurs during this period. After 3500
removing points with abnormally high average distances. iterations, the ADC process for GM resumes.
The remaining SfM points serve as reliable geometric cues • Plane-like Gaussian GS : Due to potential spatial offsets
in texture-rich regions. and redundancy of GS , we disable splitting and cloning
Geomtric Cues from Plane-like Regions. The plane- operations for GS . Instead, only the parameters optimiza-
like regions in indoor scenes are usually weakly textured tion and pruning processes are conducted for them.
walls, ceilings, etc. For detecting these weakly textured
regions in images, we use the Segment Anything Model
(SAM) [17] for image segmentation, then determine texture
strength by counting the density of SIFT [22] points in each
segmented region.
It is difficult to calculate the depth of these regions due
to the lack of texture information. However, the positions of
the structural line clues on their edges are accurate enough.
Thus, we can locate these plane-like cues using the 3D line Figure 4. Densification in the tangential space. Left: Cloning.
cues with the help of the monocular depth estimation model. When the Gaussian Gi is cloned, an average gradient is added to
Specifically, we first obtain the monocular depth estimation its spatial location µi . The projection of this gradient in the normal
direction (the correction factor) is then subtracted, resulting in the
from Metric3D [44]. While the monocular depth estima-
new cloned green Gaussian Gi+1 . Right: Splitting. When the
tion can reflect the relative positions of lines and planes,
Gaussian Gi is split, a small random vector v is generated and
it remains inconsistent with absolute depths. Thus, we esti- added to its position. The component of v along the normal is set
mate a relative per-image depth scale by aligning monocular to 0, yielding the two newly split Gaussians Gi+1 .
depth with line cues and SfM cues using linear regression,
which is then used to adjust the monocular depths. Based on Densification in the tangential space. The vanilla 3D
the adjusted monocular depths, the weakly textured region Gaussian Splatting (3DGS) is traditionally densified by ran-
is back-projected into 3D space to form a point cloud. To domly splitting or cloning Gaussian primitives within their
ensure the reliability of the back-projected point cloud, we Gaussian fields. While this method is flexible and easy to
then calculate the distances between sampling points from implement, it encounters difficulties in indoor environments
3D line cues and the back-projected point cloud, based on as this unconstrained densification strategy can result in ar-
which we filter out points exceeding a preset threshold. The tifacts like floaters in weakly textured regions. Inspired by
retained points form the plane-like cues, giving reliable pri- GeoGS [19], we propose a similar tangent space densifica-
ors for weakly textured regions in 3D space. The whole tion scheme. However, unlike GeoGS, whose initial point
process is described as Algorithm 1 in the supplementary cloud is derived from PlanarSLAM [18] and whose normal
material. vectors can be directly obtained from a spatial fit, we do
not simply reduce the third scale of the Gaussian primitive
4.2. Geometric-cue-guided ADC Strategies to a very small value as we do not know the initial normal
Besides extracting points from geometric cues for initial- vectors. Instead, we introduce a soft constraint that flat-
ization, we further let these cues guide the reconstruction tens the Gaussian ellipsoid, akin to the approach used in
process. To achieve this, we design Geometric-cue-guided Neusg [6], we directly minimize the minimum scale fac-
Adaptive Densification Control (ADC) strategies for the tor Si = diag(s1 , s2 , s3 ) for each Gaussian that originates
three Gaussian types based on the positional accuracy and from 3D line points:
significance of their respective 3D spatial locations. In the
Ls =∥ min(s1 , s2 , s3 ) ∥1 . (8)
following, we denote the Gaussians from SfM cues, line
cues, and plane-like cues as GM , GL , and GS , respectively. In this case, Gaussian primitive becomes flat, forming disc-
• Line Gaussian GL : They typically locate at object edges like surfel, and the smallest scaling axis approximates the
with high spatial accuracy. Thus, as discussed in detail normal direction. Simultaneously, we align the Gaussians’
below, we specifically design a tangent space densifica- normal with the monocular surface normal estimated by
tion method for them. Metrci3D using the following normal constraints:
• SfM Gaussian GM : The ADC process for GL takes 1 X
Ln = ∥N̂ij − Nmono,ij ∥1 , (9)
precedence over GM since GM are often noiser than GL . |N̂ |

848
where N̂ij represents estimated normal values at pixel posi- Monocular depth regularization loss. Monocular
tion (i, j), and Nmono,ij represents monocular normal values depth estimation faces challenges such as scale ambigu-
at pixel position (i, j). We then define the geometric normal ity, uneven scale distribution, and a lack of strict geomet-
for a flat Gaussian primitive as follows: ric consistency between depth maps. A simple linear trans-
formation of monocular depth may destabilize 3D points
n̂f lat = R · OneHot(arg min(s1 , s2 , s3 )), (10) and introduce distortions during alignment. To address this,
the Pearson correlation coefficient (PCC) [8] can be used
where rotation matrix R ∈ SO(3) which is obtained from on small blocks of the depth map to extract robust geo-
its quaternion q, OneHot(.) ∈ R3 returns a unit vector with metric priors from monocular depth rather than imposing
zeros everywhere except at the position where the scaling strict consistency. Specifically, we randomly sample N
si = (s1 , s2 , s3 ) is minimum. However, the orientation non-overlapping patches to compute the monocular depth
of the shortest axis can be ambiguous, as it may point ei- regularization loss as follows:
ther outward or inward relative to the surface. We leverage
N
the relationship between the observation direction and the 1 X
Ld = 1 − P CC(P̂i , Pmono,i ), (12)
Gaussian normal to resolve this ambiguity. Specifically, the N i
angle between the observation direction and the Gaussian
normal should always be greater than 90 degrees. If this 2
where P̂i ∈ RS denotes the ith patch of D̂, and Pmono,i ∈
condition is not met, we simply invert the Gaussian normal 2
RS denotes the ith patch of Dmono , with the patch size S
vector by multiplying it by -1, ensuring the correct orien-
being a hyperparameter. PCC, similar to normalized cross-
tation. Then, we implement a tangent space densification
correlation, measures the similarity between the two depth
strategy for these flat Gaussians, as shown in Fig. 4.
patches. This loss function encourages local consistency
When a cloning operation is performed, the position of between rendered and monocular depth maps, improving
the newly cloned Gaussian is determined by adding the av- alignment over small regions while handling local varia-
erage gradient to the position of the original Gaussian, along tions, thus enhancing depth estimation accuracy.
with a correction factor. The correction is calculated as fol- Normal consistency loss. Following RaDeGS [48],
lows: we measure the consistency between the normal directions
µi+1 = µi + ∇µi − n̂f lat · ∇µi , (11) computed from the Gaussians and those derived from the
depth map:
where µi is the position of the original Gaussian Gi , ∇µi is
the accumulated average gradient of the original Gaussian
X
Lnc = (1 − N̂ij · N⃗ ij ), (13)
Gi , n̂f lat is the normal of the original Gaussian Gi .
When a splitting operation is performed, the position where N̂ij is the rendered normal at pixel(i, j) from Eq. (7),
of the newly split Gaussian is determined by adding an off- N⃗ ij is the normal estimated by the gradient of the depth
set to the original Gaussian’s position. This offset is drawn map at pixel(i, j).
from a normal distribution with a mean of 0 and the three Together with Ln and Ls used in the tangent space den-
scales of the Gaussian as the standard deviation. We set sification, our final loss function is defined as follows:
the offset along the axis corresponding to the most minia-
ture scale to zero, ensuring no movement occurs along that L = Lrgb + λd Ld + λn Ln + λs Ls + λnc Lnc , (14)
direction.
The advantage of this approach is that the tangent where Lrgb is the photometric loss proposed in 3DGS [15].
space densification of GL enables better splitting and We set λd = 0.1, λn = 0.5, λs = 100.0 and λnc = 0.1 in
cloning along surfaces, effectively suppressing the issue of our experiments.
“floaters” caused by excessive splitting and cloning of erro-
neous seeds in GS . 5. Experiments
In this section, we first demonstrate our IndoorGS’s
4.3. Loss Functions Definition
SOTA performance. We compare Gaussian-based methods,
Despite adding geometric guidance to adaptive density con- known for their high-quality rendering and fast training and
trol, Gaussians may still drift away from the correct loca- rendering speed. We compare our method with 3DGS [15],
tions during parameter optimization due to the weak con- GOF [47], RaDeGS [48] and PGSR [5]. And PGSR is the
straints from color consistency loss at textureless regions, state-of-the-art surface reconstruction approach. We then
resulting in floaters or holes in the test views. Therefore, analyze the contribution of each proposed component.
we add two geometric regularization losses to ensure the Dataset. We evaluate the performance of our approach
optimization proceeds in the right direction. on geometry reconstruction and rendering quality using 14

849
Figure 5. Qualitative comparison for geometry on Replica(the left two columns), Scannet++(the middle two columns), and Deep Blend-
ing(the right two columns) datasets. The experimental results demonstrate IndoorGS’s robustness performance in less textured areas and
ability to capture fine details as well.

Replica Scannet++ Deep Blending Average on all scenes


PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ Time
3DGS 39.52 0.968 0.121 29.46 0.918 0.152 29.41 0.903 0.243 32.79 0.929 0.172 10m
GOF 40.98 0.980 0.074 29.82 0.919 0.153 29.21 0.904 0.246 33.33 0.934 0.158 2h
RaDeGS 41.04 0.980 0.074 30.08 0.921 0.147 29.46 0.907 0.243 33.53 0.936 0.155 15m
PGSR 40.02 0.974 0.101 29.99 0.920 0.154 29.22 0.893 0.255 33.08 0.929 0.170 25m
Ours 41.11 0.980 0.073 30.45 0.928 0.131 30.06 0.910 0.235 33.88 0.939 0.146 35m

Table 1. Quantitative comparison results of rendering quality for novel view synthesis on the Replica, Scannet++, and Deep Blending datasets. “Red” denote
the best results, “yellow” denote the second best results. Test results on multiple datasets show that IndoorGS outperforms the other three methods.

indoor scenes from publicly available datasets. This in- Replica Scannet++ Average on all scenes
cludes 8 scenes from Replica [36], 4 scenes from Scan- C-L1↓ F1↑ NC↑ C-L1↓ F1↑ NC↑ C-L1↓ F1↑ NC↑
Net++ [43] and 2 scenes from Deep Blending [11]. GOF 0.086 0.576 0.797 0.105 0.479 0.702 0.095 0.528 0.749
RaDeGS 0.079 0.600 0.862 0.096 0.497 0.761 0.087 0.549 0.811
Implementation details. We choose RaDeGS [48] as
PGSR 0.049 0.818 0.895 0.079 0.625 0.778 0.064 0.722 0.837
our baseline. Since our initialization points are denser than
Ours 0.036 0.825 0.940 0.040 0.776 0.877 0.038 0.801 0.909
vanilla 3DGS, the densification stops earlier at 12,000 itera-
tions. To extract the final mesh, we apply Truncated Signed Table 2. Quantitative comparison results of Chamfer distance, F-Score
Distance Function (TSDF) [27] surface reconstruction, then and Normal Consistency for reconstruction on the Replica and Scannet++
use Marching Cube [40] to extract mesh. All experiments datasets. IndoorGS achieves the highest reconstruction accuracy across all
are conducted on a single Nvidia RTX4080 GPU. metrics.

Metrics. We chose three widely used image evalua-


tion metrics for rendering evaluation: PSNR, SSIM, and
5.1. Mesh Comparison
LPIPS [49]. For mesh evaluation, we follow the evalua-
tion protocol from monosdf [45] and Go-surf [38], report IndoorGS demonstrates superior performance in indoor
Chamfer-L1 distance (C-L1 ), F-score (F1) with a threshold scene reconstruction, achieving the highest scores across all
of 5cm and Normal Consistency (NC). evaluated metrics—Chamfer Distance, F-Score, and Nor-

850
Geometric Cues Geometric-cue-guided ADC Loss Mesh Novel View Synthesis
Method Line SfM Plane Line SfM Plane Ld Ln Lnc C-L1↓ F1↑ NC↑ PSNR↑ SSIM↑ LPIPS↓
w/o Line & Plane & SfM cues - - - - - - ✓ ✓ ✓ 0.055 0.681 0.859 30.308 0.924 0.137
w/o Line & Plane cues - ✓ - - ✓ - ✓ ✓ ✓ 0.051 0.717 0.789 30.513 0.927 0.135
w/o Plane cues ✓ ✓ - ✓ ✓ - ✓ ✓ ✓ 0.050 0.728 0.862 30.583 0.927 0.134
w/o All customized ADC ✓ ✓ ✓ - - - ✓ ✓ ✓ 0.052 0.711 0.839 30.499 0.926 0.135
w/o Line ADC ✓ ✓ ✓ - ✓ ✓ ✓ ✓ ✓ 0.051 0.720 0.848 30.595 0.926 0.135
w/o SfM ADC ✓ ✓ ✓ ✓ - ✓ ✓ ✓ ✓ 0.040 0.771 0.875 30.573 0.927 0.133
w/o Plane ADC ✓ ✓ ✓ ✓ ✓ - ✓ ✓ ✓ 0.046 0.741 0.871 30.390 0.925 0.137
w/o Ld ✓ ✓ ✓ ✓ ✓ ✓ - ✓ ✓ 0.044 0.762 0.864 30.553 0.926 0.134
w/o Ln ✓ ✓ ✓ ✓ ✓ ✓ ✓ - ✓ 0.051 0.725 0.848 30.510 0.926 0.137
w/o Lnc ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ - 0.044 0.744 0.808 30.505 0.925 0.136
Full model ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 0.040 0.776 0.877 30.453 0.928 0.131

Table 3. Ablation study results. Geometric cues significantly improve reconstruction quality. We report mesh and novel view synthesis metrics on the
Scannet++ dataset. Adding the SfM cues means using the filtered SfM points instead, and using Ln implies incorporating both Ln and Ls .

mal Consistency, on all of the Replica, Scannet++, and Specifically, we analyzed the effects of geometric cues,
Deep Blending datasets as shown in Tab. 2 and Fig. 5. This geometric-cue-guided ADC strategies, depth regularization,
consistent excellence highlights its effectiveness in accu- normal regularization, and normal consistency. The re-
rately reconstructing complex indoor environments. Since sults are presented in Tab. 3. These results show that our
the Deep Blending dataset does not include directly pro- full model performs best across all three mesh metrics and
vided mesh ground truth (GT), we do not perform quan- the final two NVS metrics (SSIM and LPIPS). Notably,
titative metrics calculations. In contrast to other methods, as shown in the first three rows of the table, the step-by-
which often result in uneven surfaces in weakly textured re- step addition of geometric cues and the corresponding ADC
gions and overly smooth surfaces along edges, IndoorGS strategy produces incremental performance improvement.
achieves a more accurate and detailed reconstruction. Rows 4 to 7 further illustrate that introducing various ADC
strategies after adding geometric cues results in further per-
5.2. Rendering Comparison formance enhancement, and line-cue-guided ADC has the
To evaluate the rendering quality of our method, we com- most significant impact among the three strategies. The fi-
pared the test-view rendering results of our approach with nal three rows show the positive effects of the three loss
those from vanilla 3DGS, GOF, RaDeGS, and PGSR on functions, with normal regularization providing the greatest
Replica, ScanNet++, and Deep Blending datasets. As enhancement.
shown in Tab. 1, Our method outperforms all of the above
Gaussian-based reconstruction methods in terms of novel 6. Conclusion
view synthesis, achieving an average PSNR that is 0.35
dB higher than the second-best method across the three In this paper, we propose IndoorGS, an indoor scene recon-
datasets. Notably, following vanilla 3DGS, our test set is struction method based on 3DGS. Our IndoorGS utilizes ge-
obtained by sequential sampling from the original dataset. ometric cues such as filtered SfM points, line segments, and
The test views are positioned close to the training views in weakly textured planar surfaces, commonly found in indoor
indoor scenes. In this case, despite the model overfitting the environments, as input data and takes a geometric-cues-
training views and the geometry inaccurately estimated, the guided ADC strategy for Gaussian scene optimization. We
PSNR of the rendered images from the test views can re- validate our method’s rendering and reconstruction quality
main relatively high, which undermines the effectiveness of on the Replica, ScanNet++, and Deep Blending datasets.
the test set in evaluating the true capability of the model, as Experimental results demonstrate that our IndoorGS out-
it fails to challenge the model with sufficiently diverse per- performs existing Gaussian-based techniques in geometric
spectives. We provide a detailed experimental analysis of reconstruction accuracy and rendering quality.
this issue in the appendix. Additionally, our method takes
only 10 minutes longer than PGSR despite including the 3D Acknowledgment
line map construction and planar cues extraction process.
Project supported by the National Natural Science Founda-
5.3. Ablation Studies tion of China (Grant No. 62302174). The computation is
We conducted an ablation study on the ScanNet++ dataset completed in the HPC Platform of Huazhong University of
to test the impact of each component we introduced. Science and Technology.

851
References [14] Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and
Shenghua Gao. 2d gaussian splatting for geometrically ac-
[1] Jiayang Bai, Letian Huang, Jie Guo, Wen Gong, Yuanqi curate radiance fields. In ACM SIGGRAPH 2024 Conference
Li, and Yanwen Guo. 360-gs: Layout-guided panoramic Papers, pages 1–11, 2024. 2, 3
gaussian splatting for indoor roaming. arXiv preprint
[15] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler,
arXiv:2402.00763, 2024. 3
and George Drettakis. 3d gaussian splatting for real-time
[2] Carlos Campos, Richard Elvira, Juan J Gómez Rodrı́guez, radiance field rendering. ACM Transactions on Graphics, 42
José MM Montiel, and Juan D Tardós. Orb-slam3: An accu- (4):1–14, 2023. 2, 3, 6
rate open-source library for visual, visual–inertial, and mul-
[16] Jiyeop Kim and Jongwoo Lim. Integrating meshes and 3d
timap slam. IEEE Transactions on Robotics, 37(6):1874–
gaussians for indoor scene reconstruction with sam mask
1890, 2021. 3
guidance. arXiv preprint arXiv:2407.16173, 2024. 3
[3] Danpeng Chen, Nan Wang, Runsen Xu, Weijian Xie, Hu-
[17] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao,
jun Bao, and Guofeng Zhang. Rnin-vio: Robust neural
Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White-
inertial navigation aided visual-inertial odometry in chal-
head, Alexander C Berg, Wan-Yen Lo, et al. Segment any-
lenging scenes. In 2021 IEEE International Symposium
thing. In Proceedings of the IEEE/CVF International Con-
on Mixed and Augmented Reality (ISMAR), pages 275–283.
ference on Computer Vision, pages 4015–4026, 2023. 5
IEEE, 2021.
[4] Danpeng Chen, Shuai Wang, Weijian Xie, Shangjin Zhai, [18] Yanyan Li, Raza Yunus, Nikolas Brasch, Nassir Navab, and
Nan Wang, Hujun Bao, and Guofeng Zhang. Vip-slam: An Federico Tombari. Rgb-d slam with structural regularities.
efficient tightly-coupled rgb-d visual inertial planar slam. In In 2021 IEEE international conference on Robotics and au-
2022 International Conference on Robotics and Automation tomation (ICRA), pages 11581–11587. IEEE, 2021. 5
(ICRA), pages 5615–5621. IEEE, 2022. 3 [19] Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee
[5] Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Lee, and Federico Tombari. Geogaussian: Geometry-
Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, aware gaussian splatting for scene rendering. arXiv preprint
and Guofeng Zhang. Pgsr: Planar-based gaussian splatting arXiv:2403.11324, 2024. 5
for efficient and high-fidelity surface reconstruction. arXiv [20] Shaohui Liu, Yifan Yu, Rémi Pautrat, Marc Pollefeys, and
preprint arXiv:2406.06521, 2024. 2, 3, 6 Viktor Larsson. 3d line mapping revisited. In Proceedings of
[6] Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural im- the IEEE/CVF Conference on Computer Vision and Pattern
plicit surface reconstruction with 3d gaussian splatting guid- Recognition, pages 21445–21455, 2023. 3, 4
ance. arXiv preprint arXiv:2312.00846, 2023. 5 [21] Shuhong Liu, Heng Zhou, Liuzhuozheng Li, Yun Liu,
[7] Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Tianchen Deng, Yiming Zhou, and Mingrui Li. Struc-
Sergey Prokudin, and Siyu Tang. Splatformer: Point trans- ture gaussian slam with manhattan world hypothesis. arXiv
former for robust 3d gaussian splatting, 2024. 2 preprint arXiv:2405.20031, 2024. 3
[8] Israel Cohen, Yiteng Huang, Jingdong Chen, Jacob Benesty, [22] G Lowe. Sift-the scale invariant feature transform. Int. J, 2
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel (91-110):2, 2004. 5
Cohen. Pearson correlation coefficient. Noise reduction in [23] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin
speech processing, pages 1–4, 2009. 6 Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d
[9] Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, gaussians for view-adaptive rendering. In Proceedings of
Huamin Wang, and Weiwei Xu. High-quality surface recon- the IEEE/CVF Conference on Computer Vision and Pattern
struction using gaussian surfels. In ACM SIGGRAPH 2024 Recognition, pages 20654–20664, 2024. 3
Conference Papers, pages 1–11, 2024. 2, 3 [24] Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and
[10] Antoine Guédon and Vincent Lepetit. Sugar: Surface- Deva Ramanan. Dynamic 3d gaussians: Tracking
aligned gaussian splatting for efficient 3d mesh reconstruc- by persistent dynamic view synthesis. arXiv preprint
tion and high-quality mesh rendering. In Proceedings of arXiv:2308.09713, 2023. 2
the IEEE/CVF Conference on Computer Vision and Pattern [25] Run Luo, Zikai Song, Lintao Ma, Jinlin Wei, Wei Yang, and
Recognition, pages 5354–5363, 2024. 2, 3 Min Yang. Diffusiontrack: Diffusion model for multi-object
[11] Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, tracking. In Proceedings of the AAAI Conference on Artifi-
George Drettakis, and Gabriel Brostow. Deep blending for cial Intelligence, pages 3991–3999, 2024. 2
free-viewpoint image-based rendering. ACM Transactions [26] Pierre Moulon, Pascal Monasse, Romuald Perrot, and Re-
on Graphics (ToG), 37(6):1–15, 2018. 7 naud Marlet. Openmvg: Open multiple view geometry. In
[12] Manuel Hofer, Michael Maurer, and Horst Bischof. Line3d: Reproducible Research in Pattern Recognition: First Inter-
Efficient 3d scene abstraction for the built environment. In national Workshop, RRPR 2016, Cancún, Mexico, December
Pattern Recognition: 37th German Conference, GCPR 2015, 4, 2016, Revised Selected Papers 1, pages 60–74. Springer,
Aachen, Germany, October 7-10, 2015, Proceedings 37, 2017. 2
pages 237–248. Springer, 2015. 3 [27] Richard A Newcombe, Shahram Izadi, Otmar Hilliges,
[13] Manuel Hofer, Michael Maurer, and Horst Bischof. Efficient David Molyneaux, David Kim, Andrew J Davison, Pushmeet
3d scene abstraction using line segments. Computer Vision Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon.
and Image Understanding, 157:167–178, 2017. 3 Kinectfusion: Real-time dense surface mapping and track-

852
ing. In 2011 10th IEEE international symposium on mixed 4d gaussian splatting for real-time dynamic scene rendering.
and augmented reality, pages 127–136. Ieee, 2011. 7 In Proceedings of the IEEE/CVF Conference on Computer
[28] Rémi Pautrat, Juan-Ting Lin, Viktor Larsson, Martin R Os- Vision and Pattern Recognition, pages 20310–20320, 2024.
wald, and Marc Pollefeys. Sold2: Self-supervised occlusion- 2
aware line description and detection. In Proceedings of [42] Haodong Xiang, Xinghui Li, Xiansong Lai, Wanting Zhang,
the IEEE/CVF Conference on Computer Vision and Pattern Zhichao Liao, Kai Cheng, and Xueping Liu. Gaussian-
Recognition, pages 11368–11378, 2021. 4 room: Improving 3d gaussian splatting with sdf guidance
[29] Rémi Pautrat, Daniel Barath, Viktor Larsson, Martin R Os- and monocular cues for indoor scene reconstruction. arXiv
wald, and Marc Pollefeys. Deeplsd: Line segment detection preprint arXiv:2405.19671, 2024. 3
and refinement with deep image gradients. In Proceedings of [43] Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner,
the IEEE/CVF Conference on Computer Vision and Pattern and Angela Dai. Scannet++: A high-fidelity dataset of 3d in-
Recognition, pages 17327–17336, 2023. 4 door scenes. In Proceedings of the IEEE/CVF International
[30] Rémi Pautrat, Iago Suárez, Yifan Yu, Marc Pollefeys, and Conference on Computer Vision, pages 12–22, 2023. 7
Viktor Larsson. Gluestick: Robust image matching by [44] Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu,
sticking points and lines together. In Proceedings of the Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d:
IEEE/CVF International Conference on Computer Vision, Towards zero-shot metric 3d prediction from a single image.
pages 9706–9716, 2023. 4 In Proceedings of the IEEE/CVF International Conference
[31] Radu Bogdan Rusu and Steve Cousins. 3d is here: Point on Computer Vision, pages 9043–9053, 2023. 5
cloud library (pcl). In 2011 IEEE international conference [45] Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sat-
on robotics and automation, pages 1–4. IEEE, 2011. 5 tler, and Andreas Geiger. Monosdf: Exploring monocu-
[32] Johannes L Schonberger and Jan-Michael Frahm. Structure- lar geometric cues for neural implicit surface reconstruc-
from-motion revisited. In Proceedings of the IEEE con- tion. Advances in neural information processing systems,
ference on computer vision and pattern recognition, pages 35:25018–25032, 2022. 7
4104–4113, 2016. 2 [46] Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and
[33] Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat-
Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian ting. In Proceedings of the IEEE/CVF Conference on Com-
Zhang, Bin Zhou, et al. Gir: 3d gaussian inverse ren- puter Vision and Pattern Recognition, pages 19447–19456,
dering for relightable scene factorization. arXiv preprint 2024. 3
arXiv:2312.05133, 2023. 2 [47] Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian
[34] Noah Snavely, Steven M Seitz, and Richard Szeliski. Model- opacity fields: Efficient adaptive surface reconstruction in
ing the world from internet photo collections. International unbounded scenes. ACM Transactions on Graphics, 2024.
journal of computer vision, 80:189–210, 2008. 2 2, 3, 6
[35] Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jing- [48] Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang,
wei Zhao, Xiang He, Weihao Gu, and Hao Zhao. Sa- Xiaoxiao Long, and Ping Tan. Rade-gs: Rasterizing depth in
gs: Scale-adaptive gaussian splatting for training-free anti- gaussian splatting. arXiv preprint arXiv:2406.01467, 2024.
aliasing. arXiv preprint arXiv:2403.19615, 2024. 3 2, 3, 4, 6, 7
[36] Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik
[49] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht-
Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl
man, and Oliver Wang. The unreasonable effectiveness of
Ren, Shobhit Verma, et al. The replica dataset: A digital
deep features as a perceptual metric. In Proceedings of the
replica of indoor spaces. arXiv preprint arXiv:1906.05797,
IEEE conference on computer vision and pattern recogni-
2019. 7
tion, pages 586–595, 2018. 7
[37] Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang
[50] Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, and
Zeng. Dreamgaussian: Generative gaussian splatting for effi-
Hengshuang Zhao. Pixel-gs: Density control with pixel-
cient 3d content creation. arXiv preprint arXiv:2309.16653,
aware gradient for 3d gaussian splatting. arXiv preprint
2023. 2
arXiv:2403.15530, 2024. 3
[38] Jingwen Wang, Tymoteusz Bleja, and Lourdes Agapito. Go-
surf: Neural feature grid optimization for fast, high-fidelity [51] Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing
rgb-d surface reconstruction. In 2022 International Confer- Gao, Junqing Yu, Zikai Song, and Wei Yang. Video anomaly
ence on 3D Vision (3DV), pages 433–442. IEEE, 2022. 7 detection with motion and appearance guided patch diffusion
model. arXiv preprint arXiv:2412.09026, 2024. 2
[39] Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee
Lee. Freesplat: Generalizable 3d gaussian splatting to- [52] Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hu-
wards free-view synthesis of indoor scenes. arXiv preprint jun Bao, Zhaopeng Cui, Martin R Oswald, and Marc Polle-
arXiv:2405.17958, 2024. 3 feys. Nice-slam: Neural implicit scalable encoding for slam.
[40] LORENSEN WE. Marching cubes: A high resolution 3d In Proceedings of the IEEE/CVF conference on computer vi-
surface construction algorithm. Computer graphics, 21(1): sion and pattern recognition, pages 12786–12796, 2022. 2
7–12, 1987. 7 [53] Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and
[41] Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Markus Gross. Ewa volume splatting. In Proceedings Visu-
Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. alization, 2001. VIS’01., pages 29–538. IEEE, 2001. 2

853

You might also like