Monocular Depth for 3D Gaussian Splatting
Monocular Depth for 3D Gaussian Splatting
Gaussian Renderer
View 2 𝑐#! Anchor Attributes
+
Splat Attributes 𝑐":$
Image View 3 𝑠#! 𝑠":$ Render Image
X X 𝑟":$ X X
residual
𝑟#!
freeze
Monocular X ∆𝜇":$ X
𝑜#! 𝑜":$
Depth X
Prediction
X ∆𝑐":$
𝑓! 𝑫𝒆𝒄𝝁
∆s":$
∆𝑟":$
X ∆𝑜":$ X
𝑫𝒆𝒄𝒄
Depth XXX : Anchors Residual-Form Gaussian Decoder : Gaussian Splats
Render Depth
𝑳𝒅𝒆𝒑𝒕𝒉
× Depth-Scale Parameter (λ+)
Scale-Consistent Depth Loss
Fig. 3. Our methods consists of three main steps: (a) Per-View Anchor Initialization: Given monocular depth images, depth-scale adjustable anchors
are initialized from each view. Each anchor is fixed in the 3D scene except the depth-scale toward the corresponding view. (b) Anchor Decoding with
Residual-Form Gaussian Decoder: Each anchor is decoded into k Gaussian splats by our residual-form Gaussian decoders. When initialized, each anchor
contains nominal Gaussian splat attributes (µ̄j , r̄j , c̄j , ōj , s̄j ) and an embedded feature fj . The residual decoders generate k sets of residual attributes for
child splats, which are combined with nominal anchor attributes to generate child Gaussian splats. (c) Training with Scale-Consistent Depth Loss Online
Depth-Scale Calibration: We use scale-consistent depth loss Ldepth that incorporates scales for each monocular depth supervision.
Perceptron (MLP) structures. During training and rendering, Scale-Consistent Depth Loss Monocular depth images D
the decoders generates on-the-fly the residuals ∆αi from inherently contain scale ambiguity, and therefore need to
nominal attribute ᾱi stored in each anchor pj , as follows: be calibrated with adequate scale parameters when they
are used for depth supervision. Unlike previous approaches
{∆α0 , ∆α1 , ..., ∆αk−1 } = Fα (fj ) (3) that employ depth losses based on scale-invariant Pearson
Correlation [33], we define our depth loss term with a depth-
The use of residual-form decoders along with nominal scale parameter λ̂i embedded for each monocular depth Di ,
attributes (µ̄j , c̄j , ōj , s̄j ) enables faster training of decoders as follows:
and direct initialization of splat attributes, offering significant
advantages over other anchored Gaussian methods [38], [24]. n
X
Anchored Gaussian Generation k child Gaussian splats Ldepth = log ||1 + (λ̂i Di − D̂i )||2 (9)
are spawned from each anchor pj by combining the decoded i=1
residual attributes {α}1:k and nominal attributes ᾱj . This is Note that this depth-scale parameter λ̂i differs from the
expressed as: scale parameter ŝi introduced in our anchor depth-scale
parameterization. Specifically, ŝi allows to each initialized
µ1:k = µ̄j + ∆µ1:k (4) per-view anchor group to adjust its scale toward the reference
view, while λ̂i corrects the monocular depth scale ambiguity
c1:k = c̄j + ∆c1:k (5)
during the loss calibration.
s1:k = s̄j · ∆s1:k (6) Full Loss Design In addition to our proposed scaled-depth
o1:k = ōj + ∆o1:k (7) loss, the full loss function consists of a photometric loss
(8) Lphoto , a volumetric regularization loss Lvol [39], and an
anisotorpic regularization loss Laniso [40]. For completeness,
As shown in Fig. 5(a), our residual attribute structure we list these loss functions below:
enables direct initialization of splat attributes α1:k (e.g.
color) by incorporating the reference value ᾱ. This approach
accelerates the training of decoders, as they only need to Lphoto = w · D-SSIM(Ii , Iˆi ) + (1 − w) · ||Ii − Iˆi ||1 (10)
learn the deviations from the reference value. By addressing
In this equation, Lphoto represents a combination of L1 loss
one of the main weaknesses of the anchor-decoder scaffold
and D-SSIM loss [5], where SSIM stands for the Structural
structure [24], [38], our method improves both the training
Similarity Index Measure . The weight parameter w controls
efficiency and robustness of the original framework.
the balance between the two loss components.
C. Training from Rendering Losses
The generated Gaussian splats provide an explicit 3D
X
Lvol = Prod(si ) (11)
scene representation that can be rendered into novel view p∈P
color images and depth images, I,ˆ D̂ using the tile-based
1 X
rasterizer. We use the color image I and monocular depth Laniso = max{max(sp )/min(sp ), r} − r (12)
|P|
image D as supervision during training. p∈P
(hku_main_building)
Main Building
(hku_campus_seq_00)
Campus
Both Lvol and Laniso are applied to splats P to regu- distortion loss. (2) Sequential methods, including Colmap-
larize their shape. Here, Prod means the multiplication of Free 3D Gaussian Splatting (CF-3DGS) [8] and MonoGS
covariance-related Gaussian scales [40]. The overall loss [21], process each frame sequentially to fully exploit local
function is formulated as: information and refine image poses. (3) Anchored methods,
such as Scaffold-GS [24] and our approach, generate splats
anchored to points with restricted movement in 3D space.
L = λp Lphoto + λs Lscale + λd Ldepth + λu Laniso (13) Finally, (4) Baseline include the original 3DGS [5] and Mip-
Here, λp , λs , λd , and λu represent the weighting factors Splatting [43].
assigned to each corresponding loss term.
A. Rendering evaluation on R3 LIVE dataset
V. E XPERIMENT R3 LIVE dataset is a publicly available odometry dataset
We evaluate our method on four challenging ground-view captured by a hand-held device with a 15 Hz camera, 200
scenes: one indoor and one outdoor scene each from the Hz Inertial Measurement Unit (IMU), and 10 Hz Livox
R3 LIVE odometry dataset [16] and the Tanks and Temples Avia LiDAR sensor. It includes diverse indoor and outdoor
dataset [14]. To demonstrate the performance of our method, scenes from the campuses of HKU and HKUST. Unlike
we report rendering metrics including PSNR, SSIM [41], typical neural rendering datasets with object-centric views or
and LPIPS [42], which are widely used in neural rendering structured viewing patterns [14], [11], [13], R3 LIVE dataset
benchmarks [7], [5]. Our algorithm utilized no input point captures many complex indoor and outdoor structures with
clouds in any of the scenes, while we used LiDAR point free trajectory patterns.
cloud for R3 LIVE and SfM point cloud for Tanks and We process IMU, LiDAR, and Image data using R3 LIVE
Temples dataset. [16] multi-sensor odometry pipeline to generate pose-tagged
For effective comparison and analysis, we categorized image sequences. To synchronize pose estimation with image
existing 3DGS variants into four types according to the time stamps, we slightly modified the R3 LIVE odometry
the characteristics of regularization or information that they implementation. However, it still inevitably introduces pixel-
utilize: (1) Geometric 3DGS methods include 2D Gaus- level errors due to sensor fusion and inaccurate extrinsic cal-
sian Splatting (2DGS) [30] and Gaussian Opacity Field ibration. To avoid redundancy from high-frame rate images,
(GOF) [31]. To align the splats with the actual geometry, we subsample the images by selecting every 10th frame from
2DGS constraint splats to be flat and GOF applies depth the dataset.
TABLE I
With Residual-Form
R ESULTS ON R3 LIVE DATASET
Decoder
Main Building Campus
Method Type Input
PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓
3DGS [5] Baseline LiDAR 15.84 0.684 0.497 21.60 0.760 0.364
Mip-Splatting [43] Baseline LiDAR 13.58 0.630 0.499 15.89 0.679 0.449
With Direct-Form
CF-3DGS [8] Sequential Mono – – – – – –
Decoder
MonoGS [21] Sequential Mono 11.60 0.486 0.557 16.06 0.600 0.527
GOF [31] Geometric LiDAR 15.53 0.676 0.502 19.78 0.734 0.478
2DGS [30] Geometric LiDAR 15.72 0.674 0.507 20.67 0.743 0.418
Scaffold-GS [24] Anchored LiDAR 17.01 0.697 0.495 20.94 0.756 0.419
Ours w/ mono Anchored Mono 17.27 0.703 0.470 22.98 0.774 0.365
(a) (b)
Fig. 5. (a) With Residual-Form Gaussian Decoder (top) , only residual
TABLE II from nominal color is estimated and trained by the decoder, allowing
direct color initialization and fast training. Direct-Form Gaussian Decoder
R ESULTS ON TANKS AND T EMPLES DATASET (bottom) [24], [38] does not allow color initialization due to its on-the-
Method Type Input
Main Building Campus fly decoding scheme. (b) Rendering Performance (PSNR) ablation between
PSNR ↑ SSIM ↑ LPIPS ↓ PSNR ↑ SSIM ↑ LPIPS ↓ Direct-Form Color MLP and Residual-Form Color MLP.
3DGS [5] Baseline SfM 15.92 0.689 0.396 16.86 0.639 0.435
Mip-Splatting [43] Baseline SfM 15.44 0.683 0.366 16.39 0.646 0.417
TABLE III
CF-3DGS [8] Sequential Mono – – – 15.53 0.614 0.510
MonoGS [21] Sequential Mono 9.78 0.453 0.637 12.01 0.488 0.591 A BLATION OF EACH PROPOSED MODULE
GOF [31] Geometry SfM 15.49 0.680 0.377 16.56 0.636 0.459
2DGS [30] Geometry SfM 16.57 0.705 0.382 17.20 0.636 0.455 depth cal. res. MLP PSNR ↑ SSIM ↑ LPIPS ↓
Scaffold-GS [24] Anchored SfM 17.12 0.719 0.345 17.42 0.677 0.413
Ours w/ mono Anchored Mono 15.70 0.682 0.442 16.66 0.641 0.462 16.84 0.687 0.475
✓ 16.91 0.693 0.467
✓ 17.03 0.688 0.465
✓ ✓ 17.27 0.703 0.470
As shown in Table I, our method outperforms all state-
of-the-art 3DGS variants in terms of rendering performance.
Notably, all algorithms except the Anchored variants exhibit
significantly lower performance in the main building scene, is the anchor-based Scaffold-GS [24], validating our analysis
which presents considerable challenges due to the complexity of the degeneracy patterns associated with different algorithm
of its narrow corridors and hallways. In both scenes, our al- types (Fig. 2).
gorithm achieves state-of-the-art rendering performance even As shown in Table. II, our method still shows comparable
without using initial LiDAR point clouds. Monocular depth performance to other 3DGS methods. In this sense, Our
based scene initialization delivers much better or comparable method achieves a suitable balance between robustness to
performance to LiDAR based intiailzation in both scenes. limited multi-view information and high performance in
This is largely due to the inherent incompleteness of LiDAR densely captured datasets, demonstrating the most stable
point clouds in such complex environments. Our results performance across all the datasets from R3 LIVE and Tanks
demonstrate that effectively integrating monocular depths and Temples datasets.
can be more beneficial in these challenging scenes, directly C. Ablation Studies
generating pixel-aligned and complete point clouds. We evaluated our depth calibration framework and
B. Rendering evaluation of Tanks and Temples dataset residual-form Gaussian MLP in Table III. As shown in the
ablation results, each proposed module contributes to the
We also validate our method on the Tanks and Temples overall improvement in rendering performance. Additionally,
dataset, a widely recognized benchmark for neural rendering our residual-form Gaussian decoder enables fast initialization
evaluation. We select the courthouse scene and the meeting of Gaussian attributes, as illustrated in Fig. 5(a). For this
room scenes as they are geometrically most challenging in ablation, we used a direct-form MLP that generates color
the benchmarks [31]. Similar to the evaluation of R3 LIVE attributes from features, similar to [24], [38]. Compared
dataset, we subsample every 10th images. Instead of LiDAR to this direct-form MLP decoder, our proposed decoder
point clouds, we directly use SfM PCD for the other variants. significantly accelerates the training process (Fig. 5(b)).
Unlike the R3 LIVE odometry dataset, the trajectories in
this dataset follow object-centric or circular patterns, provid- VI. C ONCLUSIONS , L IMITATIONS , AND F UTURE W ORK
ing relatively dense observations to each part of scene. Due In this paper, we presented Mode-GS, a novel 3DGS
to this reason, our method does not achieve the best perfor- algorithm designed for robust neural rendering from ground-
mance. In these viewing patterns, the initial point cloud has robot trajectory datasets. Our algorithm introduces a practical
less impact on 3DGS algorithms, as splat cloning and split- rendering pipeline for ground-view robot datasets, utilizing
ting are effectively guided by salient multi-view photometric easily obtainable odometry poses and operating in a point-
gradients. It has even been shown that random initialization cloud-free setting. However, our approach is less effective in
can yield plausible results in such cases [44]. As monocular scenarios where extensive multi-view data is available, such
depth usually contains inevitable inner distortion that can not as in densely captured, object-centric datasets. Future work
be adjusted by scale factor, anchoring on these depths can be will focus on developing a hybrid approach that integrates
detrimental when enough photometric information is given. our method with non-anchored splats to achieve optimal
Nonetheless, the best-performing algorithm in this scenario performance.
R EFERENCES [22] N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer,
D. Ramanan, and J. Luiten, “SplaTAM: Splat Track & Map 3D
[1] M. Adamkiewicz, T. Chen, A. Caccavale, R. Gardner, P. Culbertson, Gaussians for Dense RGB-D SLAM,” in Proceedings of the IEEE/CVF
J. Bohg, and M. Schwager, “Vision-Only Robot Navigation in a Neural Conference on Computer Vision and Pattern Recognition (CVPR),
Radiance World,” IEEE Robotics and Automation Letters, vol. 7, no. 2, 2024.
pp. 4606–4613, 2022. [23] C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li,
[2] T. Chen, O. Shorinwa, J. Bruno, J. Yu, W. Zeng, K. Nagami, P. Dames, “GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting,” in
and M. Schwager, “Splat-Nav: Safe Real-Time Robot Navigation in Proceedings of the IEEE/CVF Conference on Computer Vision and
Gaussian Splatting Maps,” arXiv preprint arXiv:2403.02751, 2024. Pattern Recognition (CVPR), 2024.
[3] C. Maxey, J. Choi, H. Lee, D. Manocha, and H. Kwon, “UAV-Sim: [24] T. Lu, M. Yu, L. Xu, Y. Xiangli, L. Wang, D. Lin, and B. Dai,
NeRF-based Synthetic Data Generation for UAV-based Perception,” “Scaffold-GS: Structured 3D Gaussians for View-Adaptive Render-
arXiv preprint arXiv:2310.16255, 2023. ing,” in Proceedings of the IEEE/CVF Conference on Computer Vision
[4] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoor- and Pattern Recognition (CVPR), 2024.
thi, and R. Ng, “NeRF: Representing Scenes as Neural Radiance Fields [25] Q. Liu, H. Xin, Z. Liu, and H. Wang, “Integrating Neural Radiance
for View Synthesis,” Communications of the ACM, vol. 65, no. 1, pp. Fields End-to-End for Cognitive Visuomotor Navigation,” IEEE Trans-
99–106, 2021. actions on Pattern Analysis and Machine Intelligence, 2024.
[5] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3D Gaussian [26] L. Meyer, F. Erich, Y. Yoshiyasu, M. Stamminger, N. Ando, and Y. Do-
Splatting for Real-Time Radiance Field Rendering,” ACM Transac- mae, “PEGASUS: Physically Enhanced Gaussian Splatting Simulation
tions on Graphics, vol. 42, no. 4, pp. 1–14, 2023. System for 6DOF Object Pose Dataset Generation,” arXiv preprint
[6] M. Zwicker, H. Pfister, J. Van Baar, and M. Gross, “EWA splatting,” arXiv:2401.02281, 2024.
IEEE Transactions on Visualization and Computer Graphics, vol. 8, [27] V. Patil and M. Hutter, “Radiance Fields for Robotic Teleoperation,”
no. 3, pp. 223–238, 2002. arXiv preprint arXiv:2407.20194, 2024.
[7] P. Wang, Y. Liu, Z. Chen, L. Liu, Z. Liu, T. Komura, C. Theobalt, and [28] A. Guédon and V. Lepetit, “SuGaR: Surface-Aligned Gaussian Splat-
W. Wang, “F2-NeRF: Fast Neural Radiance Field Training with Free ting for Efficient 3D Mesh Reconstruction and High-Quality Mesh
Camera Trajectories,” in Proceedings of the IEEE/CVF Conference on Rendering,” in Proceedings of the IEEE/CVF Conference on Computer
Computer Vision and Pattern Recognition (CVPR), 2023. Vision and Pattern Recognition (CVPR), 2024, pp. 5354–5363.
[8] Y. Fu, S. Liu, A. Kulkarni, J. Kautz, A. A. Efros, and X. Wang, [29] H. Chen, C. Li, and G. H. Lee, “NeuSG: Neural Implicit Surface
“COLMAP-Free 3D Gaussian Splatting,” in Proceedings of the Reconstruction with 3D Gaussian Splatting Guidance,” arXiv preprint
IEEE/CVF Conference on Computer Vision and Pattern Recognition arXiv:2312.00846, 2023.
(CVPR), 2024. [30] B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2D Gaussian
[9] C. Liu, S. Chen, Y. Bhalgat, S. Hu, Z. Wang, M. Cheng, V. A. Splatting for Geometrically Accurate Radiance Fields,” in ACM SIG-
Prisacariu, and T. Braud, “GSLoc: Efficient Camera Pose Refinement GRAPH 2024 Conference Papers, 2024, pp. 1–11.
via 3D Gaussian Splatting,” arXiv preprint arXiv:2408.11085, 2024. [31] Z. Yu, T. Sattler, and A. Geiger, “Gaussian Opacity Fields: Effi-
[10] L. Zhao, P. Wang, and P. Liu, “BAD-Gaussians: Bundle Adjusted cient Adaptive Surface Reconstruction in Unbounded Scenes,” arXiv
Deblur Gaussian Splatting,” arXiv preprint arXiv:2403.11831, 2024. preprint arXiv:2404.10772, 2024.
[11] H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-NERF: Scal- [32] B. Zhang, C. Fang, R. Shrestha, Y. Liang, X. Long, and P. Tan,
able Construction of Large-Scale NeRFs for Virtual Fly-Throughs,” “RaDe-GS: Rasterizing Depth in Gaussian Splatting,” arXiv preprint
in Proceedings of the IEEE/CVF Conference on Computer Vision and arXiv:2406.01467, 2024.
Pattern Recognition (CVPR), 2022. [33] M. Turkulainen, X. Ren, I. Melekhov, O. Seiskari, E. Rahtu, and
[12] Y. Xiangli, L. Xu, X. Pan, N. Zhao, A. Rao, C. Theobalt, B. Dai, and J. Kannala, “DN-Splatter: Depth and Normal Priors for Gaussian
D. Lin, “BungeeNeRF: Progressive Neural Radiance Field for Ex- Splatting and Meshing,” arXiv preprint arXiv:2403.17822, 2024.
treme Multi-scale Scene Rendering,” in Proceedings of the European [34] K. Cheng, X. Long, K. Yang, Y. Yao, W. Yin, Y. Ma, W. Wang, and
Conference on Computer Vision (ECCV), 2022. X. Chen, “GaussianPro: 3D Gaussian Splatting with Progressive Prop-
[13] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, agation,” in Forty-first International Conference on Machine Learning,
“Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields,” 2024.
in Proceedings of the IEEE/CVF Conference on Computer Vision and [35] Z. Li, S. Yao, Y. Chu, A. F. Garcia-Fernandez, Y. Yue, E. G. Lim,
Pattern Recognition (CVPR), 2022. and X. Zhu, “MVG-Splatting: Multi-View Guided Gaussian Splatting
[14] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and Temples: with Adaptive Quantile-Based Geometric Consistency Densification,”
Benchmarking Large-Scale Scene Reconstruction,” ACM Transactions arXiv preprint arXiv:2407.11840, 2024.
on Graphics, vol. 36, no. 4, 2017. [36] Z. Fan, W. Cong, K. Wen, K. Wang, J. Zhang, X. Ding, D. Xu,
[15] Y. Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and B. Ivanovic, M. Pavone, G. Pavlakos, Z. Wang, and Y. Wang, “In-
benchmarks for urban scene understanding in 2d and 3d,” IEEE stantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in
Transactions on Pattern Analysis and Machine Intelligence, 2022. 40 Seconds,” arXiv preprint arXiv:2403.20309, 2024.
[16] J. Lin and F. Zhang, “R3 LIVE: A Robust, Real-time, RGB-colored, [37] W. Yin, C. Zhang, H. Chen, Z. Cai, G. Yu, K. Wang, X. Chen,
LiDAR-Inertial-Visual tightly-coupled state Estimation and mapping and C. Shen, “Metric3D: Towards Zero-shot Metric 3D Prediction
package,” in Proceedings of the IEEE International Conference on from A Single Image,” in Proceedings of the IEEE/CVF International
Robotics and Automation (ICRA), 2022. Conference on Computer Vision (ICCV), 2023.
[17] J. L. Schonberger and J.-M. Frahm, “Structure-From-Motion Revis- [38] E. Ververas, R. A. Potamias, J. Song, J. Deng, and S. Zafeiriou,
ited,” in Proceedings of the IEEE/CVF Conference on Computer Vision “SAGS: Structure-Aware 3D Gaussian Splatting,” arXiv preprint
and Pattern Recognition (CVPR), 2016. arXiv:2404.19149, 2024.
[18] C. Campos, R. Elvira, J. J. Gómez, J. M. M. Montiel, and J. D. Tardós, [39] S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and
“ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual- J. Saragih, “Mixture of Volumetric Primitives for Efficient Neural
Inertial and Multi-Map SLAM,” IEEE Transactions on Robotics, Rendering,” ACM Transactions on Graphics, vol. 40, no. 4, 2021.
vol. 37, no. 6, pp. 1874–1890, 2021. [Online]. Available: [Link]
[19] T. Shan and B. Englot, “LeGO-LOAM: Lightweight and Ground- [40] T. Xie, Z. Zong, Y. Qiu, X. Li, Y. Feng, Y. Yang, and C. Jiang, “Phys-
Optimized Lidar Odometry and Mapping on Variable Terrain,” in Gaussian: Physics-Integrated 3D Gaussians for Generative Dynamics,”
Proceedings of the IEEE/RSJ International Conference on Intelligent arXiv preprint arXiv:2311.12198, 2023.
Robots and Systems (IROS), 2018. [41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
[20] T. Qin, P. Li, and S. Shen, “VINS-Mono: A Robust and Versatile quality assessment: from error visibility to structural similarity,” IEEE
Monocular Visual-Inertial State Estimator,” IEEE Transactions on Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
Robotics, vol. 34, no. 4, pp. 1004–1020, 2018. [42] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
[21] H. Matsuki, R. Murai, P. H. Kelly, and A. J. Davison, “Gaussian Unreasonable Effectiveness of Deep Features as a Perceptual Metric,”
Splatting SLAM,” in Proceedings of the IEEE/CVF Conference on in Proceedings of the IEEE/CVF Conference on Computer Vision and
Computer Vision and Pattern Recognition (CVPR), 2024. Pattern Recognition (CVPR), 2018.
[43] Z. Yu, A. Chen, B. Huang, T. Sattler, and A. Geiger, “Mip-Splatting:
Alias-free 3D Gaussian Splatting,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR),
2024.
[44] J. Jung, J. Han, H. An, J. Kang, S. Park, and S. Kim, “Relaxing
Accurate Initialization Constraint for 3D Gaussian Splatting,” arXiv
preprint arXiv:2403.09413, 2024.