HERO-SLAM: Hybrid Enhanced Robust Optimization of Neural SLAM
Zhe Xin, Yufeng Yue, Liangjun Zhang, Chenming Wu
Accepted by ICRA'24

Abstract

Simultaneous Localization and Mapping (SLAM) plays a crucial role in numerous applications including robotics, autonomous driving, and virtual reality. However, the robustness of SLAM, particularly in challenging or data-limited situations, remains an unresolved issue. In this paper, we introduce HERO-SLAM, a Hybrid Enhanced Robust Optimization methodology for neural SLAM, which combines the benefits of neural implicit field and feature-metric optimization. This unique hybrid method offers increased robustness in challenging environments, such as those involving sudden viewpoint changes or sparse data collection intervals. We also propose a hybrid novel optimization pipeline to optimize the multi-resolution implicit fields. Our comprehensive experimental results on benchmarking datasets validate the effectiveness of our hybrid approach, demonstrating its superior performance over existing implicit field-based methods in challenging scenarios. Overall, HERO-SLAM opens a new pathway toward improving the stability, performance, and applicability of visual SLAM methods in real-world applications.

Pipeline

Every newly captured frame would be aligned with the last frame for the camera pose estimation using feature-metric warping losses. The robustness and accuracy of tracking get improved, which in turn, facilitates the enhancement of mapping quality by optimizing the neural implicit field of multi-resolution feature encoding. The mapping module optimizes all keyframes from the keyframe database based on photometric reconstruction and depth supervision, following the volumetric rendering paradigm.

overview

Robustness of Neural SLAM

The visualization of mapping and tracking errors on Replica dataset of challenging sparse inputs with large motion changes. This paper introduces a robust system for real-time dense 3D reconstruction, dubbed HERO-SLAM, which synergistically leverages the capabilities of neural implicit fields and feature-metric optimization, demonstrating exceptional resilience to large viewpoint changes and ensuring efficient runtime performance.

kitti

Experiment Results

Qualitative visualization of results among different approaches. Our reconstructions are smoother, more complete, and have fewer artifacts compared to other advanced methods on the ScanNet dataset

kitti

Quantitative results of all eight scenes on the Replica dataset. Our method achieves better reconstruction quality and has the best average performance in all metrics, even with low-frequency image sequences.

kitti

Reconstruction results on Replica dataset using NICE-SLAM's culling strategy (unit: cm).

kitti

Camera tracking results on TUM RGB-D dataset. Our method achieves the best performance and is robust to large view changes. Non-continuous scenes are marked with an asterisk. Trajectories with errors larger than 30 cm are denoted as FAILED across the paper.

kitti

Run-time, frame rate comparison, and pose estimation performance with different iterations when tracking.

kitti

Acknowledgement

The website template was borrowed from Michaƫl Gharbi and Ben Mildenhall.