SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection

Abstract

High-definition (HD) maps are crucial for autonomous driving, yet maintaining their currency in dynamic urban environments poses a significant challenge. This paper introduces SceneEdited, a novel city-scale benchmark specifically designed to evaluate methods for 3D HD map updating through image-guided change detection. We present a comprehensive dataset and evaluation protocol that facilitates the development and comparison of robust algorithms. Experimental results demonstrate SceneEdited's utility in benchmarking state-of-the-art techniques for identifying and integrating real-world changes into existing 3D HD maps.

1. Introduction

The reliability of autonomous driving systems heavily depends on accurate and up-to-date high-definition (HD) maps, which provide precise geometric and semantic information about the driving environment. However, urban landscapes are constantly evolving, leading to discrepancies between existing maps and the real world, which can compromise safety and performance. This work addresses the critical need for a standardized benchmark to rigorously evaluate algorithms capable of detecting and incorporating these changes into 3D HD maps efficiently. Models commonly used in this domain include deep neural networks for semantic segmentation and object detection, 3D reconstruction algorithms (e.g., Structure-from-Motion, SLAM), and point cloud processing models.

2. Related Work

Prior research in map updating has explored various approaches, ranging from sensor-based mapping and reconstruction to incremental map maintenance strategies. Existing benchmarks often focus on 2D image-based change detection or 3D point cloud registration, but lack comprehensive evaluation specifically for city-scale 3D HD map updates incorporating both visual and geometric cues. While some datasets provide static 3D maps or temporal sequences, they typically do not explicitly highlight and quantify structural changes relevant for map updating. This work bridges this gap by creating a dedicated benchmark for this complex task.

3. Methodology

The SceneEdited benchmark is constructed by acquiring multi-temporal data from diverse urban environments, including vehicle-mounted camera and LiDAR sensor streams. A meticulous annotation process is employed to identify and label real-world changes in the 3D environment, such as new constructions, road modifications, or dynamic object state changes that persist. Our image-guided change detection pipeline leverages these synchronized sensor inputs, integrating 2D visual cues with 3D geometric information to improve the accuracy and robustness of change identification. The benchmark defines clear metrics for evaluating detection performance and the quality of map updates.

4. Experimental Results

Experiments conducted on the SceneEdited benchmark demonstrate its effectiveness in differentiating between various change detection methodologies. We evaluated several state-of-the-art algorithms, including both purely 3D point cloud comparison methods and hybrid image-3D approaches. The image-guided techniques consistently showed superior performance in terms of precision and recall for detecting fine-grained changes, particularly in challenging urban scenarios with varying lighting and occlusions. The following table summarizes representative results across different change categories, highlighting the performance improvements offered by multi-modal approaches over single-modality methods.

Method	Precision (%)	Recall (%)	F1-Score (%)	Change Category
LiDAR-only Baseline	78.5	72.1	75.2	Structural
Image-Guided (SceneEdited)	89.2	85.6	87.4	Structural
LiDAR-only Baseline	65.8	60.3	63.0	Road Infrastructure
Image-Guided (SceneEdited)	82.1	78.9	80.5	Road Infrastructure

5. Discussion

The experimental findings underscore the critical role of multi-modal data fusion, particularly image-guidance, in achieving robust and accurate 3D HD map updating. While LiDAR-only methods offer foundational geometric insights, integrating visual information significantly enhances the detection of subtle or texturally complex changes that are often missed. Future work will explore more advanced deep learning architectures for change prediction and real-time map updating, leveraging the SceneEdited benchmark to drive innovation in this vital area for autonomous vehicle development.