Aerial-ground Cross-modal Localization: Dataset, Ground-truth, and Benchmark

1University of Calgary, 2Nanyang Technological University, 3Wuhan University, 4Chongqing Technology and Business University
*Corresponding author

Main contributions:

(1) We present a new large-scale dataset that enables aerial-ground cross-modal localization by combining ground-level imagery from mobile mapping systems with ALS point clouds. The data span three representative urban areas—Wuhan, Hong Kong, and San Francisco—and will be made publicly accessible to the research community.

(2) We propose an indirect yet scalable approach for generating accurate 6-DoF ground-truth image poses. This is achieved by registering mobile LiDAR submaps to ALS data using ground segmentation and façade reconstruction, followed by multi-sensor pose graph optimization.

(3) We establish a unified benchmarking suite for both global and fine-grained I2P localization, and evaluate state-of-the-art methods under challenging cross-view and cross-modality conditions. Future research trends are summarized according to the evaluation results.

teaser

Global distribution of our dataset on the map.

teaser

Dataset coverage and collection. (a), (b) and (c) illustrate trajectories of the Hong Kong, California and Wuhan datasets, while (d), (e) and (f) are the corresponding data acquisition platforms, with (d) and (e) provided by authors of UrbanNav and UrbanLoco , respectively.

teaser
teaser

File structure of the dataset (taking Wuhan Loop 1 as an example).

teaser

Projection of ALS point clouds to ground images. (a), (b) and (c) are from Wuhan, Hong Kong and California datasets, respectively. Point clouds are colorized by depth, with colors ranging from blue (near) to red (far), through green and yellow.