Drones, Vol. 9, Pages 379: An Efficient Pyramid Transformer Network for Cross-View Geo-Localization in Complex Terrains


Drones, Vol. 9, Pages 379: An Efficient Pyramid Transformer Network for Cross-View Geo-Localization in Complex Terrains

Drones doi: 10.3390/drones9050379

Authors:
Chengjie Ju
Wangping Xu
Nanxing Chen
Enhui Zheng

Unmanned aerial vehicle (UAV) self-localization in complex environments is critical when global navigation satellite systems (GNSSs) are unreliable. Existing datasets, often limited to low-altitude urban scenes, hinder generalization. This study introduces Multi-UAV, a novel dataset with 17.4 k high-resolution UAV–satellite image pairs from diverse terrains (urban, rural, mountainous, farmland, coastal) and altitudes across China, enhancing cross-view geolocalization research. We propose a lightweight value reduction pyramid transformer (VRPT) for efficient feature extraction and a residual feature pyramid network (RFPN) for multi-scale feature fusion. Using meter-level accuracy (MA@K) and relative distance score (RDS), VRPT achieves robust, high-precision localization across varied terrains, offering significant potential for resource-constrained UAV deployment.



Source link

Chengjie Ju www.mdpi.com