Sensors, Vol. 25, Pages 6163: VLA-MP: A Vision-Language-Action Framework for Multimodal Perception and Physics-Constrained Action Generation in Autonomous Driving

Greenberg October 5, 2025 in News - 1 Minute

Sensors, Vol. 25, Pages 6163: VLA-MP: A Vision-Language-Action Framework for Multimodal Perception and Physics-Constrained Action Generation in Autonomous Driving

Sensors doi: 10.3390/s25196163

Authors:
Maoning Ge
Kento Ohtani
Yingjie Niu
Yuxiao Zhang
Kazuya Takeda

Autonomous driving in complex real-world environments requires robust perception, reasoning, and physically feasible planning, which remain challenging for current end-to-end approaches. This paper introduces VLA-MP, a unified vision-language-action framework that integrates multimodal Bird&rsquo;s-Eye View (BEV) perception, vision-language alignment, and a GRU-bicycle dynamics cascade adapter for physics-informed action generation. The system constructs structured environmental representations from RGB images and LiDAR, aligns scene features with natural language instructions through a cross-modal projector and large language model, and converts high-level semantic hidden states outputs into executable and physically consistent trajectories. Experiments on the LMDrive dataset and CARLA simulator demonstrate that VLA-MP achieves high performance across the LangAuto benchmark series, with best driving scores of 44.3, 63.5, and 78.4 on LangAuto, LangAuto-Short, and LangAuto-Tiny, respectively, while maintaining high infraction scores of 0.89&ndash;0.95, outperforming recent VLA methods such as LMDrive and AD-H. Visualization and video results further validate the framework&rsquo;s ability to follow complex language-conditioned instructions, adapt to dynamic environments, and prioritize safety. These findings highlight the potential of combining multimodal perception, language reasoning, and physics-aware adapters for robust and interpretable autonomous driving.

Source link

Maoning Ge www.mdpi.com

Greenberg

Learn More →

Related Posts

Children, Vol. 12, Pages 1338: Effects of the SmartACT Intervention on Motor and Psychological Variables in Adolescent Athletes: A Controlled Trial Using BlazePod and Microgate

Sensors, Vol. 25, Pages 6164: Radon Exposure Assessment: IoT-Embedded Sensors

JFMK, Vol. 10, Pages 387: Exploratory Analysis on Physiological and Biomechanical Correlates of Performance in the CrossFit Benchmark Workout Fran

Greenberg