Electronics, Vol. 14, Pages 3874: FFMamba: Feature Fusion State Space Model Based on Sound Event Localization and Detection


Electronics, Vol. 14, Pages 3874: FFMamba: Feature Fusion State Space Model Based on Sound Event Localization and Detection

Electronics doi: 10.3390/electronics14193874

Authors:
Yibo Li
Dongyuan Ge
Jieke Xu
Xifan Yao

Previous studies on Sound Event Localization and Detection (SELD) have primarily focused on CNN- and Transformer-based designs. While CNNs possess local receptive fields, making it difficult to capture global dependencies over long sequences, Transformers excel at modeling long-range dependencies but have limited sensitivity to local time–frequency features. Recently, the VMamba architecture, built upon the Visual State Space (VSS) model, has shown great promise in handling long sequences, yet it remains limited in modeling local spatial details. To address this issue, we propose a novel state space model with an attention-enhanced feature fusion mechanism, termed FFMamba, which balances both local spatial modeling and long-range dependency capture. At a fine-grained level, we design two key modules: the Multi-Scale Fusion Visual State Space (MSFVSS) module and the Wavelet Transform-Enhanced Downsampling (WTED) module. Specifically, the MSFVSS module integrates a Multi-Scale Fusion (MSF) component into the VSS framework, enhancing its ability to capture both long-range temporal dependencies and detailed local spatial information. Meanwhile, the WTED module employs a dual-branch design to fuse spatial and frequency domain features, improving the richness of feature representations. Comparative experiments were conducted on the DCASE2021 Task 3 and DCASE2022 Task 3 datasets. The results demonstrate that the proposed FFMamba model outperforms recent approaches in capturing long-range temporal dependencies and effectively integrating multi-scale audio features. In addition, ablation studies confirmed the effectiveness of the MSFVSS and WTED modules.



Source link

Yibo Li www.mdpi.com