Electronics, Vol. 14, Pages 4720: Multi-Feature Fusion for Automatic Piano Transcription Based on Mel Cyclic and STFT Spectrograms
Electronics doi: 10.3390/electronics14234720
Authors:
Jinliang Dai
Qiuyue Zheng
Yang Wang
Qihuan Shan
Jie Wan
Weiwei Zhang
Automatic piano transcription (APT) is a challenging problem in music information retrieval. In recent years, most APT approaches have been based on neural networks and have demonstrated higher performance. However, most previous works utilize a short-time Fourier transform (STFT) spectrogram as input, which results in a noisy spectrogram due to the mixing of harmonics from concurrent notes. To address this issue, a novel APT network based on two spectrograms is proposed. Firstly, the Mel cyclic and Mel STFT spectrograms of the piano musical signal are computed to represent the mixed audio. Next, separate modules for onset, offset, and frame-level note detection are constructed to achieve distinct objectives. To capture the temporal dynamics of notes, an axial attention mechanism is incorporated into the frame-level note detection modules. Finally, a multi-feature fusion module is introduced to aggregate different features and generate the piano note sequences. In this work, the two spectrograms provide complementary information, the axial attention mechanism enhances the temporal relevance of notes, and the multi-feature fusion module incorporates frame-level note, note onset, and note offset features together to deduce final piano notes. Experimental results demonstrate that the proposed approach achieves higher accuracies with lower error rates in automatic piano transcription compared with other reference approaches.
Source link
Jinliang Dai www.mdpi.com
