J. Imaging, Vol. 12, Pages 24: LLM-Based Pose Normalization and Multimodal Fusion for Facial Expression Recognition in Extreme Poses


J. Imaging, Vol. 12, Pages 24: LLM-Based Pose Normalization and Multimodal Fusion for Facial Expression Recognition in Extreme Poses

Journal of Imaging doi: 10.3390/jimaging12010024

Authors:
Bohan Chen
Bowen Qu
Yu Zhou
Han Huang
Jianing Guo
Yanning Xian
Longxiang Ma
Jinxuan Yu
Jingyu Chen

Facial expression recognition (FER) technology has progressively matured over time. However, existing FER methods are primarily optimized for frontal face images, and their recognition accuracy significantly degrades when processing profile or large-angle rotated facial images. Consequently, this limitation hinders the practical deployment of FER systems. To mitigate the interference caused by large pose variations and improve recognition accuracy, we propose a FER method based on profile-to-frontal transformation and multimodal learning. Specifically, we first leverage the visual understanding and generation capabilities of Qwen-Image-Edit that transform profile images to frontal viewpoints, preserving key expression features while standardizing facial poses. Second, we introduce the CLIP model to enhance the semantic representation capability of expression features through vision–language joint learning. The qualitative and quantitative experiments on the RAF (89.39%), EXPW (67.17%), and AffectNet-7 (62.66%) datasets demonstrate that our method outperforms the existing approaches.



Source link

Bohan Chen www.mdpi.com