Applied Sciences, Vol. 16, Pages 986: Eye Gaze Detection Using a Hybrid Multimodal Deep Learning Model for Assistive Technology


Applied Sciences, Vol. 16, Pages 986: Eye Gaze Detection Using a Hybrid Multimodal Deep Learning Model for Assistive Technology

Applied Sciences doi: 10.3390/app16020986

Authors:
Verdzekov Emile Tatinyuy
Noumsi Woguia Auguste Vigny
Mvogo Ngono Joseph
Fono Louis Aimé
Wirba Pountianus Berinyuy

This paper presents a novel hybrid multimodal deep learning model for robust and real-time eye gaze estimation. Accurate gaze tracking is essential for advancing human–computer interaction (HCI) and assistive technologies, but existing methods often struggle with environmental variations, require extensive calibration, and are computationally intensive. Our proposed model, GazeNet-HM, addresses these limitations by synergistically fusing features from RGB, depth, and infrared (IR) imaging modalities. This multimodal approach allows the model to leverage complementary information: RGB provides rich texture, depth offers invariance to lighting and aids pose estimation, and IR ensures robust pupil detection. Furthermore, we introduce a personalized adaptation module that dynamically fine-tunes the model to individual users with minimal calibration data. To ensure practical deployment, we employ advanced model compression techniques, enabling real-time inference on resource-constrained embedded systems. Extensive evaluations on public datasets (MPIIGaze, EYEDIAP, Gaze360) and our collected M-Gaze dataset demonstrate that GazeNet-HM achieves state-of-the-art performance, reducing the mean angular error by up to 27.1% compared to leading unimodal methods. After model compression, the system achieves a real-time inference speed of 32 FPS on an embedded Jetson Xavier NX platform. Ablation studies confirm the contribution of each modality and component, highlighting the effectiveness of our holistic design.



Source link

Verdzekov Emile Tatinyuy www.mdpi.com