JCTO, Vol. 3, Pages 25: Evaluating a Multi-Modal Large Language Model for Ophthalmology Triage
Journal of Clinical & Translational Ophthalmology doi: 10.3390/jcto3040025
Authors:
Caius Goh
Jabez Ng
Wei Yung Au
Clarence See
Alva Lim
Jun Wen Zheng
Xiuyi Fan
Kelvin Li
Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal framework integrating clinical text and ophthalmic imaging with large language models (LLMs). Textual consistency filtering and chain-of-thought (CoT) reasoning were incorporated to improve diagnostic accuracy. Methods: A dataset of 56 ophthalmology cases from a Singapore restructured hospital was pre-processed with acronym expansion, sentence reconstruction, and textual consistency filtering. To address dataset size limitations, 100 synthetic cases were generated via one-shot GPT-4 prompting, validated by semantic checks and ophthalmologist review. Three diagnostic approaches were tested: Text-Only, Image-Assisted, and Image with CoT. Diagnostic performance was quantified using a novel SNOMED-CT-based dissimilarity score, defined as the shortest path distance between predicted and reference diagnoses in the ontology, which was used to quantify semantic alignment. Results: The synthetic dataset included anterior segment (n = 40), posterior segment (n = 35), and extraocular (n = 25) cases. The text-only approach yielded a mean dissimilarity of 6.353 (95% CI: 4.668, 8.038). Incorporation of image assistance reduced this to 5.234 (95% CI: 3.930, 6.540), while CoT prompting provided further gains when imaging cues were ambiguous. Conclusions: The multimodal pipeline showed potential in improving diagnostic alignment in ophthalmology triage. Image inputs enhanced accuracy, and CoT reasoning reduced errors from ambiguous features, supporting its feasibility as a pilot framework for ophthalmology triage.
Source link
Caius Goh www.mdpi.com
