Applied Sciences, Vol. 15, Pages 7786: Diagnostic Accuracy and Agreement Between AI and Clinicians in Orthodontic 3D Model Analysis


Applied Sciences, Vol. 15, Pages 7786: Diagnostic Accuracy and Agreement Between AI and Clinicians in Orthodontic 3D Model Analysis

Applied Sciences doi: 10.3390/app15147786

Authors:
Sabahattin Bor
Fırat Oğuz
Ayla Khanmohammadi

Background: Artificial intelligence (AI) is increasingly integrated into orthodontic workflows, including digital model analysis modules embedded in orthodontic software. While these systems offer efficiency and automation, the accuracy and clinical reliability of AI-generated measurements and diagnostic assessments remain unclear. Therefore, to use AI systems safely and effectively in clinical orthodontics, it is important to check their results by comparing them with those of experienced orthodontists. Methods: Digital models of 48 patients were analyzed by the Orthodontist group and two AI platforms: Titan (full) and SoftSmile (Bolton only). Three orthodontists independently measured all variables using 3Shape OrthoAnalyzer, and group means were used for comparison. A subset of models was reanalyzed after two weeks to assess consistency. Data distribution was evaluated, and appropriate statistical tests were applied. Reliability was assessed using intraclass correlation coefficients (ICC) and Cohen’s kappa. Results: Almost perfect agreement was observed between the orthodontists and Titan AI in molar classification (κ = 0.955 right, κ = 0.900 left; p < 0.001), with perfect agreement reported across all groups—including between the orthodontists themselves—for Angle classification (κ = 1.00). In anterior and overall Bolton analyses, no meaningful agreement was found between the orthodontists and AI platforms. However, in a subset of patients where all three methods identified the tooth size discrepancy in the same arch (either maxilla or mandible), no significant differences were found in anterior (p = 0.226) or overall Bolton values (p = 0.795). Overjet, overbite, and space analysis values showed significant differences between the orthodontist and Titan groups (p < 0.001). ICC analysis indicated good to excellent intra- and inter-rater reliability within the orthodontist group (≥0.77), while both AI systems demonstrated excellent internal consistency, with ICC values exceeding 0.95. Conclusions: AI-based platforms showed high agreement with orthodontists only in Angle classification. While their performance in Bolton analysis was limited, significant differences were observed in other linear measurements, indicating the need for further refinement before clinical use.



Source link

Sabahattin Bor www.mdpi.com