Applied Sciences, Vol. 15, Pages 5683: Large Language Model-Powered Automated Assessment: A Systematic Review
Applied Sciences doi: 10.3390/app15105683
Authors:
Emrah Emirtekin
This systematic review investigates 49 peer-reviewed studies on Large Language Model-Powered Automated Assessment (LLMPAA) published between 2018 and 2024. Following PRISMA guidelines, studies were selected from Web of Science, Scopus, IEEE, ACM Digital Library, and PubMed databases. The analysis shows that LLMPAA has been widely applied in reading comprehension, language education, and computer science, primarily using essay and short-answer formats. While models such as GPT-4 and fine-tuned BERT often exhibit high agreement with human raters (e.g., QWK = 0.99, r = 0.95), other studies report lower agreement (e.g., ICC = 0.45, r = 0.38). LLMPAA offers benefits like efficiency, scalability, and personalized feedback. However, significant challenges remain, including bias, inconsistency, hallucination, limited explainability, dataset quality, and privacy concerns. These findings indicate that while LLMPAA technologies hold promise, their effectiveness varies by context. Human oversight is essential to ensure fair and reliable assessment outcomes.
Source link
Emrah Emirtekin www.mdpi.com