Applied Sciences, Vol. 15, Pages 8860: Analyzing LLM Sentencing Variability in Theft Indictments Across Gender, Family Status and the Value of the Stolen Item
Applied Sciences doi: 10.3390/app15168860
Authors:
Karol Struniawski
Ryszard Kozera
Aleksandra Konopka
As large language models (LLMs) increasingly enter high-stakes decision-making contexts, questions arise about their suitability in domains requiring normative judgment, such as judicial sentencing. This study investigates whether LLMs exhibit bias when tasked with sentencing decisions in Polish criminal law, despite clear legal norms that prohibit considering extralegal factors. The simulated sentencing scenarios for theft offenses use two leading open-source LLMs (LLaMA and Mixtral) and systematically vary three defendant characteristics: gender, number of children, and the value of the stolen item. While none of these variables should legally affect sentence length under Polish law, our results reveal statistically significant disparities, particularly in how female defendants with children are treated. The non-parametric tests (Kruskal–Wallis and Mann–Whitney U) and correlation analysis were applied to quantify these effects. Our findings raise concerns about the normative reliability of LLMs and their alignment with principles of fairness and legality. From a jurisprudential perspective, we contrast the implicit logic of LLM sentencing with theoretical models of adjudication, including Dworkin’s moral interpretivism and Posner’s pragmatism. This work contributes to ongoing debates on the integration of AI in legal systems, highlighting both the empirical risks and the philosophical limitations of computational legal reasoning.
Source link
Karol Struniawski www.mdpi.com