Software, Vol. 4, Pages 17: Investigating Reproducibility Challenges in LLM Bugfixing on the HumanEvalFix Benchmark

Greenberg July 15, 2025 in News - 1 Minute

Software, Vol. 4, Pages 17: Investigating Reproducibility Challenges in LLM Bugfixing on the HumanEvalFix Benchmark

Authors:
Balázs Szalontai
Balázs Márton
Balázs Pintér
Tibor Gregorics

Benchmark results for large language models often show inconsistencies across different studies. This paper investigates the challenges of reproducing these results in automatic bugfixing using LLMs, on the HumanEvalFix benchmark. To determine the cause of the differing results in the literature, we attempted to reproduce a subset of them by evaluating 12 models in the DeepSeekCoder, CodeGemma, CodeLlama, and WizardCoder model families, in different sizes and tunings. A total of 35 unique results were reported for these models across studies, of which we successfully reproduced 12. We identified several relevant factors that influenced the results. The base models can be confused with their instruction-tuned variants, making their results better than expected. Incorrect prompt templates or generation length can decrease benchmark performance, as well as using 4-bit quantization. Using sampling instead of greedy decoding can increase the variance, especially with higher temperature values. We found that precision and 8-bit quantization have less influence on benchmark results.

Source link

Balázs Szalontai www.mdpi.com

Greenberg

Learn More →

Related Posts

Applied Sciences, Vol. 15, Pages 7876: Upgrading Renewable Phenols to Functional Benzyl Chlorides and Formamides: Versatile Building Blocks for the Chemical Industry

Applied Sciences, Vol. 15, Pages 7877: Spatiotemporal Mechanical Effects of Framework&ndash;Slope Systems Under Frost Heave Conditions

Forests, Vol. 16, Pages 1162: Modelling Wood Product Service Lives and Residence Times for Biogenic Carbon in Harvested Wood Products: A Review of Half-Lives, Averages and Population Distributions

Greenberg

Applied Sciences, Vol. 15, Pages 7877: Spatiotemporal Mechanical Effects of Framework–Slope Systems Under Frost Heave Conditions