Algorithms, Vol. 19, Pages 96: Instruction-Tuned Decoder-Only Large Language Models for Efficient Extreme Summarization on Consumer-Grade GPUs
Algorithms doi: 10.3390/a19020096
Authors:
Attia Fathalla Elatiky
Ahmed M. Hamad
Heba Khaled
Mahmoud Fayez
Extreme summarization generates very short summaries, typically a single sentence, answering the question “What is the document about?”. Although large language models perform well in text generation, fine-tuning them for summarization often requires substantial computational resources that are unavailable to many researchers. In this study, we present an effective method for instruction-tuning open decoder-only large language models under limited GPU resources. The proposed approach combines parameter-efficient fine-tuning techniques, such as Low-Rank Adaptation (LoRA), with quantization to reduce memory requirements, enabling training on a single consumer-grade GPU. We fine-tuned a pre-trained decoder-only model on the XSum dataset using an instruction-following format. Experimental results demonstrate that the proposed decoder-only approach achieves competitive performance on the XSum dataset under strict GPU memory constraints. On the full test set, the proposed 2G–1R pipeline attains ROUGE-1/2/L F1 scores of 46.0/22.0/37.0 and a BERTScore F1 of 0.917, outperforming the individual generator models in lexical overlap and semantic similarity. Evaluation was conducted using traditional overlap-based metrics (ROUGE) and semantic metrics, including BERTScore and G-Eval. While remaining competitive in ROUGE compared to strong encoder–decoder baselines, the pipeline consistently produces summaries with higher semantic quality. These findings demonstrate that large decoder-only language models can be efficiently fine-tuned for extreme summarization on limited consumer-grade hardware without sacrificing output quality.
Source link
Attia Fathalla Elatiky www.mdpi.com
