Hybrid Fine-Tuning of Large Language Models Using LoRA: Enhancing Multi-Task Text Classification Through Knowledge Sharing

Beiranvand, A.; Sarhadi, M.; Salimi Sartakhti, J.

doi:10.22061/jecei.2025.11314.794

Document Type : Original Research Paper

Authors

Department of Computer Engineering, University of Kashan, Kashan, Iran.

https://doi.org/10.22061/jecei.2025.11314.794

Abstract

Background and Objectives: Large Language Models have demonstrated ‎exceptional performance across various NLP tasks, especially when fine-tuned for ‎specific applications.‎‏ ‏Full fine-tuning of large language models requires extensive ‎computational resources, which are often unavailable in real-world settings. While ‎Low-Rank Adaptation (LoRA) has emerged as a promising solution to mitigate these ‎challenges, its potential remains largely untapped in multi-task scenarios. This ‎study addresses this gap by introducing a novel hybrid approach that combines ‎LoRA with an attention-based mechanism, enabling fine-tuning across tasks while ‎facilitating knowledge sharing to improve generalization and efficiency.‎‏ ‏‎ This study ‎aims to address this gap by introducing a novel hybrid fine-tuning approach using ‎LoRA for multi-task text classification, with a focus on inter-task knowledge sharing ‎to enhance overall model performance.‎
Methods: We proposed a hybrid fine-tuning method that utilizes LoRA to fine-tune ‎LLMs across multiple tasks simultaneously. By employing an attention mechanism, ‎this approach integrates outputs from various task-specific models, facilitating ‎cross-task knowledge sharing. The attention layer dynamically prioritizes relevant ‎information from different tasks, enabling the model to benefit from ‎complementary insights. ‎
Results: The hybrid fine-tuning approach demonstrated significant improvements ‎in accuracy across multiple text classification tasks. On different NLP tasks, the ‎model showed superior generalization and precision compared to conventional ‎single-task LoRA fine-tuning. Additionally, the model exhibited better scalability ‎and computational efficiency, as it required fewer resources to achieve comparable ‎or better performance. Cross-task knowledge sharing through the attention ‎mechanism was found to be a critical factor in achieving these performance gains.‎
Conclusion: The proposed hybrid fine-tuning method enhances the accuracy and ‎efficiency of LLMs in multi-task settings by enabling effective knowledge sharing ‎between tasks. This approach offers a scalable and resource-efficient solution for ‎real-world applications requiring multi-task learning, paving the way for more ‎robust and generalized NLP models. ‎

Keywords

Main Subjects

Text Classification

Open Access

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/

Publisher’s Note

JECEI Publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Publisher

Shahid Rajaee Teacher Training University

References

[1] K. I. Roumeliotis, N. D. Tselikas, "Chatgpt and open-ai models: A preliminary review," Future Internet, 15(6): 192, 2023.

[2] T. Brown et al., "Language models are few-shot learners," in Proc. Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 33: 1877-1901, 2020.

[3] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

[4] K. Lv, Y. Yang, T. Liu, Q. Gao, Q. Guo, X. Qiu, "Full parameter fine-tuning for large language models with limited resources," arXiv preprint arXiv:2306.09782, 2023.

[5] E. J. Hu et al., "Lora: Low-rank adaptation of large language models," arXiv preprint arXiv:2106.09685, 2021.

[6] H. Touvron et al., "Llama 2: Open foundation and fine-tuned chat models," arXiv preprint arXiv:2307.09288, 2023.

[7] Hugging Face. https://huggingface.co/, 2023.

[8] Eric Wang. Alpaca-lora. https://github.com/tloen/alpaca-lora, 2023.

[9] A. Vaswani et al., "Attention is all you need," in Proc. Advances in neural information processing systems 30 (NIPS 2017), 2017.

[10] J. Achiam et al., "Gpt-4 technical report," arXiv preprint arXiv:2303.08774, 2023.

[11] C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer," J. Mach. Learn. Res., 21(140): 1-67, 2020.

[12] D. Narayanan et al., "Efficient large-scale language model training on gpu clusters using megatron-lm," in Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis: 1-15, 2021.

[13] O. Sharir, B. Peleg, Y. Shoham, "The cost of training nlp models: A concise overview," arXiv preprint arXiv:2004.08900, 2020.

[14] S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, S. Paul, B. Bossan, "Peft: State-of-the-art parameter-efficient fine-tuning methods," 2022.

[15] A. Hernández, J. M. Amigó, "Attention mechanisms and their applications to complex systems," Entropy, 23(3): 283, 2021.

[16] S. Dathathri et al., "Plug and play language models: A simple approach to controlled text generation," arXiv preprint arXiv:1912.02164, 2019.

[17] C. Sun, X. Qiu, Y. Xu, X. Huang, "How to fine-tune bert for text classification?," in Proc. Chinese computational linguistics: 18th China National Conference (CCL 2019): 194-206, 2019.

[18] I. Yamada, A. Asai, H. Shindo, H. Takeda, Y. Matsumoto, "LUKE: Deep contextualized entity representations with entity-aware self-attention," arXiv preprint arXiv:2010.01057, 2020.

[19] R. Nogueira, K. Cho, "Passage Re-ranking with BERT," arXiv preprint arXiv:1901.04085, 2019.

[20] D. Khashabi et al., "Unifiedqa: Crossing format boundaries with a single qa system," arXiv preprint arXiv:2005.00700, 2020.

[21] J. Pfeiffer et al., "Adapterhub: A framework for adapting transformers," arXiv preprint arXiv:2007.07779, 2020.

[22] X. L. Li, P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," arXiv preprint arXiv:2101.00190, 2021.

[23] A. C. Stickland, I. Murray, "Bert and pals: Projected attention layers for efficient adaptation in multi-task learning," in Proc. International Conference on Machine Learning, PMLR: 5986-5995, 2019.

[24] L. Zhang, L. Zhang, S. Shi, X. Chu, B. Li, "Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning," arXiv preprint arXiv:2308.03303, 2023.

[25] D. Cer et al., "Universal sentence encoder," arXiv preprint arXiv:1803.11175, 2018.

[26] N. Shazeer et al., "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer," arXiv preprint arXiv:1701.06538, 2017.

[27] X. Wang, L. Aitchison, M. Rudolph, "LoRA ensembles for large language model fine-tuning," arXiv preprint arXiv:2310.00035, 2023.

[28] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, S. R. Bowman, "GLUE: A multi-task benchmark and analysis platform for natural language understanding," arXiv preprint arXiv:1804.07461, 2018.

[29] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, C. Potts, "Learning word vectors for sentiment analysis," in Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: 142-150, 2011.

[30] X. Zhang, J. Zhao, Y. LeCun, "Character-level convolutional networks for text classification," in Proc. Advances in neural information processing systems 28 (NIPS 2015), 2015.

[31] X. Li, D. Roth, "Learning question classifiers," in Proc. COLING 2002: The 19th International Conference on Computational Linguistics, 2002.

[32] V. Sanh, L. Debut, J. Chaumond, T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," arXiv preprint arXiv:1910.01108, 2019.

[33] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, "Electra: Pre-training text encoders as discriminators rather than generators," arXiv preprint arXiv:2003.10555, 2020.

[34] T. Wolf et al., "Transformers: State-of-the-art natural language processing," in Proc. the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations: 38-45, 2020.

LETTERS TO EDITOR

Journal of Electrical and Computer Engineering Innovations (JECEI) welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in JECEI should be sent to the editorial office of JECEI within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Journal of Electrical and Computer Engineering Innovations (JECEI)

Hybrid Fine-Tuning of Large Language Models Using LoRA: Enhancing Multi-Task Text Classification Through Knowledge Sharing

References

References

Send comment about this article

Volume 13, Issue 2
July 2025
Pages 417-430

Hybrid Fine-Tuning of Large Language Models Using LoRA: Enhancing Multi-Task Text Classification Through Knowledge Sharing

References

References

Send comment about this article

Volume 13, Issue 2July 2025Pages 417-430

Volume 13, Issue 2
July 2025
Pages 417-430