Papers
arxiv:2310.08659

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Published on Oct 12, 2023
· Submitted by akhaliq on Oct 16, 2023

Abstract

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

If I understand this paper correctly, we can do quantized fintuning with LoftQ but the quantized weight we obtain will always be specific to the dataset that was used for the finetuning. We cannot train an adapter a1 on frozen model M and later train an adapter a2 on (M+a1) both frozen in let say 8bit or 4bit right

LoftQ is not task-specific. You can fine-tune any dataset with the same quantized model M and initial adapter a0. Your requirement of training an adapter a2 on (M+a1) is definitely feasible.

Moreover, LoftQ supports multi-task learning, which is the original motivation of LoRA. With, again, the same quantized model M and initial adapter a0, you can obtain many adapters a1, a2, ..., an, from different datasets, and plug each of them into the same quantized model M for deployment.

Hi Team,
LoftQ can be used for Vision Foundation models like OWL-ViT v2 and Grounding Dino?

Reference code regarding this will be helpful.

thanks

Sign up or log in to comment

Models citing this paper 25

Browse 25 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.08659 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2310.08659 in a Space README.md to link it from this page.

Collections including this paper 12