RichardErkhov commited on
Commit
a0e3ce7
·
verified ·
1 Parent(s): 5092c43

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +410 -0
README.md ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ LongAlpaca-70B - GGUF
11
+ - Model creator: https://huggingface.co/Yukang/
12
+ - Original model: https://huggingface.co/Yukang/LongAlpaca-70B/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [LongAlpaca-70B.Q2_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q2_K.gguf) | Q2_K | 23.71GB |
18
+ | [LongAlpaca-70B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.IQ3_XS.gguf) | IQ3_XS | 26.37GB |
19
+ | [LongAlpaca-70B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.IQ3_S.gguf) | IQ3_S | 27.86GB |
20
+ | [LongAlpaca-70B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q3_K_S.gguf) | Q3_K_S | 27.86GB |
21
+ | [LongAlpaca-70B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.IQ3_M.gguf) | IQ3_M | 28.82GB |
22
+ | [LongAlpaca-70B.Q3_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q3_K.gguf) | Q3_K | 30.99GB |
23
+ | [LongAlpaca-70B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q3_K_M.gguf) | Q3_K_M | 30.99GB |
24
+ | [LongAlpaca-70B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q3_K_L.gguf) | Q3_K_L | 33.67GB |
25
+ | [LongAlpaca-70B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.IQ4_XS.gguf) | IQ4_XS | 34.64GB |
26
+ | [LongAlpaca-70B.Q4_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q4_0.gguf) | Q4_0 | 36.2GB |
27
+ | [LongAlpaca-70B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.IQ4_NL.gguf) | IQ4_NL | 36.55GB |
28
+ | [LongAlpaca-70B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/blob/main/LongAlpaca-70B.Q4_K_S.gguf) | Q4_K_S | 36.55GB |
29
+ | [LongAlpaca-70B.Q4_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q4_K | 38.58GB |
30
+ | [LongAlpaca-70B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q4_K_M | 38.58GB |
31
+ | [LongAlpaca-70B.Q4_1.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q4_1 | 40.2GB |
32
+ | [LongAlpaca-70B.Q5_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q5_0 | 44.2GB |
33
+ | [LongAlpaca-70B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q5_K_S | 44.2GB |
34
+ | [LongAlpaca-70B.Q5_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q5_K | 45.41GB |
35
+ | [LongAlpaca-70B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q5_K_M | 45.41GB |
36
+ | [LongAlpaca-70B.Q5_1.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q5_1 | 48.2GB |
37
+ | [LongAlpaca-70B.Q6_K.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q6_K | 52.7GB |
38
+ | [LongAlpaca-70B.Q8_0.gguf](https://huggingface.co/RichardErkhov/Yukang_-_LongAlpaca-70B-gguf/tree/main/) | Q8_0 | 68.26GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ # LongLoRA and LongAlpaca for Long-context LLMs
45
+
46
+
47
+ [![Huggingface Models](https://img.shields.io/badge/Models-Huggingface%20Models-bron)](https://huggingface.co/Yukang)
48
+ [![Github](https://img.shields.io/badge/Github-Repo-cyan)](https://github.com/dvlab-research/LongLoRA)
49
+ [![Data](https://img.shields.io/badge/Data-LongAlpaca%2012k-light)](https://huggingface.co/datasets/Yukang/LongAlpaca-12k)
50
+ [![Paper](https://img.shields.io/badge/Paper-Arvix-blue)](https://arxiv.org/abs/2309.12307)
51
+
52
+ [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-yellow.svg)](https://github.com/dvlab-research/LongLoRA/blob/main/LICENSE)
53
+ [![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-orange.svg)](https://github.com/dvlab-research/LongLoRA/blob/main/DATA_LICENSE)
54
+ [![Weight License](https://img.shields.io/badge/Weight%20License-CC%20By%20NC%204.0-red)](https://github.com/dvlab-research/LongLoRA/blob/main/WEIGHT_LICENSE)
55
+
56
+ For detailed usage and codes, please visit the [Github project](https://github.com/dvlab-research/LongLoRA).
57
+ ## TABLE OF CONTENTS
58
+ 1. [News](#news)
59
+ 2. [Examples](#examples)
60
+ 3. [Highlights](#highlights)
61
+ 4. [How to contribute](#how-to-contribute)
62
+ 5. [Requirements](#usage-requirements)
63
+ 6. [Installation and quick guide](#installation-and-quick-guide)
64
+ 7. [LongAlpaca Data](#longalpaca-data)
65
+ 8. [Models](#models)
66
+ 9. [Training](#training)
67
+ 10. [Evaluation](#evaluation)
68
+ 11. [Demo](#demo)
69
+ 12. [Data Generation via Pdf2Text](#data-generation-via-pdf2text)
70
+ 13. [Citation](#citation)
71
+ 14. [Acknowledgement](#acknowledgement)
72
+ 15. [License](#license)
73
+
74
+ ## News
75
+ - [x] [2023.10.8] **We release the long instruction-following dataset**, [LongAlpaca-12k](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) and **the corresponding models**, [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B), and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B).
76
+ - (*The previous sft models*, [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) and [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), *have been depreciated*.)
77
+ - [x] [2023.10.3] We add support GPTNeoX models. Please refer to this [PR](https://github.com/dvlab-research/LongLoRA/pull/32) for usage. Thanks for @naubull2 for this contribution.
78
+ - [x] [2023.9.22] We release all our fine-tuned [models](https://huggingface.co/Yukang), including **70B-32k models**, [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k), [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft). Welcome to check them out!
79
+ - [x] [2023.9.22] We release [Paper](http://arxiv.org/abs/2309.12307) and this GitHub repo, including training and evaluation code.
80
+
81
+ **LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [[Paper](http://arxiv.org/abs/2309.12307)]** <br />
82
+ [Yukang Chen](https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en),
83
+ [Shengju Qian](https://scholar.google.com/citations?user=QNnWmasAAAAJ),
84
+ [Haotian Tang](https://scholar.google.com/citations?user=WxL13BAAAAAJ&hl),
85
+ [Xin Lai](https://scholar.google.com/citations?user=tqNDPA4AAAAJ&hl=zh-CN),
86
+ [Zhijian Liu](https://scholar.google.com/citations?user=3coYSTUAAAAJ&hl=en),
87
+ [Song Han](https://scholar.google.com/citations?user=E0iCaa4AAAAJ&hl=zh-CN),
88
+ [Jiaya Jia](https://scholar.google.com/citations?user=XPAkzTEAAAAJ&hl=en)<br />
89
+
90
+ ## Highlights
91
+ 1. In LongLoRA approach, The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and is not required during inference.
92
+ 2. We released all our models, including models from 7B to 70B, context length from 8k to 100k, including [LLaMA2-LongLoRA-7B-100k](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft), [LLaMA2-LongLoRA-13B-64k](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k), and [LLaMA2-LongLoRA-70B-32k](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k).
93
+ 3. We built up a long-context instruction-following dataset, [LongAlpaca-12k](#longalpaca-data). We released the corresponding [LongAlpaca-7B](https://huggingface.co/Yukang/LongAlpaca-7B), [LongAlpaca-13B](https://huggingface.co/Yukang/LongAlpaca-13B) and [LongAlpaca-70B](https://huggingface.co/Yukang/LongAlpaca-70B) models. To our best knowledge, this is the first open-sourced long-context 70B model.
94
+
95
+ ## How to Contribute
96
+ - Make sure to have git installed.
97
+ - Create your own [fork](https://github.com/dvlab-research/LongLoRA/fork) of the project.
98
+ - Clone the repository on your local machine, using git clone and pasting the url of this project.
99
+ - Read both the `Requirements` and `Installation and Quick Guide` sections below.
100
+ - Commit and push your changes.
101
+ - Make a pull request when finished modifying the project.
102
+
103
+
104
+ ## Usage Requirements
105
+ To download and use the [pre-trained weights](#pre-trained-weights) you will need:
106
+ 1. Hugging Face (HF) account with valid email. Note, the email used for HF must alse be used for the license agreement.
107
+ 2. Accept the Meta [license and acceptable use policy](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
108
+
109
+
110
+ ## Installation and Quick Guide
111
+ To install and run the application:
112
+ 1. [Fork this repo](https://github.com/dvlab-research/LongLoRA/fork) on github
113
+ 2. Clone the repository on your local machine, using git clone and pasting the url of this project.
114
+ 3. Run the following code:
115
+ ```
116
+ pip install -r requirements.txt
117
+ pip install flash-attn --no-build-isolation
118
+ ```
119
+ 4. Use either a [Released model](#released-models) or [Fine tune](#fine-tuning) a model to fit your preferences.
120
+ 5. Test your model by chat.
121
+ 6. Deploy your own demo.
122
+
123
+ ## LongAlpaca Data
124
+
125
+ LongAlpaca-12k contains 9k long QA data that we collected and 3k short QA sampled from the original [Alpaca data](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). This is to avoid the case that the model might degrade at short instruction following. The data we collect contains various types and amounts as the following figure.
126
+
127
+ | Data | Short QA | Long QA | Total | Download |
128
+ |:---------------|----------|----------|----------|----------|
129
+ | LongAlpaca-12k | 3k | 9k | 12k | [Link](https://huggingface.co/datasets/Yukang/LongAlpaca-12k) |
130
+
131
+ Following the original Alpaca format, our Long QA data uses the following prompts for fine-tuning:
132
+ - `instruction`: `str`, describes the task the model should perform. For example, to answer a question after reading a book section or paper. We vary the contents and questions to make instructions diverse.
133
+ - `output`: `str`, the answer to the instruction.
134
+
135
+ We did not use the `input` format in the Alpaca format for simplicity.
136
+
137
+ ## Models
138
+
139
+ ### Models with supervised fine-tuning
140
+ | Model | Size | Context | Train | Link |
141
+ |:---------------|------|---------|---------|-----------------------------------------------------------------------------------------------------------------------|
142
+ | LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) |
143
+ | LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) |
144
+ | LongAlpaca-70B | 70B | 32768 | LoRA+ | [Model](https://huggingface.co/Yukang/LongAlpaca-70B) [(LoRA-weight)](https://huggingface.co/Yukang/LongAlpaca-70B-lora) |
145
+
146
+
147
+ ### Models with context extension via fully fine-tuning
148
+ | Model | Size | Context | Train | Link |
149
+ |:----------------------------|------|---------|-------|-------------------------------------------------------------------|
150
+ | Llama-2-7b-longlora-8k-ft | 7B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k-ft) |
151
+ | Llama-2-7b-longlora-16k-ft | 7B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k-ft) |
152
+ | Llama-2-7b-longlora-32k-ft | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k-ft) |
153
+ | Llama-2-7b-longlora-100k-ft | 7B | 100000 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-7b-longlora-100k-ft) |
154
+ | Llama-2-13b-longlora-8k-ft | 13B | 8192 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k-ft) |
155
+ | Llama-2-13b-longlora-16k-ft | 13B | 16384 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k-ft) |
156
+ | Llama-2-13b-longlora-32k-ft | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k-ft) |
157
+
158
+ ### Models with context extension via improved LoRA fine-tuning
159
+ | Model | Size | Context | Train | Link |
160
+ |:----------------------------|------|---------|-------|---------------------------------------------------------------------|
161
+ | Llama-2-7b-longlora-8k | 7B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-8k) |
162
+ | Llama-2-7b-longlora-16k | 7B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-16k) |
163
+ | Llama-2-7b-longlora-32k | 7B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-7b-longlora-32k) |
164
+ | Llama-2-13b-longlora-8k | 13B | 8192 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-8k) |
165
+ | Llama-2-13b-longlora-16k | 13B | 16384 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-16k) |
166
+ | Llama-2-13b-longlora-32k | 13B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-32k) |
167
+ | Llama-2-13b-longlora-64k | 13B | 65536 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-13b-longlora-64k) |
168
+ | Llama-2-70b-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-longlora-32k) |
169
+ | Llama-2-70b-chat-longlora-32k | 70B | 32768 | LoRA+ | [LoRA-weight](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k) |
170
+
171
+ ## Training
172
+ ### Pre-trained weights
173
+ We use LLaMA2 models as the pre-trained weights and fine-tune them to long context window sizes. Download based on your choices.
174
+
175
+ | Pre-trained weights |
176
+ |:-------------------------------------------------------------------------------------|
177
+ | [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
178
+ |[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
179
+ | [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
180
+ | [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
181
+ | [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) |
182
+ | [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) |
183
+
184
+ This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
185
+
186
+ ### Fine-tuning
187
+ ```
188
+ torchrun --nproc_per_node=8 fine-tune.py \
189
+ --model_name_or_path path_to/Llama-2-7b-hf \
190
+ --bf16 True \
191
+ --output_dir path_to_saving_checkpoints \
192
+ --cache_dir path_to_cache \
193
+ --model_max_length 8192 \
194
+ --use_flash_attn True \
195
+ --low_rank_training False \
196
+ --num_train_epochs 1 \
197
+ --per_device_train_batch_size 1 \
198
+ --per_device_eval_batch_size 2 \
199
+ --gradient_accumulation_steps 8 \
200
+ --evaluation_strategy "no" \
201
+ --save_strategy "steps" \
202
+ --save_steps 1000 \
203
+ --save_total_limit 2 \
204
+ --learning_rate 2e-5 \
205
+ --weight_decay 0.0 \
206
+ --warmup_steps 20 \
207
+ --lr_scheduler_type "constant_with_warmup" \
208
+ --logging_steps 1 \
209
+ --deepspeed "ds_configs/stage2.json" \
210
+ --tf32 True \
211
+ --max_steps 1000
212
+ ```
213
+
214
+ - Please remember to change `path_to/Llama-2-7b-hf`, `path_to_saving_checkpoints`, `path_to_cache` to your own directory.
215
+ - Note that you can change `model_max_length` to other values.
216
+ - You could change `ds_configs/stage2.json` to `ds_configs/stage3.json` if you want.
217
+ - Please set `use_flash_attn` as `False` if you use V100 machines or do not install flash attention.
218
+ - You can set `low_rank_training` as `False` if you want to use fully fine-tuning. It will cost more GPU memory and slower, but the performance will be a bit better.
219
+ - When training is finished, to get the full model weight:
220
+ ```
221
+ cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin
222
+ ```
223
+
224
+ ### Supervised Fine-tuning
225
+ ```
226
+ torchrun --nproc_per_node=8 supervised-fine-tune.py \
227
+ --model_name_or_path path_to_Llama2_chat_models \
228
+ --bf16 True \
229
+ --output_dir path_to_saving_checkpoints \
230
+ --model_max_length 32768 \
231
+ --use_flash_attn True \
232
+ --data_path LongAlpaca-12k.json \
233
+ --low_rank_training True \
234
+ --num_train_epochs 3 \
235
+ --per_device_train_batch_size 1 \
236
+ --per_device_eval_batch_size 2 \
237
+ --gradient_accumulation_steps 1 \
238
+ --evaluation_strategy "no" \
239
+ --save_strategy "steps" \
240
+ --save_steps 1000 \
241
+ --save_total_limit 2 \
242
+ --learning_rate 2e-5 \
243
+ --weight_decay 0.0 \
244
+ --warmup_steps 20 \
245
+ --lr_scheduler_type "constant_with_warmup" \
246
+ --logging_steps 1 \
247
+ --deepspeed "ds_configs/stage2.json" \
248
+ --tf32 True
249
+ ```
250
+ - There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT.
251
+ - Our long instruction following data can be found in [LongAlpaca-12k.json](https://huggingface.co/datasets/Yukang/LongAlpaca-12k).
252
+
253
+
254
+ ### Get trainable weights in low-rank training
255
+ In low-rank training, we set embedding and normalization layers as trainable. Please use the following line to extract the trainable weights `trainable_params.bin` from `pytorch_model.bin`
256
+ ```
257
+ python3 get_trainable_weights.py --checkpoint_path path_to_saving_checkpoints --trainable_params "embed,norm"
258
+ ```
259
+
260
+ ### Merge LoRA Weight
261
+ Merge the LoRA weights of `pytorch_model.bin` and trainable parameters `trainable_params.bin`, save the resulting model into your desired path in the Hugging Face format:
262
+ ```
263
+ python3 merge_lora_weights_and_save_hf_model.py \
264
+ --base_model path_to/Llama-2-7b-hf \
265
+ --peft_model path_to_saving_checkpoints \
266
+ --context_size 8192 \
267
+ --save_path path_to_saving_merged_model
268
+ ```
269
+ For example,
270
+ ```
271
+ python3 merge_lora_weights_and_save_hf_model.py \
272
+ --base_model /dataset/pretrained-models/Llama-2-7b-hf \
273
+ --peft_model /dataset/yukangchen/hf_models/lora-models/Llama-2-7b-longlora-8k \
274
+ --context_size 8192 \
275
+ --save_path /dataset/yukangchen/models/Llama-2-7b-longlora-8k-merged
276
+ ```
277
+
278
+
279
+ ## Evaluation
280
+ ### Perplexity Validation
281
+ To evaluate a model that is trained in the low-rank setting, please set both `base_model` and `peft_model`. `base_model` is the pre-trained weight. `peft_model` is the path to the saved checkpoint, which should contain `trainable_params.bin`, `adapter_model.bin` and `adapter_config.json`. For example,
282
+ ```
283
+ python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to/Llama-2-7b-hf --peft_model path_to_saving_checkpoints --data_path pg19/test.bin
284
+ ```
285
+
286
+ To evaluate a model that is fully fine-tuned, you only need to set `base_model` as the path to the saved checkpoint, which should contain `pytorch_model.bin` and `config.json`. `peft_model` should be ignored.
287
+ ```
288
+ python3 eval.py --seq_len 8192 --context_size 8192 --batch_size 1 --base_model path_to_saving_checkpoints --data_path pg19/test.bin
289
+ ```
290
+
291
+ - Note that `--seq_len` is to set the sequence length for evaluation. `--context_size` is to set the context length of the model during fine-tuning. `--seq_len` should not be larger than `--context_size`.
292
+
293
+ - We have already tokenized the validation and test splits of PG19 and proof-pile dataset into `pg19/validation.bin`, `pg19/test.bin`, and `proof-pile/test_sampled_data.bin`, with the tokenizer of LLaMA. `proof-pile/test_sampled_data.bin` contains 128 documents that are randomly sampled from the total proof-pile test split. For each document, it has at least 32768 tokens. We also release the sampled ids in [proof-pile/test_sampled_ids.bin](https://drive.google.com/file/d/1cnzWODLRQYAd7HeugzLCIhaqzaLZv7J5/view?usp=share_link). You can download them from the links below.
294
+
295
+ | Dataset | Split | Link |
296
+ |:-----------|------------|--------------------------------------------------------------------------------------------------------------|
297
+ | PG19 | validation | [pg19/validation.bin](https://drive.google.com/file/d/1rbJvb0qRIf2mQoN2ON7S93TbTzMnlrN6/view?usp=share_link) |
298
+ | PG19 | test | [pg19/test.bin](https://drive.google.com/file/d/1QANDMdctpacPAYgS04adDXqByGEq-Ret/view?usp=share_link) |
299
+ | Proof-pile | test | [proof-pile/test_sampled_data.bin](https://drive.google.com/file/d/1bUI5lPDvrqzY_XXJJ2sSuvZx0Y9AZClE/view?usp=share_link) |
300
+
301
+
302
+ ### Passkey Retrieval
303
+ We provide a manner to test the passkey retrieval accuracy. For example,
304
+ ```
305
+ python3 passkey_retrivial.py \
306
+ --context_size 32768 \
307
+ --base_model path_to/Llama-2-7b-longlora-32k \
308
+ --max_tokens 32768 \
309
+ --interval 1000
310
+ ```
311
+ - Note that the `context_size` is the context length during fine-tuning.
312
+ - `max_tokens` is maximum length for the document in passkey retrieval evaluation.
313
+ - `interval` is the interval during the document length increasing. It is a rough number because the document increases by sentences.
314
+
315
+ ## Demo
316
+ ### Local Inference
317
+ To chat with [Llama-2-13b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-13b-chat-longlora-32k-sft) or [Llama-2-70b-chat-longlora-32k-sft](https://huggingface.co/Yukang/Llama-2-70b-chat-longlora-32k-sft), you need to run `merge_lora_weights_and_save_hf_model.py` first, and then:
318
+ ```
319
+ python3 inference.py \
320
+ --base_model path_to_model \
321
+ --question $question \
322
+ --context_size $context_length \
323
+ --max_gen_len $max_gen_len \
324
+ --flash_attn True \
325
+ --material $material_content \
326
+ --material_type $material_type \
327
+ --material_title $material_title
328
+ ```
329
+ To ask a question related to a book:
330
+ ```
331
+ python3 inference.py \
332
+ --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
333
+ --question "Why doesn't Professor Snape seem to like Harry?" \
334
+ --context_size 32768 \
335
+ --max_gen_len 512 \
336
+ --flash_attn True \
337
+ --material "materials/Harry Potter and the Philosophers Stone_section2.txt" \
338
+ --material_type "book" \
339
+ --material_title "Harry Potter and the Philosophers Stone"
340
+ ```
341
+ Note that you can ignore `material_type` or `material_title`.
342
+
343
+ To ask a question related to a paper:
344
+ ```
345
+ python3 inference.py \
346
+ --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
347
+ --question "What are the main contributions and novelties of this work?" \
348
+ --context_size 32768 \
349
+ --max_gen_len 512 \
350
+ --flash_attn True \
351
+ --material "materials/paper1.txt" \
352
+ --material_type "paper"
353
+ ```
354
+
355
+ ### Online Demo
356
+ To deploy your own demo run
357
+ ```
358
+ python3 demo.py \
359
+ --base_model path_to_model \
360
+ --context_size $context_size \
361
+ --max_gen_len $max_gen_len \
362
+ --flash_attn True
363
+ ```
364
+ Example
365
+ ```
366
+ python3 demo.py \
367
+ --base_model /data/models/Llama-2-13b-chat-longlora-32k-sft \
368
+ --context_size 32768 \
369
+ --max_gen_len 512 \
370
+ --flash_attn True
371
+ ```
372
+ - Note that `flash_attn=True` will make the generation slow but save much GPU memory.
373
+
374
+ ## Data Generation via Pdf2text
375
+ During our dataset collection, we convert paper and books from pdf to text. The conversion quality has a large influence on the final model quality. We think that this step is non-trivial. We release the tool for the pdf2txt conversion, in the folder `pdf2txt`. It is built upon `pdf2image`, `easyocr`, `ditod` and `detectron2`. Please refer to the [README.md](pdf2txt/README.md) in `pdf2txt` for more details.
376
+
377
+ ## Citation
378
+ If you find this project useful in your research, please consider citing:
379
+
380
+ ```
381
+ @article{longlora,
382
+ title={LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models},
383
+ author={Yukang Chen and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},
384
+ journal={arXiv:2309.12307},
385
+ year={2023}
386
+ }
387
+ ```
388
+
389
+
390
+ ```
391
+ @misc{long-alpaca,
392
+ author = {Yukang Chen and Shaozuo Yu and Shengju Qian and Haotian Tang and Xin Lai and Zhijian Liu and Song Han and Jiaya Jia},
393
+ title = {Long Alpaca: Long-context Instruction-following models},
394
+ year = {2023},
395
+ publisher = {GitHub},
396
+ journal = {GitHub repository},
397
+ howpublished = {\url{https://github.com/dvlab-research/LongLoRA}},
398
+ }
399
+ ```
400
+ ## Acknowledgement
401
+ - This work is built upon the [LLaMA2](https://ai.meta.com/llama) as the pre-trained models.
402
+ - This work can also be built upon the [GPTNeoX-HF](https://huggingface.co/docs/transformers/model_doc/gpt_neox) which is based upon [EleutherAI/GPTNeoX](https://github.com/EleutherAI/gpt-neox) as the pre-trained model architecture.
403
+ - This work is based on [DeepSpeed](https://github.com/microsoft/DeepSpeed), [peft](https://github.com/huggingface/peft), and [Flash-Attention2](https://github.com/Dao-AILab/flash-attention) for acceleration.
404
+ - Some evaluation code is modified upon [Landmark Attention](https://github.com/epfml/landmark-attention).
405
+ - We use [LongChat](https://github.com/DachengLi1/LongChat) for the retrieval evaluation.
406
+
407
+ ## License
408
+ - LongLoRA is licensed under the Apache License 2.0. This means that it requires the preservation of copyright and license notices.
409
+ - Data and weights are under CC-BY-NC 4.0 License. They are licensed for research use only, and allowed only non-commercial. Models trained using the dataset should not be used outside of research purposes.
410
+