michaelfeil
commited on
Commit
·
7b61394
1
Parent(s):
3a2eae3
Upload togethercomputer/RedPajama-INCITE-Chat-7B-v0.1 ctranslate fp16 weights
Browse files- README.md +263 -0
- config.json +5 -0
- generation_config.json +6 -0
- model.bin +3 -0
- special_tokens_map.json +5 -0
- tokenizer.json +0 -0
- tokenizer_config.json +9 -0
- vocabulary.txt +0 -0
README.md
ADDED
@@ -0,0 +1,263 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- ctranslate2
|
4 |
+
- int8
|
5 |
+
- float16
|
6 |
+
|
7 |
+
license: apache-2.0
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
datasets:
|
11 |
+
- togethercomputer/RedPajama-Data-1T
|
12 |
+
- OpenAssistant/oasst1
|
13 |
+
- databricks/databricks-dolly-15k
|
14 |
+
widget:
|
15 |
+
- text: "<human>: Write an email to my friends inviting them to come to my home on Friday for a dinner party, bring their own food to share.\n<bot>:"
|
16 |
+
example_title: "Email Writing"
|
17 |
+
- text: "<human>: Create a list of things to do in San Francisco\n<bot>:"
|
18 |
+
example_title: "Brainstorming"
|
19 |
+
inference:
|
20 |
+
parameters:
|
21 |
+
temperature: 0.7
|
22 |
+
top_p: 0.7
|
23 |
+
top_k: 50
|
24 |
+
max_new_tokens: 128
|
25 |
+
---
|
26 |
+
# # Fast-Inference with Ctranslate2
|
27 |
+
Speedup inference by 2x-8x using int8 inference in C++
|
28 |
+
|
29 |
+
quantized version of [togethercomputer/RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1)
|
30 |
+
```bash
|
31 |
+
pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0
|
32 |
+
```
|
33 |
+
Converted on 2023-05-19 using
|
34 |
+
```
|
35 |
+
ct2-transformers-converter --model togethercomputer/RedPajama-INCITE-Chat-7B-v0.1 --output_dir /home/michael/tmp-ct2fast-RedPajama-INCITE-Chat-7B-v0.1 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
|
36 |
+
```
|
37 |
+
|
38 |
+
Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
|
39 |
+
- `compute_type=int8_float16` for `device="cuda"`
|
40 |
+
- `compute_type=int8` for `device="cpu"`
|
41 |
+
|
42 |
+
```python
|
43 |
+
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
|
44 |
+
from transformers import AutoTokenizer
|
45 |
+
|
46 |
+
model_name = "michaelfeil/ct2fast-RedPajama-INCITE-Chat-7B-v0.1"
|
47 |
+
# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
|
48 |
+
model = GeneratorCT2fromHfHub(
|
49 |
+
# load in int8 on CUDA
|
50 |
+
model_name_or_path=model_name,
|
51 |
+
device="cuda",
|
52 |
+
compute_type="int8_float16",
|
53 |
+
tokenizer=AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1")
|
54 |
+
)
|
55 |
+
outputs = model.generate(
|
56 |
+
text=["How do you call a fast Flan-ingo?", "User: How are you doing?"],
|
57 |
+
)
|
58 |
+
print(outputs)
|
59 |
+
```
|
60 |
+
|
61 |
+
# Licence and other remarks:
|
62 |
+
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
|
63 |
+
|
64 |
+
# Original description
|
65 |
+
|
66 |
+
tags:
|
67 |
+
- ctranslate2
|
68 |
+
- int8
|
69 |
+
- float16
|
70 |
+
|
71 |
+
|
72 |
+
# RedPajama-INCITE-Chat-7B-v0.1
|
73 |
+
|
74 |
+
RedPajama-INCITE-Chat-7B-v0.1 was developed by Together and leaders from the open-source AI community including Ontocord.ai, ETH DS3Lab, AAI CERC, Université de Montréal, MILA - Québec AI Institute, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group and LAION.
|
75 |
+
|
76 |
+
It is fine-tuned on OASST1 and Dolly2 to enhance chatting ability.
|
77 |
+
|
78 |
+
- Base Model: [RedPajama-INCITE-Base-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
|
79 |
+
- Instruction-tuned Version: [RedPajama-INCITE-Instruct-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1)
|
80 |
+
- Chat Version: [RedPajama-INCITE-Chat-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1)
|
81 |
+
|
82 |
+
|
83 |
+
## Model Details
|
84 |
+
- **Developed by**: Together Computer.
|
85 |
+
- **Model type**: Language Model
|
86 |
+
- **Language(s)**: English
|
87 |
+
- **License**: Apache 2.0
|
88 |
+
- **Model Description**: A 6.9B parameter pretrained language model.
|
89 |
+
|
90 |
+
# Quick Start
|
91 |
+
|
92 |
+
Please note that the model requires `transformers` version >= 4.25.1.
|
93 |
+
|
94 |
+
To prompt the chat model, use the following format:
|
95 |
+
```
|
96 |
+
<human>: [Instruction]
|
97 |
+
<bot>:
|
98 |
+
```
|
99 |
+
|
100 |
+
## GPU Inference
|
101 |
+
|
102 |
+
This requires a GPU with 16GB memory.
|
103 |
+
|
104 |
+
```python
|
105 |
+
import torch
|
106 |
+
import transformers
|
107 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
108 |
+
|
109 |
+
MIN_TRANSFORMERS_VERSION = '4.25.1'
|
110 |
+
|
111 |
+
# check transformers version
|
112 |
+
assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
|
113 |
+
|
114 |
+
# init
|
115 |
+
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1")
|
116 |
+
model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", torch_dtype=torch.float16)
|
117 |
+
model = model.to('cuda:0')
|
118 |
+
# infer
|
119 |
+
prompt = "<human>: Who is Alan Turing?\n<bot>:"
|
120 |
+
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
|
121 |
+
input_length = inputs.input_ids.shape[1]
|
122 |
+
outputs = model.generate(
|
123 |
+
**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
|
124 |
+
)
|
125 |
+
token = outputs.sequences[0, input_length:]
|
126 |
+
output_str = tokenizer.decode(token)
|
127 |
+
print(output_str)
|
128 |
+
"""
|
129 |
+
Alan Mathison Turing (23 June 1912 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, mathematician, and theoretical biologist.
|
130 |
+
"""
|
131 |
+
```
|
132 |
+
|
133 |
+
## GPU Inference in Int8
|
134 |
+
|
135 |
+
This requires a GPU with 12GB memory.
|
136 |
+
|
137 |
+
To run inference with int8, please ensure you have installed accelerate and bitandbytes. You can install them with the following command:
|
138 |
+
|
139 |
+
```bash
|
140 |
+
pip install accelerate
|
141 |
+
pip install bitsandbytes
|
142 |
+
```
|
143 |
+
|
144 |
+
Then you can run inference with int8 as follows:
|
145 |
+
|
146 |
+
```python
|
147 |
+
import torch
|
148 |
+
import transformers
|
149 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
150 |
+
|
151 |
+
MIN_TRANSFORMERS_VERSION = '4.25.1'
|
152 |
+
|
153 |
+
# check transformers version
|
154 |
+
assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
|
155 |
+
|
156 |
+
# init
|
157 |
+
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1")
|
158 |
+
model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", device_map='auto', torch_dtype=torch.float16, load_in_8bit=True)
|
159 |
+
|
160 |
+
# infer
|
161 |
+
prompt = "<human>: Who is Alan Turing?\n<bot>:"
|
162 |
+
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
|
163 |
+
input_length = inputs.input_ids.shape[1]
|
164 |
+
outputs = model.generate(
|
165 |
+
**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
|
166 |
+
)
|
167 |
+
token = outputs.sequences[0, input_length:]
|
168 |
+
output_str = tokenizer.decode(token)
|
169 |
+
print(output_str)
|
170 |
+
"""
|
171 |
+
Alan Mathison Turing (23 June 1912 – 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, and theoretical biologist.
|
172 |
+
"""
|
173 |
+
```
|
174 |
+
|
175 |
+
## CPU Inference
|
176 |
+
|
177 |
+
```python
|
178 |
+
import torch
|
179 |
+
import transformers
|
180 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
181 |
+
|
182 |
+
MIN_TRANSFORMERS_VERSION = '4.25.1'
|
183 |
+
|
184 |
+
# check transformers version
|
185 |
+
assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
|
186 |
+
|
187 |
+
# init
|
188 |
+
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1")
|
189 |
+
model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", torch_dtype=torch.bfloat16)
|
190 |
+
# infer
|
191 |
+
prompt = "<human>: Who is Alan Turing?\n<bot>:"
|
192 |
+
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
|
193 |
+
input_length = inputs.input_ids.shape[1]
|
194 |
+
outputs = model.generate(
|
195 |
+
**inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
|
196 |
+
)
|
197 |
+
token = outputs.sequences[0, input_length:]
|
198 |
+
output_str = tokenizer.decode(token)
|
199 |
+
print(output_str)
|
200 |
+
"""
|
201 |
+
Alan Mathison Turing, OBE, FRS, (23 June 1912 – 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, and theoretical biologist.
|
202 |
+
"""
|
203 |
+
```
|
204 |
+
|
205 |
+
Please note that since `LayerNormKernelImpl` is not implemented in fp16 for CPU, we use `bfloat16` for CPU inference.
|
206 |
+
|
207 |
+
|
208 |
+
# Uses
|
209 |
+
|
210 |
+
## Direct Use
|
211 |
+
|
212 |
+
Excluded uses are described below.
|
213 |
+
|
214 |
+
### Misuse, Malicious Use, and Out-of-Scope Use
|
215 |
+
|
216 |
+
It is the responsibility of the end user to ensure that the model is used in a responsible and ethical manner.
|
217 |
+
|
218 |
+
#### Out-of-Scope Use
|
219 |
+
|
220 |
+
`RedPajama-INCITE-Chat-7B-v0.1` is a language model and may not perform well for other use cases outside of its intended scope.
|
221 |
+
For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
|
222 |
+
It is important to consider the limitations of the model and to only use it for its intended purpose.
|
223 |
+
|
224 |
+
#### Misuse and Malicious Use
|
225 |
+
|
226 |
+
`RedPajama-INCITE-Chat-7B-v0.1` is designed for language modeling.
|
227 |
+
Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the project.
|
228 |
+
|
229 |
+
Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
|
230 |
+
|
231 |
+
- Generating fake news, misinformation, or propaganda
|
232 |
+
- Promoting hate speech, discrimination, or violence against individuals or groups
|
233 |
+
- Impersonating individuals or organizations without their consent
|
234 |
+
- Engaging in cyberbullying or harassment
|
235 |
+
- Defamatory content
|
236 |
+
- Spamming or scamming
|
237 |
+
- Sharing confidential or sensitive information without proper authorization
|
238 |
+
- Violating the terms of use of the model or the data used to train it
|
239 |
+
- Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming
|
240 |
+
|
241 |
+
## Limitations
|
242 |
+
|
243 |
+
`RedPajama-INCITE-Chat-7B-v0.1`, like other language models, has limitations that should be taken into consideration.
|
244 |
+
For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
|
245 |
+
We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
|
246 |
+
|
247 |
+
## Training
|
248 |
+
|
249 |
+
**Training Data**
|
250 |
+
|
251 |
+
Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
|
252 |
+
|
253 |
+
**Training Procedure**
|
254 |
+
|
255 |
+
- **Hardware:** 8 A100
|
256 |
+
- **Optimizer:** Adam
|
257 |
+
- **Gradient Accumulations**: 1
|
258 |
+
- **Num of Tokens:** 131M tokens
|
259 |
+
- **Learning rate:** 1e-5
|
260 |
+
|
261 |
+
## Community
|
262 |
+
|
263 |
+
Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
|
config.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": "<|endoftext|>",
|
3 |
+
"eos_token": "<|endoftext|>",
|
4 |
+
"unk_token": "<|endoftext|>"
|
5 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 0,
|
4 |
+
"eos_token_id": 0,
|
5 |
+
"transformers_version": "4.28.1"
|
6 |
+
}
|
model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cbab5c9ca2a8bb76dc0da12acf26b08c0ea1b31a7254e27b6bd2b40cee0d43b0
|
3 |
+
size 13714629236
|
special_tokens_map.json
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": "<|endoftext|>",
|
3 |
+
"eos_token": "<|endoftext|>",
|
4 |
+
"unk_token": "<|endoftext|>"
|
5 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_prefix_space": false,
|
3 |
+
"bos_token": "<|endoftext|>",
|
4 |
+
"clean_up_tokenization_spaces": true,
|
5 |
+
"eos_token": "<|endoftext|>",
|
6 |
+
"model_max_length": 2048,
|
7 |
+
"tokenizer_class": "GPTNeoXTokenizer",
|
8 |
+
"unk_token": "<|endoftext|>"
|
9 |
+
}
|
vocabulary.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|