Transformers
English
Inference Endpoints
michaelfeil commited on
Commit
ebaa121
·
1 Parent(s): 93d69c8

Upload togethercomputer/RedPajama-INCITE-Chat-3B-v1 ctranslate fp16 weights

Browse files
Files changed (3) hide show
  1. README.md +251 -0
  2. generation_config.json +6 -0
  3. special_tokens_map.json +5 -0
README.md ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ datasets:
6
+ - togethercomputer/RedPajama-Data-1T
7
+ - OpenAssistant/oasst1
8
+ - databricks/databricks-dolly-15k
9
+ widget:
10
+ - text: "<human>: Write an email to my friends inviting them to come to my home on Friday for a dinner party, bring their own food to share.\n<bot>:"
11
+ example_title: "Email Writing"
12
+ - text: "<human>: Create a list of things to do in San Francisco\n<bot>:"
13
+ example_title: "Brainstorming"
14
+ inference:
15
+ parameters:
16
+ temperature: 0.7
17
+ top_p: 0.7
18
+ top_k: 50
19
+ max_new_tokens: 128
20
+ ---
21
+ # # Fast-Inference with Ctranslate2
22
+ Speedup inference by 2x-8x using int8 inference in C++
23
+
24
+ quantized version of [togethercomputer/RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1)
25
+ ```bash
26
+ pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0
27
+ ```
28
+ Converted on 2023-05-19 using
29
+ ```
30
+ ct2-transformers-converter --model togethercomputer/RedPajama-INCITE-Chat-3B-v1 --output_dir /home/michael/tmp-ct2fast-RedPajama-INCITE-Chat-3B-v1 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
31
+ ```
32
+
33
+ Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
34
+ - `compute_type=int8_float16` for `device="cuda"`
35
+ - `compute_type=int8` for `device="cpu"`
36
+
37
+ ```python
38
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
39
+ from transformers import AutoTokenizer
40
+
41
+ model_name = "michaelfeil/ct2fast-RedPajama-INCITE-Chat-3B-v1"
42
+ # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
43
+ model = GeneratorCT2fromHfHub(
44
+ # load in int8 on CUDA
45
+ model_name_or_path=model_name,
46
+ device="cuda",
47
+ compute_type="int8_float16",
48
+ tokenizer=AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1")
49
+ )
50
+ outputs = model.generate(
51
+ text=["How do you call a fast Flan-ingo?", "User: How are you doing?"],
52
+ )
53
+ print(outputs)
54
+ ```
55
+
56
+ # Licence and other remarks:
57
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
58
+
59
+ # Original description
60
+
61
+
62
+ # RedPajama-INCITE-Chat-3B-v1
63
+
64
+ RedPajama-INCITE-Chat-3B-v1 was developed by Together and leaders from the open-source AI community including Ontocord.ai, ETH DS3Lab, AAI CERC, Université de Montréal, MILA - Québec AI Institute, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group and LAION.
65
+
66
+ It is fine-tuned on OASST1 and Dolly2 to enhance chatting ability.
67
+
68
+ - Base Model: [RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
69
+ - Instruction-tuned Version: [RedPajama-INCITE-Instruct-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1)
70
+ - Chat Version: [RedPajama-INCITE-Chat-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1)
71
+
72
+
73
+ ## Model Details
74
+ - **Developed by**: Together Computer.
75
+ - **Model type**: Language Model
76
+ - **Language(s)**: English
77
+ - **License**: Apache 2.0
78
+ - **Model Description**: A 2.8B parameter pretrained language model.
79
+
80
+ # Quick Start
81
+
82
+ Please note that the model requires `transformers` version >= 4.25.1.
83
+
84
+ To prompt the chat model, use the following format:
85
+ ```
86
+ <human>: [Instruction]
87
+ <bot>:
88
+ ```
89
+
90
+ ## GPU Inference
91
+
92
+ This requires a GPU with 8GB memory.
93
+
94
+ ```python
95
+ import torch
96
+ import transformers
97
+ from transformers import AutoTokenizer, AutoModelForCausalLM
98
+
99
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
100
+
101
+ # check transformers version
102
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
103
+
104
+ # init
105
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1")
106
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1", torch_dtype=torch.float16)
107
+ model = model.to('cuda:0')
108
+ # infer
109
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
110
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
111
+ input_length = inputs.input_ids.shape[1]
112
+ outputs = model.generate(
113
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
114
+ )
115
+ token = outputs.sequences[0, input_length:]
116
+ output_str = tokenizer.decode(token)
117
+ print(output_str)
118
+ """
119
+ Alan Turing was a British mathematician, logician, cryptologist, and computer scientist. He is widely regarded as the father of computer science and artificial intelligence.
120
+ """
121
+ ```
122
+
123
+ ## GPU Inference in Int8
124
+
125
+ This requires a GPU with 6GB memory.
126
+
127
+ To run inference with int8, please ensure you have installed accelerate and bitandbytes. You can install them with the following command:
128
+
129
+ ```bash
130
+ pip install accelerate
131
+ pip install bitsandbytes
132
+ ```
133
+
134
+ Then you can run inference with int8 as follows:
135
+
136
+ ```python
137
+ import torch
138
+ import transformers
139
+ from transformers import AutoTokenizer, AutoModelForCausalLM
140
+
141
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
142
+
143
+ # check transformers version
144
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
145
+
146
+ # init
147
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1")
148
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1", device_map='auto', torch_dtype=torch.float16, load_in_8bit=True)
149
+
150
+ # infer
151
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
152
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
153
+ input_length = inputs.input_ids.shape[1]
154
+ outputs = model.generate(
155
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
156
+ )
157
+ token = outputs.sequences[0, input_length:]
158
+ output_str = tokenizer.decode(token)
159
+ print(output_str)
160
+ """
161
+ Alan Turing was a British mathematician and computer scientist who made important contributions to computer science and mathematical logic. He is widely regarded as the father of computer science and artificial intelligence for his work on the Turing machine and Turing test.
162
+ """
163
+ ```
164
+
165
+ ## CPU Inference
166
+
167
+ ```python
168
+ import torch
169
+ import transformers
170
+ from transformers import AutoTokenizer, AutoModelForCausalLM
171
+
172
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
173
+
174
+ # check transformers version
175
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
176
+
177
+ # init
178
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1")
179
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1", torch_dtype=torch.bfloat16)
180
+ # infer
181
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
182
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
183
+ input_length = inputs.input_ids.shape[1]
184
+ outputs = model.generate(
185
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
186
+ )
187
+ token = outputs.sequences[0, input_length:]
188
+ output_str = tokenizer.decode(token)
189
+ print(output_str)
190
+ """
191
+ Alan Turing was a British mathematician and computer scientist who made important contributions to the fields of mathematics, cryptography, and computer science. He is widely regarded as the father of computer science and artificial intelligence.
192
+ """
193
+ ```
194
+
195
+ Please note that since `LayerNormKernelImpl` is not implemented in fp16 for CPU, we use `bfloat16` for CPU inference.
196
+
197
+
198
+ # Uses
199
+
200
+ Excluded uses are described below.
201
+
202
+ ### Misuse, Malicious Use, and Out-of-Scope Use
203
+
204
+ It is the responsibility of the end user to ensure that the model is used in a responsible and ethical manner.
205
+
206
+ #### Out-of-Scope Use
207
+
208
+ `RedPajama-INCITE-Chat-3B-v1` is a language model and may not perform well for other use cases outside of its intended scope.
209
+ For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
210
+ It is important to consider the limitations of the model and to only use it for its intended purpose.
211
+
212
+ #### Misuse and Malicious Use
213
+
214
+ `RedPajama-INCITE-Chat-3B-v1` is designed for language modeling.
215
+ Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the project.
216
+
217
+ Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
218
+
219
+ - Generating fake news, misinformation, or propaganda
220
+ - Promoting hate speech, discrimination, or violence against individuals or groups
221
+ - Impersonating individuals or organizations without their consent
222
+ - Engaging in cyberbullying or harassment
223
+ - Defamatory content
224
+ - Spamming or scamming
225
+ - Sharing confidential or sensitive information without proper authorization
226
+ - Violating the terms of use of the model or the data used to train it
227
+ - Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming
228
+
229
+ ## Limitations
230
+
231
+ `RedPajama-INCITE-Chat-3B-v1`, like other language models, has limitations that should be taken into consideration.
232
+ For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
233
+ We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
234
+
235
+ ## Training
236
+
237
+ **Training Data**
238
+
239
+ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
240
+
241
+ **Training Procedure**
242
+
243
+ - **Hardware:** 8 A100
244
+ - **Optimizer:** Adam
245
+ - **Gradient Accumulations**: 1
246
+ - **Num of Tokens:** 131M tokens
247
+ - **Learning rate:** 1e-5
248
+
249
+ ## Community
250
+
251
+ Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.28.1"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }