Text Generation
Transformers
English
gpt_neox
Inference Endpoints
Jamie@TitanML commited on
Commit
79a467c
·
1 Parent(s): 3e6dd14

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -25,7 +25,6 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ datasets:
6
+ - togethercomputer/RedPajama-Data-1T
7
+ - OpenAssistant/oasst1
8
+ - databricks/databricks-dolly-15k
9
+ widget:
10
+ - text: "<human>: Write an email to my friends inviting them to come to my home on Friday for a dinner party, bring their own food to share.\n<bot>:"
11
+ example_title: "Email Writing"
12
+ - text: "<human>: Create a list of things to do in San Francisco\n<bot>:"
13
+ example_title: "Brainstorming"
14
+ inference:
15
+ parameters:
16
+ temperature: 0.7
17
+ top_p: 0.7
18
+ top_k: 50
19
+ max_new_tokens: 128
20
+ ---
21
+
22
+ # RedPajama-INCITE-7B-Chat
23
+
24
+ RedPajama-INCITE-7B-Chat was developed by Together and leaders from the open-source AI community including Ontocord.ai, ETH DS3Lab, AAI CERC, Université de Montréal, MILA - Québec AI Institute, Stanford Center for Research on Foundation Models (CRFM), Stanford Hazy Research research group and LAION.
25
+
26
+ It is fine-tuned on OASST1 and Dolly2 to enhance chatting ability.
27
+
28
+ - Base Model: [RedPajama-INCITE-7B-Base](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Base)
29
+ - Instruction-tuned Version: [RedPajama-INCITE-7B-Instruct](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Instruct)
30
+ - Chat Version: [RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
31
+
32
+
33
+ ## Model Details
34
+ - **Developed by**: Together Computer.
35
+ - **Model type**: Language Model
36
+ - **Language(s)**: English
37
+ - **License**: Apache 2.0
38
+ - **Model Description**: A 6.9B parameter pretrained language model.
39
+
40
+ # Quick Start
41
+
42
+ Please note that the model requires `transformers` version >= 4.25.1.
43
+
44
+ To prompt the chat model, use the following format:
45
+ ```
46
+ <human>: [Instruction]
47
+ <bot>:
48
+ ```
49
+
50
+ ## GPU Inference
51
+
52
+ This requires a GPU with 16GB memory.
53
+
54
+ ```python
55
+ import torch
56
+ import transformers
57
+ from transformers import AutoTokenizer, AutoModelForCausalLM
58
+
59
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
60
+
61
+ # check transformers version
62
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
63
+
64
+ # init
65
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat")
66
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat", torch_dtype=torch.float16)
67
+ model = model.to('cuda:0')
68
+ # infer
69
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
70
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
71
+ input_length = inputs.input_ids.shape[1]
72
+ outputs = model.generate(
73
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
74
+ )
75
+ token = outputs.sequences[0, input_length:]
76
+ output_str = tokenizer.decode(token)
77
+ print(output_str)
78
+ """
79
+ Alan Mathison Turing (23 June 1912 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, mathematician, and theoretical biologist.
80
+ """
81
+ ```
82
+
83
+ ## GPU Inference in Int8
84
+
85
+ This requires a GPU with 12GB memory.
86
+
87
+ To run inference with int8, please ensure you have installed accelerate and bitandbytes. You can install them with the following command:
88
+
89
+ ```bash
90
+ pip install accelerate
91
+ pip install bitsandbytes
92
+ ```
93
+
94
+ Then you can run inference with int8 as follows:
95
+
96
+ ```python
97
+ import torch
98
+ import transformers
99
+ from transformers import AutoTokenizer, AutoModelForCausalLM
100
+
101
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
102
+
103
+ # check transformers version
104
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
105
+
106
+ # init
107
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat")
108
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat", device_map='auto', torch_dtype=torch.float16, load_in_8bit=True)
109
+
110
+ # infer
111
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
112
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
113
+ input_length = inputs.input_ids.shape[1]
114
+ outputs = model.generate(
115
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
116
+ )
117
+ token = outputs.sequences[0, input_length:]
118
+ output_str = tokenizer.decode(token)
119
+ print(output_str)
120
+ """
121
+ Alan Mathison Turing (23 June 1912 – 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, and theoretical biologist.
122
+ """
123
+ ```
124
+
125
+ ## CPU Inference
126
+
127
+ ```python
128
+ import torch
129
+ import transformers
130
+ from transformers import AutoTokenizer, AutoModelForCausalLM
131
+
132
+ MIN_TRANSFORMERS_VERSION = '4.25.1'
133
+
134
+ # check transformers version
135
+ assert transformers.__version__ >= MIN_TRANSFORMERS_VERSION, f'Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher.'
136
+
137
+ # init
138
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat")
139
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-7B-Chat", torch_dtype=torch.bfloat16)
140
+ # infer
141
+ prompt = "<human>: Who is Alan Turing?\n<bot>:"
142
+ inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
143
+ input_length = inputs.input_ids.shape[1]
144
+ outputs = model.generate(
145
+ **inputs, max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
146
+ )
147
+ token = outputs.sequences[0, input_length:]
148
+ output_str = tokenizer.decode(token)
149
+ print(output_str)
150
+ """
151
+ Alan Mathison Turing, OBE, FRS, (23 June 1912 – 7 June 1954) was an English computer scientist, mathematician, logician, cryptanalyst, philosopher, and theoretical biologist.
152
+ """
153
+ ```
154
+
155
+ Please note that since `LayerNormKernelImpl` is not implemented in fp16 for CPU, we use `bfloat16` for CPU inference.
156
+
157
+
158
+ # Uses
159
+
160
+ ## Direct Use
161
+
162
+ Excluded uses are described below.
163
+
164
+ ### Misuse, Malicious Use, and Out-of-Scope Use
165
+
166
+ It is the responsibility of the end user to ensure that the model is used in a responsible and ethical manner.
167
+
168
+ #### Out-of-Scope Use
169
+
170
+ `RedPajama-INCITE-7B-Chat` is a language model and may not perform well for other use cases outside of its intended scope.
171
+ For example, it may not be suitable for use in safety-critical applications or for making decisions that have a significant impact on individuals or society.
172
+ It is important to consider the limitations of the model and to only use it for its intended purpose.
173
+
174
+ #### Misuse and Malicious Use
175
+
176
+ `RedPajama-INCITE-7B-Chat` is designed for language modeling.
177
+ Misuse of the model, such as using it to engage in illegal or unethical activities, is strictly prohibited and goes against the principles of the project.
178
+
179
+ Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
180
+
181
+ - Generating fake news, misinformation, or propaganda
182
+ - Promoting hate speech, discrimination, or violence against individuals or groups
183
+ - Impersonating individuals or organizations without their consent
184
+ - Engaging in cyberbullying or harassment
185
+ - Defamatory content
186
+ - Spamming or scamming
187
+ - Sharing confidential or sensitive information without proper authorization
188
+ - Violating the terms of use of the model or the data used to train it
189
+ - Creating automated bots for malicious purposes such as spreading malware, phishing scams, or spamming
190
+
191
+ ## Limitations
192
+
193
+ `RedPajama-INCITE-7B-Chat`, like other language models, has limitations that should be taken into consideration.
194
+ For example, the model may not always provide accurate or relevant answers, particularly for questions that are complex, ambiguous, or outside of its training data.
195
+ We therefore welcome contributions from individuals and organizations, and encourage collaboration towards creating a more robust and inclusive chatbot.
196
+
197
+ ## Training
198
+
199
+ **Training Data**
200
+
201
+ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
202
+
203
+ **Training Procedure**
204
+
205
+ - **Hardware:** 8 A100
206
+ - **Optimizer:** Adam
207
+ - **Gradient Accumulations**: 1
208
+ - **Num of Tokens:** 79M tokens
209
+ - **Learning rate:** 1e-5
210
+
211
+ ## Community
212
+
213
+ Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "togethercomputer/RedPajama-INCITE-Chat-7B-v1",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 0,
8
+ "hidden_act": "gelu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 16384,
12
+ "layer_norm_eps": 1e-05,
13
+ "max_position_embeddings": 2048,
14
+ "model_type": "gpt_neox",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "rotary_emb_base": 10000,
18
+ "rotary_pct": 1.0,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "float16",
21
+ "transformers_version": "4.28.1",
22
+ "use_cache": true,
23
+ "use_parallel_residual": false,
24
+ "vocab_size": 50432
25
+ }
ct_output_models/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": null,
5
+ "unk_token": "<|endoftext|>"
6
+ }
ct_output_models/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86299d05d8652884aa425ebb6a189f634b28e4f7c8ab60e6b9a5124b3fabd6a8
3
+ size 6867593490
ct_output_models/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.29.1"
6
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "eos_token": "<|endoftext|>",
6
+ "model_max_length": 2048,
7
+ "tokenizer_class": "GPTNeoXTokenizer",
8
+ "unk_token": "<|endoftext|>"
9
+ }