Upload tokenizer_config.json

#1
by fedyanin - opened
README.md CHANGED
@@ -1,307 +1,3 @@
1
- ---
2
- language:
3
- - en
4
- - fr
5
- - es
6
- - pt
7
- tags:
8
- - falcon3
9
- base_model: tiiuae/Falcon3-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
- ---
15
-
16
- <div align="center">
17
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
- </div>
19
-
20
- # Falcon3-7B-Instruct
21
-
22
- **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
23
-
24
- This repository contains the **Falcon3-7B-Instruct**. It achieves state of art results (at the time of release) on reasoning, language understanding, instruction following, code and mathematics tasks.
25
- Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
26
-
27
- ## Model Details
28
- - Architecture
29
- - Transformer based causal decoder only architecture
30
- - 28 decoder blocks
31
- - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
32
- - Wider head dimension: 256
33
- - High RoPE value to support long context understanding: 1000042
34
- - Uses SwiGLU and RMSNorm
35
- - 32K context length
36
- - 131K vocab size
37
- - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
38
- - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
39
- - Supports EN, FR, ES, PT
40
- - Developed by [Technology Innovation Institute](https://www.tii.ae)
41
- - License: TII Falcon-LLM License 2.0
42
- - Model Release Date: December 2024
43
-
44
-
45
- ## Getting started
46
-
47
- <details>
48
- <summary> Click to expand </summary>
49
-
50
- ```python
51
- from transformers import AutoTokenizer, AutoModelForCausalLM
52
-
53
-
54
- from transformers import AutoModelForCausalLM, AutoTokenizer
55
-
56
- model_name = "tiiuae/Falcon3-7B-Instruct"
57
-
58
- model = AutoModelForCausalLM.from_pretrained(
59
- model_name,
60
- torch_dtype="auto",
61
- device_map="auto"]
62
- )
63
- tokenizer = AutoTokenizer.from_pretrained(model_name)
64
-
65
- prompt = "How many hours in one day?"
66
- messages = [
67
- {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
68
- {"role": "user", "content": prompt}
69
- ]
70
- text = tokenizer.apply_chat_template(
71
- messages,
72
- tokenize=False,
73
- add_generation_prompt=True
74
- )
75
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
76
-
77
- generated_ids = model.generate(
78
- **model_inputs,
79
- max_new_tokens=1024
80
- )
81
- generated_ids = [
82
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
83
- ]
84
-
85
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
86
- print(response)
87
- ```
88
-
89
- </details>
90
-
91
- <br>
92
-
93
- ## Benchmarks
94
- We report the official HuggingFace leaderboard normalized evaluations [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) in the following table.
95
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
96
- <colgroup>
97
- <col style="width: 10%;">
98
- <col style="width: 7%;">
99
- <col style="width: 7%;">
100
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
101
- </colgroup>
102
- <thead>
103
- <tr>
104
- <th>Benchmark</th>
105
- <th>Llama-3.1-8B-Instruct</th>
106
- <th>Qwen2.5-7B-Instruct</th>
107
- <th>Falcon3-7B-Instruct</th>
108
- </tr>
109
- </thead>
110
- <tbody>
111
- <tr>
112
- <td>IFEval</td>
113
- <td><b>78.56</b></td>
114
- <td>75.85</td>
115
- <td>76.12</td>
116
- </tr>
117
- <tr>
118
- <td>BBH (3-shot)</td>
119
- <td>29.89</td>
120
- <td>34.89</td>
121
- <td><b>37.92</b></td>
122
- </tr>
123
- <tr>
124
- <td>MATH Lvl-5 (4-shot)</td>
125
- <td>19.34</td>
126
- <td>0.00</td>
127
- <td><b>31.87</b></td>
128
- </tr>
129
- <tr>
130
- <td>GPQA (0-shot)</td>
131
- <td>2.35</td>
132
- <td>5.48</td>
133
- <td><b>8.05</b></td>
134
- </tr>
135
- <tr>
136
- <td>MUSR (0-shot)</td>
137
- <td>8.41</td>
138
- <td>8.45</td>
139
- <td><b>21.17</b></td>
140
- </tr>
141
- <tr>
142
- <td>MMLU-PRO (5-shot)</td>
143
- <td>30.68</td>
144
- <td><b>36.52</b></td>
145
- <td>34.30</td>
146
- </tr>
147
- </tbody>
148
- </table>
149
-
150
- Also, we report in the following table our internal pipeline benchmarks.
151
- - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
152
- - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
153
- - We use same batch-size across all models.
154
-
155
- <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
156
- <colgroup>
157
- <col style="width: 10%;">
158
- <col style="width: 10%;">
159
- <col style="width: 7%;">
160
- <col style="width: 7%;">
161
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
162
- </colgroup>
163
- <thead>
164
- <tr>
165
- <th>Category</th>
166
- <th>Benchmark</th>
167
- <th>Llama-3.1-8B-Instruct</th>
168
- <th>Qwen2.5-7B-Instruct</th>
169
- <th>Falcon3-7B-Instruct</th>
170
- </tr>
171
- </thead>
172
- <tbody>
173
- <tr>
174
- <td rowspan="3">General</td>
175
- <td>MMLU (5-shot)</td>
176
- <td>68.2</td>
177
- <td><b>73.5</b></td>
178
- <td>70.5</td>
179
- </tr>
180
- <tr>
181
- <td>MMLU-PRO (5-shot)</td>
182
- <td>36.4</td>
183
- <td><b>43.1</b></td>
184
- <td>40.7</td>
185
- </tr>
186
- <tr>
187
- <td>IFEval</td>
188
- <td><b>78.8</b></td>
189
- <td>74.7</td>
190
- <td>76.5</td>
191
- </tr>
192
- <tr>
193
- <td rowspan="3">Math</td>
194
- <td>GSM8K (5-shot)</td>
195
- <td><b>82.6</b></td>
196
- <td>72.0</td>
197
- <td>81.4</td>
198
- </tr>
199
- <tr>
200
- <td>GSM8K (8-shot, COT)</td>
201
- <td><b>85.4</b></td>
202
- <td>76.6</td>
203
- <td>79.7</td>
204
- </tr>
205
- <tr>
206
- <td>MATH Lvl-5 (4-shot)</td>
207
- <td>15.4</td>
208
- <td>-</td>
209
- <td><b>29.4</b></td>
210
- </tr>
211
- <tr>
212
- <td rowspan="5">Reasoning</td>
213
- <td>Arc Challenge (25-shot)</td>
214
- <td>58.6</td>
215
- <td>57.8</td>
216
- <td><b>62.6</b></td>
217
- </tr>
218
- <tr>
219
- <td>GPQA (0-shot)</td>
220
- <td><b>33.5</b></td>
221
- <td>32</td>
222
- <td>31.9</td>
223
- </tr>
224
- <tr>
225
- <td>GPQA (0-shot, COT)</td>
226
- <td>9.6</td>
227
- <td>13.8</td>
228
- <td><b>22.3</b></td>
229
- </tr>
230
- <tr>
231
- <td>MUSR (0-shot)</td>
232
- <td>38.6</td>
233
- <td>41</td>
234
- <td><b>46.4</b></td>
235
- </tr>
236
- <tr>
237
- <td>BBH (3-shot)</td>
238
- <td>48.6</td>
239
- <td><b>54.1</b></td>
240
- <td>52.4</td>
241
- </tr>
242
- <tr>
243
- <td rowspan="4">CommonSense Understanding</td>
244
- <td>PIQA (0-shot)</td>
245
- <td><b>78.9</b></td>
246
- <td>73.7</td>
247
- <td>78.8</td>
248
- </tr>
249
- <tr>
250
- <td>SciQ (0-shot)</td>
251
- <td>80.2</td>
252
- <td>50.9</td>
253
- <td><b>94.7</b></td>
254
- </tr>
255
- <tr>
256
- <td>Winogrande (0-shot)</td>
257
- <td>-</td>
258
- <td>-</td>
259
- <td>70.4</td>
260
- </tr>
261
- <tr>
262
- <td>OpenbookQA (0-shot)</td>
263
- <td><b>46.2</b></td>
264
- <td>42.4</td>
265
- <td>45.8</td>
266
- </tr>
267
- <tr>
268
- <td rowspan="2">Instructions following</td>
269
- <td>MT-Bench (avg)</td>
270
- <td>7.9</td>
271
- <td><b>8.5</b></td>
272
- <td>8.4</td>
273
- </tr>
274
- <tr>
275
- <td>Alpaca (WC)</td>
276
- <td>26.6</td>
277
- <td><b>31.5</b></td>
278
- <td>26.1</td>
279
- </tr>
280
- <tr>
281
- <td>Tool use</td>
282
- <td>BFCL AST (avg)</td>
283
- <td>90.6</td>
284
- <td><b>91.4</b></td>
285
- <td>89.5</td>
286
- </tr>
287
- </tbody>
288
- </table>
289
-
290
- ## Useful links
291
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
292
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
293
-
294
- ## Technical Report
295
- Coming soon....
296
-
297
- ## Citation
298
- If Falcon3 family were helpful to your work, feel free to give us a cite.
299
-
300
- ```
301
- @misc{Falcon3,
302
- title = {The Falcon 3 family of Open Models},
303
- author = {TII Team},
304
- month = {December},
305
- year = {2024}
306
- }
307
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "architectures": [
3
  "LlamaForCausalLM"
4
  ],
@@ -9,6 +10,7 @@
9
  "head_dim": 256,
10
  "hidden_act": "silu",
11
  "hidden_size": 3072,
 
12
  "intermediate_size": 23040,
13
  "max_position_embeddings": 32768,
14
  "mlp_bias": false,
@@ -24,5 +26,5 @@
24
  "torch_dtype": "bfloat16",
25
  "transformers_version": "4.46.1",
26
  "use_cache": true,
27
- "vocab_size": 131072
28
  }
 
1
  {
2
+ "_name_or_path": "Iheb-Chaabane/falcon3-7b-explore-dpo-bs-64",
3
  "architectures": [
4
  "LlamaForCausalLM"
5
  ],
 
10
  "head_dim": 256,
11
  "hidden_act": "silu",
12
  "hidden_size": 3072,
13
+ "initializer_range": 0.02,
14
  "intermediate_size": 23040,
15
  "max_position_embeddings": 32768,
16
  "mlp_bias": false,
 
26
  "torch_dtype": "bfloat16",
27
  "transformers_version": "4.46.1",
28
  "use_cache": true,
29
+ "vocab_size": 131080
30
  }
model-00001-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:07e279c8c6075600e5dc795364efff8897de0f0c22a1d2d8db79a70adf8edb3f
3
- size 4938900432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ada95243d59a5b8a5d60ea7bec7907ca20b92bb124d5054b51792fe059b72195
3
+ size 4938949584
model-00002-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5d6600f34e9972eed3201425ba75c2d58f574655f373ea8b86ddfa37d391f2a
3
  size 4942085160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0f167f4dc5fb028251a03f67ce36bef07a163084fbd8f7d63ca043d770ab9ca
3
  size 4942085160
model-00003-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a96480584a0b5bd09c556e53d952146008bb423e5e12ea9bbd0b60d62f9a2f72
3
  size 4224838512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11a6edf04d6b4ab1044d88107eb8a4c71d6378c7d232c3c668870ceae1d2a80c
3
  size 4224838512
model-00004-of-00004.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b84ea911989e21ebf4ac05018171f73016d8ae72b7904e89289be0b4672a403
3
- size 805306496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4be476e54f6ce54be4690cb9b7241959fd2096ab9a4b97648679e1fce43c575b
3
+ size 805355648
model.safetensors.index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_size": 14911113216
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00004-of-00004.safetensors",
 
1
  {
2
  "metadata": {
3
+ "total_size": 14911199232
4
  },
5
  "weight_map": {
6
  "lm_head.weight": "model-00004-of-00004.safetensors",
special_tokens_map.json CHANGED
@@ -32,7 +32,7 @@
32
  "single_word": false
33
  },
34
  "pad_token": {
35
- "content": "<|pad|>",
36
  "lstrip": false,
37
  "normalized": false,
38
  "rstrip": false,
 
32
  "single_word": false
33
  },
34
  "pad_token": {
35
+ "content": "<pad>",
36
  "lstrip": false,
37
  "normalized": false,
38
  "rstrip": false,
tokenizer.json CHANGED
@@ -18212,7 +18212,16 @@
18212
  },
18213
  {
18214
  "id": 2023,
18215
- "content": "<|pad|>",
 
 
 
 
 
 
 
 
 
18216
  "single_word": false,
18217
  "lstrip": false,
18218
  "rstrip": false,
@@ -20280,7 +20289,7 @@
20280
  ">>UNUSED_1894<<": 2020,
20281
  ">>UNUSED_1895<<": 2021,
20282
  ">>UNUSED_1896<<": 2022,
20283
- "<|pad|>": 2023,
20284
  "!": 2024,
20285
  "\"": 2025,
20286
  "#": 2026,
 
18212
  },
18213
  {
18214
  "id": 2023,
18215
+ "content": ">>UNUSED_1897<<",
18216
+ "single_word": false,
18217
+ "lstrip": false,
18218
+ "rstrip": false,
18219
+ "normalized": false,
18220
+ "special": true
18221
+ },
18222
+ {
18223
+ "id": 131072,
18224
+ "content": "<pad>",
18225
  "single_word": false,
18226
  "lstrip": false,
18227
  "rstrip": false,
 
20289
  ">>UNUSED_1894<<": 2020,
20290
  ">>UNUSED_1895<<": 2021,
20291
  ">>UNUSED_1896<<": 2022,
20292
+ ">>UNUSED_1897<<": 2023,
20293
  "!": 2024,
20294
  "\"": 2025,
20295
  "#": 2026,
tokenizer_config.json CHANGED
@@ -16186,7 +16186,15 @@
16186
  "special": true
16187
  },
16188
  "2023": {
16189
- "content": "<|pad|>",
 
 
 
 
 
 
 
 
16190
  "lstrip": false,
16191
  "normalized": false,
16192
  "rstrip": false,
@@ -16219,15 +16227,11 @@
16219
  ">>PASSWORD<<",
16220
  ">>KEY<<"
16221
  ],
16222
- "chat_template": "{%- if tools %}\n{{- '<|system|>\\n' }}\n{%- if messages[0]['role'] == 'system' %}\n{{- messages[0]['content'] }}\n{%- set remaining_messages = messages[1:] %}\n{%- else %}\n{%- set remaining_messages = messages %}\n{%- endif %}\n{{- 'You are a Falcon assistant skilled in function calling. You are helpful, respectful, and concise.\\n\\n# Tools\\n\\nYou have access to the following functions. You MUST use them to answer questions when needed. For each function call, you MUST return a JSON object inside <tool_call></tool_call> tags.\\n\\n<tools>' + tools|tojson(indent=2) + '</tools>\\n\\n# Output Format\\n\\nYour response MUST follow this format when making function calls:\\n<tool_call>\\n[\\n {\"name\": \"function_name\", \"arguments\": {\"arg1\": \"value1\", \"arg2\": \"value2\"}},\\n {\"name\": \"another_function\", \"arguments\": {\"arg\": \"value\"}}\\n]\\n</tool_call>\\nIf no function calls are needed, respond normally without the tool_call tags.\\n' }}\n{%- for message in remaining_messages %}\n{%- if message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if message.content %}\n{{- '<|assistant|>\\n' + message['content'] }}\n{%- endif %}\n{%- if message.tool_calls %}\n{{- '\\n<tool_call>\\n' }}\n{{- message.tool_calls|tojson(indent=2) }}\n{{- '\\n</tool_call>' }}\n{%- endif %}\n{{- eos_token + '\\n' }}\n{%- elif message['role'] == 'tool' %}\n{{- '<|assistant|>\\n<tool_response>\\n' + message['content'] + '\\n</tool_response>\\n' }}\n{%- endif %}\n{%- endfor %}\n{{- '<|assistant|>\\n' if add_generation_prompt }}\n{%- else %}\n{%- for message in messages %}\n{%- if message['role'] == 'system' %}\n{{- '<|system|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'user' %}\n{{- '<|user|>\\n' + message['content'] + '\\n' }}\n{%- elif message['role'] == 'assistant' %}\n{%- if not loop.last %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token + '\\n' }}\n{%- else %}\n{{- '<|assistant|>\\n' + message['content'] + eos_token }}\n{%- endif %}\n{%- endif %}\n{%- if loop.last and add_generation_prompt %}\n{{- '<|assistant|>\\n' }}\n{%- endif %}\n{%- endfor %}\n{%- endif %}",
16223
  "clean_up_tokenization_spaces": true,
16224
  "eos_token": "<|endoftext|>",
16225
- "extra_special_tokens": {},
16226
- "model_input_names": [
16227
- "input_ids",
16228
- "attention_mask"
16229
- ],
16230
  "model_max_length": 32768,
16231
- "pad_token": "<|pad|>",
16232
  "tokenizer_class": "PreTrainedTokenizerFast"
16233
  }
 
 
16186
  "special": true
16187
  },
16188
  "2023": {
16189
+ "content": ">>UNUSED_1897<<",
16190
+ "lstrip": false,
16191
+ "normalized": false,
16192
+ "rstrip": false,
16193
+ "single_word": false,
16194
+ "special": true
16195
+ },
16196
+ "131072": {
16197
+ "content": "<pad>",
16198
  "lstrip": false,
16199
  "normalized": false,
16200
  "rstrip": false,
 
16227
  ">>PASSWORD<<",
16228
  ">>KEY<<"
16229
  ],
16230
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}",
16231
  "clean_up_tokenization_spaces": true,
16232
  "eos_token": "<|endoftext|>",
 
 
 
 
 
16233
  "model_max_length": 32768,
16234
+ "pad_token": "<pad>",
16235
  "tokenizer_class": "PreTrainedTokenizerFast"
16236
  }
16237
+