AchyuthGamer commited on
Commit
e7ca051
·
1 Parent(s): b4f45ab

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,387 @@
1
  ---
2
- license: creativeml-openrail-m
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: mistralai/Mistral-7B-Instruct-v0.1
3
+ inference: false
4
+ license: apache-2.0
5
+ model_creator: Mistral AI
6
+ model_name: Mistral 7B Instruct v0.1
7
+ model_type: mistral
8
+ pipeline_tag: text-generation
9
+ prompt_template: '<s>[INST] {prompt} [/INST]'
10
+ quantized_by: TheBloke
11
+ tags:
12
+ - finetuned
13
  ---
14
+
15
+ <!-- header start -->
16
+ <!-- 200823 -->
17
+ <div style="width: auto; margin-left: auto; margin-right: auto">
18
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
19
+ </div>
20
+ <div style="display: flex; justify-content: space-between; width: 100%;">
21
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
22
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
23
+ </div>
24
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
25
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
26
+ </div>
27
+ </div>
28
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
29
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
30
+ <!-- header end -->
31
+
32
+ # Mistral 7B Instruct v0.1 - GPTQ
33
+ - Model creator: [Mistral AI](https://huggingface.co/mistralai)
34
+ - Original model: [Mistral 7B Instruct v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
35
+
36
+ <!-- description start -->
37
+ ## Description
38
+
39
+ This repo contains GPTQ model files for [Mistral AI's Mistral 7B Instruct v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
40
+
41
+ Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
42
+
43
+ ### GPTQs will work in ExLlama, or via Transformers (requiring Transformers from Github)
44
+
45
+ These models are confirmed to work with ExLlama v1.
46
+
47
+ At the time of writing (September 28th), AutoGPTQ has not yet added support for the new Mistral models.
48
+
49
+ These GPTQs were made directly from Transformers, and so can be loaded via the Transformers interface. They can't be loaded directly from AutoGPTQ.
50
+
51
+ To load them via Transformers, you will need to install Transformers from Github, with:
52
+ ```
53
+ pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
54
+ ```
55
+
56
+ <!-- description end -->
57
+ <!-- repositories-available start -->
58
+ ## Repositories available
59
+
60
+ * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-AWQ)
61
+ * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ)
62
+ * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF)
63
+ * [Mistral AI's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
64
+ <!-- repositories-available end -->
65
+
66
+ <!-- prompt-template start -->
67
+ ## Prompt template: Mistral
68
+
69
+ ```
70
+ <s>[INST] {prompt} [/INST]
71
+
72
+ ```
73
+
74
+ <!-- prompt-template end -->
75
+
76
+
77
+ <!-- README_GPTQ.md-provided-files start -->
78
+ ## Provided files, and GPTQ parameters
79
+
80
+ Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
81
+
82
+ Each separate quant is in a different branch. See below for instructions on fetching from different branches.
83
+
84
+ These files were made with Transformers 4.34.0.dev0, from commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79.
85
+
86
+ <details>
87
+ <summary>Explanation of GPTQ parameters</summary>
88
+
89
+ - Bits: The bit size of the quantised model.
90
+ - GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
91
+ - Act Order: True or False. Also known as `desc_act`. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now.
92
+ - Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
93
+ - GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
94
+ - Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
95
+ - ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4-bit.
96
+
97
+ </details>
98
+
99
+ | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
100
+ | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
101
+ | [main](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
102
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
103
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 7.68 GB | Yes | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
104
+ | [gptq-8bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-32g-actorder_True) | 8 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 8.17 GB | Yes | 8-bit, with group size 32g and Act Order for maximum inference quality. |
105
+
106
+ <!-- README_GPTQ.md-provided-files end -->
107
+
108
+ <!-- README_GPTQ.md-download-from-branches start -->
109
+ ## How to download, including from branches
110
+
111
+ ### In text-generation-webui
112
+
113
+ To download from the `main` branch, enter `TheBloke/Mistral-7B-Instruct-v0.1-GPTQ` in the "Download model" box.
114
+
115
+ To download from another branch, add `:branchname` to the end of the download name, eg `TheBloke/Mistral-7B-Instruct-v0.1-GPTQ:gptq-4bit-32g-actorder_True`
116
+
117
+ ### From the command line
118
+
119
+ I recommend using the `huggingface-hub` Python library:
120
+
121
+ ```shell
122
+ pip3 install huggingface-hub
123
+ ```
124
+
125
+ To download the `main` branch to a folder called `Mistral-7B-Instruct-v0.1-GPTQ`:
126
+
127
+ ```shell
128
+ mkdir Mistral-7B-Instruct-v0.1-GPTQ
129
+ huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GPTQ --local-dir Mistral-7B-Instruct-v0.1-GPTQ --local-dir-use-symlinks False
130
+ ```
131
+
132
+ To download from a different branch, add the `--revision` parameter:
133
+
134
+ ```shell
135
+ mkdir Mistral-7B-Instruct-v0.1-GPTQ
136
+ huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir Mistral-7B-Instruct-v0.1-GPTQ --local-dir-use-symlinks False
137
+ ```
138
+
139
+ <details>
140
+ <summary>More advanced huggingface-cli download usage</summary>
141
+
142
+ If you remove the `--local-dir-use-symlinks False` parameter, the files will instead be stored in the central Huggingface cache directory (default location on Linux is: `~/.cache/huggingface`), and symlinks will be added to the specified `--local-dir`, pointing to their real location in the cache. This allows for interrupted downloads to be resumed, and allows you to quickly clone the repo to multiple places on disk without triggering a download again. The downside, and the reason why I don't list that as the default option, is that the files are then hidden away in a cache folder and it's harder to know where your disk space is being used, and to clear it up if/when you want to remove a download model.
143
+
144
+ The cache location can be changed with the `HF_HOME` environment variable, and/or the `--cache-dir` parameter to `huggingface-cli`.
145
+
146
+ For more documentation on downloading with `huggingface-cli`, please see: [HF -> Hub Python Library -> Download files -> Download from the CLI](https://huggingface.co/docs/huggingface_hub/guides/download#download-from-the-cli).
147
+
148
+ To accelerate downloads on fast connections (1Gbit/s or higher), install `hf_transfer`:
149
+
150
+ ```shell
151
+ pip3 install hf_transfer
152
+ ```
153
+
154
+ And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
155
+
156
+ ```shell
157
+ mkdir Mistral-7B-Instruct-v0.1-GPTQ
158
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GPTQ --local-dir Mistral-7B-Instruct-v0.1-GPTQ --local-dir-use-symlinks False
159
+ ```
160
+
161
+ Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
162
+ </details>
163
+
164
+ ### With `git` (**not** recommended)
165
+
166
+ To clone a specific branch with `git`, use a command like this:
167
+
168
+ ```shell
169
+ git clone --single-branch --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ
170
+ ```
171
+
172
+ Note that using Git with HF repos is strongly discouraged. It will be much slower than using `huggingface-hub`, and will use twice as much disk space as it has to store the model files twice (it stores every byte both in the intended target folder, and again in the `.git` folder as a blob.)
173
+
174
+ <!-- README_GPTQ.md-download-from-branches end -->
175
+ <!-- README_GPTQ.md-text-generation-webui start -->
176
+ ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
177
+
178
+ These models are confirmed to work via the ExLlama Loader in text-generation-webui.
179
+
180
+ Use **Loader: ExLlama** - or Transformers may work too. AutoGPTQ will not work.
181
+
182
+ Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
183
+
184
+ It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
185
+
186
+ 1. Click the **Model tab**.
187
+ 2. Under **Download custom model or LoRA**, enter `TheBloke/Mistral-7B-Instruct-v0.1-GPTQ`.
188
+ - To download from a specific branch, enter for example `TheBloke/Mistral-7B-Instruct-v0.1-GPTQ:gptq-4bit-32g-actorder_True`
189
+ - see Provided Files above for the list of branches for each option.
190
+ 3. Click **Download**.
191
+ 4. The model will start downloading. Once it's finished it will say "Done".
192
+ 5. In the top left, click the refresh icon next to **Model**.
193
+ 6. In the **Model** dropdown, choose the model you just downloaded: `Mistral-7B-Instruct-v0.1-GPTQ`
194
+ 7. The model will automatically load, and is now ready for use!
195
+ 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
196
+ * Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
197
+ 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
198
+ <!-- README_GPTQ.md-text-generation-webui end -->
199
+
200
+ <!-- README_GPTQ.md-use-from-python start -->
201
+ ## How to use this GPTQ model from Python code
202
+
203
+ ### Install the necessary packages
204
+
205
+ Requires: Transformers 4.34.0.dev0 from Github source, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
206
+
207
+ ```shell
208
+ pip3 install optimum
209
+ pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
210
+ pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
211
+ ```
212
+
213
+ If you have problems installing AutoGPTQ using the pre-built wheels, install it from source instead:
214
+
215
+ ```shell
216
+ pip3 uninstall -y auto-gptq
217
+ git clone https://github.com/PanQiWei/AutoGPTQ
218
+ cd AutoGPTQ
219
+ git checkout v0.4.2
220
+ pip3 install .
221
+ ```
222
+
223
+ ### You can then use the following code
224
+
225
+ ```python
226
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
227
+
228
+ model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
229
+ # To use a different branch, change revision
230
+ # For example: revision="gptq-4bit-32g-actorder_True"
231
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
232
+ device_map="auto",
233
+ trust_remote_code=False,
234
+ revision="main")
235
+
236
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
237
+
238
+ prompt = "Tell me about AI"
239
+ prompt_template=f'''<s>[INST] {prompt} [/INST]
240
+ '''
241
+
242
+ print("\n\n*** Generate:")
243
+
244
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
245
+ output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
246
+ print(tokenizer.decode(output[0]))
247
+
248
+ # Inference can also be done using transformers' pipeline
249
+
250
+ print("*** Pipeline:")
251
+ pipe = pipeline(
252
+ "text-generation",
253
+ model=model,
254
+ tokenizer=tokenizer,
255
+ max_new_tokens=512,
256
+ do_sample=True,
257
+ temperature=0.7,
258
+ top_p=0.95,
259
+ top_k=40,
260
+ repetition_penalty=1.1
261
+ )
262
+
263
+ print(pipe(prompt_template)[0]['generated_text'])
264
+ ```
265
+ <!-- README_GPTQ.md-use-from-python end -->
266
+
267
+ <!-- README_GPTQ.md-compatibility start -->
268
+ ## Compatibility
269
+
270
+ The files provided are only tested to work with ExLlama v1, and Transformers 4.34.0.dev0 as of commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79.
271
+
272
+ <!-- README_GPTQ.md-compatibility end -->
273
+
274
+ <!-- footer start -->
275
+ <!-- 200823 -->
276
+ ## Discord
277
+
278
+ For further support, and discussions on these models and AI in general, join us at:
279
+
280
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
281
+
282
+ ## Thanks, and how to contribute
283
+
284
+ Thanks to the [chirper.ai](https://chirper.ai) team!
285
+
286
+ Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
287
+
288
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
289
+
290
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
291
+
292
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
293
+
294
+ * Patreon: https://patreon.com/TheBlokeAI
295
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
296
+
297
+ **Special thanks to**: Aemon Algiz.
298
+
299
+ **Patreon special mentions**: Pierre Kircher, Stanislav Ovsiannikov, Michael Levine, Eugene Pentland, Andrey, 준교 김, Randy H, Fred von Graf, Artur Olbinski, Caitlyn Gatomon, terasurfer, Jeff Scroggin, James Bentley, Vadim, Gabriel Puliatti, Harry Royden McLaughlin, Sean Connelly, Dan Guido, Edmond Seymore, Alicia Loh, subjectnull, AzureBlack, Manuel Alberto Morcote, Thomas Belote, Lone Striker, Chris Smitley, Vitor Caleffi, Johann-Peter Hartmann, Clay Pascal, biorpg, Brandon Frisco, sidney chen, transmissions 11, Pedro Madruga, jinyuan sun, Ajan Kanaga, Emad Mostaque, Trenton Dambrowitz, Jonathan Leane, Iucharbius, usrbinkat, vamX, George Stoitzev, Luke Pendergrass, theTransient, Olakabola, Swaroop Kallakuri, Cap'n Zoog, Brandon Phillips, Michael Dempsey, Nikolai Manek, danny, Matthew Berman, Gabriel Tamborski, alfie_i, Raymond Fosdick, Tom X Nguyen, Raven Klaugh, LangChain4j, Magnesian, Illia Dulskyi, David Ziegler, Mano Prime, Luis Javier Navarrete Lozano, Erik Bjäreholt, 阿明, Nathan Dryer, Alex, Rainer Wilmers, zynix, TL, Joseph William Delisle, John Villwock, Nathan LeClaire, Willem Michiel, Joguhyik, GodLy, OG, Alps Aficionado, Jeffrey Morgan, ReadyPlayerEmma, Tiffany J. Kim, Sebastain Graf, Spencer Kim, Michael Davis, webtim, Talal Aujan, knownsqashed, John Detwiler, Imad Khwaja, Deo Leter, Jerry Meng, Elijah Stavena, Rooh Singh, Pieter, SuperWojo, Alexandros Triantafyllidis, Stephen Murray, Ai Maven, ya boyyy, Enrico Ros, Ken Nordquist, Deep Realms, Nicholas, Spiking Neurons AB, Elle, Will Dee, Jack West, RoA, Luke @flexchar, Viktor Bowallius, Derek Yates, Subspace Studios, jjj, Toran Billups, Asp the Wyvern, Fen Risland, Ilya, NimbleBox.ai, Chadd, Nitin Borwankar, Emre, Mandus, Leonard Tan, Kalila, K, Trailburnt, S_X, Cory Kujawski
300
+
301
+
302
+ Thank you to all my generous patrons and donaters!
303
+
304
+ And thank you again to a16z for their generous grant.
305
+
306
+ <!-- footer end -->
307
+
308
+ # Original model card: Mistral AI's Mistral 7B Instruct v0.1
309
+
310
+
311
+ # Model Card for Mistral-7B-Instruct-v0.1
312
+
313
+ The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) generative text model using a variety of publicly available conversation datasets.
314
+
315
+ For full details of this model please read our [release blog post](https://mistral.ai/news/announcing-mistral-7b/)
316
+
317
+ ## Instruction format
318
+
319
+ In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[\INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
320
+
321
+ E.g.
322
+ ```
323
+ text = "<s>[INST] What is your favourite condiment? [/INST]"
324
+ "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
325
+ "[INST] Do you have mayonnaise recipes? [/INST]"
326
+ ```
327
+
328
+ This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
329
+
330
+ ```python
331
+ from transformers import AutoModelForCausalLM, AutoTokenizer
332
+
333
+ device = "cuda" # the device to load the model onto
334
+
335
+ model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
336
+ tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
337
+
338
+ messages = [
339
+ {"role": "user", "content": "What is your favourite condiment?"},
340
+ {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
341
+ {"role": "user", "content": "Do you have mayonnaise recipes?"}
342
+ ]
343
+
344
+ encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
345
+
346
+ model_inputs = encodeds.to(device)
347
+ model.to(device)
348
+
349
+ generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
350
+ decoded = tokenizer.batch_decode(generated_ids)
351
+ print(decoded[0])
352
+ ```
353
+
354
+ ## Model Architecture
355
+ This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices:
356
+ - Grouped-Query Attention
357
+ - Sliding-Window Attention
358
+ - Byte-fallback BPE tokenizer
359
+
360
+ ## Troubleshooting
361
+ - If you see the following error:
362
+ ```
363
+ Traceback (most recent call last):
364
+ File "", line 1, in
365
+ File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
366
+ config, kwargs = AutoConfig.from_pretrained(
367
+ File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained
368
+ config_class = CONFIG_MAPPING[config_dict["model_type"]]
369
+ File "/transformers/models/auto/configuration_auto.py", line 723, in getitem
370
+ raise KeyError(key)
371
+ KeyError: 'mistral'
372
+ ```
373
+
374
+ Installing transformers from source should solve the issue
375
+ pip install git+https://github.com/huggingface/transformers
376
+
377
+ This should not be required after transformers-v4.33.4.
378
+
379
+ ## Limitations
380
+
381
+ The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
382
+ It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
383
+ make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
384
+
385
+ ## The Mistral AI Team
386
+
387
+ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 4096,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 14336,
11
+ "max_position_embeddings": 32768,
12
+ "model_type": "mistral",
13
+ "num_attention_heads": 32,
14
+ "num_hidden_layers": 32,
15
+ "num_key_value_heads": 8,
16
+ "rms_norm_eps": 1e-05,
17
+ "rope_theta": 10000.0,
18
+ "sliding_window": 4096,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "bfloat16",
21
+ "transformers_version": "4.34.0.dev0",
22
+ "use_cache": true,
23
+ "vocab_size": 32000,
24
+ "pretraining_tp": 1,
25
+ "pad_token_id": 0,
26
+ "quantization_config": {
27
+ "bits": 4,
28
+ "group_size": 128,
29
+ "damp_percent": 0.1,
30
+ "desc_act": true,
31
+ "sym": true,
32
+ "true_sequential": true,
33
+ "model_name_or_path": null,
34
+ "model_file_base_name": "model",
35
+ "quant_method": "gptq"
36
+ }
37
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.34.0.dev0"
6
+ }
quantize_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.1,
5
+ "desc_act": true,
6
+ "sym": true,
7
+ "true_sequential": true,
8
+ "model_name_or_path": null,
9
+ "model_file_base_name": "model"
10
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "unk_token": "<unk>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": true,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": null,
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": true
43
+ }