TheBloke commited on
Commit
a69efb6
·
1 Parent(s): 9cba0d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -84,6 +84,8 @@ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwa
84
 
85
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
86
 
 
 
87
  ## Explanation of quantisation methods
88
  <details>
89
  <summary>Click to see details</summary>
@@ -186,12 +188,12 @@ Windows Command Line users: You can set the environment variable by running `set
186
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
187
 
188
  ```shell
189
- ./main -ngl 32 -m mistral-7b-v0.1.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
190
  ```
191
 
192
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
193
 
194
- Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
195
 
196
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
197
 
@@ -207,6 +209,8 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
207
 
208
  ### How to load this model in Python code, using ctransformers
209
 
 
 
210
  #### First install the package
211
 
212
  Run one of the following commands, according to your system:
 
84
 
85
  They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
86
 
87
+ Sequence length note: The model will work at sequence lengths of 4096, or lower. GGUF does not yet have support for the new sliding window sequence length mode, so longer sequence lengths are not supported.
88
+
89
  ## Explanation of quantisation methods
90
  <details>
91
  <summary>Click to see details</summary>
 
188
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
189
 
190
  ```shell
191
+ ./main -ngl 32 -m mistral-7b-v0.1.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
192
  ```
193
 
194
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
195
 
196
+ Sequence length can be 4096 or lower. Mistral's sliding window sequence length is not yet supported in llama.cpp, so sequence lengths longer than 4096 are not supported.
197
 
198
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
199
 
 
209
 
210
  ### How to load this model in Python code, using ctransformers
211
 
212
+ Note: I have not tested ctransformers with Mistral models, but it may work if you set the `model_type` to `llama`.
213
+
214
  #### First install the package
215
 
216
  Run one of the following commands, according to your system: