TheBloke
/

vicuna-13B-v1.5-16K-GGML

Model card Files Files and versions Community

TheBloke commited on Aug 9, 2023

Commit

ea07ac9

·

1 Parent(s): f941f38

Initial GGML model commit

Files changed (1) hide show

README.md +6 -10

README.md CHANGED Viewed

@@ -56,15 +56,9 @@ ASSISTANT:
 <!-- compatibility_ggml start -->
 ## Compatibility
-### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
-These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
-### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
-These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
-They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
 ## Explanation of the new k-quant methods
 <details>
@@ -114,8 +108,12 @@ Change `-t 10` to the number of physical CPU cores you have. For example if your
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
 ## How to run in `text-generation-webui`
 Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
@@ -152,8 +150,6 @@ Thank you to all my generous patrons and donaters!
 # Original model card: lmsys's Vicuna 13B v1.5 16K
-**Note:** This is a preview version. A slightly better checkpoint will be uploaded soon.
 # Vicuna Model Card
 ## Model Details

 <!-- compatibility_ggml start -->
 ## Compatibility
+These quantised GGML files are compatible with llama.cpp as of June 6th, commit `2d43387`.
+They should also be compatible with all UIs, libraries and utilities which use GGML.
 ## Explanation of the new k-quant methods
 <details>
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
+Change `-c 2048` to the desired sequence length for this model. For example, `-c 4096` for a Llama 2 model.  For models that use RoPE, add `--rope-freq-base 10000 --rope-freq-scale 0.5` for doubled context, or `--rope-freq-base 10000 --rope-freq-scale 0.25` for 4x context.
 If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
+For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
 ## How to run in `text-generation-webui`
 Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
 # Original model card: lmsys's Vicuna 13B v1.5 16K
 # Vicuna Model Card
 ## Model Details