Initial GGML model commit
Browse files
README.md
CHANGED
@@ -56,15 +56,9 @@ ASSISTANT:
|
|
56 |
<!-- compatibility_ggml start -->
|
57 |
## Compatibility
|
58 |
|
59 |
-
|
60 |
|
61 |
-
|
62 |
-
|
63 |
-
### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
|
64 |
-
|
65 |
-
These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
|
66 |
-
|
67 |
-
They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
|
68 |
|
69 |
## Explanation of the new k-quant methods
|
70 |
<details>
|
@@ -114,8 +108,12 @@ Change `-t 10` to the number of physical CPU cores you have. For example if your
|
|
114 |
|
115 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
116 |
|
|
|
|
|
117 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
118 |
|
|
|
|
|
119 |
## How to run in `text-generation-webui`
|
120 |
|
121 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
@@ -152,8 +150,6 @@ Thank you to all my generous patrons and donaters!
|
|
152 |
# Original model card: lmsys's Vicuna 13B v1.5 16K
|
153 |
|
154 |
|
155 |
-
**Note:** This is a preview version. A slightly better checkpoint will be uploaded soon.
|
156 |
-
|
157 |
# Vicuna Model Card
|
158 |
|
159 |
## Model Details
|
|
|
56 |
<!-- compatibility_ggml start -->
|
57 |
## Compatibility
|
58 |
|
59 |
+
These quantised GGML files are compatible with llama.cpp as of June 6th, commit `2d43387`.
|
60 |
|
61 |
+
They should also be compatible with all UIs, libraries and utilities which use GGML.
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
## Explanation of the new k-quant methods
|
64 |
<details>
|
|
|
108 |
|
109 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
110 |
|
111 |
+
Change `-c 2048` to the desired sequence length for this model. For example, `-c 4096` for a Llama 2 model. For models that use RoPE, add `--rope-freq-base 10000 --rope-freq-scale 0.5` for doubled context, or `--rope-freq-base 10000 --rope-freq-scale 0.25` for 4x context.
|
112 |
+
|
113 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
114 |
|
115 |
+
For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
|
116 |
+
|
117 |
## How to run in `text-generation-webui`
|
118 |
|
119 |
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
|
|
150 |
# Original model card: lmsys's Vicuna 13B v1.5 16K
|
151 |
|
152 |
|
|
|
|
|
153 |
# Vicuna Model Card
|
154 |
|
155 |
## Model Details
|