Update README.md
Browse files
README.md
CHANGED
@@ -181,7 +181,7 @@ print(pipe(prompt_template)[0]['generated_text'])
|
|
181 |
|
182 |
This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
|
183 |
|
184 |
-
It was created with group_size none (-1) to reduce VRAM usage, and with --act-order (desc_act) to
|
185 |
|
186 |
* `gptq_model-4bit-128g.safetensors`
|
187 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
@@ -198,7 +198,7 @@ This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not*
|
|
198 |
|
199 |
It was created with both group_size 128g and --act-order (desc_act) for increased inference quality.
|
200 |
|
201 |
-
|
202 |
|
203 |
* `gptq_model-4bit-128g.safetensors`
|
204 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
|
|
181 |
|
182 |
This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
|
183 |
|
184 |
+
It was created with group_size none (-1) to reduce VRAM usage, and with --act-order (desc_act) to improve accuracy of responses.
|
185 |
|
186 |
* `gptq_model-4bit-128g.safetensors`
|
187 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
|
|
198 |
|
199 |
It was created with both group_size 128g and --act-order (desc_act) for increased inference quality.
|
200 |
|
201 |
+
It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
|
202 |
|
203 |
* `gptq_model-4bit-128g.safetensors`
|
204 |
* Works with AutoGPTQ in CUDA or Triton modes.
|