|
--- |
|
license: unknown |
|
--- |
|
|
|
[ehartford/WizardLM-7B-Uncensored](https://huggingface.co/ehartford/WizardLM-7B-Uncensored) quantized to **8bit GPTQ** with act order + true sequential, no group size. |
|
|
|
*For most uses this probably isn't what you want.* \ |
|
*For 4bit with no act order or compatibility with `old-cuda` (text-generation-webui default) see [TheBloke/WizardLM-7B-uncensored-GPTQ](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ)* |
|
|
|
Quantized using AutoGPTQ with the following config: |
|
```python |
|
config: dict = dict( |
|
quantize_config=dict(bits=8, desc_act=True, true_sequential=True, model_file_base_name='WizardLM-7B-Uncensored'), |
|
use_safetensors=True |
|
) |
|
``` |
|
See `quantize.py` for the full script. |
|
|
|
Tested for compatibility with: |
|
- WSL with GPTQ-for-Llama `triton` branch. |
|
- Windows with AutoGPTQ on `cuda` (triton deselected) |
|
|
|
AutoGPTQ loader should read configuration from `quantize_config.json`.\ |
|
For GPTQ-for-Llama use the following configuration when loading:\ |
|
wbits: 8\ |
|
groupsize: None\ |
|
model_type: llama |
|
|
|
|