--- license: mit datasets: - wikitext --- [gpt2-medium](https://huggingface.co/openai-community/gpt2-medium) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). To use, first install AutoGPTQ: ```shell pip install auto-gptq ``` Then load the model from the hub: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig model_name = "smpanaro/gpt2-medium-AutoGPTQ-4bit-128g" model = AutoGPTQForCausalLM.from_quantized(model_name) ``` |Model|4-Bit Perplexity|16-Bit Perplexity|Delta| |--|--|--|--| |[smpanaro/gpt2-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-AutoGPTQ-4bit-128g)|26.5000|25.1875|1.3125| |smpanaro/gpt2-medium-AutoGPTQ-4bit-128g|19.1719|18.4739|0.698| |[smpanaro/gpt2-large-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-large-AutoGPTQ-4bit-128g)|16.6875|16.4541|0.2334| |[smpanaro/gpt2-xl-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-xl-AutoGPTQ-4bit-128g)|14.9297|14.7951|0.1346| Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better