Model created using AutoGPTQ on a GPT-2 model with 4-bit quantization.

You can load this model with the AutoGPTQ library, installed with the following command:

pip install auto-gptq

You can then download the model from the hub using the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "mlabonne/gpt2-GPTQ-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
quantize_config = BaseQuantizeConfig.from_pretrained(model_name)
model = AutoGPTQForCausalLM.from_quantized(model_name,
                                           model_basename="gptq_model-4bit-128g",
                                           device="cuda:0",
                                           use_triton=True,
                                           use_safetensors=True,
                                           quantize_config=quantize_config)

This model works with the traditional Text Generation pipeline.

Example of generation with the input text "I have a dream":

I have a dream. I want someone with my face, and what I have. I want to go home. I want to be alive. I want to see my children. I dream if I have the spirit, my body, my voice,
Downloads last month
20
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.