--- license: mit language: - en - ru tags: - gpt3 - transformers --- # ruGPT-13B-4bit This files are GPTQ model files for sberbank [ruGPT-3.5-13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) model. ## Technical details Model was quantized to 4-bit with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library ## Examples of usage First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed: GITHUB_ACTIONS=true pip install auto-gptq Then try the following example code: ```python from transformers import AutoTokenizer, TextGenerationPipeline from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig repo_name = "gurgutan/ruGPT-13B-4bit" # load tokenizer from Hugging Face Hub tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=True) # download quantized model from Hugging Face Hub and load to the first GPU model = AutoGPTQForCausalLM.from_quantized(repo_name, device="cuda:0", use_safetensors=True, use_triton=False) # inference with model.generate request = "Буря мглою небо кроет" print(tokenizer.decode(model.generate(**tokenizer(request, return_tensors="pt").to(model.device))[0])) # or you can also use pipeline pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer) print(pipeline(request)[0]["generated_text"]) ``` # Original model: [ruGPT-3.5 13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the [article](https://habr.com/ru/companies/sberbank/articles/730108/)).