|
--- |
|
license: mit |
|
language: |
|
- en |
|
- ru |
|
tags: |
|
- gpt3 |
|
- transformers |
|
--- |
|
|
|
# ruGPT-13B-4bit |
|
This files are GPTQ model files for sberbank [ruGPT-3.5-13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) model. |
|
|
|
## Technical details |
|
Model was quantized to 4-bit with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library |
|
|
|
## Examples of usage |
|
First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed: |
|
|
|
GITHUB_ACTIONS=true pip install auto-gptq |
|
|
|
Then try the following example code: |
|
|
|
```python |
|
from transformers import AutoTokenizer, TextGenerationPipeline |
|
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig |
|
repo_name = "gurgutan/ruGPT-13B-4bit" |
|
# load tokenizer from Hugging Face Hub |
|
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=True) |
|
# download quantized model from Hugging Face Hub and load to the first GPU |
|
model = AutoGPTQForCausalLM.from_quantized(repo_name, device="cuda:0", use_safetensors=True, use_triton=False) |
|
# inference with model.generate |
|
request = "Буря мглою небо кроет" |
|
print(tokenizer.decode(model.generate(**tokenizer(request, return_tensors="pt").to(model.device))[0])) |
|
# or you can also use pipeline |
|
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer) |
|
print(pipeline(request)[0]["generated_text"]) |
|
|
|
``` |
|
# Original model: [ruGPT-3.5 13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) |
|
Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the [article](https://habr.com/ru/companies/sberbank/articles/730108/)). |
|
|