gurgutan
/

ruGPT-13B-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

ruGPT-13B-4bit / README.md

gurgutan's picture

Update README.md

9b949f3 over 1 year ago

|

history blame contribute delete

1.65 kB

	---
	license: mit
	language:
	- en
	- ru
	tags:
	- gpt3
	- transformers
	---

	# ruGPT-13B-4bit
	This files are GPTQ model files for sberbank [ruGPT-3.5-13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B) model.

	## Technical details
	Model was quantized to 4-bit with [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) library

	## Examples of usage
	First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:

	GITHUB_ACTIONS=true pip install auto-gptq

	Then try the following example code:

	```python
	from transformers import AutoTokenizer, TextGenerationPipeline
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
	repo_name = "gurgutan/ruGPT-13B-4bit"
	# load tokenizer from Hugging Face Hub
	tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=True)
	# download quantized model from Hugging Face Hub and load to the first GPU
	model = AutoGPTQForCausalLM.from_quantized(repo_name, device="cuda:0", use_safetensors=True, use_triton=False)
	# inference with model.generate
	request = "Буря мглою небо кроет"
	print(tokenizer.decode(model.generate(**tokenizer(request, return_tensors="pt").to(model.device))[0]))
	# or you can also use pipeline
	pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
	print(pipeline(request)[0]["generated_text"])

	```
	# Original model: [ruGPT-3.5 13B](https://huggingface.co/ai-forever/ruGPT-3.5-13B)
	Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the [article](https://habr.com/ru/companies/sberbank/articles/730108/)).