tomo1222
/

gemma-2-27b-bf16-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

gemma-2-27b-bf16-4bit / README.md

tomo1222's picture

Update README.md

57875eb verified 2 months ago

|

history blame contribute delete

757 Bytes

	---
	library_name: transformers
	license: gemma
	---

	# Model Card for Model ID

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is a model with 4-bit quantization using the following code.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers import BitsAndBytesConfig
	import torch

	model_name = "google/gemma-2-27b"
	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4"
	)

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=quantization_config,
	device_map="auto"
	)


	```