Requirements
pip install -U transformers optimum auto-gptq
Transformers inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
device = "auto"
model_name = "jakiAJK/Microsoft-Phi-4-NyxKrage_GPTQ-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map= device, trust_remote_code= True, torch_dtype= dtype)
model.eval()
chat = [
{ "role": "user", "content": "List any 5 country capitals." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to('cuda')
output = model.generate(**input_tokens,
max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output)
- Downloads last month
- 279
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for jakiAJK/Microsoft-Phi-4-NyxKrage_GPTQ-4bit
Base model
NyxKrage/Microsoft_Phi-4