1kbooks
/

llm-jp-3-13b-finetuned-ver2

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

1kbooks commited on Dec 16, 2024

Commit

b4022cd

·

verified ·

1 Parent(s): 1ba11a0

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -45,7 +45,42 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]

 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+```python
+model_id = "1kbooks/llm-jp-3-13b-finetuned-ver2"
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=bnb_config,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+input  = "ここに指示を入力"
+with torch.no_grad():
+  prompt = f"""### 指示\n{input}\n### 回答\n"""
+  inputs = tokenizer([prompt], return_tensors = "pt").to(model.device)
+  tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
+  attention_mask = torch.ones_like(tokenized_input)
+  outputs = model.generate(
+      tokenized_input,
+      attention_mask=attention_mask,
+      max_new_tokens = 512,
+      use_cache = True,
+      do_sample=False,
+      repetition_penalty=1.2,
+      pad_token_id=tokenizer.eos_token_id
+  )
+  prediction = tokenizer.decode(outputs[0], skip_special_tokens=True).split('\n### 回答')[-1]
+print(prediction)
+```
 ### Downstream Use [optional]