File size: 3,268 Bytes
fc43ce2 a88107e c04d8ce e17a614 fc43ce2 b4f00ea 2da83ca b4f00ea fc43ce2 a88107e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
library_name: peft
datasets:
- Xilabs/instructmix
pipeline_tag: text-generation
base_model: openlm-research/open_llama_3b_v2
---
## Model Card for "InstructMix Llama 3B"
**Model Name:** InstructMix Llama 3B
**Description:**
InstructMix Llama 3B is a language model fine-tuned on the InstructMix dataset using parameter-efficient fine-tuning (PEFT), using the base model "openlm-research/open_llama_3b_v2," which can be found at [https://huggingface.co/openlm-research/open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2).
An easy way to use InstructMix Llama 3B is via the API: https://replicate.com/ritabratamaiti/instructmix-llama-3b
**Usage:**
```py
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
from peft import PeftModel, PeftConfig
# Hugging Face model_path
model_path = 'openlm-research/open_llama_3b_v2'
peft_model_id = 'Xilabs/instructmix-llama-3b'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
model_path, device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)
def generate_prompt(instruction, input=None):
if input:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""
def evaluate(
instruction,
input=None,
temperature=0.5,
top_p=0.75,
top_k=40,
num_beams=5,
max_new_tokens=128,
**kwargs,
):
prompt = generate_prompt(instruction, input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to("cuda")
generation_config = GenerationConfig(
temperature=temperature,
top_p=top_p,
top_k=top_k,
num_beams=num_beams,
early_stopping=True,
repetition_penalty=1.1,
**kwargs,
)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=max_new_tokens,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s, skip_special_tokens = True)
#print(output)
return output.split("### Response:")[1]
instruction = "What is the meaning of life?"
print(evaluate(instruction, num_beams=3, temperature=0.1, max_new_tokens=256))
```
## Training procedure
The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
### Framework versions
- PEFT 0.4.0 |