File size: 3,605 Bytes
0ab7ab0 4c81d3e 0b752fa a97d8aa fdba0f4 0ab7ab0 fde878a 0ab7ab0 21541f1 0ab7ab0 da4d410 0ab7ab0 da4d410 0ab7ab0 da4d410 8e8b72d 0ab7ab0 da4d410 0ab7ab0 21541f1 c2eecdb 21541f1 b2b1191 c2eecdb b2b1191 c2eecdb af9a42b c2eecdb af9a42b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
---
base_model:
- TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
datasets:
- HuggingFaceH4/ultrachat_200k
library_name: peft
tags:
- ultrachat
pipeline_tag: text-generation
---
# Model Card for Model ID
This is a **quantized adapter** trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.
```python
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
```
## Model Details
Base model was quantized using BitsAndBytes
```python
from bitsandbytes import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, # Use 4-bit precision model loading
bnb_4bit_quant_type="nf4", # Quantization type
bnb_4bit_compute_dtype="float16", # Compute data type
bnb_4bit_use_double_quant=True # Apply nested quantization
)
```
### Model Description
This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.
- Finetuned from model : [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T)
### How to use
#### Method 1: Direct loading via AutoPeftModel
```python
from peft import PeftModel, AutoPeftModelForCausalLM
from transformers import pipeline, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoPeftModelForCausalLM.from_pretrained(
adapter_name,
device_map="auto"
)
model = model.merge_and_unload()
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])
```
### Method 2: direct loading AutoModel
```python
model = AutoModelForCausalLM.from_pretrained(adapter_name,
device_map="auto"
)
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])
```
#### Method 3: Using peftModel
```python
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, # Use 4-bit precision model loading
bnb_4bit_quant_type="nf4", # Quantization type
bnb_4bit_compute_dtype="float16", # Compute dtype
bnb_4bit_use_double_quant=True, # Apply nested quantization
)
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoModelForCausalLM.from_pretrained(
model_name, quantization_config=bnb_config,)
model = PeftModel.from_pretrained(
model,adapter_name
)
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids=inputs['input_ids'],
temperature=0.7, # Controls randomness: lower = more deterministic
top_p=0.9, # Nucleus sampling
top_k=50, # Top-K sampling
num_return_sequences=1,)
for i, output in enumerate(outputs):
generated_text = tokenizer.decode(output, skip_special_tokens=True)
print(f"--- Generated Sequence {i + 1} ---")
print(generated_text)
``` |