|
--- |
|
base_model: |
|
- TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
datasets: |
|
- HuggingFaceH4/ultrachat_200k |
|
library_name: peft |
|
tags: |
|
- ultrachat |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This is a **quantized adapter** trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model. |
|
|
|
```python |
|
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' |
|
``` |
|
|
|
## Model Details |
|
|
|
Base model was quantized using BitsAndBytes |
|
|
|
```python |
|
from bitsandbytes import BitsAndBytesConfig |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, # Use 4-bit precision model loading |
|
bnb_4bit_quant_type="nf4", # Quantization type |
|
bnb_4bit_compute_dtype="float16", # Compute data type |
|
bnb_4bit_use_double_quant=True # Apply nested quantization |
|
) |
|
``` |
|
|
|
### Model Description |
|
This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model. |
|
|
|
- Finetuned from model : [TinyLlama](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) |
|
|
|
### How to use |
|
|
|
#### Method 1: Direct loading via AutoPeftModel |
|
```python |
|
from peft import PeftModel, AutoPeftModelForCausalLM |
|
from transformers import pipeline, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") |
|
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' |
|
model = AutoPeftModelForCausalLM.from_pretrained( |
|
adapter_name, |
|
device_map="auto" |
|
) |
|
model = model.merge_and_unload() |
|
|
|
prompt = """<|user|> |
|
Tell me something about Large Language Models.</s> |
|
<|assistant|> |
|
""" |
|
|
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer) |
|
print(pipe(prompt)[0]["generated_text"]) |
|
|
|
``` |
|
|
|
### Method 2: direct loading AutoModel |
|
|
|
```python |
|
model = AutoModelForCausalLM.from_pretrained(adapter_name, |
|
device_map="auto" |
|
) |
|
|
|
prompt = """<|user|> |
|
Tell me something about Large Language Models.</s> |
|
<|assistant|> |
|
""" |
|
|
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer) |
|
print(pipe(prompt)[0]["generated_text"]) |
|
``` |
|
|
|
#### Method 3: Using peftModel |
|
|
|
```python |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, # Use 4-bit precision model loading |
|
bnb_4bit_quant_type="nf4", # Quantization type |
|
bnb_4bit_compute_dtype="float16", # Compute dtype |
|
bnb_4bit_use_double_quant=True, # Apply nested quantization |
|
) |
|
|
|
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T" |
|
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat' |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, quantization_config=bnb_config,) |
|
|
|
model = PeftModel.from_pretrained( |
|
model,adapter_name |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") |
|
|
|
prompt = """<|user|> |
|
Tell me something about Large Language Models.</s> |
|
<|assistant|> |
|
""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
input_ids=inputs['input_ids'], |
|
temperature=0.7, # Controls randomness: lower = more deterministic |
|
top_p=0.9, # Nucleus sampling |
|
top_k=50, # Top-K sampling |
|
num_return_sequences=1,) |
|
for i, output in enumerate(outputs): |
|
generated_text = tokenizer.decode(output, skip_special_tokens=True) |
|
print(f"--- Generated Sequence {i + 1} ---") |
|
print(generated_text) |
|
|
|
``` |