File size: 3,688 Bytes
b6b66f5
d4ed008
 
 
 
 
 
 
 
 
 
2371c5a
d4ed008
b6b66f5
d4ed008
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
tags:
- generated_from_trainer
model-index:
- name: Qra-7b-dolly-instruction-0.1
  results: []
datasets:
- s3nh/alpaca-dolly-instruction-only-polish
language:
- pl
inference: true
license: llama2
pipeline_tag: text-generation
---

# Qra-7b-dolly-instruction-0.1

This model if a fine-tuned version of [OPI-PG/Qra-7b](https://huggingface.co/OPI-PG/Qra-7b) on the [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co/datasets/s3nh/alpaca-dolly-instruction-only-polish) dataset.

## Model Description

Trained from [OPI-PG/Qra-7b](https://huggingface.co/OPI-PG/Qra-7b)

## Intended uses & limitations

This model has been fine-tuned for question-answering task. It is possible to use it as a chat, but it doesn't work well because the dataset did not contain conversations.

```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "nie3e/Qra-7b-dolly-instruction-0.1"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, device=device
)

def get_answer(system_prompt: str, user_prompt: str) -> str:
    input_msg = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    prompt = pipe.tokenizer.apply_chat_template(
        input_msg, tokenize=False,
        add_generation_prompt=True
    )
    outputs = pipe(
        prompt, max_new_tokens=512, do_sample=False, temperature=0.1, top_k=50,
        top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id,
        pad_token_id=pipe.tokenizer.pad_token_id
    )
    return outputs[0]['generated_text'][len(prompt):].strip()

print(
        get_answer(
        system_prompt="Jesteś przyjaznym chatbotem",
        user_prompt="Napisz czym jest dokument architectural decision record."
    )
)
```

## Training and evaluation data

Dataset: [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co/datasets/s3nh/alpaca-dolly-instruction-only-polish)

Each row has been converted into conversation using this function:
```py
system_message = """Jesteś przyjaznym chatbotem"""

def create_conversation(sample) -> dict:
    strip_characters = "\"'"
    return {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user",
             "content": f"{sample['instruction'].strip(strip_characters)} "
                        f"{sample['input'].strip(strip_characters)}"},
            {"role": "assistant",
             "content": f"{sample['output'].strip(strip_characters)}"}
        ]
    }
```

Train/test split: 90%/10%

## Training procedure

GPU: 2x RTX 4060Ti 16GB
Training time: ~13 hours

Using `device_map="auto"`

### Training hyperparameters

Lora config:
```py
peft_config = LoraConfig(
    lora_alpha=128,
    lora_dropout=0.05,
    r=256,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM"
)
```

Training arguments:
```py
args = TrainingArguments(
    output_dir="Qra-7b-dolly-instruction-0.1",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=6,
    gradient_checkpointing=True,
    optim="adamw_torch_fused",
    logging_steps=10,
    save_strategy="epoch",
    learning_rate=2e-4,
    bf16=True,
    tf32=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="constant",
    push_to_hub=False,
    report_to=["tensorboard"],
)
```


### Framework versions

- Transformers 4.39.2
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2