File size: 4,059 Bytes
60634a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language:
- en
library_name: peft
pipeline_tag: text-generation
tags:
- medical
license: cc-by-nc-3.0
---

# MedFalcon v2.1a 40b LoRA - Step 4500

![img.png](img.png)

## Model Description

This a model check point release at 4500 steps. For evaluation use only! Limitations: 
* LoRA output will be more concise than the base model
* Due to the size, base knowledge may be overwritten from falcon-40b
* Due to the size, more hardware may be required to load falcon-40b when using this LoRA

### Architecture
`nmitchko/medfalconv2-1a-40b-lora'` is a large language model LoRa specifically fine-tuned for medical domain tasks.
It is based on [`Falcon-40b`](https://huggingface.co/tiiuae/falcon-40b) at 40 billion parameters.

The primary goal of this model is to improve question-answering and medical dialogue tasks.
It was trained using [LoRA](https://arxiv.org/abs/2106.09685), specifically [QLora](https://github.com/artidoro/qlora), to reduce memory footprint. 

See Training Parameters for more info  This Lora supports 4-bit and 8-bit modes.

### Requirements

```
bitsandbytes>=0.39.0
peft
transformers
```

Steps to load this model:
1. Load base model using transformers
2. Apply LoRA using peft

```python
# 
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
from peft import PeftModel

model = "tiiuae/falcon-40b"
LoRA = "nmitchko/medfalconv2-1a-40b-lora"

# If you want 8 or 4 bit set the appropriate flags
load_8bit = True

tokenizer = AutoTokenizer.from_pretrained(model)

model = AutoModelForCausalLM.from_pretrained(model,
    load_in_8bit=load_8bit,
    torch_dtype=torch.float16,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, LoRA)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

sequences = pipeline(
   "What does the drug ceftrioxone do?\nDoctor:",
    max_length=200,
    do_sample=True,
    top_k=40,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")
```

## Training Parameters 

The model was trained for 4500 steps or 1 epoch on a custom, unreleased dataset named `medconcat`. 
`medconcat` contains only human generated content and weighs in at over 100MiB of raw text.  

The below bash script initiated training in `4bit` mode for a rather large LoRA:

| Item          | Amount | Units |
|---------------|--------|-------|
| LoRA Rank     | 128    | ~     |
| LoRA Alpha    | 256    | ~     |
| Learning Rate | 1e-3   | SI    |
| Dropout       | 5      | %     |


```bash
CURRENTDATEONLY=`date +"%b %d %Y"`

sudo nvidia-smi -i 1 -pl 250

export CUDA_VISIBLE_DEVICES=0

nohup python qlora.py \
    --model_name_or_path models/tiiuae_falcon-40b \
    --output_dir ./loras/medfalcon2.1a-40b \
    --logging_steps 100 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 200 \
    --save_total_limit 40 \
    --evaluation_strategy steps \
    --eval_dataset_size 1024 \
    --max_eval_samples 1000 \
    --per_device_eval_batch_size 1 \
    --max_new_tokens 32 \
    --dataloader_num_workers 3 \
    --group_by_length \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --lora_r 128 \
    --lora_alpha 256 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --dataset="training/datasets/medconcat/" \
    --dataset_format alpaca \
    --trust_remote_code=True \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --max_steps 4500 \
    --eval_steps 1000 \
    --learning_rate 0.0001 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.05 \
    --weight_decay 0.0 \
    --seed 0 > "${CURRENTDATEONLY}-finetune-medfalcon2.1a.log" &
```