File size: 3,292 Bytes
76bee80 882a4ee 76bee80 c304d85 76bee80 93c7c03 76bee80 c304d85 76bee80 b5493a6 76bee80 b5493a6 76bee80 93c7c03 76bee80 ef14d95 9562e8f b5493a6 76bee80 c304d85 6094c85 9562e8f c304d85 93c7c03 c304d85 93c7c03 c304d85 78111ba c304d85 78111ba c304d85 76bee80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
license: apache-2.0
language:
- en
- fi
base_model:
- LumiOpen/Poro-34B
datasets:
- sablo/oasst2_curated
- LumiOpen/instruction-collection-fin
---
This is an SFT-tuned model of [Poro-34B](https://huggingface.co/LumiOpen/Poro-34B) with English and Finnish data. We trained this model as part of our experiments on the impact of multilingual instruction-tuning on Poro-34B. For a better chat experience, we recommend using [Poro-34B-chat](https://huggingface.co/LumiOpen/Poro-34B-chat) instead.
## Datasets
### SFT
We use a curated subset of Open Assistant 2 and translated the dataset into Finnish using Poro-34B.
- **English**: [oasst2_curated](https://huggingface.co/datasets/sablo/oasst2_curated)
- **Finnish**: [instruction-collection-fin](https://huggingface.co/datasets/LumiOpen/instruction-collection-fin) (oasst2 subset)
### DPO
We use the HelpSteer2 preference binarized into chosen-rejected pairs using the helpfulness score as recommended in the [HelpSteer2](https://arxiv.org/abs/2406.08673) paper. We translated the dataset into Finnish using Poro.
- **English**: [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
- **Finnish**: TBA
## Recipes
For finetuning, we used 4 nodes (8 x AMD MI250X) to obtain a global batch size of 128 for SFT and 64 for DPO. We used the [Alignment Handbook](https://github.com/huggingface/alignment-handbook/) codebase.
**SFT**
```
bf16: true
do_eval: true
evaluation_strategy: epoch
gradient_accumulation_steps: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
learning_rate: 2.0e-05
log_level: info
logging_steps: 50
logging_strategy: steps
lr_scheduler_type: cosine
max_seq_length: 2048
max_steps: -1
num_train_epochs: 3
output_dir: data/poro-sft-oasst2
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 2
remove_unused_columns: true
save_strategy: "epoch"
save_total_limit: 1
seed: 42
warmup_ratio: 0.1
```
**DPO**
```
bf16: true
beta: 0.05
do_eval: true
evaluation_strategy: epoch
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
learning_rate: 5.0e-7
log_level: info
logging_steps: 20
lr_scheduler_type: cosine
max_length: 1024
max_prompt_length: 512
num_train_epochs: 5
optim: adamw_torch
output_dir: data/poro-dpo-helpsteer2
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
save_strategy: "epoch"
save_total_limit: 1
seed: 42
warmup_ratio: 0.1
```
## Evaluation
We use [IFEval](https://huggingface.co/datasets/google/IFEval) to evaluate the performance of the model in English. For Finnish, we translated the IFEval prompts to [Finnish](https://huggingface.co/datasets/LumiOpen/ifeval_mt) with DeepL. We report the instruction-level strict accuracy:
- **English**: 0.3997
- **Finnish**: 0.3448
## Citation
We discuss our experimental setup and results in our NoDaLiDa 2025 paper.
```
@inproceedings{
zosa2024got,
title={Got Compute, but No Data: Lessons From Post-training a Finnish {LLM}},
author={Elaine Zosa and Ville Komulainen and Sampo Pyysalo},
booktitle={The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies},
year={2024},
url={https://openreview.net/forum?id=8wWlu1stNK}
}
``` |