File size: 5,005 Bytes
83679c8 d512a31 587bc22 d512a31 587bc22 ea4f9c5 83679c8 38fa6a3 d512a31 8f36d98 d512a31 4e05be8 22eb57d d512a31 69cd23f d512a31 a01ff98 d512a31 fd75c58 38fa6a3 d512a31 38fa6a3 fd75c58 38fa6a3 d512a31 052f4d9 d512a31 38fa6a3 cc293e4 d512a31 208275f d512a31 972169e d512a31 505f6e0 d512a31 505f6e0 d512a31 e4816bf 587bc22 d512a31 587bc22 d512a31 587bc22 d512a31 587bc22 d512a31 cc293e4 d512a31 587bc22 d512a31 e4816bf d512a31 e4816bf d512a31 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
---
license: bigscience-openrail-m
language:
- en
- zh
- ja
tags:
- sft
pipeline_tag: text-generation
widget:
- text: >-
<|prompter|>What is a meme, and what's the history behind this
word?</s><|assistant|>
- text: <|prompter|>What's the Earth total population</s><|assistant|>
- text: >-
<|prompter|>Write a story about future of AI
development</s><|assistant|>
datasets:
- OpenAssistant/oasst1
- databricks/databricks-dolly-15k
- anon8231489123/ShareGPT_Vicuna_unfiltered
- LIUM/tedlium
- theblackcat102/joke_explaination
---
# Bloom-3B SFT model
![conversation example](https://huggingface.co/ikala/bloom-zh-3b-chat/resolve/main/bloom-chat-example.png)
It is based on a Bloom-zh's 3B that was fine-tuned on human demonstrations
of assistant conversations collected through the
[https://open-assistant.io/](https://open-assistant.io/) human feedback web
app before April 12, 2023.
supervised finetune on sequence length of 5120
## Model Details
- **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/team) and [iKala](https://ikala.ai/)
- **Model type:** Transformer-based Language Model
- **Language:** English, Chinese, Japanese
- **Finetuned from:** [ckip-joint/bloom-3b-zh](https://huggingface.co/ckip-joint/bloom-3b-zh)
- **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
- **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-3b-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license)), Non commercial
## Prompting
Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `</s>` token.
Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should
start generating the assistant reply.
## Benchmark
| model | MMLU | BBH | Humaneval @10 |
|---|---|---|---|
| [ikala/redpajama-3b-chat](https://huggingface.co/ikala/redpajama-3b-chat) | 24.6 | 29.3 | 4.8 |
| [ikala/bloom-zh-3b-chat](https://huggingface.co/ikala/bloom-zh-3b-chat) | 31.4 | 30.2 | 0.0 |
| llama-7b (reference) | 30.9 | 27.6 | 10.3 |
## Dev Details
- base model: [ckip-joint/bloom-3b-zh](https://huggingface.co/ckip-joint/bloom-3b-zh)
- checkpoint: 1 epoch (6000 steps)
- hardware: NVIDIA RTX A6000 x 4
command: `deepspeed trainer_sft.py --configs defaults bloom-zh-3b datasets --num_train_epochs 2 --deepspeed`
data:
```
datasets:
- wmt2019_zh-en:
max_val_set: 1000
max_train_set: 20000
- ted_trans_en-ja:
max_val_set: 1000
max_train_set: 20000
- ted_trans_zh-ja:
max_val_set: 1000
max_train_set: 20000
- ikala:
input_file_path: export_conversation_v4.4.jsonl
val_split: 0.05
- dolly15k:
val_split: 0.05
- oasst_export:
lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
val_split: 0.05
- joke
- gsm8k
- webgpt
```
with internal datasets `ikala` so if you try to reproduce please remove the dataset
bloom-zh-3b:
```
bloom-zh-3b:
dtype: fp16
log_dir: "bloom-zh_3b"
learning_rate: 8e-6
model_name: ckip-joint/bloom-3b-zh
output_dir: bloom_model_v4_3b
weight_decay: 0.0
max_length: 5120
warmup_steps: 2000
gradient_checkpointing: true
gradient_accumulation_steps: 32
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
eval_steps: 500
save_steps: 1000
num_train_epochs: 8
save_total_limit: 2
deepspeed_config: configs/zero3_config_sft.json
```
zero config:
```
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": "auto"
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"warmup_type": "linear",
"total_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 3,
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
```
|