File size: 3,319 Bytes
7163466 934bda9 39f6fc8 934bda9 234fb92 cf790b9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
license: apache-2.0
---
**Note: internal model, not ready for use**
This is an intermediate model used as base-model for further pythia 12b SFT-8 experiments.
It was trained on a wider set of instruction-tuning datasets for >12.5k steps with batch-size 128 and a context size of 2048.
The gpt4all dataset had "as a language model" *contamination* (>1.8k entries). We added filtering later, but this model (pre-v8) was trained on the raw unfildered gpt4all dataset.
- wandb: https://wandb.ai/open-assistant/supervised-finetuning/runs/sytsyhrp
- [sampling report](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-pretrained%2F2023-05-05_OpenAssistant_pythia-12b-pre-v8-12_5k-steps_sampling_noprefix2.json)
Datasets:
```
pretrain:
num_train_epochs: 1
weight_decay: 0.0
use_custom_sampler: true
sort_by_length: false
datasets:
- gpteacher_roleplay:
val_split: 0.05
- red_pajama:
fraction: 0.25
max_val_set: 1000
- wizardlm_70k:
val_split: 0.05
max_val_set: 500
- joke:
val_split: 0.05
- poem_instructions:
val_split: 0.025
- oa_stackexchange:
val_split: 0.05
fraction: 0.1
max_val_set: 1000
- tell_a_joke:
val_split: 0.05
max_val_set: 250
- webgpt:
val_split: 0.05
max_val_set: 250
- gpt4all:
val_split: 0.01
max_val_set: 1000
- alpaca_gpt4:
val_split: 0.025
max_val_set: 250
- code_alpaca:
val_split: 0.05
max_val_set: 250
- vicuna:
max_val_set: 250
- oig_file:
source_url: https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl
max_count: 10000
min_length: 250
val_split: 0.05
max_val_set: 250
- minimath:
val_split: 0.05
- humaneval_mbpp_codegen_qa:
val_split: 0.05
- humaneval_mbpp_testgen_qa:
val_split: 0.05
- grade_school_math_instructions:
val_split: 0.05
- recipes:
val_split: 0.05
- cmu_wiki_qa:
val_split: 0.05
- oa_wiki_qa_bart_10000row:
val_split: 0.05
max_val_set: 250
- prosocial_dialogue:
fraction: 0.1
max_val_set: 250
- explain_prosocial:
fraction: 0.075
max_val_set: 250
- soda:
fraction: 0.25
max_val_set: 1000
- oa_leet10k:
val_split: 0.05
max_val_set: 250
- dolly15k:
val_split: 0.05
max_val_set: 300
```
Pythia:
```
pythia-12b-pretrain:
dtype: fp16
log_dir: "pythia_log_12b"
learning_rate: 6e-6
model_name: EleutherAI/pythia-12b-deduped
output_dir: pythia_model_12b
weight_decay: 0.0
max_length: 2048
warmup_steps: 100
gradient_checkpointing: true
gradient_accumulation_steps: 4
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
eval_steps: 251
save_steps: 500
num_train_epochs: 1
save_total_limit: 2
deepspeed_config: configs/zero_config_pretrain.json
```
Command used: `deepspeed trainer_sft.py --show_dataset_stats --configs defaults pythia-12b-pretrain pretrain --cache_dir .cache/ --output_dir .saved/pythia-12b-super-pretrain2 --deepspeed` |