--- license: apache-2.0 --- **Note: internal model, not ready for use** This is an intermediate model used as base-model for further pythia 12b SFT-8 experiments. It was trained on a wider set of instruction-tuning datasets for >12.5k steps with batch-size 128 and a context size of 2048. The gpt4all dataset had "as a language model" *contamination* (>1.8k entries). We added filtering later, but this model (pre-v8) was trained on the raw unfildered gpt4all dataset. - wandb: https://wandb.ai/open-assistant/supervised-finetuning/runs/sytsyhrp - [sampling report](https://open-assistant.github.io/oasst-model-eval/?f=https%3A%2F%2Fraw.githubusercontent.com%2FOpen-Assistant%2Foasst-model-eval%2Fmain%2Fsampling_reports%2Foasst-pretrained%2F2023-05-05_OpenAssistant_pythia-12b-pre-v8-12_5k-steps_sampling_noprefix2.json) Datasets: ``` pretrain: num_train_epochs: 1 weight_decay: 0.0 use_custom_sampler: true sort_by_length: false datasets: - gpteacher_roleplay: val_split: 0.05 - red_pajama: fraction: 0.25 max_val_set: 1000 - wizardlm_70k: val_split: 0.05 max_val_set: 500 - joke: val_split: 0.05 - poem_instructions: val_split: 0.025 - oa_stackexchange: val_split: 0.05 fraction: 0.1 max_val_set: 1000 - tell_a_joke: val_split: 0.05 max_val_set: 250 - webgpt: val_split: 0.05 max_val_set: 250 - gpt4all: val_split: 0.01 max_val_set: 1000 - alpaca_gpt4: val_split: 0.025 max_val_set: 250 - code_alpaca: val_split: 0.05 max_val_set: 250 - vicuna: max_val_set: 250 - oig_file: source_url: https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl max_count: 10000 min_length: 250 val_split: 0.05 max_val_set: 250 - minimath: val_split: 0.05 - humaneval_mbpp_codegen_qa: val_split: 0.05 - humaneval_mbpp_testgen_qa: val_split: 0.05 - grade_school_math_instructions: val_split: 0.05 - recipes: val_split: 0.05 - cmu_wiki_qa: val_split: 0.05 - oa_wiki_qa_bart_10000row: val_split: 0.05 max_val_set: 250 - prosocial_dialogue: fraction: 0.1 max_val_set: 250 - explain_prosocial: fraction: 0.075 max_val_set: 250 - soda: fraction: 0.25 max_val_set: 1000 - oa_leet10k: val_split: 0.05 max_val_set: 250 - dolly15k: val_split: 0.05 max_val_set: 300 ``` Pythia: ``` pythia-12b-pretrain: dtype: fp16 log_dir: "pythia_log_12b" learning_rate: 6e-6 model_name: EleutherAI/pythia-12b-deduped output_dir: pythia_model_12b weight_decay: 0.0 max_length: 2048 warmup_steps: 100 gradient_checkpointing: true gradient_accumulation_steps: 4 per_device_train_batch_size: 4 per_device_eval_batch_size: 4 eval_steps: 251 save_steps: 500 num_train_epochs: 1 save_total_limit: 2 deepspeed_config: configs/zero_config_pretrain.json ``` Command used: `deepspeed trainer_sft.py --show_dataset_stats --configs defaults pythia-12b-pretrain pretrain --cache_dir .cache/ --output_dir .saved/pythia-12b-super-pretrain2 --deepspeed` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_OpenAssistant__pythia-12b-pre-v8-12.5k-steps) | Metric | Value | |-----------------------|---------------------------| | Avg. | 35.93 | | ARC (25-shot) | 41.47 | | HellaSwag (10-shot) | 68.8 | | MMLU (5-shot) | 26.58 | | TruthfulQA (0-shot) | 36.82 | | Winogrande (5-shot) | 65.27 | | GSM8K (5-shot) | 7.66 | | DROP (3-shot) | 4.89 |