--- license: apache-2.0 base_model: amazingvince/zephyr-smol_llama-100m-sft-full tags: - generated_from_trainer model-index: - name: zephyr-smol_llama-100m-dpo-1-epoch results: [] --- # zephyr-smol_llama-100m-dpo-1-epoch This model is a fine-tuned version of [amazingvince/zephyr-smol_llama-100m-sft-full](https://huggingface.co/amazingvince/zephyr-smol_llama-100m-sft-full) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.5661 - Rewards/chosen: 0.0614 - Rewards/rejected: -0.4791 - Rewards/accuracies: 0.6810 - Rewards/margins: 0.5405 - Logps/rejected: -447.3311 - Logps/chosen: -587.6553 - Logits/rejected: -4.9351 - Logits/chosen: -5.2302 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6597 | 0.26 | 1000 | 0.5887 | -0.0788 | -0.5504 | 0.6700 | 0.4715 | -448.0441 | -589.0577 | -4.7945 | -5.0906 | | 0.5306 | 0.52 | 2000 | 0.5740 | 0.0053 | -0.5021 | 0.6840 | 0.5074 | -447.5612 | -588.2166 | -4.8585 | -5.1486 | | 0.6036 | 0.77 | 3000 | 0.5676 | 0.0550 | -0.4785 | 0.6890 | 0.5335 | -447.3253 | -587.7193 | -4.9388 | -5.2343 | ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.0 - Datasets 2.14.6 - Tokenizers 0.14.1