--- base_model: lvwerra/gpt2-imdb tags: - generated_from_trainer model-index: - name: gpt-imdb-ipo_annealing results: [] --- # gpt-imdb-ipo_annealing This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 125.6974 - Rewards/chosen: -0.0343 - Rewards/rejected: -0.1277 - Rewards/accuracies: 0.875 - Rewards/margins: 0.0934 - Logps/rejected: -267.1282 - Logps/chosen: -236.1897 - Logits/rejected: -31.3501 - Logits/chosen: -31.5916 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 24 - eval_batch_size: 24 - seed: 42 - optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 150 - training_steps: 7197 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 16.3187 | 0.21 | 500 | 34.0876 | 0.1161 | -0.1126 | 0.5292 | 0.2287 | -263.8062 | -235.1407 | -33.1877 | -33.4371 | | 5.5155 | 0.42 | 1000 | 13.0423 | -0.1485 | -0.3812 | 0.5042 | 0.2327 | -264.1273 | -235.4375 | -35.2608 | -35.4541 | | 10.2532 | 0.63 | 1500 | 18.5157 | -0.4407 | -0.5471 | 0.5458 | 0.1064 | -264.3746 | -235.8205 | -34.2230 | -34.4246 | | 6.755 | 0.83 | 2000 | 28.1593 | -0.7791 | -0.8052 | 0.5917 | 0.0261 | -264.7961 | -236.3400 | -33.6119 | -33.8069 | | 9.4126 | 1.04 | 2500 | 9.2406 | -0.8733 | -1.2564 | 0.6229 | 0.3831 | -265.6003 | -236.5962 | -31.9471 | -32.0700 | | 8.5908 | 1.25 | 3000 | 12.4967 | -0.6700 | -1.0163 | 0.6167 | 0.3462 | -265.4156 | -236.4061 | -31.6914 | -31.8443 | | 19.5217 | 1.46 | 3500 | 6.8889 | -0.0720 | -0.4689 | 0.6854 | 0.3969 | -264.5895 | -235.4041 | -32.1300 | -32.2692 | | 6.9195 | 1.67 | 4000 | 4.2435 | -0.5324 | -0.9335 | 0.7021 | 0.4012 | -265.7609 | -236.4489 | -31.8342 | -31.9606 | | 4.6993 | 1.88 | 4500 | 5.0987 | -0.2002 | -0.6179 | 0.7521 | 0.4177 | -265.3070 | -235.7907 | -31.6301 | -31.7617 | | 2.7896 | 2.08 | 5000 | 2.7344 | -0.2390 | -0.5589 | 0.7500 | 0.3199 | -265.4754 | -236.0307 | -31.9650 | -32.1009 | | 3.2262 | 2.29 | 5500 | 3.0584 | -0.1936 | -0.5168 | 0.8083 | 0.3231 | -265.8080 | -236.0606 | -31.6585 | -31.8243 | | 4.1965 | 2.5 | 6000 | 4.2350 | -0.1555 | -0.4440 | 0.8417 | 0.2884 | -266.2272 | -236.1557 | -31.6484 | -31.8344 | | 15.1482 | 2.71 | 6500 | 10.8174 | -0.0932 | -0.3244 | 0.8667 | 0.2312 | -266.7491 | -236.1454 | -31.4600 | -31.6800 | | 145.9251 | 2.92 | 7000 | 125.6974 | -0.0343 | -0.1277 | 0.875 | 0.0934 | -267.1282 | -236.1897 | -31.3501 | -31.5916 | ### Framework versions - Transformers 4.35.2 - Pytorch 2.1.1 - Datasets 2.15.0 - Tokenizers 0.15.0