gpt-imdb-fkl-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6613	0.21	500	1870813158870725165056.0000	0.4663	0.2817	0.5937	0.1846	-260.8683	-230.6020	-34.4235	-34.9404
0.5684	0.42	1000	147.6206	0.5625	0.2263	0.6708	0.3362	-261.4219	-229.6398	-32.2534	-32.7960
0.5548	0.63	1500	387.0789	0.7744	0.4004	0.6417	0.3740	-259.6809	-227.5212	-35.6332	-36.0763
0.7404	0.83	2000	306766.0	0.6319	0.1306	0.6792	0.5013	-262.3793	-228.9465	-35.7930	-36.3250
0.3854	1.04	2500	104512616.0	0.3906	-0.2340	0.7354	0.6245	-266.0248	-231.3594	-37.8272	-38.3586
0.5825	1.25	3000	6146.4980	0.6931	0.1933	0.7063	0.4999	-261.7526	-228.3339	-36.5475	-37.0416
2792.03	1.46	3500	5439941120.0	0.4414	-0.1668	0.7271	0.6082	-265.3533	-230.8516	-37.3611	-37.9083
11.3378	1.67	4000	1371221.625	0.5757	-0.0709	0.7438	0.6465	-264.3939	-229.5085	-36.9220	-37.3953
1.9493	1.88	4500	181183.6875	0.5196	-0.0750	0.7437	0.5947	-264.4354	-230.0688	-36.8487	-37.2339
1.4785	2.08	5000	18712162.0	0.3104	-0.3569	0.7750	0.6673	-267.2543	-232.1608	-35.5673	-35.9495
0.4516	2.29	5500	3858633.25	0.3507	-0.2764	0.7604	0.6272	-266.4495	-231.7578	-35.4563	-35.8284
0.3984	2.5	6000	61627688.0	0.2498	-0.4039	0.7792	0.6537	-267.7244	-232.7677	-35.1970	-35.5582
93.8127	2.71	6500	67355640.0	0.2917	-0.3600	0.7708	0.6517	-267.2854	-232.3483	-35.4841	-35.8434
4472.7729	2.92	7000	43157476.0	0.2870	-0.3647	0.7750	0.6517	-267.3319	-232.3951	-35.4871	-35.8462