gpt-imdb-jsd-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2783	0.21	500	0.3575	-1.6510	-3.6200	0.8458	1.9690	-299.8852	-251.7749	-34.0335	-35.2131
0.3254	0.42	1000	0.2845	-2.6765	-5.5357	0.8771	2.8593	-319.0428	-262.0301	-41.3238	-42.6399
0.187	0.63	1500	0.2520	-4.2045	-7.9801	0.8875	3.7756	-343.4868	-277.3105	-36.4710	-37.8971
0.2236	0.83	2000	0.1916	-3.9591	-8.0388	0.9313	4.0797	-344.0737	-274.8567	-35.8180	-37.3586
0.1544	1.04	2500	0.1671	-4.7747	-9.4384	0.9333	4.6637	-358.0689	-283.0118	-38.2421	-39.6906
0.285	1.25	3000	0.1728	-5.7913	-11.0242	0.9271	5.2329	-373.9274	-293.1786	-39.8869	-41.8088
0.3249	1.46	3500	0.1585	-5.3924	-11.0092	0.9313	5.6168	-373.7777	-289.1895	-41.4103	-43.3052
0.2288	1.67	4000	0.1544	-5.7770	-11.2642	0.9333	5.4872	-376.3274	-293.0356	-39.3995	-41.1619
0.1367	1.88	4500	0.1463	-5.6038	-11.2632	0.9312	5.6594	-376.3172	-291.3033	-38.0074	-39.7695
0.1596	2.08	5000	0.1489	-6.3796	-12.4737	0.9312	6.0941	-388.4222	-299.0610	-39.8571	-41.5072
0.035	2.29	5500	0.1413	-6.2472	-12.4489	0.9375	6.2017	-388.1746	-297.7371	-40.1165	-41.9028
0.1528	2.5	6000	0.1452	-6.7167	-13.0974	0.9354	6.3807	-394.6590	-302.4318	-39.9707	-41.8089
0.1269	2.71	6500	0.1427	-6.6508	-13.0564	0.9458	6.4056	-394.2490	-301.7733	-40.7866	-42.6209
0.2239	2.92	7000	0.1422	-6.6308	-12.9931	0.9396	6.3623	-393.6160	-301.5730	-40.9101	-42.7380