dhmeltzer
/

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16

Safetensors

Generated from Trainer

Model card Files Files and versions Community

dhmeltzer commited on Sep 5, 2023

Commit

89f423a

1 Parent(s): 20ea7d9

End of training

Browse files

Files changed (2) hide show

README.md +18 -18
adapter_model.bin +1 -1

README.md CHANGED Viewed

@@ -14,15 +14,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged](https://huggingface.co/dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6210
-- Rewards/chosen: 0.2283
-- Rewards/rejected: -0.0798
 - Rewards/accuracies: 0.6574
-- Rewards/margins: 0.3081
-- Logps/rejected: -196.8044
-- Logps/chosen: -202.0885
-- Logits/rejected: 1.0023
-- Logits/chosen: 1.0353
 ## Model description
@@ -50,21 +50,21 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6639        | 0.3   | 55   | 0.6265          | 0.0195         | -0.2129          | 0.6462             | 0.2324          | -198.1357      | -204.1772    | 0.9958          | 1.0271        |
-| 0.6478        | 0.6   | 110  | 0.6250          | -0.1037        | -0.3755          | 0.6540             | 0.2717          | -199.7610      | -205.4090    | 1.0383          | 1.0685        |
-| 0.6447        | 0.9   | 165  | 0.6210          | 0.2283         | -0.0798          | 0.6574             | 0.3081          | -196.8044      | -202.0885    | 1.0023          | 1.0353        |
-| 0.3498        | 1.21  | 220  | 0.6755          | -0.7949        | -1.2644          | 0.6105             | 0.4695          | -208.6501      | -212.3206    | 0.7300          | 0.7380        |
-| 0.3232        | 1.51  | 275  | 0.6903          | -1.3727        | -1.7980          | 0.6261             | 0.4253          | -213.9861      | -218.0985    | 0.5489          | 0.5429        |
-| 0.2843        | 1.81  | 330  | 0.6579          | -1.4717        | -1.8726          | 0.6529             | 0.4009          | -214.7323      | -219.0889    | 0.6364          | 0.6414        |
-| 0.0723        | 2.11  | 385  | 0.7137          | -2.4041        | -2.9396          | 0.6429             | 0.5355          | -225.4021      | -228.4123    | 0.4816          | 0.4691        |
-| 0.0554        | 2.41  | 440  | 0.7740          | -3.6950        | -4.3358          | 0.6406             | 0.6407          | -239.3640      | -241.3219    | 0.4430          | 0.4275        |
-| 0.0482        | 2.71  | 495  | 0.8359          | -4.0649        | -4.7899          | 0.6350             | 0.7250          | -243.9053      | -245.0203    | 0.4267          | 0.4083        |
 ### Framework versions

 This model is a fine-tuned version of [dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged](https://huggingface.co/dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6234
+- Rewards/chosen: 0.0858
+- Rewards/rejected: -0.1898
 - Rewards/accuracies: 0.6574
+- Rewards/margins: 0.2756
+- Logps/rejected: -198.1188
+- Logps/chosen: -205.4868
+- Logits/rejected: 0.7931
+- Logits/chosen: 0.8315
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6867        | 0.1   | 19   | 0.6390          | 0.0633         | -0.1318          | 0.6451             | 0.1951          | -197.8286      | -205.5991    | 0.7774          | 0.8133        |
+| 0.6727        | 0.21  | 38   | 0.6384          | 0.0354         | -0.2285          | 0.6529             | 0.2639          | -198.3123      | -205.7386    | 0.8054          | 0.8432        |
+| 0.6577        | 0.31  | 57   | 0.6391          | -0.0114        | -0.2258          | 0.6406             | 0.2145          | -198.2988      | -205.9725    | 0.7954          | 0.8346        |
+| 0.6609        | 0.42  | 76   | 0.6344          | -0.3737        | -0.6175          | 0.6417             | 0.2438          | -200.2571      | -207.7841    | 0.7818          | 0.8194        |
+| 0.6536        | 0.52  | 95   | 0.6285          | -0.1130        | -0.3816          | 0.6652             | 0.2687          | -199.0778      | -206.4805    | 0.7958          | 0.8350        |
+| 0.654         | 0.62  | 114  | 0.6342          | 0.0007         | -0.2311          | 0.6484             | 0.2318          | -198.3250      | -205.9122    | 0.7917          | 0.8303        |
+| 0.6435        | 0.73  | 133  | 0.6258          | 0.0462         | -0.2234          | 0.6562             | 0.2696          | -198.2865      | -205.6845    | 0.7949          | 0.8332        |
+| 0.6508        | 0.83  | 152  | 0.6234          | 0.0858         | -0.1898          | 0.6574             | 0.2756          | -198.1188      | -205.4868    | 0.7931          | 0.8315        |
+| 0.6361        | 0.94  | 171  | 0.6269          | 0.1007         | -0.1655          | 0.6618             | 0.2662          | -197.9971      | -205.4121    | 0.7975          | 0.8353        |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1e8a64fc0248ce95e2634fc5b0baf417940f224965d258fd8acc3be31b3e6369
 size 639792909

 version https://git-lfs.github.com/spec/v1
+oid sha256:ec0cd5f842a60c8ebae8020a6a38e5fc3e3fb15b671b8e73be446ee1a9bfe7c2
 size 639792909