dhmeltzer
/

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16

Safetensors

Generated from Trainer

Model card Files Files and versions Community

dhmeltzer commited on Sep 5, 2023

Commit

74ea265

1 Parent(s): 2ab2de6

Model save

Browse files

Files changed (1) hide show

README.md +68 -14

README.md CHANGED Viewed

@@ -1,21 +1,75 @@
 ---
-library_name: peft
 ---
 ## Training procedure
-The following `bitsandbytes` quantization config was used during training:
-- quant_method: bitsandbytes
-- load_in_8bit: False
-- load_in_4bit: True
-- llm_int8_threshold: 6.0
-- llm_int8_skip_modules: None
-- llm_int8_enable_fp32_cpu_offload: False
-- llm_int8_has_fp16_weight: False
-- bnb_4bit_quant_type: nf4
-- bnb_4bit_use_double_quant: True
-- bnb_4bit_compute_dtype: bfloat16
-### Framework versions
-- PEFT 0.5.0

 ---
+base_model: dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged
+tags:
+- generated_from_trainer
+model-index:
+- name: llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16
+This model is a fine-tuned version of [dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged](https://huggingface.co/dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6234
+- Rewards/chosen: 0.0858
+- Rewards/rejected: -0.1898
+- Rewards/accuracies: 0.6574
+- Rewards/margins: 0.2756
+- Logps/rejected: -198.1188
+- Logps/chosen: -205.4868
+- Logits/rejected: 0.7931
+- Logits/chosen: 0.8315
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6867        | 0.1   | 19   | 0.6390          | 0.0633         | -0.1318          | 0.6451             | 0.1951          | -197.8286      | -205.5991    | 0.7774          | 0.8133        |
+| 0.6727        | 0.21  | 38   | 0.6384          | 0.0354         | -0.2285          | 0.6529             | 0.2639          | -198.3123      | -205.7386    | 0.8054          | 0.8432        |
+| 0.6577        | 0.31  | 57   | 0.6391          | -0.0114        | -0.2258          | 0.6406             | 0.2145          | -198.2988      | -205.9725    | 0.7954          | 0.8346        |
+| 0.6609        | 0.42  | 76   | 0.6344          | -0.3737        | -0.6175          | 0.6417             | 0.2438          | -200.2571      | -207.7841    | 0.7818          | 0.8194        |
+| 0.6536        | 0.52  | 95   | 0.6285          | -0.1130        | -0.3816          | 0.6652             | 0.2687          | -199.0778      | -206.4805    | 0.7958          | 0.8350        |
+| 0.654         | 0.62  | 114  | 0.6342          | 0.0007         | -0.2311          | 0.6484             | 0.2318          | -198.3250      | -205.9122    | 0.7917          | 0.8303        |
+| 0.6435        | 0.73  | 133  | 0.6258          | 0.0462         | -0.2234          | 0.6562             | 0.2696          | -198.2865      | -205.6845    | 0.7949          | 0.8332        |
+| 0.6508        | 0.83  | 152  | 0.6234          | 0.0858         | -0.1898          | 0.6574             | 0.2756          | -198.1188      | -205.4868    | 0.7931          | 0.8315        |
+| 0.6361        | 0.94  | 171  | 0.6269          | 0.1007         | -0.1655          | 0.6618             | 0.2662          | -197.9971      | -205.4121    | 0.7975          | 0.8353        |
+### Framework versions
+- Transformers 4.32.1
+- Pytorch 2.0.1+cu118
+- Datasets 2.14.4
+- Tokenizers 0.13.3