File size: 2,661 Bytes
e16548d
0b5bc0f
 
e16548d
0b5bc0f
 
e16548d
 
0b5bc0f
e16548d
0b5bc0f
e16548d
 
0b5bc0f
e16548d
0b5bc0f
e16548d
 
0b5bc0f
e16548d
0b5bc0f
e16548d
0b5bc0f
e16548d
0b5bc0f
e16548d
0b5bc0f
e16548d
0b5bc0f
 
 
e16548d
 
0b5bc0f
 
 
 
 
 
 
e16548d
 
 
 
 
0b5bc0f
 
e16548d
0b5bc0f
e16548d
0b5bc0f
 
 
 
 
e16548d
 
 
 
 
0b5bc0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: mit
library_name: "trl"
tags:
- DPO
- DPO
base_model: Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT
model-index:
- name: Weni/WeniGPT-2.5.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation
  results: []
language: ['pt']
---

# Weni/WeniGPT-2.5.3-Zephyr-7B-zephyr-prompt-LLM_Base_2.0.3_DPO_reduction_variation

This model is a fine-tuned version of [Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT] on the dataset Weni/LLM_Base_2.0.3_DPO with the DPO trainer. It is part of the DPO project for [Weni](https://weni.ai/).

It achieves the following results on the evaluation set:
{'eval_loss': 0.6931472420692444, 'eval_runtime': 173.8803, 'eval_samples_per_second': 2.824, 'eval_steps_per_second': 1.415, 'eval_rewards/chosen': 0.0, 'eval_rewards/rejected': 0.0, 'eval_rewards/accuracies': 0.0, 'eval_rewards/margins': 0.0, 'eval_logps/rejected': -204.64605712890625, 'eval_logps/chosen': -64.2483901977539, 'eval_logits/rejected': -2.031214475631714, 'eval_logits/chosen': -1.649370789527893, 'epoch': 0.0}

## Intended uses & limitations

This model has not been trained to avoid specific intructions. 

## Training procedure

Finetuning was done on the model Weni/WeniGPT-2.2.3-Zephyr-7B-merged-LLM_Base_2.0.3_SFT with the following prompt:

```
Question:
<|user|>{question}</s>


Chosen:
<|assistant|>{correct_ans}</s>


Rejected:
<|assistant|>{rejected_ans}</s>
```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 8
- per_device_eval_batch_size: 2
- gradient_accumulation_steps: 2
- num_gpus: 1
- total_train_batch_size: 16
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 1
- quantization_type: bitsandbytes
- LoRA: ("\n  - bits: 4\n  - use_exllama: True\n  - device_map: auto\n  - use_cache: False\n  - lora_r: 8\n  - lora_alpha: 16\n  - lora_dropout: 0.1\n  - bias: none\n  - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']\n  - task_type: CAUSAL_LM",)

### Training results

### Framework versions

- git+https://github.com/huggingface/transformers@main
- datasets==2.17.1
- peft==0.8.2
- safetensors==0.4.2
- evaluate==0.4.1
- bitsandbytes==0.42
- huggingface_hub==0.20.3
- seqeval==1.2.2
- optimum==1.17.1
- auto-gptq==0.7.0
- gpustat==1.1.1
- deepspeed==0.13.2
- wandb==0.16.3
- git+https://github.com/huggingface/trl.git@main
- git+https://github.com/huggingface/accelerate.git@main
- coloredlogs==15.0.1
- traitlets==5.14.1
- autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl

### Hardware
- Cloud provided: runpod.io