metadata
license: apache-2.0
datasets:
- argilla/distilabel-intel-orca-dpo-pairs
language:
- en
tags:
- distilabel
- dpo
- rlaif
- rlhf
⚗️ distilabeled OpenHermes 2.5 Mistral 7B
🫡 A Half Neural DPO of OpenHermes 2.5
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
---|---|---|---|---|---|---|---|
argilla/distilabeled-Hermes-2.5-Mistral-7B | 44.64 | 73.35 | 55.96 | 42.21 | 54.04 | 5,922 | 46% |
dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel (first experiment) | 44.27 | 73.3 | 56.26 | 42.25 | 54.02 | 7,732 | 60% |
mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42 | 0 (no DPO) | N/A |