dvilasuero's picture
dvilasuero HF staff
Update README.md
70922ab
|
raw
history blame
1.63 kB
metadata
license: apache-2.0
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
language:
  - en
tags:
  - distilabel
  - dpo
  - rlaif
  - rlhf

⚗️ distilabeled OpenHermes 2.5 Mistral 7B

🫡 A Half Neural DPO of OpenHermes 2.5

Model AGIEval GPT4All TruthfulQA Bigbench Average dpo-pairs % original pairs
argilla/distilabeled-Hermes-2.5-Mistral-7B 44.64 73.35 55.96 42.21 54.04 5,922 46%
dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel (first experiment) 44.27 73.3 56.26 42.25 54.02 7,732 60%
mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) 43.67 73.24 55.37 41.76 53.51 12,859 100%
teknium/OpenHermes-2.5-Mistral-7B 42.75 72.99 52.99 40.94 52.42 0 (no DPO) N/A