dvilasuero HF staff commited on
Commit
adb769e
·
verified ·
1 Parent(s): a93afef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -12,7 +12,7 @@ tags:
12
  ---
13
  # ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
14
 
15
- > A Half Neural DPO of OpenHermes 2.5, less is more for DPO!
16
 
17
  <div>
18
  <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
@@ -110,7 +110,7 @@ dataset = dataset.filter(
110
  not r["in_gsm8k_train"]
111
  )
112
  ```
113
- This resulted in `5,922` instead of `12,859` samples (54% reduction) and led to the following benchmark results.
114
 
115
  ## Benchmark results
116
  For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
@@ -118,21 +118,21 @@ For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find
118
  For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
119
 
120
 
121
- | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average | dpo-pairs | % original pairs |
122
- |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|----------:|-----------------:|
123
- | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** | **5,922** | **46%** |
124
- | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 | 7,732 | 60% |
125
- | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 | 12,859 | 100% |
126
- | teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42| 0 (no DPO) | N/A |
127
 
128
  > Update: we now include llm-harness results too!
129
 
130
- | Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | dpo-pairs | % original pairs |
131
- |------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|----------:|-----------------:|
132
- | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** | **5,922** | **46%** |
133
- | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 | 7,732 | 60% |
134
- | [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 | 12,859 | 100% |
135
- | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 | 0 (no DPO) | N/A |
136
 
137
  ### Training Hardware
138
 
 
12
  ---
13
  # ⚗️ distilabeled OpenHermes 2.5 Mistral 7B
14
 
15
+ > A Neural DPO of OpenHermes 2.5, high quality matters for DPO!
16
 
17
  <div>
18
  <img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/yWdvBtKKfJdpdnPiSlNb9.png">
 
110
  not r["in_gsm8k_train"]
111
  )
112
  ```
113
+ This resulted in `5,922` instead of `12,859` samples (54% reduction) and we run it for 200 steps (using around ~3.2K samples).
114
 
115
  ## Benchmark results
116
  For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).
 
118
  For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!
119
 
120
 
121
+ | Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
122
+ |-------------------------------------------------------------------------------------------------------------------|--------:|--------:|-----------:|---------:|--------:|
123
+ | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | **44.64** | **73.35** | 55.96 | 42.21 | **54.04** |
124
+ | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) (first experiment) | 44.27 | 73.3 | **56.26** | **42.25** | 54.02 |
125
+ | mlabonne/NeuralHermes-2.5-Mistral-7B (original recipe) | 43.67 | 73.24 | 55.37 | 41.76 | 53.51 |
126
+ | teknium/OpenHermes-2.5-Mistral-7B | 42.75 | 72.99 | 52.99 | 40.94 | 52.42|
127
 
128
  > Update: we now include llm-harness results too!
129
 
130
+ | Model | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
131
+ |------------------------------------------------------|-------|-----------|------|-----------:|------------|-------|
132
+ | [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) | 66.04 | **85.07** | Pending | 55.96 | **79.56** | **66.34** |
133
+ | [dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel](https://huggingface.co/dvilasuero/NeuralHermes-2.5-Mistral-7B-distilabel) | 65.36 | 84.74 | Pending | **56.26** | 79.24 | 65.13 |
134
+ | [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B) | **66.55** | 84.90 | **63.32** | 54.93 | 78.30 | 61.30 |
135
+ | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) | 64.93 | 84.18 | 63.64 | 52.24 | 78.06 | 26.08 |
136
 
137
  ### Training Hardware
138