Update README.md
Browse files
README.md
CHANGED
@@ -109,7 +109,7 @@ model-index:
|
|
109 |
|
110 |
### Chocolatine-14B-Instruct-DPO-v1.2
|
111 |
|
112 |
-
DPO fine-
|
113 |
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
|
114 |
Training in French also improves the model in English, surpassing the performances of its base model.
|
115 |
Window context = 4k tokens
|
@@ -140,17 +140,20 @@ Chocolatine is the best-performing model in size 13B on the [OpenLLM Leaderboard
|
|
140 |
|
141 |
### MT-Bench-French
|
142 |
|
143 |
-
Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
|
|
|
144 |
|
145 |
```
|
146 |
########## First turn ##########
|
147 |
score
|
148 |
model turn
|
149 |
gpt-4o-mini 1 9.2875
|
|
|
150 |
Chocolatine-14B-Instruct-4k-DPO 1 8.6375
|
151 |
Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
|
152 |
Phi-3.5-mini-instruct 1 8.5250
|
153 |
Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
|
|
|
154 |
Phi-3-medium-4k-instruct 1 8.2250
|
155 |
gpt-3.5-turbo 1 8.1375
|
156 |
Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
|
@@ -166,7 +169,9 @@ vigogne-2-7b-chat 1 5.6625
|
|
166 |
score
|
167 |
model turn
|
168 |
gpt-4o-mini 2 8.912500
|
|
|
169 |
Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
|
|
|
170 |
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
|
171 |
Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
|
172 |
Phi-3-medium-4k-instruct 2 7.750000
|
@@ -185,7 +190,9 @@ vigogne-2-7b-chat 2 2.775000
|
|
185 |
score
|
186 |
model
|
187 |
gpt-4o-mini 9.100000
|
|
|
188 |
Chocolatine-14B-Instruct-DPO-v1.2 8.475000
|
|
|
189 |
Chocolatine-14B-Instruct-4k-DPO 8.187500
|
190 |
Chocolatine-3B-Instruct-DPO-v1.2 8.118750
|
191 |
Phi-3.5-mini-instruct 8.050000
|
@@ -240,12 +247,12 @@ print(sequences[0]['generated_text'])
|
|
240 |
|
241 |
### Limitations
|
242 |
|
243 |
-
The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
|
244 |
It does not have any moderation mechanism.
|
245 |
|
246 |
-
- **Developed by:** Jonathan Pacifico, 2024
|
247 |
- **Model type:** LLM
|
248 |
-
- **Language(s) (NLP):** French, English
|
249 |
- **License:** MIT
|
250 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
251 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)
|
|
|
109 |
|
110 |
### Chocolatine-14B-Instruct-DPO-v1.2
|
111 |
|
112 |
+
DPO fine-tuning of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)
|
113 |
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
|
114 |
Training in French also improves the model in English, surpassing the performances of its base model.
|
115 |
Window context = 4k tokens
|
|
|
140 |
|
141 |
### MT-Bench-French
|
142 |
|
143 |
+
Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
|
144 |
+
[Update 2025/01/19] New version 1.3 added
|
145 |
|
146 |
```
|
147 |
########## First turn ##########
|
148 |
score
|
149 |
model turn
|
150 |
gpt-4o-mini 1 9.2875
|
151 |
+
Chocolatine-14B-Instruct-DPO-v1.3 1 9.0125
|
152 |
Chocolatine-14B-Instruct-4k-DPO 1 8.6375
|
153 |
Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
|
154 |
Phi-3.5-mini-instruct 1 8.5250
|
155 |
Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
|
156 |
+
phi-4 1 8.3000
|
157 |
Phi-3-medium-4k-instruct 1 8.2250
|
158 |
gpt-3.5-turbo 1 8.1375
|
159 |
Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
|
|
|
169 |
score
|
170 |
model turn
|
171 |
gpt-4o-mini 2 8.912500
|
172 |
+
Chocolatine-14B-Instruct-DPO-v1.3 2 8.762500
|
173 |
Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
|
174 |
+
phi-4 2 8.131250
|
175 |
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
|
176 |
Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
|
177 |
Phi-3-medium-4k-instruct 2 7.750000
|
|
|
190 |
score
|
191 |
model
|
192 |
gpt-4o-mini 9.100000
|
193 |
+
Chocolatine-14B-Instruct-DPO-v1.3 8.825000
|
194 |
Chocolatine-14B-Instruct-DPO-v1.2 8.475000
|
195 |
+
phi-4 8.215625
|
196 |
Chocolatine-14B-Instruct-4k-DPO 8.187500
|
197 |
Chocolatine-3B-Instruct-DPO-v1.2 8.118750
|
198 |
Phi-3.5-mini-instruct 8.050000
|
|
|
247 |
|
248 |
### Limitations
|
249 |
|
250 |
+
The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
|
251 |
It does not have any moderation mechanism.
|
252 |
|
253 |
+
- **Developed by:** Jonathan Pacifico, 2024
|
254 |
- **Model type:** LLM
|
255 |
+
- **Language(s) (NLP):** French, English
|
256 |
- **License:** MIT
|
257 |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
258 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)
|