jpacifico commited on
Commit
bdf969b
·
verified ·
1 Parent(s): 2531186

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -109,7 +109,7 @@ model-index:
109
 
110
  ### Chocolatine-14B-Instruct-DPO-v1.2
111
 
112
- DPO fine-tuned of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)
113
  using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
114
  Training in French also improves the model in English, surpassing the performances of its base model.
115
  Window context = 4k tokens
@@ -140,17 +140,20 @@ Chocolatine is the best-performing model in size 13B on the [OpenLLM Leaderboard
140
 
141
  ### MT-Bench-French
142
 
143
- Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
 
144
 
145
  ```
146
  ########## First turn ##########
147
  score
148
  model turn
149
  gpt-4o-mini 1 9.2875
 
150
  Chocolatine-14B-Instruct-4k-DPO 1 8.6375
151
  Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
152
  Phi-3.5-mini-instruct 1 8.5250
153
  Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
 
154
  Phi-3-medium-4k-instruct 1 8.2250
155
  gpt-3.5-turbo 1 8.1375
156
  Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
@@ -166,7 +169,9 @@ vigogne-2-7b-chat 1 5.6625
166
  score
167
  model turn
168
  gpt-4o-mini 2 8.912500
 
169
  Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
 
170
  Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
171
  Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
172
  Phi-3-medium-4k-instruct 2 7.750000
@@ -185,7 +190,9 @@ vigogne-2-7b-chat 2 2.775000
185
  score
186
  model
187
  gpt-4o-mini 9.100000
 
188
  Chocolatine-14B-Instruct-DPO-v1.2 8.475000
 
189
  Chocolatine-14B-Instruct-4k-DPO 8.187500
190
  Chocolatine-3B-Instruct-DPO-v1.2 8.118750
191
  Phi-3.5-mini-instruct 8.050000
@@ -240,12 +247,12 @@ print(sequences[0]['generated_text'])
240
 
241
  ### Limitations
242
 
243
- The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
244
  It does not have any moderation mechanism.
245
 
246
- - **Developed by:** Jonathan Pacifico, 2024
247
  - **Model type:** LLM
248
- - **Language(s) (NLP):** French, English
249
  - **License:** MIT
250
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
251
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)
 
109
 
110
  ### Chocolatine-14B-Instruct-DPO-v1.2
111
 
112
+ DPO fine-tuning of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)
113
  using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
114
  Training in French also improves the model in English, surpassing the performances of its base model.
115
  Window context = 4k tokens
 
140
 
141
  ### MT-Bench-French
142
 
143
+ Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
144
+ [Update 2025/01/19] New version 1.3 added
145
 
146
  ```
147
  ########## First turn ##########
148
  score
149
  model turn
150
  gpt-4o-mini 1 9.2875
151
+ Chocolatine-14B-Instruct-DPO-v1.3 1 9.0125
152
  Chocolatine-14B-Instruct-4k-DPO 1 8.6375
153
  Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
154
  Phi-3.5-mini-instruct 1 8.5250
155
  Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
156
+ phi-4 1 8.3000
157
  Phi-3-medium-4k-instruct 1 8.2250
158
  gpt-3.5-turbo 1 8.1375
159
  Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
 
169
  score
170
  model turn
171
  gpt-4o-mini 2 8.912500
172
+ Chocolatine-14B-Instruct-DPO-v1.3 2 8.762500
173
  Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
174
+ phi-4 2 8.131250
175
  Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
176
  Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
177
  Phi-3-medium-4k-instruct 2 7.750000
 
190
  score
191
  model
192
  gpt-4o-mini 9.100000
193
+ Chocolatine-14B-Instruct-DPO-v1.3 8.825000
194
  Chocolatine-14B-Instruct-DPO-v1.2 8.475000
195
+ phi-4 8.215625
196
  Chocolatine-14B-Instruct-4k-DPO 8.187500
197
  Chocolatine-3B-Instruct-DPO-v1.2 8.118750
198
  Phi-3.5-mini-instruct 8.050000
 
247
 
248
  ### Limitations
249
 
250
+ The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
251
  It does not have any moderation mechanism.
252
 
253
+ - **Developed by:** Jonathan Pacifico, 2024
254
  - **Model type:** LLM
255
+ - **Language(s) (NLP):** French, English
256
  - **License:** MIT
257
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
258
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)