jpacifico
/

Chocolatine-14B-Instruct-DPO-v1.2

@@ -109,7 +109,7 @@ model-index:
 ### Chocolatine-14B-Instruct-DPO-v1.2
-DPO fine-tuned of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)
 using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
 Training in French also improves the model in English, surpassing the performances of its base model.
 Window context = 4k tokens
@@ -140,17 +140,20 @@ Chocolatine is the best-performing model in size 13B on the [OpenLLM Leaderboard
 ### MT-Bench-French
-Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
 ```
 ########## First turn ##########
                                              score
 model                                 turn
 gpt-4o-mini                           1     9.2875
 Chocolatine-14B-Instruct-4k-DPO       1     8.6375
 Chocolatine-14B-Instruct-DPO-v1.2     1     8.6125
 Phi-3.5-mini-instruct                 1     8.5250
 Chocolatine-3B-Instruct-DPO-v1.2      1     8.3750
 Phi-3-medium-4k-instruct              1     8.2250
 gpt-3.5-turbo                         1     8.1375
 Chocolatine-3B-Instruct-DPO-Revised   1     7.9875
@@ -166,7 +169,9 @@ vigogne-2-7b-chat                     1     5.6625
                                                score
 model                                 turn
 gpt-4o-mini                           2     8.912500
 Chocolatine-14B-Instruct-DPO-v1.2     2     8.337500
 Chocolatine-3B-Instruct-DPO-Revised   2     7.937500
 Chocolatine-3B-Instruct-DPO-v1.2      2     7.862500
 Phi-3-medium-4k-instruct              2     7.750000
@@ -185,7 +190,9 @@ vigogne-2-7b-chat                     2     2.775000
                                           score
 model
 gpt-4o-mini                            9.100000
 Chocolatine-14B-Instruct-DPO-v1.2      8.475000
 Chocolatine-14B-Instruct-4k-DPO        8.187500
 Chocolatine-3B-Instruct-DPO-v1.2       8.118750
 Phi-3.5-mini-instruct                  8.050000
@@ -240,12 +247,12 @@ print(sequences[0]['generated_text'])
 ### Limitations
-The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
 It does not have any moderation mechanism.
-- **Developed by:** Jonathan Pacifico, 2024
 - **Model type:** LLM
-- **Language(s) (NLP):** French, English
 - **License:** MIT
 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)

 ### Chocolatine-14B-Instruct-DPO-v1.2
+DPO fine-tuning of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)
 using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.
 Training in French also improves the model in English, surpassing the performances of its base model.
 Window context = 4k tokens
 ### MT-Bench-French
+Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
+[Update 2025/01/19] New version 1.3 added
 ```
 ########## First turn ##########
                                              score
 model                                 turn
 gpt-4o-mini                           1     9.2875
+Chocolatine-14B-Instruct-DPO-v1.3     1     9.0125
 Chocolatine-14B-Instruct-4k-DPO       1     8.6375
 Chocolatine-14B-Instruct-DPO-v1.2     1     8.6125
 Phi-3.5-mini-instruct                 1     8.5250
 Chocolatine-3B-Instruct-DPO-v1.2      1     8.3750
+phi-4                                 1     8.3000
 Phi-3-medium-4k-instruct              1     8.2250
 gpt-3.5-turbo                         1     8.1375
 Chocolatine-3B-Instruct-DPO-Revised   1     7.9875
                                                score
 model                                 turn
 gpt-4o-mini                           2     8.912500
+Chocolatine-14B-Instruct-DPO-v1.3     2     8.762500
 Chocolatine-14B-Instruct-DPO-v1.2     2     8.337500
+phi-4                                 2     8.131250
 Chocolatine-3B-Instruct-DPO-Revised   2     7.937500
 Chocolatine-3B-Instruct-DPO-v1.2      2     7.862500
 Phi-3-medium-4k-instruct              2     7.750000
                                           score
 model
 gpt-4o-mini                            9.100000
+Chocolatine-14B-Instruct-DPO-v1.3      8.825000
 Chocolatine-14B-Instruct-DPO-v1.2      8.475000
+phi-4                                  8.215625
 Chocolatine-14B-Instruct-4k-DPO        8.187500
 Chocolatine-3B-Instruct-DPO-v1.2       8.118750
 Phi-3.5-mini-instruct                  8.050000
 ### Limitations
+The Chocolatine model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
 It does not have any moderation mechanism.
+- **Developed by:** Jonathan Pacifico, 2024
 - **Model type:** LLM
+- **Language(s) (NLP):** French, English
 - **License:** MIT
 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jpacifico__Chocolatine-14B-Instruct-DPO-v1.2)