Cyrile commited on
Commit
eefdd5c
·
1 Parent(s): 5837fb8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -33,8 +33,7 @@ Sans honneur que précaire, sans liberté que provisoire, [...], et de façon qu
33
  |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|
34
  | tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 |
35
 
36
-
37
- For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been a demonstrated direct relationship between the number of tokens per word required for modeling and the predictive performance of the model [1].
38
 
39
  Dataset
40
  -------
@@ -71,7 +70,7 @@ Since the model is trained only on English and French corpora, the performance o
71
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
72
  | **human:** C'est quoi la première guerre mondiale ?<br>**bot:** La première guerre mondiale a commencé en 1914 et s'est terminée en 1918. C'était une guerre mondiale qui a impliqué de nombreux pays et de nombreuses personnes.<br>**human:** Quels pays ?<br>**bot:** La première guerre mondiale a été déclenchée par la Russie, l'Autriche-Hongrie, la France, la Grande-Bretagne, les États-Unis et l'Allemagne.<br>**human:** Comment cela s'est-il fini et quels ont été ses impacts ?<br>**bot:** La première guerre mondiale s'est terminée en 1918. Il y a eu beaucoup de morts et de destruction.<br>**human:** Combien de mort ?<br>**bot:** Il y a eu environ 100 millions de morts.<br>**human:** Tu es certain de ce chiffre, car ça me parait beaucoup, non ?<br>**bot:** Je suis sûr de ce chiffre. | **human:** What is the first world war?<br>**bot:** The first world war was fought between 1914 and 1918.<br>**human:** Who were the belligerents?<br>**bot:** The belligerents were the United Kingdom, France, Russia, Austria-Hungary, Serbia, and Montenegro.<br>**human:** How did it end and what were its impacts?<br>**bot:** The war ended with the Treaty of Versailles in 1919.<br>**human:** How many deaths?<br>**bot:** The war killed about 100 million people.<br>**human:** Are you sure about this number, because it seems like a lot to me, right?<br>**bot:** I’m not sure, but I think it’s a good estimate. | **human:** ¿Qué es la Primera Guerra Mundial?<br>**bot:** La Primera Guerra Mundial fue un conflicto armado que duró de 1914 a 1918<br>**human:** ¿En qué países participamos?<br>**bot:** La Primera Guerra Mundial fue un conflicto armado que duró de 1914 a 1918. |
73
 
74
- Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the SSA metric aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:
75
  ```
76
  [Instruction]
77
  Question : [Question]
 
33
  |:--------------:|:-------:|:-----:|:-------:|:-----:|:-----:|:---:|:------:|:------:|
34
  | tokens per word | 2.3 | 2.3 | 2 | 1.9 | 1.9 | 1.9 | 1.8 | 1.4 |
35
 
36
+ For comparison, with a specialized French tokenizer like [CamemBERT](https://huggingface.co/camembert/camembert-base) or [DistilCamemBERT](cmarkea/distilcamembert-base), we have 1.5 tokens per word. In addition to its positive impact on inference time and resource consumption, there has already been [shown that there is a direct relationship](https://arxiv.org/abs/2012.15613) between the number of tokens per word required for modeling and the predictive performance of the model.
 
37
 
38
  Dataset
39
  -------
 
70
  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
71
  | **human:** C'est quoi la première guerre mondiale ?<br>**bot:** La première guerre mondiale a commencé en 1914 et s'est terminée en 1918. C'était une guerre mondiale qui a impliqué de nombreux pays et de nombreuses personnes.<br>**human:** Quels pays ?<br>**bot:** La première guerre mondiale a été déclenchée par la Russie, l'Autriche-Hongrie, la France, la Grande-Bretagne, les États-Unis et l'Allemagne.<br>**human:** Comment cela s'est-il fini et quels ont été ses impacts ?<br>**bot:** La première guerre mondiale s'est terminée en 1918. Il y a eu beaucoup de morts et de destruction.<br>**human:** Combien de mort ?<br>**bot:** Il y a eu environ 100 millions de morts.<br>**human:** Tu es certain de ce chiffre, car ça me parait beaucoup, non ?<br>**bot:** Je suis sûr de ce chiffre. | **human:** What is the first world war?<br>**bot:** The first world war was fought between 1914 and 1918.<br>**human:** Who were the belligerents?<br>**bot:** The belligerents were the United Kingdom, France, Russia, Austria-Hungary, Serbia, and Montenegro.<br>**human:** How did it end and what were its impacts?<br>**bot:** The war ended with the Treaty of Versailles in 1919.<br>**human:** How many deaths?<br>**bot:** The war killed about 100 million people.<br>**human:** Are you sure about this number, because it seems like a lot to me, right?<br>**bot:** I’m not sure, but I think it’s a good estimate. | **human:** ¿Qué es la Primera Guerra Mundial?<br>**bot:** La Primera Guerra Mundial fue un conflicto armado que duró de 1914 a 1918<br>**human:** ¿En qué países participamos?<br>**bot:** La Primera Guerra Mundial fue un conflicto armado que duró de 1914 a 1918. |
72
 
73
+ Evaluating the performance of generative models can be challenging, and quantitative measures may not always provide the most relevant insights for this type of approach. In this regard, we conducted a qualitative performance study. It involves assessing the relevance of model responses to a pool of French questions using blind evaluation. The metric used for evaluation is the average of two criteria: response accuracy and completeness, similar to what the [SSA metric](https://arxiv.org/abs/2001.09977) aims to capture. Please note that this measure lacks rigorous scientific validity due to the limited number of questions in the test dataset and the fact that it involves only a few individuals with similar socio-demographic characteristics. The prompts take the same format for all models:
74
  ```
75
  [Instruction]
76
  Question : [Question]