eldor-fozilov
commited on
include results
Browse files
README.md
CHANGED
@@ -35,6 +35,23 @@ For details regarding the performance metrics compared to the base model, see [t
|
|
35 |
- [Azimjon Urinov](https://azimjonn.github.io/)
|
36 |
- [Khurshid Juraev](https://kjuraev.com/)
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Usage
|
39 |
|
40 |
The model can be used with frameworks:
|
|
|
35 |
- [Azimjon Urinov](https://azimjonn.github.io/)
|
36 |
- [Khurshid Juraev](https://kjuraev.com/)
|
37 |
|
38 |
+
📊 **Performance Comparison:**
|
39 |
+
| Model Name | BLEU Uz-En (One-shot) | BLEU En-Uz (One-shot) | COMET (Uz-En) | COMET (Ez-Un) | Uzbek Sentiment Analysis | Uzbek News Classification | MMLU (English) (5-shot) |
|
40 |
+
|------------------------|------------------------------------|------------------------------------|--------------------------|----------------|----------------|-------------|-------------------|
|
41 |
+
| **Llama-3.1 8B Instruct** | 23.74 | 6.72 | 84.30 | 82.70 | 68.96 | 55.41 | 65.77
|
42 |
+
| **Llama-3.1 8B Instruct Uz** | 27.42 | 11.58 | 85.63 | 86.53 | 82.42 | 60.84 | 62.78
|
43 |
+
| **Mistral 7B Instruct** | 7.47 | 0.67 | 68.14 | 45.58 | 62.02 | 47.52 | 61.07
|
44 |
+
| **Mistral 7B Instruct Uz** | 29.39 | 16.77 | 86.91 |88.75 | 79.13 | 59.38 | 55.72
|
45 |
+
| **Mistral Nemo Instruct** | 25.68 | 9.79 | 85.56 | 85.04 | 72.47 | 49.24 |67.62
|
46 |
+
| **Mistral Nemo Instruct Uz** | 30.49 | 15.52 | 87.04 | 88.01 | 82.05 | 58.2 | 67.36
|
47 |
+
| **Google Translate** | 41.18 | 22.98 | 89.16 | 90.67 | — | — | — |
|
48 |
+
|
49 |
+
The results show that Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets, sentiment analysis and news classification in Uzbek language.
|
50 |
+
Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, the finetuned models did not show significant decline. (The base Llama model’s MMLU score differs from the official score due to our evaluation method. Refer to the links below to see evaluation details.)
|
51 |
+
|
52 |
+
Looking ahead, these models are just **early versions**. We are actively working on further improving our data curation and fine-tuning method to provide even better results in the near future. In addition, we will scale up the dataset size both for continual-pretraining and instruction-tuning, and also customize other strong open-source LLMs for Uzbek language.
|
53 |
+
We’re eager to see how these models will be used by our Uzbek 🇺🇿 community and look forward to continuing this work. 🚀
|
54 |
+
|
55 |
## Usage
|
56 |
|
57 |
The model can be used with frameworks:
|