|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- cerebras/SlimPajama-627B |
|
- bigcode/starcoderdata |
|
- sam-mosaic/orca-gpt4-chatml |
|
- alvations/globalvoices-en-es |
|
language: |
|
- en |
|
- es |
|
--- |
|
<div align="center"> |
|
|
|
# TinyLlama-1.1B-translate-en-es |
|
|
|
</div> |
|
|
|
This is a finetuned version with a partial dataset from alvations/globalvoices-en-es to test performance on translation task. It has been trained to translate english to spanish and viceversa with only 20k rows from the dataset. |
|
|
|
The translation is not very accurate but it shows a lot of potential. |
|
|
|
In order to use it you have to follow the chatml standard like so: |
|
--- |
|
english to spanish: |
|
``` |
|
<|im_start|>user Translate this to spanish: ```A father and son, who have been living off grid for 20 years, encounter an outsider who threatens to destroy the utopia they've built.``` |
|
<|im_start|>assistant |
|
``` |
|
This will provide the following result: |
|
``` |
|
Un padre y hijo, que han vivido sin comida desde hace 20 años, encuentran un invitado quien amenaza con destruir la utopía que ellos han creado. |
|
``` |
|
--- |
|
spanish to english: |
|
``` |
|
<|im_start|>user Traduce esto al ingles: ```España se queda sin Copilot para Windows 11: la regulación de la UE frena su despliegue en Europa.``` |
|
<|im_start|>assistant |
|
``` |
|
Which will be completed as: |
|
``` |
|
Spain is left without Copilot for Windows 11: the control of the UE has halted its deployment in Europe. |
|
``` |
|
|
|
--- |
|
The results are far from perfect but there are A LOT of room to improvement since it was finetuned with only 20k rows from the dataset (which has 355k rows) for 2 epoch. This training took only about 5 hours on a "M1 Pro" processor. |
|
|
|
The base model used is a fine-tuned model with orca dataset [acalatrava/TinyLlama-1.1B-orca-gpt4](https://huggingface.co/acalatrava/TinyLlama-1.1B-orca-gpt4) |
|
|
|
### Training |
|
- **Method**: QLORA |
|
- **Time**: 10h on a M1 Pro 32GB |
|
- **Based on**: [https://colab.research.google.com/drive/1Zmaceu65d7w4Tcd-cfnZRb6k_Tcv2b8g](https://colab.research.google.com/drive/1Zmaceu65d7w4Tcd-cfnZRb6k_Tcv2b8g) removing quantization since it's not supported on MPS |
|
|