license: apache-2.0 | |
language: | |
- en | |
- az | |
- sw | |
- af | |
- ar | |
- ba | |
- be | |
- bxr | |
- bg | |
- bn | |
- cv | |
- hy | |
- da | |
- de | |
- el | |
- es | |
- eu | |
- fa | |
- fi | |
- fr | |
- he | |
- hi | |
- hu | |
- kk | |
- id | |
- it | |
- ja | |
- ka | |
- ky | |
- ko | |
- lt | |
- lv | |
- mn | |
- ml | |
- os | |
- mr | |
- ms | |
- my | |
- nl | |
- ro | |
- pl | |
- pt | |
- sah | |
- ru | |
- tg | |
- sv | |
- ta | |
- te | |
- tk | |
- th | |
- tr | |
- tl | |
- tt | |
- tyv | |
- uk | |
- en | |
- ur | |
- vi | |
- uz | |
- yo | |
- zh | |
- xal | |
pipeline_tag: text-generation | |
tags: | |
- PyTorch | |
- Transformers | |
- gpt3 | |
- gpt2 | |
- Deepspeed | |
- Megatron | |
datasets: | |
- mc4 | |
- wikipedia | |
thumbnail: "https://github.com/sberbank-ai/mgpt" | |
# Multilingual GPT model | |
We introduce family of autoregressive GPT-like models with 1.3 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. | |
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism, [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) frameworks allows us to effectively parallelize the training and inference steps. Resulting models show performance on par with the recently released [XGLM](https://arxiv.org/pdf/2112.10668.pdf) models at the same time covering more languages and enhance NLP possibilities for low resource languages. | |
## Code | |
The source code for the mGPT XL model is available on [Github](https://github.com/sberbank-ai/mgpt) | |
## Paper | |
[Arxiv preprint](https://arxiv.org/user) | |
Cite us: | |
```{ | |
bibtex | |
} | |
``` | |
## Languages | |
Model includes 60 languages: (iso codes) | |
```az, sw, af, ar, ba, be, bxr, bg, bn, cv, hy, da, de, el, es, eu, fa, fi, fr, he, hi, hu, kk, id, it, ja, ka, ky, ko, lt, lv, mn, ml, os, mr, ms, my, nl, ro, pl, pt, sah, ru, tg, sv, ta, te, tk, th, tr, tl, tt, tyv, uk, en, ur, vi, uz, yo, zh, xal``` | |
## Training Data Statistics | |
- Tokens: 559B | |
<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png"> | |
"General training corpus statistics" | |
## Details | |
Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048. | |
Total training time was around n days on n GPUs for n context and few days on n GPUs for n context. | |