ai-forever
/

mGPT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mGPT / README.md

ai-forever's picture

Update README.md

666ef9e almost 3 years ago

|

2.26 kB

	---
	license: apache-2.0
	language:
	- en
	- az
	- sw
	- af
	- ar
	- ba
	- be
	- bxr
	- bg
	- bn
	- cv
	- hy
	- da
	- de
	- el
	- es
	- eu
	- fa
	- fi
	- fr
	- he
	- hi
	- hu
	- kk
	- id
	- it
	- ja
	- ka
	- ky
	- ko
	- lt
	- lv
	- mn
	- ml
	- os
	- mr
	- ms
	- my
	- nl
	- ro
	- pl
	- pt
	- sah
	- ru
	- tg
	- sv
	- ta
	- te
	- tk
	- th
	- tr
	- tl
	- tt
	- tyv
	- uk
	- en
	- ur
	- vi
	- uz
	- yo
	- zh
	- xal
	pipeline_tag: text-generation
	tags:
	- PyTorch
	- Transformers
	- gpt3
	- gpt2
	- Deepspeed
	- Megatron
	datasets:
	- mc4
	- wikipedia
	thumbnail: "https://github.com/sberbank-ai/mgpt"
	---

	# Multilingual GPT model

	We introduce family of autoregressive GPT-like models with 1.3 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus.

	We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism, [Deepspeed](https://github.com/microsoft/DeepSpeed) and [Megatron](https://github.com/NVIDIA/Megatron-LM) frameworks allows us to effectively parallelize the training and inference steps. Resulting models show performance on par with the recently released [XGLM](https://arxiv.org/pdf/2112.10668.pdf) models at the same time covering more languages and enhance NLP possibilities for low resource languages.

	## Code
	The source code for the mGPT XL model is available on [Github](https://github.com/sberbank-ai/mgpt)

	## Paper
	[Arxiv preprint](https://arxiv.org/user)

	Cite us:
	```{
	bibtex
	}
	```

	## Languages

	Model includes 60 languages: (iso codes)
	```az, sw, af, ar, ba, be, bxr, bg, bn, cv, hy, da, de, el, es, eu, fa, fi, fr, he, hi, hu, kk, id, it, ja, ka, ky, ko, lt, lv, mn, ml, os, mr, ms, my, nl, ro, pl, pt, sah, ru, tg, sv, ta, te, tk, th, tr, tl, tt, tyv, uk, en, ur, vi, uz, yo, zh, xal```

	## Training Data Statistics

	- Tokens: 559B

	<img style="text-align:center; display:block;" src="https://huggingface.co/sberbank-ai/mGPT/resolve/main/stats.png">
	"General training corpus statistics"


	## Details
	Model was trained with sequence length 1024 using transformers lib by [SberDevices](https://sberdevices.ru/) team on 80B tokens for 3 epochs. After that model was finetuned 1 epoch with sequence length 2048.

	Total training time was around n days on n GPUs for n context and few days on n GPUs for n context.