|
--- |
|
language: |
|
- multilingual |
|
- ace |
|
- afr |
|
- als |
|
- amh |
|
- ang |
|
- ara |
|
- arg |
|
- arz |
|
- asm |
|
- ast |
|
- ava |
|
- aym |
|
- azb |
|
- aze |
|
- bak |
|
- bar |
|
- bcl |
|
- bel |
|
- ben |
|
- bho |
|
- bjn |
|
- bod |
|
- bos |
|
- bpy |
|
- bre |
|
- bul |
|
- bxr |
|
- cat |
|
- cbk |
|
- cdo |
|
- ceb |
|
- ces |
|
- che |
|
- chr |
|
- chv |
|
- ckb |
|
- cor |
|
- cos |
|
- crh |
|
- csb |
|
- cym |
|
- dan |
|
- deu |
|
- diq |
|
- div |
|
- dsb |
|
- dty |
|
- egl |
|
- ell |
|
- eng |
|
- epo |
|
- est |
|
- eus |
|
- ext |
|
- fao |
|
- fas |
|
- fin |
|
- fra |
|
- frp |
|
- fry |
|
- fur |
|
- gag |
|
- gla |
|
- gle |
|
- glg |
|
- glk |
|
- glv |
|
- grn |
|
- guj |
|
- hak |
|
- hat |
|
- hau |
|
- hbs |
|
- heb |
|
- hif |
|
- hin |
|
- hrv |
|
- hsb |
|
- hun |
|
- hye |
|
- ibo |
|
- ido |
|
- ile |
|
- ilo |
|
- ina |
|
- ind |
|
- isl |
|
- ita |
|
- jam |
|
- jav |
|
- jbo |
|
- jpn |
|
- kaa |
|
- kab |
|
- kan |
|
- kat |
|
- kaz |
|
- kbd |
|
- khm |
|
- kin |
|
- kir |
|
- koi |
|
- kok |
|
- kom |
|
- kor |
|
- krc |
|
- ksh |
|
- kur |
|
- lad |
|
- lao |
|
- lat |
|
- lav |
|
- lez |
|
- lij |
|
- lim |
|
- lin |
|
- lit |
|
- lmo |
|
- lrc |
|
- ltg |
|
- ltz |
|
- lug |
|
- lzh |
|
- mai |
|
- mal |
|
- mar |
|
- mdf |
|
- mhr |
|
- min |
|
- mkd |
|
- mlg |
|
- mlt |
|
- nan |
|
- mon |
|
- mri |
|
- mrj |
|
- msa |
|
- mwl |
|
- mya |
|
- myv |
|
- mzn |
|
- nap |
|
- nav |
|
- nci |
|
- nds |
|
- nep |
|
- new |
|
- nld |
|
- nno |
|
- nob |
|
- nrm |
|
- nso |
|
- oci |
|
- olo |
|
- ori |
|
- orm |
|
- oss |
|
- pag |
|
- pam |
|
- pan |
|
- pap |
|
- pcd |
|
- pdc |
|
- pfl |
|
- pnb |
|
- pol |
|
- por |
|
- pus |
|
- que |
|
- roh |
|
- ron |
|
- rue |
|
- rup |
|
- rus |
|
- sah |
|
- san |
|
- scn |
|
- sco |
|
- sgs |
|
- sin |
|
- slk |
|
- slv |
|
- sme |
|
- sna |
|
- snd |
|
- som |
|
- spa |
|
- sqi |
|
- srd |
|
- srn |
|
- srp |
|
- stq |
|
- sun |
|
- swa |
|
- swe |
|
- szl |
|
- tam |
|
- tat |
|
- tcy |
|
- tel |
|
- tet |
|
- tgk |
|
- tgl |
|
- tha |
|
- ton |
|
- tsn |
|
- tuk |
|
- tur |
|
- tyv |
|
- udm |
|
- uig |
|
- ukr |
|
- urd |
|
- uzb |
|
- vec |
|
- vep |
|
- vie |
|
- vls |
|
- vol |
|
- vro |
|
- war |
|
- wln |
|
- wol |
|
- wuu |
|
- xho |
|
- xmf |
|
- yid |
|
- yor |
|
- zea |
|
- zho |
|
language_bcp47: |
|
- be-tarask |
|
- map-bms |
|
- nds-nl |
|
- roa-tara |
|
- zh-yue |
|
license: apache-2.0 |
|
datasets: |
|
- wili_2018 |
|
--- |
|
|
|
# Zabanshenas - Language Detector |
|
|
|
Zabanshenas is a Transformer-based solution for identifying the most likely language of a written document/text. Zabanshenas is a Persian word that has two meanings: |
|
|
|
- A person who studies linguistics. |
|
- A way to identify the type of written language. |
|
|
|
|
|
## How to use |
|
|
|
Follow [Zabanshenas repo](https://github.com/m3hrdadfi/zabanshenas) for more information! |
|
|
|
|
|
## Evaluation |
|
The following tables summarize the scores obtained by model overall and per each class. |
|
|
|
|
|
### By Paragraph |
|
|
|
| language | precision | recall | f1-score | |
|
|:--------------------------------------:|:---------:|:--------:|:--------:| |
|
| Achinese (ace) | 1.000000 | 0.982143 | 0.990991 | |
|
| Afrikaans (afr) | 1.000000 | 1.000000 | 1.000000 | |
|
| Alemannic German (als) | 1.000000 | 0.946429 | 0.972477 | |
|
| Amharic (amh) | 1.000000 | 0.982143 | 0.990991 | |
|
| Old English (ang) | 0.981818 | 0.964286 | 0.972973 | |
|
| Arabic (ara) | 0.846154 | 0.982143 | 0.909091 | |
|
| Aragonese (arg) | 1.000000 | 1.000000 | 1.000000 | |
|
| Egyptian Arabic (arz) | 0.979592 | 0.857143 | 0.914286 | |
|
| Assamese (asm) | 0.981818 | 0.964286 | 0.972973 | |
|
| Asturian (ast) | 0.964912 | 0.982143 | 0.973451 | |
|
| Avar (ava) | 0.941176 | 0.905660 | 0.923077 | |
|
| Aymara (aym) | 0.964912 | 0.982143 | 0.973451 | |
|
| South Azerbaijani (azb) | 0.965517 | 1.000000 | 0.982456 | |
|
| Azerbaijani (aze) | 1.000000 | 1.000000 | 1.000000 | |
|
| Bashkir (bak) | 1.000000 | 0.978261 | 0.989011 | |
|
| Bavarian (bar) | 0.843750 | 0.964286 | 0.900000 | |
|
| Central Bikol (bcl) | 1.000000 | 0.982143 | 0.990991 | |
|
| Belarusian (Taraschkewiza) (be-tarask) | 1.000000 | 0.875000 | 0.933333 | |
|
| Belarusian (bel) | 0.870968 | 0.964286 | 0.915254 | |
|
| Bengali (ben) | 0.982143 | 0.982143 | 0.982143 | |
|
| Bhojpuri (bho) | 1.000000 | 0.928571 | 0.962963 | |
|
| Banjar (bjn) | 0.981132 | 0.945455 | 0.962963 | |
|
| Tibetan (bod) | 1.000000 | 0.982143 | 0.990991 | |
|
| Bosnian (bos) | 0.552632 | 0.375000 | 0.446809 | |
|
| Bishnupriya (bpy) | 1.000000 | 0.982143 | 0.990991 | |
|
| Breton (bre) | 1.000000 | 0.964286 | 0.981818 | |
|
| Bulgarian (bul) | 1.000000 | 0.964286 | 0.981818 | |
|
| Buryat (bxr) | 0.946429 | 0.946429 | 0.946429 | |
|
| Catalan (cat) | 0.982143 | 0.982143 | 0.982143 | |
|
| Chavacano (cbk) | 0.914894 | 0.767857 | 0.834951 | |
|
| Min Dong (cdo) | 1.000000 | 0.982143 | 0.990991 | |
|
| Cebuano (ceb) | 1.000000 | 1.000000 | 1.000000 | |
|
| Czech (ces) | 1.000000 | 1.000000 | 1.000000 | |
|
| Chechen (che) | 1.000000 | 1.000000 | 1.000000 | |
|
| Cherokee (chr) | 1.000000 | 0.963636 | 0.981481 | |
|
| Chuvash (chv) | 0.938776 | 0.958333 | 0.948454 | |
|
| Central Kurdish (ckb) | 1.000000 | 1.000000 | 1.000000 | |
|
| Cornish (cor) | 1.000000 | 1.000000 | 1.000000 | |
|
| Corsican (cos) | 1.000000 | 0.982143 | 0.990991 | |
|
| Crimean Tatar (crh) | 1.000000 | 0.946429 | 0.972477 | |
|
| Kashubian (csb) | 1.000000 | 0.963636 | 0.981481 | |
|
| Welsh (cym) | 1.000000 | 1.000000 | 1.000000 | |
|
| Danish (dan) | 1.000000 | 1.000000 | 1.000000 | |
|
| German (deu) | 0.828125 | 0.946429 | 0.883333 | |
|
| Dimli (diq) | 0.964912 | 0.982143 | 0.973451 | |
|
| Dhivehi (div) | 1.000000 | 1.000000 | 1.000000 | |
|
| Lower Sorbian (dsb) | 1.000000 | 0.982143 | 0.990991 | |
|
| Doteli (dty) | 0.940000 | 0.854545 | 0.895238 | |
|
| Emilian (egl) | 1.000000 | 0.928571 | 0.962963 | |
|
| Modern Greek (ell) | 1.000000 | 1.000000 | 1.000000 | |
|
| English (eng) | 0.588889 | 0.946429 | 0.726027 | |
|
| Esperanto (epo) | 1.000000 | 0.982143 | 0.990991 | |
|
| Estonian (est) | 0.963636 | 0.946429 | 0.954955 | |
|
| Basque (eus) | 1.000000 | 0.982143 | 0.990991 | |
|
| Extremaduran (ext) | 0.982143 | 0.982143 | 0.982143 | |
|
| Faroese (fao) | 1.000000 | 1.000000 | 1.000000 | |
|
| Persian (fas) | 0.948276 | 0.982143 | 0.964912 | |
|
| Finnish (fin) | 1.000000 | 1.000000 | 1.000000 | |
|
| French (fra) | 0.710145 | 0.875000 | 0.784000 | |
|
| Arpitan (frp) | 1.000000 | 0.946429 | 0.972477 | |
|
| Western Frisian (fry) | 0.982143 | 0.982143 | 0.982143 | |
|
| Friulian (fur) | 1.000000 | 0.982143 | 0.990991 | |
|
| Gagauz (gag) | 0.981132 | 0.945455 | 0.962963 | |
|
| Scottish Gaelic (gla) | 0.982143 | 0.982143 | 0.982143 | |
|
| Irish (gle) | 0.949153 | 1.000000 | 0.973913 | |
|
| Galician (glg) | 1.000000 | 1.000000 | 1.000000 | |
|
| Gilaki (glk) | 0.981132 | 0.945455 | 0.962963 | |
|
| Manx (glv) | 1.000000 | 1.000000 | 1.000000 | |
|
| Guarani (grn) | 1.000000 | 0.964286 | 0.981818 | |
|
| Gujarati (guj) | 1.000000 | 0.982143 | 0.990991 | |
|
| Hakka Chinese (hak) | 0.981818 | 0.964286 | 0.972973 | |
|
| Haitian Creole (hat) | 1.000000 | 1.000000 | 1.000000 | |
|
| Hausa (hau) | 1.000000 | 0.945455 | 0.971963 | |
|
| Serbo-Croatian (hbs) | 0.448276 | 0.464286 | 0.456140 | |
|
| Hebrew (heb) | 1.000000 | 0.982143 | 0.990991 | |
|
| Fiji Hindi (hif) | 0.890909 | 0.890909 | 0.890909 | |
|
| Hindi (hin) | 0.981481 | 0.946429 | 0.963636 | |
|
| Croatian (hrv) | 0.500000 | 0.636364 | 0.560000 | |
|
| Upper Sorbian (hsb) | 0.955556 | 1.000000 | 0.977273 | |
|
| Hungarian (hun) | 1.000000 | 1.000000 | 1.000000 | |
|
| Armenian (hye) | 1.000000 | 0.981818 | 0.990826 | |
|
| Igbo (ibo) | 0.918033 | 1.000000 | 0.957265 | |
|
| Ido (ido) | 1.000000 | 1.000000 | 1.000000 | |
|
| Interlingue (ile) | 1.000000 | 0.962264 | 0.980769 | |
|
| Iloko (ilo) | 0.947368 | 0.964286 | 0.955752 | |
|
| Interlingua (ina) | 1.000000 | 1.000000 | 1.000000 | |
|
| Indonesian (ind) | 0.761905 | 0.872727 | 0.813559 | |
|
| Icelandic (isl) | 1.000000 | 1.000000 | 1.000000 | |
|
| Italian (ita) | 0.861538 | 1.000000 | 0.925620 | |
|
| Jamaican Patois (jam) | 1.000000 | 0.946429 | 0.972477 | |
|
| Javanese (jav) | 0.964912 | 0.982143 | 0.973451 | |
|
| Lojban (jbo) | 1.000000 | 1.000000 | 1.000000 | |
|
| Japanese (jpn) | 1.000000 | 1.000000 | 1.000000 | |
|
| Karakalpak (kaa) | 0.965517 | 1.000000 | 0.982456 | |
|
| Kabyle (kab) | 1.000000 | 0.964286 | 0.981818 | |
|
| Kannada (kan) | 0.982143 | 0.982143 | 0.982143 | |
|
| Georgian (kat) | 1.000000 | 0.964286 | 0.981818 | |
|
| Kazakh (kaz) | 0.980769 | 0.980769 | 0.980769 | |
|
| Kabardian (kbd) | 1.000000 | 0.982143 | 0.990991 | |
|
| Central Khmer (khm) | 0.960784 | 0.875000 | 0.915888 | |
|
| Kinyarwanda (kin) | 0.981132 | 0.928571 | 0.954128 | |
|
| Kirghiz (kir) | 1.000000 | 1.000000 | 1.000000 | |
|
| Komi-Permyak (koi) | 0.962264 | 0.910714 | 0.935780 | |
|
| Konkani (kok) | 0.964286 | 0.981818 | 0.972973 | |
|
| Komi (kom) | 1.000000 | 0.962264 | 0.980769 | |
|
| Korean (kor) | 1.000000 | 1.000000 | 1.000000 | |
|
| Karachay-Balkar (krc) | 1.000000 | 0.982143 | 0.990991 | |
|
| Ripuarisch (ksh) | 1.000000 | 0.964286 | 0.981818 | |
|
| Kurdish (kur) | 1.000000 | 0.964286 | 0.981818 | |
|
| Ladino (lad) | 1.000000 | 1.000000 | 1.000000 | |
|
| Lao (lao) | 0.961538 | 0.909091 | 0.934579 | |
|
| Latin (lat) | 0.877193 | 0.943396 | 0.909091 | |
|
| Latvian (lav) | 0.963636 | 0.946429 | 0.954955 | |
|
| Lezghian (lez) | 1.000000 | 0.964286 | 0.981818 | |
|
| Ligurian (lij) | 1.000000 | 0.964286 | 0.981818 | |
|
| Limburgan (lim) | 0.938776 | 1.000000 | 0.968421 | |
|
| Lingala (lin) | 0.980769 | 0.927273 | 0.953271 | |
|
| Lithuanian (lit) | 0.982456 | 1.000000 | 0.991150 | |
|
| Lombard (lmo) | 1.000000 | 1.000000 | 1.000000 | |
|
| Northern Luri (lrc) | 1.000000 | 0.928571 | 0.962963 | |
|
| Latgalian (ltg) | 1.000000 | 0.982143 | 0.990991 | |
|
| Luxembourgish (ltz) | 0.949153 | 1.000000 | 0.973913 | |
|
| Luganda (lug) | 1.000000 | 1.000000 | 1.000000 | |
|
| Literary Chinese (lzh) | 1.000000 | 1.000000 | 1.000000 | |
|
| Maithili (mai) | 0.931034 | 0.964286 | 0.947368 | |
|
| Malayalam (mal) | 1.000000 | 0.982143 | 0.990991 | |
|
| Banyumasan (map-bms) | 0.977778 | 0.785714 | 0.871287 | |
|
| Marathi (mar) | 0.949153 | 1.000000 | 0.973913 | |
|
| Moksha (mdf) | 0.980000 | 0.890909 | 0.933333 | |
|
| Eastern Mari (mhr) | 0.981818 | 0.964286 | 0.972973 | |
|
| Minangkabau (min) | 1.000000 | 1.000000 | 1.000000 | |
|
| Macedonian (mkd) | 1.000000 | 0.981818 | 0.990826 | |
|
| Malagasy (mlg) | 0.981132 | 1.000000 | 0.990476 | |
|
| Maltese (mlt) | 0.982456 | 1.000000 | 0.991150 | |
|
| Min Nan Chinese (nan) | 1.000000 | 1.000000 | 1.000000 | |
|
| Mongolian (mon) | 1.000000 | 0.981818 | 0.990826 | |
|
| Maori (mri) | 1.000000 | 1.000000 | 1.000000 | |
|
| Western Mari (mrj) | 0.982456 | 1.000000 | 0.991150 | |
|
| Malay (msa) | 0.862069 | 0.892857 | 0.877193 | |
|
| Mirandese (mwl) | 1.000000 | 0.982143 | 0.990991 | |
|
| Burmese (mya) | 1.000000 | 1.000000 | 1.000000 | |
|
| Erzya (myv) | 0.818182 | 0.964286 | 0.885246 | |
|
| Mazanderani (mzn) | 0.981481 | 1.000000 | 0.990654 | |
|
| Neapolitan (nap) | 1.000000 | 0.981818 | 0.990826 | |
|
| Navajo (nav) | 1.000000 | 1.000000 | 1.000000 | |
|
| Classical Nahuatl (nci) | 0.981481 | 0.946429 | 0.963636 | |
|
| Low German (nds) | 0.982143 | 0.982143 | 0.982143 | |
|
| West Low German (nds-nl) | 1.000000 | 1.000000 | 1.000000 | |
|
| Nepali (macrolanguage) (nep) | 0.881356 | 0.928571 | 0.904348 | |
|
| Newari (new) | 1.000000 | 0.909091 | 0.952381 | |
|
| Dutch (nld) | 0.982143 | 0.982143 | 0.982143 | |
|
| Norwegian Nynorsk (nno) | 1.000000 | 1.000000 | 1.000000 | |
|
| Bokmål (nob) | 1.000000 | 1.000000 | 1.000000 | |
|
| Narom (nrm) | 0.981818 | 0.964286 | 0.972973 | |
|
| Northern Sotho (nso) | 1.000000 | 1.000000 | 1.000000 | |
|
| Occitan (oci) | 0.903846 | 0.839286 | 0.870370 | |
|
| Livvi-Karelian (olo) | 0.982456 | 1.000000 | 0.991150 | |
|
| Oriya (ori) | 0.964912 | 0.982143 | 0.973451 | |
|
| Oromo (orm) | 0.982143 | 0.982143 | 0.982143 | |
|
| Ossetian (oss) | 0.982143 | 1.000000 | 0.990991 | |
|
| Pangasinan (pag) | 0.980000 | 0.875000 | 0.924528 | |
|
| Pampanga (pam) | 0.928571 | 0.896552 | 0.912281 | |
|
| Panjabi (pan) | 1.000000 | 1.000000 | 1.000000 | |
|
| Papiamento (pap) | 1.000000 | 0.964286 | 0.981818 | |
|
| Picard (pcd) | 0.849057 | 0.849057 | 0.849057 | |
|
| Pennsylvania German (pdc) | 0.854839 | 0.946429 | 0.898305 | |
|
| Palatine German (pfl) | 0.946429 | 0.946429 | 0.946429 | |
|
| Western Panjabi (pnb) | 0.981132 | 0.962963 | 0.971963 | |
|
| Polish (pol) | 0.933333 | 1.000000 | 0.965517 | |
|
| Portuguese (por) | 0.774648 | 0.982143 | 0.866142 | |
|
| Pushto (pus) | 1.000000 | 0.910714 | 0.953271 | |
|
| Quechua (que) | 0.962963 | 0.928571 | 0.945455 | |
|
| Tarantino dialect (roa-tara) | 1.000000 | 0.964286 | 0.981818 | |
|
| Romansh (roh) | 1.000000 | 0.928571 | 0.962963 | |
|
| Romanian (ron) | 0.965517 | 1.000000 | 0.982456 | |
|
| Rusyn (rue) | 0.946429 | 0.946429 | 0.946429 | |
|
| Aromanian (rup) | 0.962963 | 0.928571 | 0.945455 | |
|
| Russian (rus) | 0.859375 | 0.982143 | 0.916667 | |
|
| Yakut (sah) | 1.000000 | 0.982143 | 0.990991 | |
|
| Sanskrit (san) | 0.982143 | 0.982143 | 0.982143 | |
|
| Sicilian (scn) | 1.000000 | 1.000000 | 1.000000 | |
|
| Scots (sco) | 0.982143 | 0.982143 | 0.982143 | |
|
| Samogitian (sgs) | 1.000000 | 0.982143 | 0.990991 | |
|
| Sinhala (sin) | 0.964912 | 0.982143 | 0.973451 | |
|
| Slovak (slk) | 1.000000 | 0.982143 | 0.990991 | |
|
| Slovene (slv) | 1.000000 | 0.981818 | 0.990826 | |
|
| Northern Sami (sme) | 0.962264 | 0.962264 | 0.962264 | |
|
| Shona (sna) | 0.933333 | 1.000000 | 0.965517 | |
|
| Sindhi (snd) | 1.000000 | 1.000000 | 1.000000 | |
|
| Somali (som) | 0.948276 | 1.000000 | 0.973451 | |
|
| Spanish (spa) | 0.739130 | 0.910714 | 0.816000 | |
|
| Albanian (sqi) | 0.982143 | 0.982143 | 0.982143 | |
|
| Sardinian (srd) | 1.000000 | 0.982143 | 0.990991 | |
|
| Sranan (srn) | 1.000000 | 1.000000 | 1.000000 | |
|
| Serbian (srp) | 1.000000 | 0.946429 | 0.972477 | |
|
| Saterfriesisch (stq) | 1.000000 | 0.964286 | 0.981818 | |
|
| Sundanese (sun) | 1.000000 | 0.977273 | 0.988506 | |
|
| Swahili (macrolanguage) (swa) | 1.000000 | 1.000000 | 1.000000 | |
|
| Swedish (swe) | 1.000000 | 1.000000 | 1.000000 | |
|
| Silesian (szl) | 1.000000 | 0.981481 | 0.990654 | |
|
| Tamil (tam) | 0.982143 | 1.000000 | 0.990991 | |
|
| Tatar (tat) | 1.000000 | 1.000000 | 1.000000 | |
|
| Tulu (tcy) | 0.982456 | 1.000000 | 0.991150 | |
|
| Telugu (tel) | 1.000000 | 0.920000 | 0.958333 | |
|
| Tetum (tet) | 1.000000 | 0.964286 | 0.981818 | |
|
| Tajik (tgk) | 1.000000 | 1.000000 | 1.000000 | |
|
| Tagalog (tgl) | 1.000000 | 1.000000 | 1.000000 | |
|
| Thai (tha) | 0.932203 | 0.982143 | 0.956522 | |
|
| Tongan (ton) | 1.000000 | 0.964286 | 0.981818 | |
|
| Tswana (tsn) | 1.000000 | 1.000000 | 1.000000 | |
|
| Turkmen (tuk) | 1.000000 | 0.982143 | 0.990991 | |
|
| Turkish (tur) | 0.901639 | 0.982143 | 0.940171 | |
|
| Tuvan (tyv) | 1.000000 | 0.964286 | 0.981818 | |
|
| Udmurt (udm) | 1.000000 | 0.982143 | 0.990991 | |
|
| Uighur (uig) | 1.000000 | 0.982143 | 0.990991 | |
|
| Ukrainian (ukr) | 0.963636 | 0.946429 | 0.954955 | |
|
| Urdu (urd) | 1.000000 | 0.982143 | 0.990991 | |
|
| Uzbek (uzb) | 1.000000 | 1.000000 | 1.000000 | |
|
| Venetian (vec) | 1.000000 | 0.982143 | 0.990991 | |
|
| Veps (vep) | 0.982456 | 1.000000 | 0.991150 | |
|
| Vietnamese (vie) | 0.964912 | 0.982143 | 0.973451 | |
|
| Vlaams (vls) | 1.000000 | 0.982143 | 0.990991 | |
|
| Volapük (vol) | 1.000000 | 1.000000 | 1.000000 | |
|
| Võro (vro) | 0.964286 | 0.964286 | 0.964286 | |
|
| Waray (war) | 1.000000 | 0.982143 | 0.990991 | |
|
| Walloon (wln) | 1.000000 | 1.000000 | 1.000000 | |
|
| Wolof (wol) | 0.981481 | 0.963636 | 0.972477 | |
|
| Wu Chinese (wuu) | 0.981481 | 0.946429 | 0.963636 | |
|
| Xhosa (xho) | 1.000000 | 0.964286 | 0.981818 | |
|
| Mingrelian (xmf) | 1.000000 | 0.964286 | 0.981818 | |
|
| Yiddish (yid) | 1.000000 | 1.000000 | 1.000000 | |
|
| Yoruba (yor) | 0.964912 | 0.982143 | 0.973451 | |
|
| Zeeuws (zea) | 1.000000 | 0.982143 | 0.990991 | |
|
| Cantonese (zh-yue) | 0.981481 | 0.946429 | 0.963636 | |
|
| Standard Chinese (zho) | 0.932203 | 0.982143 | 0.956522 | |
|
| accuracy | 0.963055 | 0.963055 | 0.963055 | |
|
| macro avg | 0.966424 | 0.963216 | 0.963891 | |
|
| weighted avg | 0.966040 | 0.963055 | 0.963606 | |
|
|
|
### By Sentence |
|
|
|
| language | precision | recall | f1-score | |
|
|:--------------------------------------:|:---------:|:--------:|:--------:| |
|
| Achinese (ace) | 0.754545 | 0.873684 | 0.809756 | |
|
| Afrikaans (afr) | 0.708955 | 0.940594 | 0.808511 | |
|
| Alemannic German (als) | 0.870130 | 0.752809 | 0.807229 | |
|
| Amharic (amh) | 1.000000 | 0.820000 | 0.901099 | |
|
| Old English (ang) | 0.966667 | 0.906250 | 0.935484 | |
|
| Arabic (ara) | 0.907692 | 0.967213 | 0.936508 | |
|
| Aragonese (arg) | 0.921569 | 0.959184 | 0.940000 | |
|
| Egyptian Arabic (arz) | 0.964286 | 0.843750 | 0.900000 | |
|
| Assamese (asm) | 0.964286 | 0.870968 | 0.915254 | |
|
| Asturian (ast) | 0.880000 | 0.795181 | 0.835443 | |
|
| Avar (ava) | 0.864198 | 0.843373 | 0.853659 | |
|
| Aymara (aym) | 1.000000 | 0.901961 | 0.948454 | |
|
| South Azerbaijani (azb) | 0.979381 | 0.989583 | 0.984456 | |
|
| Azerbaijani (aze) | 0.989899 | 0.960784 | 0.975124 | |
|
| Bashkir (bak) | 0.837209 | 0.857143 | 0.847059 | |
|
| Bavarian (bar) | 0.741935 | 0.766667 | 0.754098 | |
|
| Central Bikol (bcl) | 0.962963 | 0.928571 | 0.945455 | |
|
| Belarusian (Taraschkewiza) (be-tarask) | 0.857143 | 0.733333 | 0.790419 | |
|
| Belarusian (bel) | 0.775510 | 0.752475 | 0.763819 | |
|
| Bengali (ben) | 0.861111 | 0.911765 | 0.885714 | |
|
| Bhojpuri (bho) | 0.965517 | 0.933333 | 0.949153 | |
|
| Banjar (bjn) | 0.891566 | 0.880952 | 0.886228 | |
|
| Tibetan (bod) | 1.000000 | 1.000000 | 1.000000 | |
|
| Bosnian (bos) | 0.375000 | 0.323077 | 0.347107 | |
|
| Bishnupriya (bpy) | 0.986301 | 1.000000 | 0.993103 | |
|
| Breton (bre) | 0.951613 | 0.893939 | 0.921875 | |
|
| Bulgarian (bul) | 0.945055 | 0.877551 | 0.910053 | |
|
| Buryat (bxr) | 0.955556 | 0.843137 | 0.895833 | |
|
| Catalan (cat) | 0.692308 | 0.750000 | 0.720000 | |
|
| Chavacano (cbk) | 0.842857 | 0.641304 | 0.728395 | |
|
| Min Dong (cdo) | 0.972973 | 1.000000 | 0.986301 | |
|
| Cebuano (ceb) | 0.981308 | 0.954545 | 0.967742 | |
|
| Czech (ces) | 0.944444 | 0.915385 | 0.929687 | |
|
| Chechen (che) | 0.875000 | 0.700000 | 0.777778 | |
|
| Cherokee (chr) | 1.000000 | 0.970588 | 0.985075 | |
|
| Chuvash (chv) | 0.875000 | 0.836957 | 0.855556 | |
|
| Central Kurdish (ckb) | 1.000000 | 0.983051 | 0.991453 | |
|
| Cornish (cor) | 0.979592 | 0.969697 | 0.974619 | |
|
| Corsican (cos) | 0.986842 | 0.925926 | 0.955414 | |
|
| Crimean Tatar (crh) | 0.958333 | 0.907895 | 0.932432 | |
|
| Kashubian (csb) | 0.920354 | 0.904348 | 0.912281 | |
|
| Welsh (cym) | 0.971014 | 0.943662 | 0.957143 | |
|
| Danish (dan) | 0.865169 | 0.777778 | 0.819149 | |
|
| German (deu) | 0.721311 | 0.822430 | 0.768559 | |
|
| Dimli (diq) | 0.915966 | 0.923729 | 0.919831 | |
|
| Dhivehi (div) | 1.000000 | 0.991228 | 0.995595 | |
|
| Lower Sorbian (dsb) | 0.898876 | 0.879121 | 0.888889 | |
|
| Doteli (dty) | 0.821429 | 0.638889 | 0.718750 | |
|
| Emilian (egl) | 0.988095 | 0.922222 | 0.954023 | |
|
| Modern Greek (ell) | 0.988636 | 0.966667 | 0.977528 | |
|
| English (eng) | 0.522727 | 0.784091 | 0.627273 | |
|
| Esperanto (epo) | 0.963855 | 0.930233 | 0.946746 | |
|
| Estonian (est) | 0.922222 | 0.873684 | 0.897297 | |
|
| Basque (eus) | 1.000000 | 0.941176 | 0.969697 | |
|
| Extremaduran (ext) | 0.925373 | 0.885714 | 0.905109 | |
|
| Faroese (fao) | 0.855072 | 0.887218 | 0.870849 | |
|
| Persian (fas) | 0.879630 | 0.979381 | 0.926829 | |
|
| Finnish (fin) | 0.952830 | 0.943925 | 0.948357 | |
|
| French (fra) | 0.676768 | 0.943662 | 0.788235 | |
|
| Arpitan (frp) | 0.867925 | 0.807018 | 0.836364 | |
|
| Western Frisian (fry) | 0.956989 | 0.890000 | 0.922280 | |
|
| Friulian (fur) | 1.000000 | 0.857143 | 0.923077 | |
|
| Gagauz (gag) | 0.939024 | 0.802083 | 0.865169 | |
|
| Scottish Gaelic (gla) | 1.000000 | 0.879121 | 0.935673 | |
|
| Irish (gle) | 0.989247 | 0.958333 | 0.973545 | |
|
| Galician (glg) | 0.910256 | 0.922078 | 0.916129 | |
|
| Gilaki (glk) | 0.964706 | 0.872340 | 0.916201 | |
|
| Manx (glv) | 1.000000 | 0.965517 | 0.982456 | |
|
| Guarani (grn) | 0.983333 | 1.000000 | 0.991597 | |
|
| Gujarati (guj) | 1.000000 | 0.991525 | 0.995745 | |
|
| Hakka Chinese (hak) | 0.955224 | 0.955224 | 0.955224 | |
|
| Haitian Creole (hat) | 0.833333 | 0.666667 | 0.740741 | |
|
| Hausa (hau) | 0.936709 | 0.913580 | 0.925000 | |
|
| Serbo-Croatian (hbs) | 0.452830 | 0.410256 | 0.430493 | |
|
| Hebrew (heb) | 0.988235 | 0.976744 | 0.982456 | |
|
| Fiji Hindi (hif) | 0.936709 | 0.840909 | 0.886228 | |
|
| Hindi (hin) | 0.965517 | 0.756757 | 0.848485 | |
|
| Croatian (hrv) | 0.443820 | 0.537415 | 0.486154 | |
|
| Upper Sorbian (hsb) | 0.951613 | 0.830986 | 0.887218 | |
|
| Hungarian (hun) | 0.854701 | 0.909091 | 0.881057 | |
|
| Armenian (hye) | 1.000000 | 0.816327 | 0.898876 | |
|
| Igbo (ibo) | 0.974359 | 0.926829 | 0.950000 | |
|
| Ido (ido) | 0.975000 | 0.987342 | 0.981132 | |
|
| Interlingue (ile) | 0.880597 | 0.921875 | 0.900763 | |
|
| Iloko (ilo) | 0.882353 | 0.821918 | 0.851064 | |
|
| Interlingua (ina) | 0.952381 | 0.895522 | 0.923077 | |
|
| Indonesian (ind) | 0.606383 | 0.695122 | 0.647727 | |
|
| Icelandic (isl) | 0.978261 | 0.882353 | 0.927835 | |
|
| Italian (ita) | 0.910448 | 0.910448 | 0.910448 | |
|
| Jamaican Patois (jam) | 0.988764 | 0.967033 | 0.977778 | |
|
| Javanese (jav) | 0.903614 | 0.862069 | 0.882353 | |
|
| Lojban (jbo) | 0.943878 | 0.929648 | 0.936709 | |
|
| Japanese (jpn) | 1.000000 | 0.764706 | 0.866667 | |
|
| Karakalpak (kaa) | 0.940171 | 0.901639 | 0.920502 | |
|
| Kabyle (kab) | 0.985294 | 0.837500 | 0.905405 | |
|
| Kannada (kan) | 0.975806 | 0.975806 | 0.975806 | |
|
| Georgian (kat) | 0.953704 | 0.903509 | 0.927928 | |
|
| Kazakh (kaz) | 0.934579 | 0.877193 | 0.904977 | |
|
| Kabardian (kbd) | 0.987952 | 0.953488 | 0.970414 | |
|
| Central Khmer (khm) | 0.928571 | 0.829787 | 0.876404 | |
|
| Kinyarwanda (kin) | 0.953125 | 0.938462 | 0.945736 | |
|
| Kirghiz (kir) | 0.927632 | 0.881250 | 0.903846 | |
|
| Komi-Permyak (koi) | 0.750000 | 0.776786 | 0.763158 | |
|
| Konkani (kok) | 0.893491 | 0.872832 | 0.883041 | |
|
| Komi (kom) | 0.734177 | 0.690476 | 0.711656 | |
|
| Korean (kor) | 0.989899 | 0.989899 | 0.989899 | |
|
| Karachay-Balkar (krc) | 0.928571 | 0.917647 | 0.923077 | |
|
| Ripuarisch (ksh) | 0.915789 | 0.896907 | 0.906250 | |
|
| Kurdish (kur) | 0.977528 | 0.935484 | 0.956044 | |
|
| Ladino (lad) | 0.985075 | 0.904110 | 0.942857 | |
|
| Lao (lao) | 0.896552 | 0.812500 | 0.852459 | |
|
| Latin (lat) | 0.741935 | 0.831325 | 0.784091 | |
|
| Latvian (lav) | 0.710526 | 0.878049 | 0.785455 | |
|
| Lezghian (lez) | 0.975309 | 0.877778 | 0.923977 | |
|
| Ligurian (lij) | 0.951807 | 0.897727 | 0.923977 | |
|
| Limburgan (lim) | 0.909091 | 0.921053 | 0.915033 | |
|
| Lingala (lin) | 0.942857 | 0.814815 | 0.874172 | |
|
| Lithuanian (lit) | 0.892857 | 0.925926 | 0.909091 | |
|
| Lombard (lmo) | 0.766234 | 0.951613 | 0.848921 | |
|
| Northern Luri (lrc) | 0.972222 | 0.875000 | 0.921053 | |
|
| Latgalian (ltg) | 0.895349 | 0.865169 | 0.880000 | |
|
| Luxembourgish (ltz) | 0.882353 | 0.750000 | 0.810811 | |
|
| Luganda (lug) | 0.946429 | 0.883333 | 0.913793 | |
|
| Literary Chinese (lzh) | 1.000000 | 1.000000 | 1.000000 | |
|
| Maithili (mai) | 0.893617 | 0.823529 | 0.857143 | |
|
| Malayalam (mal) | 1.000000 | 0.975000 | 0.987342 | |
|
| Banyumasan (map-bms) | 0.924242 | 0.772152 | 0.841379 | |
|
| Marathi (mar) | 0.874126 | 0.919118 | 0.896057 | |
|
| Moksha (mdf) | 0.771242 | 0.830986 | 0.800000 | |
|
| Eastern Mari (mhr) | 0.820000 | 0.860140 | 0.839590 | |
|
| Minangkabau (min) | 0.973684 | 0.973684 | 0.973684 | |
|
| Macedonian (mkd) | 0.895652 | 0.953704 | 0.923767 | |
|
| Malagasy (mlg) | 1.000000 | 0.966102 | 0.982759 | |
|
| Maltese (mlt) | 0.987952 | 0.964706 | 0.976190 | |
|
| Min Nan Chinese (nan) | 0.975000 | 1.000000 | 0.987342 | |
|
| Mongolian (mon) | 0.954545 | 0.933333 | 0.943820 | |
|
| Maori (mri) | 0.985294 | 1.000000 | 0.992593 | |
|
| Western Mari (mrj) | 0.966292 | 0.914894 | 0.939891 | |
|
| Malay (msa) | 0.770270 | 0.695122 | 0.730769 | |
|
| Mirandese (mwl) | 0.970588 | 0.891892 | 0.929577 | |
|
| Burmese (mya) | 1.000000 | 0.964286 | 0.981818 | |
|
| Erzya (myv) | 0.535714 | 0.681818 | 0.600000 | |
|
| Mazanderani (mzn) | 0.968750 | 0.898551 | 0.932331 | |
|
| Neapolitan (nap) | 0.892308 | 0.865672 | 0.878788 | |
|
| Navajo (nav) | 0.984375 | 0.984375 | 0.984375 | |
|
| Classical Nahuatl (nci) | 0.901408 | 0.761905 | 0.825806 | |
|
| Low German (nds) | 0.896226 | 0.913462 | 0.904762 | |
|
| West Low German (nds-nl) | 0.873563 | 0.835165 | 0.853933 | |
|
| Nepali (macrolanguage) (nep) | 0.704545 | 0.861111 | 0.775000 | |
|
| Newari (new) | 0.920000 | 0.741935 | 0.821429 | |
|
| Dutch (nld) | 0.925926 | 0.872093 | 0.898204 | |
|
| Norwegian Nynorsk (nno) | 0.847059 | 0.808989 | 0.827586 | |
|
| Bokmål (nob) | 0.861386 | 0.852941 | 0.857143 | |
|
| Narom (nrm) | 0.966667 | 0.983051 | 0.974790 | |
|
| Northern Sotho (nso) | 0.897436 | 0.921053 | 0.909091 | |
|
| Occitan (oci) | 0.958333 | 0.696970 | 0.807018 | |
|
| Livvi-Karelian (olo) | 0.967742 | 0.937500 | 0.952381 | |
|
| Oriya (ori) | 0.933333 | 1.000000 | 0.965517 | |
|
| Oromo (orm) | 0.977528 | 0.915789 | 0.945652 | |
|
| Ossetian (oss) | 0.958333 | 0.841463 | 0.896104 | |
|
| Pangasinan (pag) | 0.847328 | 0.909836 | 0.877470 | |
|
| Pampanga (pam) | 0.969697 | 0.780488 | 0.864865 | |
|
| Panjabi (pan) | 1.000000 | 1.000000 | 1.000000 | |
|
| Papiamento (pap) | 0.876190 | 0.920000 | 0.897561 | |
|
| Picard (pcd) | 0.707317 | 0.568627 | 0.630435 | |
|
| Pennsylvania German (pdc) | 0.827273 | 0.827273 | 0.827273 | |
|
| Palatine German (pfl) | 0.882353 | 0.914634 | 0.898204 | |
|
| Western Panjabi (pnb) | 0.964286 | 0.931034 | 0.947368 | |
|
| Polish (pol) | 0.859813 | 0.910891 | 0.884615 | |
|
| Portuguese (por) | 0.535714 | 0.833333 | 0.652174 | |
|
| Pushto (pus) | 0.989362 | 0.902913 | 0.944162 | |
|
| Quechua (que) | 0.979167 | 0.903846 | 0.940000 | |
|
| Tarantino dialect (roa-tara) | 0.964912 | 0.901639 | 0.932203 | |
|
| Romansh (roh) | 0.914894 | 0.895833 | 0.905263 | |
|
| Romanian (ron) | 0.880597 | 0.880597 | 0.880597 | |
|
| Rusyn (rue) | 0.932584 | 0.805825 | 0.864583 | |
|
| Aromanian (rup) | 0.783333 | 0.758065 | 0.770492 | |
|
| Russian (rus) | 0.517986 | 0.765957 | 0.618026 | |
|
| Yakut (sah) | 0.954023 | 0.922222 | 0.937853 | |
|
| Sanskrit (san) | 0.866667 | 0.951220 | 0.906977 | |
|
| Sicilian (scn) | 0.984375 | 0.940299 | 0.961832 | |
|
| Scots (sco) | 0.851351 | 0.900000 | 0.875000 | |
|
| Samogitian (sgs) | 0.977011 | 0.876289 | 0.923913 | |
|
| Sinhala (sin) | 0.406154 | 0.985075 | 0.575163 | |
|
| Slovak (slk) | 0.956989 | 0.872549 | 0.912821 | |
|
| Slovene (slv) | 0.907216 | 0.854369 | 0.880000 | |
|
| Northern Sami (sme) | 0.949367 | 0.892857 | 0.920245 | |
|
| Shona (sna) | 0.936508 | 0.855072 | 0.893939 | |
|
| Sindhi (snd) | 0.984962 | 0.992424 | 0.988679 | |
|
| Somali (som) | 0.949153 | 0.848485 | 0.896000 | |
|
| Spanish (spa) | 0.584158 | 0.746835 | 0.655556 | |
|
| Albanian (sqi) | 0.988095 | 0.912088 | 0.948571 | |
|
| Sardinian (srd) | 0.957746 | 0.931507 | 0.944444 | |
|
| Sranan (srn) | 0.985714 | 0.945205 | 0.965035 | |
|
| Serbian (srp) | 0.950980 | 0.889908 | 0.919431 | |
|
| Saterfriesisch (stq) | 0.962500 | 0.875000 | 0.916667 | |
|
| Sundanese (sun) | 0.778846 | 0.910112 | 0.839378 | |
|
| Swahili (macrolanguage) (swa) | 0.915493 | 0.878378 | 0.896552 | |
|
| Swedish (swe) | 0.989247 | 0.958333 | 0.973545 | |
|
| Silesian (szl) | 0.944444 | 0.904255 | 0.923913 | |
|
| Tamil (tam) | 0.990000 | 0.970588 | 0.980198 | |
|
| Tatar (tat) | 0.942029 | 0.902778 | 0.921986 | |
|
| Tulu (tcy) | 0.980519 | 0.967949 | 0.974194 | |
|
| Telugu (tel) | 0.965986 | 0.965986 | 0.965986 | |
|
| Tetum (tet) | 0.898734 | 0.855422 | 0.876543 | |
|
| Tajik (tgk) | 0.974684 | 0.939024 | 0.956522 | |
|
| Tagalog (tgl) | 0.965909 | 0.934066 | 0.949721 | |
|
| Thai (tha) | 0.923077 | 0.882353 | 0.902256 | |
|
| Tongan (ton) | 0.970149 | 0.890411 | 0.928571 | |
|
| Tswana (tsn) | 0.888889 | 0.926316 | 0.907216 | |
|
| Turkmen (tuk) | 0.968000 | 0.889706 | 0.927203 | |
|
| Turkish (tur) | 0.871287 | 0.926316 | 0.897959 | |
|
| Tuvan (tyv) | 0.948454 | 0.859813 | 0.901961 | |
|
| Udmurt (udm) | 0.989362 | 0.894231 | 0.939394 | |
|
| Uighur (uig) | 1.000000 | 0.953333 | 0.976109 | |
|
| Ukrainian (ukr) | 0.893617 | 0.875000 | 0.884211 | |
|
| Urdu (urd) | 1.000000 | 1.000000 | 1.000000 | |
|
| Uzbek (uzb) | 0.636042 | 0.886700 | 0.740741 | |
|
| Venetian (vec) | 1.000000 | 0.941176 | 0.969697 | |
|
| Veps (vep) | 0.858586 | 0.965909 | 0.909091 | |
|
| Vietnamese (vie) | 1.000000 | 0.940476 | 0.969325 | |
|
| Vlaams (vls) | 0.885714 | 0.898551 | 0.892086 | |
|
| Volapük (vol) | 0.975309 | 0.975309 | 0.975309 | |
|
| Võro (vro) | 0.855670 | 0.864583 | 0.860104 | |
|
| Waray (war) | 0.972222 | 0.909091 | 0.939597 | |
|
| Walloon (wln) | 0.742138 | 0.893939 | 0.810997 | |
|
| Wolof (wol) | 0.882979 | 0.954023 | 0.917127 | |
|
| Wu Chinese (wuu) | 0.961538 | 0.833333 | 0.892857 | |
|
| Xhosa (xho) | 0.934066 | 0.867347 | 0.899471 | |
|
| Mingrelian (xmf) | 0.958333 | 0.929293 | 0.943590 | |
|
| Yiddish (yid) | 0.984375 | 0.875000 | 0.926471 | |
|
| Yoruba (yor) | 0.868421 | 0.857143 | 0.862745 | |
|
| Zeeuws (zea) | 0.879518 | 0.793478 | 0.834286 | |
|
| Cantonese (zh-yue) | 0.896552 | 0.812500 | 0.852459 | |
|
| Standard Chinese (zho) | 0.906250 | 0.935484 | 0.920635 | |
|
| accuracy | 0.881051 | 0.881051 | 0.881051 | |
|
| macro avg | 0.903245 | 0.880618 | 0.888996 | |
|
| weighted avg | 0.894174 | 0.881051 | 0.884520 | |
|
|
|
### By Token (3 to 5) |
|
|
|
| language | precision | recall | f1-score | |
|
|:--------------------------------------:|:---------:|:--------:|:--------:| |
|
| Achinese (ace) | 0.873846 | 0.827988 | 0.850299 | |
|
| Afrikaans (afr) | 0.638060 | 0.732334 | 0.681954 | |
|
| Alemannic German (als) | 0.673780 | 0.547030 | 0.603825 | |
|
| Amharic (amh) | 0.997743 | 0.954644 | 0.975717 | |
|
| Old English (ang) | 0.840816 | 0.693603 | 0.760148 | |
|
| Arabic (ara) | 0.768737 | 0.840749 | 0.803132 | |
|
| Aragonese (arg) | 0.493671 | 0.505181 | 0.499360 | |
|
| Egyptian Arabic (arz) | 0.823529 | 0.741935 | 0.780606 | |
|
| Assamese (asm) | 0.948454 | 0.893204 | 0.920000 | |
|
| Asturian (ast) | 0.490000 | 0.508299 | 0.498982 | |
|
| Avar (ava) | 0.813636 | 0.655678 | 0.726166 | |
|
| Aymara (aym) | 0.795833 | 0.779592 | 0.787629 | |
|
| South Azerbaijani (azb) | 0.832836 | 0.863777 | 0.848024 | |
|
| Azerbaijani (aze) | 0.867470 | 0.800000 | 0.832370 | |
|
| Bashkir (bak) | 0.851852 | 0.750000 | 0.797688 | |
|
| Bavarian (bar) | 0.560897 | 0.522388 | 0.540958 | |
|
| Central Bikol (bcl) | 0.708229 | 0.668235 | 0.687651 | |
|
| Belarusian (Taraschkewiza) (be-tarask) | 0.615635 | 0.526462 | 0.567568 | |
|
| Belarusian (bel) | 0.539952 | 0.597855 | 0.567430 | |
|
| Bengali (ben) | 0.830275 | 0.885086 | 0.856805 | |
|
| Bhojpuri (bho) | 0.723118 | 0.691517 | 0.706965 | |
|
| Banjar (bjn) | 0.619586 | 0.726269 | 0.668699 | |
|
| Tibetan (bod) | 0.999537 | 0.991728 | 0.995617 | |
|
| Bosnian (bos) | 0.330849 | 0.403636 | 0.363636 | |
|
| Bishnupriya (bpy) | 0.941634 | 0.949020 | 0.945312 | |
|
| Breton (bre) | 0.772222 | 0.745308 | 0.758527 | |
|
| Bulgarian (bul) | 0.771505 | 0.706897 | 0.737789 | |
|
| Buryat (bxr) | 0.741935 | 0.753149 | 0.747500 | |
|
| Catalan (cat) | 0.528716 | 0.610136 | 0.566516 | |
|
| Chavacano (cbk) | 0.409449 | 0.312625 | 0.354545 | |
|
| Min Dong (cdo) | 0.951264 | 0.936057 | 0.943599 | |
|
| Cebuano (ceb) | 0.888298 | 0.876640 | 0.882431 | |
|
| Czech (ces) | 0.806045 | 0.758294 | 0.781441 | |
|
| Chechen (che) | 0.857143 | 0.600000 | 0.705882 | |
|
| Cherokee (chr) | 0.997840 | 0.952577 | 0.974684 | |
|
| Chuvash (chv) | 0.874346 | 0.776744 | 0.822660 | |
|
| Central Kurdish (ckb) | 0.984848 | 0.953545 | 0.968944 | |
|
| Cornish (cor) | 0.747596 | 0.807792 | 0.776529 | |
|
| Corsican (cos) | 0.673913 | 0.708571 | 0.690808 | |
|
| Crimean Tatar (crh) | 0.498801 | 0.700337 | 0.582633 | |
|
| Kashubian (csb) | 0.797059 | 0.794721 | 0.795888 | |
|
| Welsh (cym) | 0.829609 | 0.841360 | 0.835443 | |
|
| Danish (dan) | 0.649789 | 0.622222 | 0.635707 | |
|
| German (deu) | 0.559406 | 0.763514 | 0.645714 | |
|
| Dimli (diq) | 0.835580 | 0.763547 | 0.797941 | |
|
| Dhivehi (div) | 1.000000 | 0.980645 | 0.990228 | |
|
| Lower Sorbian (dsb) | 0.740484 | 0.694805 | 0.716918 | |
|
| Doteli (dty) | 0.616314 | 0.527132 | 0.568245 | |
|
| Emilian (egl) | 0.822993 | 0.769625 | 0.795414 | |
|
| Modern Greek (ell) | 0.972043 | 0.963753 | 0.967880 | |
|
| English (eng) | 0.260492 | 0.724346 | 0.383183 | |
|
| Esperanto (epo) | 0.766764 | 0.716621 | 0.740845 | |
|
| Estonian (est) | 0.698885 | 0.673835 | 0.686131 | |
|
| Basque (eus) | 0.882716 | 0.841176 | 0.861446 | |
|
| Extremaduran (ext) | 0.570605 | 0.511628 | 0.539510 | |
|
| Faroese (fao) | 0.773987 | 0.784017 | 0.778970 | |
|
| Persian (fas) | 0.709836 | 0.809346 | 0.756332 | |
|
| Finnish (fin) | 0.866261 | 0.796089 | 0.829694 | |
|
| French (fra) | 0.496263 | 0.700422 | 0.580927 | |
|
| Arpitan (frp) | 0.663366 | 0.584302 | 0.621329 | |
|
| Western Frisian (fry) | 0.750000 | 0.756148 | 0.753061 | |
|
| Friulian (fur) | 0.713555 | 0.675545 | 0.694030 | |
|
| Gagauz (gag) | 0.728125 | 0.677326 | 0.701807 | |
|
| Scottish Gaelic (gla) | 0.831601 | 0.817996 | 0.824742 | |
|
| Irish (gle) | 0.868852 | 0.801296 | 0.833708 | |
|
| Galician (glg) | 0.469816 | 0.454315 | 0.461935 | |
|
| Gilaki (glk) | 0.703883 | 0.687204 | 0.695444 | |
|
| Manx (glv) | 0.873047 | 0.886905 | 0.879921 | |
|
| Guarani (grn) | 0.848580 | 0.793510 | 0.820122 | |
|
| Gujarati (guj) | 0.995643 | 0.926978 | 0.960084 | |
|
| Hakka Chinese (hak) | 0.898403 | 0.904971 | 0.901675 | |
|
| Haitian Creole (hat) | 0.719298 | 0.518987 | 0.602941 | |
|
| Hausa (hau) | 0.815353 | 0.829114 | 0.822176 | |
|
| Serbo-Croatian (hbs) | 0.343465 | 0.244589 | 0.285714 | |
|
| Hebrew (heb) | 0.891304 | 0.933941 | 0.912125 | |
|
| Fiji Hindi (hif) | 0.662577 | 0.664615 | 0.663594 | |
|
| Hindi (hin) | 0.782301 | 0.778169 | 0.780229 | |
|
| Croatian (hrv) | 0.360308 | 0.374000 | 0.367026 | |
|
| Upper Sorbian (hsb) | 0.745763 | 0.611111 | 0.671756 | |
|
| Hungarian (hun) | 0.876812 | 0.846154 | 0.861210 | |
|
| Armenian (hye) | 0.988201 | 0.917808 | 0.951705 | |
|
| Igbo (ibo) | 0.825397 | 0.696429 | 0.755448 | |
|
| Ido (ido) | 0.760479 | 0.814103 | 0.786378 | |
|
| Interlingue (ile) | 0.701299 | 0.580645 | 0.635294 | |
|
| Iloko (ilo) | 0.688356 | 0.844538 | 0.758491 | |
|
| Interlingua (ina) | 0.577889 | 0.588235 | 0.583016 | |
|
| Indonesian (ind) | 0.415879 | 0.514019 | 0.459770 | |
|
| Icelandic (isl) | 0.855263 | 0.790754 | 0.821745 | |
|
| Italian (ita) | 0.474576 | 0.561247 | 0.514286 | |
|
| Jamaican Patois (jam) | 0.826087 | 0.791667 | 0.808511 | |
|
| Javanese (jav) | 0.670130 | 0.658163 | 0.664093 | |
|
| Lojban (jbo) | 0.896861 | 0.917431 | 0.907029 | |
|
| Japanese (jpn) | 0.931373 | 0.848214 | 0.887850 | |
|
| Karakalpak (kaa) | 0.790393 | 0.827744 | 0.808637 | |
|
| Kabyle (kab) | 0.828571 | 0.759162 | 0.792350 | |
|
| Kannada (kan) | 0.879357 | 0.847545 | 0.863158 | |
|
| Georgian (kat) | 0.916399 | 0.907643 | 0.912000 | |
|
| Kazakh (kaz) | 0.900901 | 0.819672 | 0.858369 | |
|
| Kabardian (kbd) | 0.923345 | 0.892256 | 0.907534 | |
|
| Central Khmer (khm) | 0.976667 | 0.816156 | 0.889226 | |
|
| Kinyarwanda (kin) | 0.824324 | 0.726190 | 0.772152 | |
|
| Kirghiz (kir) | 0.674766 | 0.779698 | 0.723447 | |
|
| Komi-Permyak (koi) | 0.652830 | 0.633700 | 0.643123 | |
|
| Konkani (kok) | 0.778865 | 0.728938 | 0.753075 | |
|
| Komi (kom) | 0.737374 | 0.572549 | 0.644592 | |
|
| Korean (kor) | 0.984615 | 0.967603 | 0.976035 | |
|
| Karachay-Balkar (krc) | 0.869416 | 0.857627 | 0.863481 | |
|
| Ripuarisch (ksh) | 0.709859 | 0.649485 | 0.678331 | |
|
| Kurdish (kur) | 0.883777 | 0.862884 | 0.873206 | |
|
| Ladino (lad) | 0.660920 | 0.576441 | 0.615797 | |
|
| Lao (lao) | 0.986175 | 0.918455 | 0.951111 | |
|
| Latin (lat) | 0.581250 | 0.636986 | 0.607843 | |
|
| Latvian (lav) | 0.824513 | 0.797844 | 0.810959 | |
|
| Lezghian (lez) | 0.898955 | 0.793846 | 0.843137 | |
|
| Ligurian (lij) | 0.662903 | 0.677100 | 0.669927 | |
|
| Limburgan (lim) | 0.615385 | 0.581818 | 0.598131 | |
|
| Lingala (lin) | 0.836207 | 0.763780 | 0.798354 | |
|
| Lithuanian (lit) | 0.756329 | 0.804714 | 0.779772 | |
|
| Lombard (lmo) | 0.556818 | 0.536986 | 0.546722 | |
|
| Northern Luri (lrc) | 0.838574 | 0.753296 | 0.793651 | |
|
| Latgalian (ltg) | 0.759531 | 0.755102 | 0.757310 | |
|
| Luxembourgish (ltz) | 0.645062 | 0.614706 | 0.629518 | |
|
| Luganda (lug) | 0.787535 | 0.805797 | 0.796562 | |
|
| Literary Chinese (lzh) | 0.921951 | 0.949749 | 0.935644 | |
|
| Maithili (mai) | 0.777778 | 0.761658 | 0.769634 | |
|
| Malayalam (mal) | 0.993377 | 0.949367 | 0.970874 | |
|
| Banyumasan (map-bms) | 0.531429 | 0.453659 | 0.489474 | |
|
| Marathi (mar) | 0.748744 | 0.818681 | 0.782152 | |
|
| Moksha (mdf) | 0.728745 | 0.800000 | 0.762712 | |
|
| Eastern Mari (mhr) | 0.790323 | 0.760870 | 0.775316 | |
|
| Minangkabau (min) | 0.953271 | 0.886957 | 0.918919 | |
|
| Macedonian (mkd) | 0.816399 | 0.849722 | 0.832727 | |
|
| Malagasy (mlg) | 0.925187 | 0.918317 | 0.921739 | |
|
| Maltese (mlt) | 0.869421 | 0.890017 | 0.879599 | |
|
| Min Nan Chinese (nan) | 0.743707 | 0.820707 | 0.780312 | |
|
| Mongolian (mon) | 0.852194 | 0.838636 | 0.845361 | |
|
| Maori (mri) | 0.934726 | 0.937173 | 0.935948 | |
|
| Western Mari (mrj) | 0.818792 | 0.827119 | 0.822934 | |
|
| Malay (msa) | 0.508065 | 0.376119 | 0.432247 | |
|
| Mirandese (mwl) | 0.650407 | 0.685225 | 0.667362 | |
|
| Burmese (mya) | 0.995968 | 0.972441 | 0.984064 | |
|
| Erzya (myv) | 0.475783 | 0.503012 | 0.489019 | |
|
| Mazanderani (mzn) | 0.775362 | 0.701639 | 0.736661 | |
|
| Neapolitan (nap) | 0.628993 | 0.595349 | 0.611708 | |
|
| Navajo (nav) | 0.955882 | 0.937500 | 0.946602 | |
|
| Classical Nahuatl (nci) | 0.679758 | 0.589005 | 0.631136 | |
|
| Low German (nds) | 0.669789 | 0.690821 | 0.680143 | |
|
| West Low German (nds-nl) | 0.513889 | 0.504545 | 0.509174 | |
|
| Nepali (macrolanguage) (nep) | 0.640476 | 0.649758 | 0.645084 | |
|
| Newari (new) | 0.928571 | 0.745902 | 0.827273 | |
|
| Dutch (nld) | 0.553763 | 0.553763 | 0.553763 | |
|
| Norwegian Nynorsk (nno) | 0.569277 | 0.519231 | 0.543103 | |
|
| Bokmål (nob) | 0.519856 | 0.562500 | 0.540338 | |
|
| Narom (nrm) | 0.691275 | 0.605882 | 0.645768 | |
|
| Northern Sotho (nso) | 0.950276 | 0.815166 | 0.877551 | |
|
| Occitan (oci) | 0.483444 | 0.366834 | 0.417143 | |
|
| Livvi-Karelian (olo) | 0.816850 | 0.790780 | 0.803604 | |
|
| Oriya (ori) | 0.981481 | 0.963636 | 0.972477 | |
|
| Oromo (orm) | 0.885714 | 0.829218 | 0.856536 | |
|
| Ossetian (oss) | 0.822006 | 0.855219 | 0.838284 | |
|
| Pangasinan (pag) | 0.842105 | 0.715655 | 0.773748 | |
|
| Pampanga (pam) | 0.770000 | 0.435028 | 0.555957 | |
|
| Panjabi (pan) | 0.996154 | 0.984791 | 0.990440 | |
|
| Papiamento (pap) | 0.674672 | 0.661670 | 0.668108 | |
|
| Picard (pcd) | 0.407895 | 0.356322 | 0.380368 | |
|
| Pennsylvania German (pdc) | 0.487047 | 0.509485 | 0.498013 | |
|
| Palatine German (pfl) | 0.614173 | 0.570732 | 0.591656 | |
|
| Western Panjabi (pnb) | 0.926267 | 0.887417 | 0.906426 | |
|
| Polish (pol) | 0.797059 | 0.734417 | 0.764457 | |
|
| Portuguese (por) | 0.500914 | 0.586724 | 0.540434 | |
|
| Pushto (pus) | 0.941489 | 0.898477 | 0.919481 | |
|
| Quechua (que) | 0.854167 | 0.797665 | 0.824950 | |
|
| Tarantino dialect (roa-tara) | 0.669794 | 0.724138 | 0.695906 | |
|
| Romansh (roh) | 0.745527 | 0.760649 | 0.753012 | |
|
| Romanian (ron) | 0.805486 | 0.769048 | 0.786845 | |
|
| Rusyn (rue) | 0.718543 | 0.645833 | 0.680251 | |
|
| Aromanian (rup) | 0.288482 | 0.730245 | 0.413580 | |
|
| Russian (rus) | 0.530120 | 0.690583 | 0.599805 | |
|
| Yakut (sah) | 0.853521 | 0.865714 | 0.859574 | |
|
| Sanskrit (san) | 0.931343 | 0.896552 | 0.913616 | |
|
| Sicilian (scn) | 0.734139 | 0.618321 | 0.671271 | |
|
| Scots (sco) | 0.571429 | 0.540816 | 0.555701 | |
|
| Samogitian (sgs) | 0.829167 | 0.748120 | 0.786561 | |
|
| Sinhala (sin) | 0.909474 | 0.935065 | 0.922092 | |
|
| Slovak (slk) | 0.738235 | 0.665782 | 0.700139 | |
|
| Slovene (slv) | 0.671123 | 0.662269 | 0.666667 | |
|
| Northern Sami (sme) | 0.800676 | 0.825784 | 0.813036 | |
|
| Shona (sna) | 0.761702 | 0.724696 | 0.742739 | |
|
| Sindhi (snd) | 0.950172 | 0.946918 | 0.948542 | |
|
| Somali (som) | 0.849462 | 0.802030 | 0.825065 | |
|
| Spanish (spa) | 0.325234 | 0.413302 | 0.364017 | |
|
| Albanian (sqi) | 0.875899 | 0.832479 | 0.853637 | |
|
| Sardinian (srd) | 0.750000 | 0.711061 | 0.730012 | |
|
| Sranan (srn) | 0.888889 | 0.771084 | 0.825806 | |
|
| Serbian (srp) | 0.824561 | 0.814356 | 0.819427 | |
|
| Saterfriesisch (stq) | 0.790087 | 0.734417 | 0.761236 | |
|
| Sundanese (sun) | 0.764192 | 0.631769 | 0.691700 | |
|
| Swahili (macrolanguage) (swa) | 0.763496 | 0.796247 | 0.779528 | |
|
| Swedish (swe) | 0.838284 | 0.723647 | 0.776758 | |
|
| Silesian (szl) | 0.819788 | 0.750809 | 0.783784 | |
|
| Tamil (tam) | 0.985765 | 0.955172 | 0.970228 | |
|
| Tatar (tat) | 0.469780 | 0.795349 | 0.590674 | |
|
| Tulu (tcy) | 0.893300 | 0.873786 | 0.883436 | |
|
| Telugu (tel) | 1.000000 | 0.913690 | 0.954899 | |
|
| Tetum (tet) | 0.765116 | 0.744344 | 0.754587 | |
|
| Tajik (tgk) | 0.828418 | 0.813158 | 0.820717 | |
|
| Tagalog (tgl) | 0.751468 | 0.757396 | 0.754420 | |
|
| Thai (tha) | 0.933884 | 0.807143 | 0.865900 | |
|
| Tongan (ton) | 0.920245 | 0.923077 | 0.921659 | |
|
| Tswana (tsn) | 0.873397 | 0.889070 | 0.881164 | |
|
| Turkmen (tuk) | 0.898438 | 0.837887 | 0.867107 | |
|
| Turkish (tur) | 0.666667 | 0.716981 | 0.690909 | |
|
| Tuvan (tyv) | 0.857143 | 0.805063 | 0.830287 | |
|
| Udmurt (udm) | 0.865517 | 0.756024 | 0.807074 | |
|
| Uighur (uig) | 0.991597 | 0.967213 | 0.979253 | |
|
| Ukrainian (ukr) | 0.771341 | 0.702778 | 0.735465 | |
|
| Urdu (urd) | 0.877647 | 0.855505 | 0.866434 | |
|
| Uzbek (uzb) | 0.655652 | 0.797040 | 0.719466 | |
|
| Venetian (vec) | 0.611111 | 0.527233 | 0.566082 | |
|
| Veps (vep) | 0.672862 | 0.688213 | 0.680451 | |
|
| Vietnamese (vie) | 0.932406 | 0.914230 | 0.923228 | |
|
| Vlaams (vls) | 0.594427 | 0.501305 | 0.543909 | |
|
| Volapük (vol) | 0.765625 | 0.942308 | 0.844828 | |
|
| Võro (vro) | 0.797203 | 0.740260 | 0.767677 | |
|
| Waray (war) | 0.930876 | 0.930876 | 0.930876 | |
|
| Walloon (wln) | 0.636804 | 0.693931 | 0.664141 | |
|
| Wolof (wol) | 0.864220 | 0.845601 | 0.854809 | |
|
| Wu Chinese (wuu) | 0.848921 | 0.830986 | 0.839858 | |
|
| Xhosa (xho) | 0.837398 | 0.759214 | 0.796392 | |
|
| Mingrelian (xmf) | 0.943396 | 0.874126 | 0.907441 | |
|
| Yiddish (yid) | 0.955729 | 0.897311 | 0.925599 | |
|
| Yoruba (yor) | 0.812010 | 0.719907 | 0.763190 | |
|
| Zeeuws (zea) | 0.617737 | 0.550409 | 0.582133 | |
|
| Cantonese (zh-yue) | 0.859649 | 0.649007 | 0.739623 | |
|
| Standard Chinese (zho) | 0.845528 | 0.781955 | 0.812500 | |
|
| accuracy | 0.749527 | 0.749527 | 0.749527 | |
|
| macro avg | 0.762866 | 0.742101 | 0.749261 | |
|
| weighted avg | 0.762006 | 0.749527 | 0.752910 | |
|
|
|
|
|
## Questions? |
|
Post a Github issue from [HERE](https://github.com/m3hrdadfi/zabanshenas/issues). |