papluca commited on
Commit
f793746
·
1 Parent(s): a613a6d

Add evaluation results

Browse files
Files changed (1) hide show
  1. README.md +64 -4
README.md CHANGED
@@ -14,6 +14,11 @@ model-index:
14
 
15
  This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset.
16
 
 
 
 
 
 
17
  ## Intended uses & limitations
18
 
19
  You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages:
@@ -22,13 +27,62 @@ You can directly use this model as a language detector, i.e. for sequence classi
22
 
23
  ## Training and evaluation data
24
 
25
- It achieves the following results on the evaluation set:
26
- - Loss: 0.0103
27
- - Accuracy: 0.9977
28
- - F1: 0.9977
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Training procedure
31
 
 
 
32
  ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
@@ -43,11 +97,17 @@ The following hyperparameters were used during training:
43
 
44
  ### Training results
45
 
 
 
46
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
47
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
48
  | 0.2492 | 1.0 | 1094 | 0.0149 | 0.9969 | 0.9969 |
49
  | 0.0101 | 2.0 | 2188 | 0.0103 | 0.9977 | 0.9977 |
50
 
 
 
 
 
51
 
52
  ### Framework versions
53
 
 
14
 
15
  This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset.
16
 
17
+ ## Model description
18
+
19
+ This model is an XLM-RoBERTa transformer model with a classification head on top (i.e. a linear layer on top of the pooled output).
20
+ For additional information please refer to the [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) model card or to the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Conneau et al.
21
+
22
  ## Intended uses & limitations
23
 
24
  You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages:
 
27
 
28
  ## Training and evaluation data
29
 
30
+ The model was fine-tuned on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset, which consists of text sequences in 20 languages. The training set contains 70k samples, while the validation and test sets 10k each. The average accuracy on the test set is **99.6%** (this matches the average macro/weighted F1-score being the test set perfectly balanced). A more detailed evaluation is provided by the following table.
31
+
32
+ | Language | Precision | Recall | F1-score | support |
33
+ |:--------:|:---------:|:------:|:--------:|:-------:|
34
+ |ar |0.998 |0.996 |0.997 |500 |
35
+ |bg |0.998 |0.964 |0.981 |500 |
36
+ |de |0.998 |0.996 |0.997 |500 |
37
+ |el |0.996 |1.000 |0.998 |500 |
38
+ |en |1.000 |1.000 |1.000 |500 |
39
+ |es |0.967 |1.000 |0.983 |500 |
40
+ |fr |1.000 |1.000 |1.000 |500 |
41
+ |hi |0.994 |0.992 |0.993 |500 |
42
+ |it |1.000 |0.992 |0.996 |500 |
43
+ |ja |0.996 |0.996 |0.996 |500 |
44
+ |nl |1.000 |1.000 |1.000 |500 |
45
+ |pl |1.000 |1.000 |1.000 |500 |
46
+ |pt |0.988 |1.000 |0.994 |500 |
47
+ |ru |1.000 |0.994 |0.997 |500 |
48
+ |sw |1.000 |1.000 |1.000 |500 |
49
+ |th |1.000 |0.998 |0.999 |500 |
50
+ |tr |0.994 |0.992 |0.993 |500 |
51
+ |ur |1.000 |1.000 |1.000 |500 |
52
+ |vi |0.992 |1.000 |0.996 |500 |
53
+ |zh |1.000 |1.000 |1.000 |500 |
54
+
55
+ ### Benchmarks
56
+
57
+ As a baseline to compare `xlm-roberta-base-language-detection` against, we have used the Python [langid](https://github.com/saffsd/langid.py) library. Since it comes pre-trained on 97 languages, we have used its `.set_languages()` method to constrain the language set to our 20 languages. The average accuracy of langid on the test set is **98.5%**. More details are provided by the table below.
58
+
59
+ | Language | Precision | Recall | F1-score | support |
60
+ |:--------:|:---------:|:------:|:--------:|:-------:|
61
+ |ar |0.990 |0.970 |0.980 |500 |
62
+ |bg |0.998 |0.964 |0.981 |500 |
63
+ |de |0.992 |0.944 |0.967 |500 |
64
+ |el |1.000 |0.998 |0.999 |500 |
65
+ |en |1.000 |1.000 |1.000 |500 |
66
+ |es |1.000 |0.968 |0.984 |500 |
67
+ |fr |0.996 |1.000 |0.998 |500 |
68
+ |hi |0.949 |0.976 |0.963 |500 |
69
+ |it |0.990 |0.980 |0.985 |500 |
70
+ |ja |0.927 |0.988 |0.956 |500 |
71
+ |nl |0.980 |1.000 |0.990 |500 |
72
+ |pl |0.986 |0.996 |0.991 |500 |
73
+ |pt |0.950 |0.996 |0.973 |500 |
74
+ |ru |0.996 |0.974 |0.985 |500 |
75
+ |sw |1.000 |1.000 |1.000 |500 |
76
+ |th |1.000 |0.996 |0.998 |500 |
77
+ |tr |0.990 |0.968 |0.979 |500 |
78
+ |ur |0.998 |0.996 |0.997 |500 |
79
+ |vi |0.971 |0.990 |0.980 |500 |
80
+ |zh |1.000 |1.000 |1.000 |500 |
81
 
82
  ## Training procedure
83
 
84
+ Fine-tuning was done via the `Trainer` API.
85
+
86
  ### Training hyperparameters
87
 
88
  The following hyperparameters were used during training:
 
97
 
98
  ### Training results
99
 
100
+ The validation results on the `valid` split of the Language Identification dataset are summarised here below.
101
+
102
  | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
103
  |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
104
  | 0.2492 | 1.0 | 1094 | 0.0149 | 0.9969 | 0.9969 |
105
  | 0.0101 | 2.0 | 2188 | 0.0103 | 0.9977 | 0.9977 |
106
 
107
+ In short, it achieves the following results on the validation set:
108
+ - Loss: 0.0101
109
+ - Accuracy: 0.9977
110
+ - F1: 0.9977
111
 
112
  ### Framework versions
113