ilos-vigil
commited on
Commit
·
c57096b
1
Parent(s):
3f0d9f8
Update README.md
Browse files
README.md
CHANGED
@@ -2,12 +2,13 @@
|
|
2 |
language: id
|
3 |
license: mit
|
4 |
datasets:
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
widget:
|
9 |
-
|
10 |
-
|
|
|
11 |
---
|
12 |
|
13 |
# Indonesian small BigBird model
|
@@ -16,6 +17,10 @@ widget:
|
|
16 |
|
17 |
Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
|
18 |
|
|
|
|
|
|
|
|
|
19 |
## Model Description
|
20 |
|
21 |
This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
|
@@ -159,4 +164,4 @@ The model achieve the following result during training evaluation.
|
|
159 |
| 5 | 32187 | 2.097 | 8.141 |
|
160 |
| 6 | 38616 | 2.087 | 8.061 |
|
161 |
| 7 | 45045 | 2.081 | 8.012 |
|
162 |
-
| 8 | 51474 | 2.078 | 7.988 |
|
|
|
2 |
language: id
|
3 |
license: mit
|
4 |
datasets:
|
5 |
+
- oscar
|
6 |
+
- wikipedia
|
7 |
+
- id_newspapers_2018
|
8 |
widget:
|
9 |
+
- text: Saya [MASK] makan nasi goreng.
|
10 |
+
- text: Kucing itu sedang bermain dengan [MASK].
|
11 |
+
pipeline_tag: fill-mask
|
12 |
---
|
13 |
|
14 |
# Indonesian small BigBird model
|
|
|
17 |
|
18 |
Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
|
19 |
|
20 |
+
## Downstream Task
|
21 |
+
|
22 |
+
* NLI/ZSC: [ilos-vigil/bigbird-small-indonesian-nli](https://huggingface.co/ilos-vigil/bigbird-small-indonesian-nli)
|
23 |
+
|
24 |
## Model Description
|
25 |
|
26 |
This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
|
|
|
164 |
| 5 | 32187 | 2.097 | 8.141 |
|
165 |
| 6 | 38616 | 2.087 | 8.061 |
|
166 |
| 7 | 45045 | 2.081 | 8.012 |
|
167 |
+
| 8 | 51474 | 2.078 | 7.988 |
|