Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
|
|
21 |
| bert-base-uncased | 110M | 512 | 0.7905 |
|
22 |
| roberta-base | 125M | 514 | 0.86 |
|
23 |
| [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
|
|
|
24 |
|
25 |
<details>
|
26 |
<summary><strong>GLUE Details</strong></summary>
|
@@ -31,6 +32,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
|
|
31 |
| bert-base-uncased | 110M | 512 | 0.7905 | 0.521 | 0.935 | 0.889 | 0.858 | 0.712 | 0.84 | 0.905 | 0.664 |
|
32 |
| roberta-base | 125M | 514 | 0.86 | 0.64 | 0.95 | 0.9 | 0.91 | 0.92 | 0.88 | 0.93 | 0.79 |
|
33 |
| bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 | 0.9 | 0.6643 |
|
|
|
34 |
|
35 |
The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
|
36 |
</details>
|
|
|
21 |
| bert-base-uncased | 110M | 512 | 0.7905 |
|
22 |
| roberta-base | 125M | 514 | 0.86 |
|
23 |
| [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
|
24 |
+
| [mega-wikitext103](https://huggingface.co/mnaylor/mega-base-wikitext) | 7.0M | 10000 | 0.48 |
|
25 |
|
26 |
<details>
|
27 |
<summary><strong>GLUE Details</strong></summary>
|
|
|
32 |
| bert-base-uncased | 110M | 512 | 0.7905 | 0.521 | 0.935 | 0.889 | 0.858 | 0.712 | 0.84 | 0.905 | 0.664 |
|
33 |
| roberta-base | 125M | 514 | 0.86 | 0.64 | 0.95 | 0.9 | 0.91 | 0.92 | 0.88 | 0.93 | 0.79 |
|
34 |
| bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 | 0.9 | 0.6643 |
|
35 |
+
| mega-wikitext103 | 7M | 10000| 0.480 | 0.00 | 0.732 | 0.748 | -0.087 | 0.701 | 0.54 | 0.598 | 0.513 |
|
36 |
|
37 |
The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
|
38 |
</details>
|