Fill-Mask
Transformers
Safetensors
English
mega
16384
16k
Inference Endpoints
pszemraj commited on
Commit
4410970
·
verified ·
1 Parent(s): 86ff964

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -21,6 +21,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
21
  | bert-base-uncased | 110M | 512 | 0.7905 |
22
  | roberta-base | 125M | 514 | 0.86 |
23
  | [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
 
24
 
25
  <details>
26
  <summary><strong>GLUE Details</strong></summary>
@@ -31,6 +32,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
31
  | bert-base-uncased | 110M | 512 | 0.7905 | 0.521 | 0.935 | 0.889 | 0.858 | 0.712 | 0.84 | 0.905 | 0.664 |
32
  | roberta-base | 125M | 514 | 0.86 | 0.64 | 0.95 | 0.9 | 0.91 | 0.92 | 0.88 | 0.93 | 0.79 |
33
  | bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 | 0.9 | 0.6643 |
 
34
 
35
  The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
36
  </details>
 
21
  | bert-base-uncased | 110M | 512 | 0.7905 |
22
  | roberta-base | 125M | 514 | 0.86 |
23
  | [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
24
+ | [mega-wikitext103](https://huggingface.co/mnaylor/mega-base-wikitext) | 7.0M | 10000 | 0.48 |
25
 
26
  <details>
27
  <summary><strong>GLUE Details</strong></summary>
 
32
  | bert-base-uncased | 110M | 512 | 0.7905 | 0.521 | 0.935 | 0.889 | 0.858 | 0.712 | 0.84 | 0.905 | 0.664 |
33
  | roberta-base | 125M | 514 | 0.86 | 0.64 | 0.95 | 0.9 | 0.91 | 0.92 | 0.88 | 0.93 | 0.79 |
34
  | bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 | 0.9 | 0.6643 |
35
+ | mega-wikitext103 | 7M | 10000| 0.480 | 0.00 | 0.732 | 0.748 | -0.087 | 0.701 | 0.54 | 0.598 | 0.513 |
36
 
37
  The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
38
  </details>