BEE-spoke-data
/

mega-encoder-small-16k-v1

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Mar 17, 2024

Commit

4410970

·

verified ·

1 Parent(s): 86ff964

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -21,6 +21,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
 | bert-base-uncased         | 110M  |   512 | 0.7905 |
 | roberta-base              | 125M  |   514 |   0.86 |
 | [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0)    | 88.1M |  4096 | 0.8278 |
 <details>
 <summary><strong>GLUE Details</strong></summary>
@@ -31,6 +32,7 @@ Despite being a long-context model evaluated on a short-context benchmark, MEGA
 | bert-base-uncased         | 110M  |   512 | 0.7905 |  0.521 | 0.935 |  0.889 |  0.858 | 0.712 |  0.84 | 0.905 |  0.664 |
 | roberta-base              | 125M  |   514 |   0.86 |   0.64 |  0.95 |    0.9 |   0.91 |  0.92 |  0.88 |  0.93 |   0.79 |
 | bert-plus-L8-4096-v1.0    | 88.1M |  4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 |   0.9 | 0.6643 |
 The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
 </details>

 | bert-base-uncased         | 110M  |   512 | 0.7905 |
 | roberta-base              | 125M  |   514 |   0.86 |
 | [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0)    | 88.1M |  4096 | 0.8278 |
+| [mega-wikitext103](https://huggingface.co/mnaylor/mega-base-wikitext)    | 7.0M |  10000 | 0.48 |
 <details>
 <summary><strong>GLUE Details</strong></summary>
 | bert-base-uncased         | 110M  |   512 | 0.7905 |  0.521 | 0.935 |  0.889 |  0.858 | 0.712 |  0.84 | 0.905 |  0.664 |
 | roberta-base              | 125M  |   514 |   0.86 |   0.64 |  0.95 |    0.9 |   0.91 |  0.92 |  0.88 |  0.93 |   0.79 |
 | bert-plus-L8-4096-v1.0    | 88.1M |  4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 |   0.9 | 0.6643 |
+| mega-wikitext103          | 7M    | 10000|  0.480 |  0.00  | 0.732 |  0.748 | -0.087 | 0.701 | 0.54  | 0.598 |  0.513 |
 The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
 </details>