nicholasKluge
commited on
Commit
·
922df51
1
Parent(s):
25c66ea
Update README.md
Browse files
README.md
CHANGED
@@ -113,6 +113,15 @@ The model will output something like:
|
|
113 |
|
114 |
🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.
|
115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
## Cite as 🤗
|
117 |
|
118 |
```latex
|
|
|
113 |
|
114 |
🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.
|
115 |
|
116 |
+
## Evaluation
|
117 |
+
|
118 |
+
| Model (gpt2-portuguese) | Average | [ARC](https://arxiv.org/abs/1803.05457) | [TruthfulQA](https://arxiv.org/abs/2109.07958) | [ToxiGen](https://arxiv.org/abs/2203.09509) |
|
119 |
+
|---------------------------------------------------------------------------------------|-----------|-----------------------------------------|------------------------------------------------|---------------------------------------------|
|
120 |
+
| [Aira-2-portuguese-124M](https://huggingface.co/nicholasKluge/Aira-2-portuguese-124M) | **34.73** | **24.87** | 40.60 | None |
|
121 |
+
| gpt2-small-portuguese | 31.96 | 22.48 | **41.44** | None |
|
122 |
+
|
123 |
+
* Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). The notebook used to make these evaluations is available in the [this repo](lm_evaluation_harness-pt.ipynb). The ToxiGen evaluation was not performed because the task is not available in Portuguese. Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
|
124 |
+
|
125 |
## Cite as 🤗
|
126 |
|
127 |
```latex
|