|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HiTZ/CONAN-EUS |
|
language: |
|
- en |
|
metrics: |
|
- bleu |
|
library_name: transformers |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- counternarrative |
|
- hate speech |
|
- text generation |
|
--- |
|
**Content Warning**: This card may contain examples of offensive language that do not reflect the authors’ views |
|
|
|
# Model Card for mT5-counternarrative-en |
|
|
|
This is a fine-tuned text-to-text [mT5-base](https://huggingface.co/google/mt5-base) model to generate counternarratives against hate speech. |
|
The model has been fine-tuned on the [CONAN-EUS](https://huggingface.co/datasets/HiTZ/CONAN-EUS) splits of the |
|
original CONAN dataset. |
|
|
|
The CONAN (COunter NArratives through Nichesourcing) dataset was published by [Chung et al., 2019](https://aclanthology.org/P19-1271.pdf) |
|
and is publicly available in [https://github.com/marcoguerini/CONAN](https://github.com/marcoguerini/CONAN). |
|
|
|
CONAN-EUS was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into |
|
**Basque and Spanish**. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them. |
|
|
|
<table style="width:33%"> |
|
<tr> |
|
<th>CONAN-EUS Splits</th> |
|
<th>Total HS-CN Count</th> |
|
<tr> |
|
<td>train</td> |
|
<td>4833</td> |
|
</tr> |
|
<tr> |
|
<td>validation</td> |
|
<td>537</td> |
|
</tr> |
|
<tr> |
|
<td>test</td> |
|
<td>1278</td> |
|
</tr> |
|
</table> |
|
|
|
- 📖 Paper: [Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation](https://arxiv.org/abs/2403.09159) In LREC-COLING 2024. |
|
- 💻 Github Repo (Data and Code): [https://github.com/ixa-ehu/conan-e/](https://github.com/ixa-ehu/conan-e/) |
|
|
|
|
|
## HS-CN example |
|
|
|
The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. |
|
An example of a HS-CN pair in Basque, Spanish and English is illustrated below: |
|
|
|
| HS | CN | |
|
|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| |
|
| Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. | |
|
| Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura.| ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.| |
|
| Muslims do not have anything useful that can enrich our culture.| What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.| |
|
|
|
If you use the model please **cite these two papers**: |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@inproceedings{bengoetxea-et-al-2024, |
|
title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation}, |
|
author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri}, |
|
year={2024}, |
|
publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)", |
|
} |
|
``` |
|
|
|
```bibtex |
|
@inproceedings{chung-etal-2019-conan, |
|
title = "{CONAN} - {CO}unter {NA}rratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech", |
|
author = "Chung, Yi-Ling and |
|
Kuzmenko, Elizaveta and |
|
Tekiroglu, Serra Sinem and |
|
Guerini, Marco", |
|
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", |
|
year = "2019", |
|
pages = "2819--2829" |
|
} |
|
``` |
|
|
|
**Contact**: [Rodrigo Agerri](https://ragerri.github.io/) |
|
HiTZ Center - Ixa, University of the Basque Country UPV/EHU |
|
|
|
|