File size: 3,523 Bytes
f17c982
 
d8b1504
 
 
 
 
 
 
 
 
 
 
 
f17c982
d8b1504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2505963
d8b1504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
datasets:
- HiTZ/CONAN-EUS
language:
- eu
metrics:
- bleu
library_name: transformers
pipeline_tag: text2text-generation
tags:
- counternarrative
- hate speech
- text generation
---
**Content Warning**: This card may contain examples of offensive language that do not reflect the authors’ views

# Model Card for mT5-counternarrative-es

This is a fine-tuned text-to-text [mT5-base](https://huggingface.co/google/mt5-base) model to generate counternarratives against hate speech in **Basque**.
The model has been fine-tuned on the Basque splits of the [CONAN-EUS](https://huggingface.co/datasets/HiTZ/CONAN-EUS) dataset.

[CONAN-EUS](https://huggingface.co/datasets/HiTZ/CONAN-EUS) was created by professionally translating all 6654 English HS-CN pairs of the original CONAN dataset into
**Basque and Spanish**. For experimentation we generated train, validation and test splits in a way that no HS-CN pairs occurred across them. 

<table style="width:33%">
    <tr>
         <th>CONAN-EUS Splits</th>
         <th>Total HS-CN Count</th>
     <tr>
         <td>train</td>
         <td>4833</td>
     </tr>
     <tr>
         <td>validation</td>
         <td>537</td>
     </tr>
     <tr>
         <td>test</td>
         <td>1278</td>
     </tr>
 </table>

- 📖 Paper: [Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation](https://arxiv.org/abs/2403.09159) In LREC-COLING 2024.
- 💻 Github Repo (Data and Code): [https://github.com/ixa-ehu/conan-e/](https://github.com/ixa-ehu/conan-e/)


## HS-CN example

The CONAN dataset includes HS-CN pairs, regarding the topic of Muslims and Islam. 
An example of a HS-CN pair in Basque, Spanish and English is illustrated below: 

| HS                                                                                                                                                    | CN                                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| Musulmanok ez dugu gure kultura aberastu dezakeen ezer erabilgarririk. | Zer egin dute guretzat musulmanek? Beno, kafea, erlojuak, kamerak, higienea, unibertsitateak, tresna kirurgikoak, mapak, musika, aljebra. |
| Los musulmanes no tienen nada útil que pueda enriquecer nuestra cultura.| ¿Qué han hecho los musulmanes por nosotros? Bueno, café, relojes, cámaras, higiene, universidades, instrumentos quirúrgicos, mapas, música, álgebra.|
| Muslims do not have anything useful that can enrich our culture.| What have Muslims ever done for us? Well, Coffee, Clocks, Cameras, Hygiene, Universities, Surgical Instruments, Maps, Music, Algebra.|

If you use the model please **cite the following paper**:

## Citation

```bibtex
@inproceedings{bengoetxea-et-al-2024,
      title={{B}asque and {S}panish {C}ounter {N}arrative {G}eneration: {D}ata {C}reation and {E}valuation},
      author={Jaione Bengoetxea and Yi-Ling Chung and Marco Guerini and Rodrigo Agerri},
      year={2024},
      publisher = "Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)",
}
```


**Contact**: [Rodrigo Agerri](https://ragerri.github.io/)
HiTZ Center - Ixa, University of the Basque Country UPV/EHU