Text Classification
Transformers
PyTorch
Spanish
roberta
Inference Endpoints
File size: 4,829 Bytes
29309ab
 
 
 
 
81b0e90
3f1f1fa
e610672
4a5746f
29309ab
 
 
 
 
565c1ab
29309ab
 
 
 
3f1f1fa
29309ab
 
e610672
3f1f1fa
29309ab
3f1f1fa
29309ab
 
3f1f1fa
29309ab
565c1ab
 
 
 
 
 
 
 
 
 
 
e610672
565c1ab
 
 
 
 
0ae0a31
565c1ab
f7f9bd0
565c1ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7f9bd0
565c1ab
6c072e7
565c1ab
 
 
 
ea5a583
 
 
 
 
 
5b28481
 
 
ea5a583
565c1ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1906f29
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
language:
- es
metrics:
- f1
pipeline_tag: text-classification
datasets:
- hackathon-somos-nlp-2023/suicide-comments-es
license: apache-2.0
---


# Model Description

This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language.

# How to use

```python
>>> from transformers import pipeline


>>> model_name= 'hackathon-somos-nlp-2023/roberta-base-bne-finetuned-suicide-es'
>>> pipe = pipeline("text-classification", model=model_name)

>>> pipe("Quiero acabar con todo. No merece la pena vivir.")
[{'label': 'Suicide', 'score': 0.9999703168869019}]

>>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
[{'label': 'Non-Suicide', 'score': 0.999990701675415}]
```


# Training

## Training data

The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal.

The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal).

More info: https://huggingface.co/datasets/hackathon-somos-nlp-2023/suicide-comments-es

## Training procedure

The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens.

The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090 provided by Q Blocks.

```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:68:00.0 Off |                  N/A |
| 31%   50C    P8    25W / 250W |      1MiB / 24265MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```


# Considerations for Using the Model

The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior.

## Limitations

It is a research toy project. Don't expect a professional, bug-free model. We have found some false positives and false negatives. If you find a bug, please send us your feedback.

## Bias

No measures have been taken to estimate the bias and toxicity embedded in the model or dataset. However, the model was fine-tuned using a dataset mainly collected on Reddit, Twitter, and ChatGPT. So there is probably an age bias because [the Internet is used more by younger people](https://www.statista.com/statistics/272365/age-distribution-of-internet-users-worldwide).

In addition, this model inherits biases from its original base model. You can review these biases by visiting the following [link](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne#limitations-and-bias).


# Evaluation


## Metric

F1 = 2 * (precision * recall) / (precision + recall)

## 5 K fold

We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model.

Results:

```
>>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477]
>>> best_f1_model_by_fold.mean()
0.9209519270693666
```


# Additional Information

## Team

* [dariolopez](https://huggingface.co/dariolopez)
* [diegogd](https://huggingface.co/diegogd)

## Licesing

This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)

## Demo (Space)

https://huggingface.co/spaces/hackathon-somos-nlp-2023/suicide-comments-es