File size: 3,961 Bytes
29309ab 81b0e90 3f1f1fa 4a5746f 29309ab 565c1ab 29309ab 3f1f1fa 29309ab 3f1f1fa 29309ab 3f1f1fa 29309ab 3f1f1fa 29309ab 565c1ab |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
---
language:
- es
metrics:
- f1
pipeline_tag: text-classification
datasets:
- dariolopez/suicide-comments-es
license: apache-2.0
---
# Model Description
This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language.
# How to use
```python
>>> from transformers import pipeline
>>> model_name= 'dariolopez/roberta-base-bne-finetuned-suicide-es'
>>> pipe = pipeline("text-classification", model=model_name)
>>> pipe("Quiero acabar con todo. No merece la pena vivir.")
[{'label': 'Suicide', 'score': 0.9999703168869019}]
>>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
[{'label': 'Non-Suicide', 'score': 0.999990701675415}]
```
# Training
## Training data
The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal.
The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal).
More info: https://huggingface.co/datasets/dariolopez/suicide-comments-es
## Training procedure
The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens.
The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A |
| 31% 50C P8 25W / 250W | 1MiB / 24265MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# Considerations for Using the Model
The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior.
## Intended uses & limitations
In progress.
## Limitations and bias
In progress.
# Evaluation
## Metric
F1 = 2 * (precision * recall) / (precision + recall)
## 5 K fold
We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model.
Results:
```
>>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477]
>>> best_f1_model_by_fold.mean()
0.9209519270693666
```
# Additional Information
## Team
* [dariolopez](https://huggingface.co/dariolopez)
* [diegogd](https://huggingface.co/diegogd)
## Licesing
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) |