dariolopez
commited on
Commit
·
565c1ab
1
Parent(s):
3f1f1fa
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ datasets:
|
|
11 |
|
12 |
# Model Description
|
13 |
|
14 |
-
This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior
|
15 |
|
16 |
# How to use
|
17 |
|
@@ -27,4 +27,86 @@ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://h
|
|
27 |
|
28 |
>>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
|
29 |
[{'label': 'Non-Suicide', 'score': 0.999990701675415}]
|
30 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
# Model Description
|
13 |
|
14 |
+
This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language.
|
15 |
|
16 |
# How to use
|
17 |
|
|
|
27 |
|
28 |
>>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
|
29 |
[{'label': 'Non-Suicide', 'score': 0.999990701675415}]
|
30 |
+
```
|
31 |
+
|
32 |
+
|
33 |
+
# Training
|
34 |
+
|
35 |
+
## Training data
|
36 |
+
|
37 |
+
The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal.
|
38 |
+
|
39 |
+
The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal).
|
40 |
+
|
41 |
+
More info: https://huggingface.co/datasets/dariolopez/suicide-comments-es
|
42 |
+
|
43 |
+
## Training procedure
|
44 |
+
|
45 |
+
The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens.
|
46 |
+
|
47 |
+
The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090.
|
48 |
+
|
49 |
+
+-----------------------------------------------------------------------------+
|
50 |
+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|
51 |
+
|-------------------------------+----------------------+----------------------+
|
52 |
+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
53 |
+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
54 |
+
| | | MIG M. |
|
55 |
+
|===============================+======================+======================|
|
56 |
+
| 0 GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A |
|
57 |
+
| 31% 50C P8 25W / 250W | 1MiB / 24265MiB | 0% Default |
|
58 |
+
| | | N/A |
|
59 |
+
+-------------------------------+----------------------+----------------------+
|
60 |
+
|
61 |
+
+-----------------------------------------------------------------------------+
|
62 |
+
| Processes: |
|
63 |
+
| GPU GI CI PID Type Process name GPU Memory |
|
64 |
+
| ID ID Usage |
|
65 |
+
|=============================================================================|
|
66 |
+
| No running processes found |
|
67 |
+
+-----------------------------------------------------------------------------+
|
68 |
+
|
69 |
+
|
70 |
+
# Considerations for Using the Model
|
71 |
+
|
72 |
+
The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior.
|
73 |
+
|
74 |
+
## Intended uses & limitations
|
75 |
+
|
76 |
+
In progress.
|
77 |
+
|
78 |
+
## Limitations and bias
|
79 |
+
|
80 |
+
In progress.
|
81 |
+
|
82 |
+
|
83 |
+
# Evaluation
|
84 |
+
|
85 |
+
|
86 |
+
## Metric
|
87 |
+
|
88 |
+
F1 = 2 * (precision * recall) / (precision + recall)
|
89 |
+
|
90 |
+
## 5 K fold
|
91 |
+
|
92 |
+
We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model.
|
93 |
+
|
94 |
+
Results:
|
95 |
+
|
96 |
+
```
|
97 |
+
>>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477]
|
98 |
+
>>> best_f1_model_by_fold.mean()
|
99 |
+
0.9209519270693666
|
100 |
+
```
|
101 |
+
|
102 |
+
|
103 |
+
# Additional Information
|
104 |
+
|
105 |
+
## Team
|
106 |
+
|
107 |
+
* [dariolopez](https://huggingface.co/dariolopez)
|
108 |
+
* [diegogd](https://huggingface.co/diegogd)
|
109 |
+
|
110 |
+
## Licesing
|
111 |
+
|
112 |
+
This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|