Text Classification
Transformers
PyTorch
Spanish
roberta
Inference Endpoints
dariolopez commited on
Commit
565c1ab
·
1 Parent(s): 3f1f1fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -2
README.md CHANGED
@@ -11,7 +11,7 @@ datasets:
11
 
12
  # Model Description
13
 
14
- This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior on public comments (forums, twitter, etc.) in Spanish language.
15
 
16
  # How to use
17
 
@@ -27,4 +27,86 @@ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://h
27
 
28
  >>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
29
  [{'label': 'Non-Suicide', 'score': 0.999990701675415}]
30
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # Model Description
13
 
14
+ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language.
15
 
16
  # How to use
17
 
 
27
 
28
  >>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
29
  [{'label': 'Non-Suicide', 'score': 0.999990701675415}]
30
+ ```
31
+
32
+
33
+ # Training
34
+
35
+ ## Training data
36
+
37
+ The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal.
38
+
39
+ The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal).
40
+
41
+ More info: https://huggingface.co/datasets/dariolopez/suicide-comments-es
42
+
43
+ ## Training procedure
44
+
45
+ The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens.
46
+
47
+ The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090.
48
+
49
+ +-----------------------------------------------------------------------------+
50
+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
51
+ |-------------------------------+----------------------+----------------------+
52
+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
53
+ | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
54
+ | | | MIG M. |
55
+ |===============================+======================+======================|
56
+ | 0 GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A |
57
+ | 31% 50C P8 25W / 250W | 1MiB / 24265MiB | 0% Default |
58
+ | | | N/A |
59
+ +-------------------------------+----------------------+----------------------+
60
+
61
+ +-----------------------------------------------------------------------------+
62
+ | Processes: |
63
+ | GPU GI CI PID Type Process name GPU Memory |
64
+ | ID ID Usage |
65
+ |=============================================================================|
66
+ | No running processes found |
67
+ +-----------------------------------------------------------------------------+
68
+
69
+
70
+ # Considerations for Using the Model
71
+
72
+ The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior.
73
+
74
+ ## Intended uses & limitations
75
+
76
+ In progress.
77
+
78
+ ## Limitations and bias
79
+
80
+ In progress.
81
+
82
+
83
+ # Evaluation
84
+
85
+
86
+ ## Metric
87
+
88
+ F1 = 2 * (precision * recall) / (precision + recall)
89
+
90
+ ## 5 K fold
91
+
92
+ We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model.
93
+
94
+ Results:
95
+
96
+ ```
97
+ >>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477]
98
+ >>> best_f1_model_by_fold.mean()
99
+ 0.9209519270693666
100
+ ```
101
+
102
+
103
+ # Additional Information
104
+
105
+ ## Team
106
+
107
+ * [dariolopez](https://huggingface.co/dariolopez)
108
+ * [diegogd](https://huggingface.co/diegogd)
109
+
110
+ ## Licesing
111
+
112
+ This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)