somosnlp-hackathon-2023
/

roberta-base-bne-finetuned-suicide-es

Text Classification

Transformers

PyTorch

Spanish

roberta

Inference Endpoints

Model card Files Files and versions Community

dariolopez commited on Apr 8, 2023

Commit

565c1ab

1 Parent(s): 3f1f1fa

Update README.md

Browse files

Files changed (1) hide show

README.md +84 -2

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ datasets:
 # Model Description
-This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior on public comments (forums, twitter, etc.) in Spanish language.
 # How to use
@@ -27,4 +27,86 @@ This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://h
 >>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
 [{'label': 'Non-Suicide', 'score': 0.999990701675415}]
-```

 # Model Description
+This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language.
 # How to use
 >>> pipe("El partido de fútbol fue igualado, disfrutamos mucho jugando juntos.")
 [{'label': 'Non-Suicide', 'score': 0.999990701675415}]
+```
+# Training
+## Training data
+The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal.
+The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal).
+More info: https://huggingface.co/datasets/dariolopez/suicide-comments-es
+## Training procedure
+The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens.
+The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090.
++-----------------------------------------------------------------------------+
+| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
+|-------------------------------+----------------------+----------------------+
+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+|                               |                      |               MIG M. |
+|===============================+======================+======================|
+|   0  GeForce RTX 3090    Off  | 00000000:68:00.0 Off |                  N/A |
+| 31%   50C    P8    25W / 250W |      1MiB / 24265MiB |      0%      Default |
+|                               |                      |                  N/A |
++-------------------------------+----------------------+----------------------+
++-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
++-----------------------------------------------------------------------------+
+# Considerations for Using the Model
+The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior.
+## Intended uses & limitations
+In progress.
+## Limitations and bias
+In progress.
+# Evaluation
+## Metric
+F1 = 2 * (precision * recall) / (precision + recall)
+## 5 K fold
+We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model.
+Results:
+```
+>>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477]
+>>> best_f1_model_by_fold.mean()
+0.9209519270693666
+```
+# Additional Information
+## Team
+* [dariolopez](https://huggingface.co/dariolopez)
+* [diegogd](https://huggingface.co/diegogd)
+## Licesing
+This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)