--- language: - es metrics: - f1 pipeline_tag: text-classification datasets: - dariolopez/suicide-comments-es license: apache-2.0 --- # Model Description This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) to detect suicidal ideation/behavior in public comments (reddit, forums, twitter, etc.) using the Spanish language. # How to use ```python >>> from transformers import pipeline >>> model_name= 'dariolopez/roberta-base-bne-finetuned-suicide-es' >>> pipe = pipeline("text-classification", model=model_name) >>> pipe("Quiero acabar con todo. No merece la pena vivir.") [{'label': 'Suicide', 'score': 0.9999703168869019}] >>> pipe("El partido de fĂștbol fue igualado, disfrutamos mucho jugando juntos.") [{'label': 'Non-Suicide', 'score': 0.999990701675415}] ``` # Training ## Training data The dataset consists of comments on Reddit, Twitter, and inputs/outputs of the Alpaca dataset translated to Spanish language and classified as suicidal ideation/behavior and non-suicidal. The dataset has 10050 rows (777 considered as Suicidal Ideation/Behavior and 9273 considered Non-Suicidal). More info: https://huggingface.co/datasets/dariolopez/suicide-comments-es ## Training procedure The training data has been tokenized using the `PlanTL-GOB-ES/roberta-base-bne` tokenizer with a vocabulary size of 50262 tokens and a model maximum length of 512 tokens. The training lasted a total of 10 minutes using a NVIDIA GPU GeForce RTX 3090. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A | | 31% 50C P8 25W / 250W | 1MiB / 24265MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # Considerations for Using the Model The model is designed for use in Spanish language, specifically to detect suicidal ideation/behavior. ## Intended uses & limitations In progress. ## Limitations and bias In progress. # Evaluation ## Metric F1 = 2 * (precision * recall) / (precision + recall) ## 5 K fold We use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) with `n_splits=5` to evaluate the model. Results: ``` >>> best_f1_model_by_fold = [0.9163879598662207, 0.9380530973451328, 0.9333333333333333, 0.8943661971830986, 0.9226190476190477] >>> best_f1_model_by_fold.mean() 0.9209519270693666 ``` # Additional Information ## Team * [dariolopez](https://huggingface.co/dariolopez) * [diegogd](https://huggingface.co/diegogd) ## Licesing This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)