metadata
library_name: transformers
tags: []
pipeline_tag: fill-mask
widget:
- text: shop làm ăn như cái <mask>
- text: hag từ Quảng <mask> kực nét
- text: Set xinh quá, <mask> bèo nhèo
- text: đúng nhận sai <mask>
5CD-AI/viso-twhin-bert-large
Overview
We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.
Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):
Model | Avg MF1 | Emotion Recognition | Hate Speech Detection | Spam Reviews Detection | Hate Speech Spans Detection | ||||||||
Acc | WF1 | MF1 | Acc | WF1 | MF1 | Acc | WF1 | MF1 | Acc | WF1 | MF1 | ||
viBERT | 78.16 | 61.91 | 61.98 | 59.7 | 85.34 | 85.01 | 62.07 | 89.93 | 89.79 | 76.8 | 90.42 | 90.45 | 84.55 |
vELECTRA | 79.23 | 64.79 | 64.71 | 61.95 | 86.96 | 86.37 | 63.95 | 89.83 | 89.68 | 76.23 | 90.59 | 90.58 | 85.12 |
PhoBERT-Base | 79.3 | 63.49 | 63.36 | 61.41 | 87.12 | 86.81 | 65.01 | 89.83 | 89.75 | 76.18 | 91.32 | 91.38 | 85.92 |
PhoBERT-Large | 79.82 | 64.71 | 64.66 | 62.55 | 87.32 | 86.98 | 65.14 | 90.12 | 90.03 | 76.88 | 91.44 | 91.46 | 86.56 |
ViSoBERT | 81.58 | 68.1 | 68.37 | 65.88 | 88.51 | 88.31 | 68.77 | 90.99 | 90.92 | 79.06 | 91.62 | 91.57 | 86.8 |
visobert-14gb-corpus | 82.2 | 68.69 | 68.75 | 66.03 | 88.79 | 88.6 | 69.57 | 91.02 | 90.88 | 77.13 | 93.69 | 93.63 | 89.66 |
viso-twhin-bert-large | 83.87 | 73.45 | 73.14 | 70.99 | 88.86 | 88.8 | 70.81 | 91.6 | 91.47 | 79.07 | 94.08 | 93.96 | 90.22 |
Usage (HuggingFace Transformers)
Install transformers
package:
pip install transformers
Then you can use this model for fill-mask task like this:
from transformers import pipeline
model_path = "5CD-AI/viso-twhin-bert-large"
mask_filler = pipeline("fill-mask", model_path)
mask_filler("đúng nhận sai <mask>", top_k=10)
Fine-tune Configuration
We fine-tune 5CD-AI/viso-twhin-bert-large
on 4 downstream tasks with transformer
library with the following configuration:
- train_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- weight_decay: 0.01
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- training_epochs: 30
- model_max_length: 128
- metric_for_best_model: wf1
- strategy: epoch
And different additional configurations for each task:
Emotion Recognition | Hate Speech Detection | Spam Reviews Detection | Hate Speech Spans Detection |
---|---|---|---|
- learning_rate: 1e-5 | - learning_rate: 5e-6 | - learning_rate: 1e-5 | - learning_rate: 5e-6 |