ViHateT5: Enhancing Hate Speech Detection in Vietnamese with A Unified Text-to-Text Transformer Model | ACL'2024 (Findings)

Disclaimer: This paper contains examples from actual content on social media platforms that could be considered toxic and offensive.

ViHateT5-HSD is the fine-tuned model of ViHateT5 on multiple Vietnamese hate speech detection benchmark datasets.

The architecture and experimental results of ViHateT5 can be found in the paper:

@inproceedings{thanh-nguyen-2024-vihatet5,
    title = "{V}i{H}ate{T}5: Enhancing Hate Speech Detection in {V}ietnamese With a Unified Text-to-Text Transformer Model",
    author = "Thanh Nguyen, Luan",
    editor = "Ku, Lun-Wei  and Martins, Andre  and Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.355",
    pages = "5948--5961"
    }

The pre-training dataset named VOZ-HSD is available at HERE.

Kindly CITE our paper if you use ViHateT5-HSD to generate published results or integrate it into other software.

Example usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tarudesu/ViHateT5-base-HSD")
model = AutoModelForSeq2SeqLM.from_pretrained("tarudesu/ViHateT5-base-HSD")

def generate_output(input_text, prefix):
    # Add prefix
    prefixed_input_text = prefix + ': ' + input_text

    # Tokenize input text
    input_ids = tokenizer.encode(prefixed_input_text, return_tensors="pt")

    # Generate output
    output_ids = model.generate(input_ids, max_length=256)

    # Decode the generated output
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    return output_text

sample = 'Tôi ghét bạn vl luôn!'
prefix = 'hate-spans-detection' # Choose 1 from 3 prefixes ['hate-speech-detection', 'toxic-speech-detection', 'hate-spans-detection']

result = generate_output(sample, prefix)
print('Result: ', result)

Please feel free to contact us by email [email protected] if you have any further information!

Downloads last month
1,154
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tarudesu/ViHateT5-base-HSD

Finetuned
(1)
this model

Dataset used to train tarudesu/ViHateT5-base-HSD

Collection including tarudesu/ViHateT5-base-HSD