metadata
license: cc-by-nc-sa-4.0
language:
- he
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- code
Hebrew Corpus
This corpus contains offensive language in Hebrew manually annotated. The data includes 15,881 tweets, labeled with one or more of five classes (abusive, hate, violence, pornographic, or non-offensive). The corpus is annonated manually by Arabic-Hebrew bilingual speakers.
https://arxiv.org/abs/2309.02724
Models
AlephBERT (https://huggingface.co/imvladikon/sentence-transformers-alephbert)
Github Repository
git clone https://github.com/SinaLab/OffensiveHebrew
You can download the data from the following GitGub link: