|
--- |
|
license: apache-2.0 |
|
language: |
|
- ar |
|
- dza |
|
pipeline_tag: text-classification |
|
tags: |
|
- hate-detection |
|
- classification |
|
library_name: PyTorch |
|
--- |
|
|
|
# Dzarashield |
|
|
|
Dzarashield is a fine-tuned model based on [DzaraBert](https://huggingface.co/Sifal/dzarabert) . It specializes in hate speech detection for Algerian Arabic text (Darija). |
|
It has been trained on a dataset consisting of 13.5k documents, constructed from manually labeled documents and various sources, achieving an F1 score of 0.87 on a holdout test of 2.5k samples. |
|
|
|
## Limitations |
|
|
|
It's important to note that this model has been fine-tuned solely on Arabic characters, which means that tokens from other languages have been pruned. |
|
|
|
# How to use |
|
## Setup: |
|
```python |
|
!git lfs install |
|
!git clone https://huggingface.co/Sifal/dzarashield |
|
%cd dzarashield |
|
|
|
from model import BertClassifier |
|
from transformers import PreTrainedTokenizerFast |
|
import torch |
|
|
|
# Check if a GPU is available |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
|
# Specify paths |
|
MODEL_PATH = "./model.pth" |
|
TOKENIZER_PATH = "./tokenizer.json" |
|
|
|
# Load the model with the appropriate map_location |
|
dzarashield = BertClassifier() |
|
dzarashield.load_state_dict(torch.load(MODEL_PATH, map_location=device)) |
|
|
|
# Load the tokenizer |
|
tokenizer = PreTrainedTokenizerFast(tokenizer_file=TOKENIZER_PATH) |
|
|
|
``` |
|
## Example: |
|
|
|
```python |
|
idx_to_label = {0: 'non-hate', 1: 'hate'} |
|
sentences = ['يا وحد الشموتي، تكول دجاج آآآه', 'واش خويا راك غايا؟'] |
|
|
|
def predict_label(sentence): |
|
tokenized = tokenizer(sentence, return_tensors='pt') |
|
with torch.no_grad(): |
|
outputs = dzarashield(**tokenized) |
|
return idx_to_label[outputs.logits.argmax().item()] |
|
|
|
for sentence in sentences: |
|
label = predict_label(sentence) |
|
print(f'sentence: {sentence} label: {label}') |
|
``` |
|
## Acknowledgments |
|
|
|
Dzarashield is built upon the foundations of [Dziribert](https://huggingface.co/alger-ia/dziribert), and I am grateful for their work in making this project possible. |
|
|
|
## References |
|
|
|
- [Dziribert](https://arxiv.org/pdf/2109.12346.pdf) |