BenhamdaneNawfal commited on
Commit
d76305d
·
verified ·
1 Parent(s): 88b64e2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +122 -8
README.md CHANGED
@@ -1,8 +1,122 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - ary
5
- metrics:
6
- - accuracy
7
- pipeline_tag: text-classification
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sentiment Analysis for Darija (Arabic Dialect)
2
+
3
+ This repository hosts a **Sentiment Analysis model for Darija** (Moroccan Arabic dialect), built using **BERT**. The model is fine-tuned to classify text into two categories: **positive** and **negative** sentiment. It is designed to facilitate sentiment analysis in applications involving Darija text data, such as social media analysis, customer feedback, or market research.
4
+
5
+ ---
6
+
7
+ ## Model Details
8
+
9
+ - **Base Model**: [SI2M-Lab/DarijaBERT](https://huggingface.co/SI2M-Lab/DarijaBERT)
10
+ - **Task**: Sentiment Classification (Binary)
11
+ - **Architecture**: BERT with a custom classification head and dropout regularization (0.3 dropout rate).
12
+ - **Fine-Tuning Data**: Dataset of labeled Darija text samples (positive and negative).
13
+ - **Max Sequence Length**: 128 tokens
14
+
15
+ ---
16
+
17
+ ## How to Use
18
+
19
+ ### Load the Model and Tokenizer
20
+
21
+ To use this model for sentiment analysis, you can load it using the Transformers library:
22
+
23
+ ```python
24
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
25
+
26
+ # Load the tokenizer and model
27
+ tokenizer = AutoTokenizer.from_pretrained("BenhamdaneNawfal/sentiment-analysis-darija")
28
+ model = AutoModelForSequenceClassification.from_pretrained("BenhamdaneNawfal/sentiment-analysis-darija")
29
+
30
+ # Example text
31
+ test_text = "هذا المنتج رائع جدا"
32
+
33
+ # Tokenize the text
34
+ inputs = tokenizer(test_text, return_tensors="pt", truncation=True, padding=True, max_length=128)
35
+
36
+ # Get model predictions
37
+ outputs = model(**inputs)
38
+ logits = outputs.logits
39
+ predicted_class = logits.argmax().item()
40
+
41
+ print(f"Predicted class: {predicted_class}")
42
+ ```
43
+
44
+ ### Output Classes
45
+ - **0**: Negative
46
+ - **1**: Positive
47
+
48
+ ---
49
+
50
+ ## Fine-Tuning Process
51
+
52
+ The model was fine-tuned using the following:
53
+
54
+ - **Dataset**: A dataset of Darija text labeled for sentiment.
55
+ - **Loss Function**: Cross-entropy loss for binary classification.
56
+ - **Optimizer**: AdamW with weight decay (0.01).
57
+ - **Learning Rate**: 5e-5 with linear warmup.
58
+ - **Batch Size**: 16 for training, 64 for evaluation.
59
+ - **Early Stopping**: Training stops if validation loss does not improve after 1 epoch.
60
+
61
+ ---
62
+
63
+ ## Evaluation Metrics
64
+
65
+ The model's performance was evaluated using the following metrics:
66
+
67
+ - **Accuracy**: 85%
68
+ - **Precision**: 87%
69
+ - **Recall**: 83%
70
+ - **F1-Score**: 85%
71
+
72
+ ---
73
+
74
+ ## Publishing on Hugging Face
75
+
76
+ The model and tokenizer were saved and uploaded to Hugging Face using the `huggingface_hub` library. To reproduce or fine-tune this model, follow these steps:
77
+
78
+ 1. Save the model and tokenizer:
79
+ ```python
80
+ model.save_pretrained("darija-bert-model")
81
+ tokenizer.save_pretrained("darija-bert-model")
82
+ ```
83
+
84
+ 2. Upload the model to Hugging Face:
85
+ ```python
86
+ from huggingface_hub import upload_folder
87
+
88
+ upload_folder(
89
+ folder_path="darija-bert-model",
90
+ repo_id="BenhamdaneNawfal/sentiment-analysis-darija",
91
+ repo_type="model"
92
+ )
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Future Work
98
+
99
+ - Expand the dataset to include more labeled examples from diverse sources.
100
+ - Fine-tune the model for multi-class sentiment analysis (e.g., neutral, positive, negative).
101
+ - Explore the use of data augmentation techniques for better generalization.
102
+
103
+ ---
104
+
105
+ ## Citation
106
+ If you use this model, please cite it as:
107
+
108
+ ```
109
+ @misc{benhamdanenawfal2025darijabert,
110
+ author = {Benhamdane Nawfal},
111
+ title = {Sentiment Analysis for Darija (Arabic Dialect)},
112
+ year = {2025},
113
+ publisher = {Hugging Face},
114
+ url = {https://huggingface.co/BenhamdaneNawfal/sentiment-analysis-darija}
115
+ }
116
+ ```
117
+
118
+ ---
119
+
120
+ ## Contact
121
+ For any questions or issues, feel free to contact me at: [[email protected]].
122
+