ymoslem commited on
Commit
3600895
·
verified ·
1 Parent(s): 39817ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -24
README.md CHANGED
@@ -37,34 +37,51 @@ datasets:
37
  - ymoslem/wmt-da-human-evaluation
38
  model-index:
39
  - name: Quality Estimation for Machine Translation
40
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ---
42
 
43
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
44
- should probably proofread and complete it, then remove this comment. -->
45
 
46
  # Quality Estimation for Machine Translation
47
 
48
- This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on the ymoslem/wmt-da-human-evaluation dataset.
 
 
49
  It achieves the following results on the evaluation set:
50
- - Loss: 0.0572
51
 
52
  ## Model description
53
 
54
- More information needed
55
-
56
- ## Intended uses & limitations
57
-
58
- More information needed
59
-
60
- ## Training and evaluation data
61
-
62
- More information needed
63
 
64
  ## Training procedure
65
 
66
  ### Training hyperparameters
67
 
 
 
 
68
  The following hyperparameters were used during training:
69
  - learning_rate: 8e-05
70
  - train_batch_size: 128
@@ -78,16 +95,16 @@ The following hyperparameters were used during training:
78
 
79
  | Training Loss | Epoch | Step | Validation Loss |
80
  |:-------------:|:------:|:-----:|:---------------:|
81
- | 0.0651 | 0.1004 | 1000 | 0.0703 |
82
- | 0.0623 | 0.2007 | 2000 | 0.0614 |
83
- | 0.0584 | 0.3011 | 3000 | 0.0597 |
84
- | 0.0593 | 0.4015 | 4000 | 0.0586 |
85
- | 0.0577 | 0.5019 | 5000 | 0.0580 |
86
- | 0.058 | 0.6022 | 6000 | 0.0577 |
87
- | 0.0587 | 0.7026 | 7000 | 0.0574 |
88
- | 0.0578 | 0.8030 | 8000 | 0.0573 |
89
- | 0.0576 | 0.9033 | 9000 | 0.0572 |
90
- | 0.0577 | 1.0037 | 10000 | 0.0572 |
91
 
92
 
93
  ### Framework versions
@@ -96,3 +113,125 @@ The following hyperparameters were used during training:
96
  - Pytorch 2.4.1+cu124
97
  - Datasets 3.2.0
98
  - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  - ymoslem/wmt-da-human-evaluation
38
  model-index:
39
  - name: Quality Estimation for Machine Translation
40
+ results:
41
+ - task:
42
+ type: regression
43
+ dataset:
44
+ name: ymoslem/wmt-da-human-evaluation
45
+ type: QE
46
+ metrics:
47
+ - name: Pearson Correlation
48
+ type: Pearson
49
+ value: 0.4458
50
+ - name: Mean Absolute Error
51
+ type: MAE
52
+ value: 0.1876
53
+ - name: Root Mean Squared Error
54
+ type: RMSE
55
+ value: 0.2393
56
+ - name: R-Squared
57
+ type: R2
58
+ value: 0.1987
59
+ metrics:
60
+ - pearsonr
61
+ - mae
62
+ - r_squared
63
  ---
64
 
 
 
65
 
66
  # Quality Estimation for Machine Translation
67
 
68
+ This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)
69
+ on the [ymoslem/wmt-da-human-evaluation](https://huggingface.co/ymoslem/wmt-da-human-evaluation) dataset.
70
+
71
  It achieves the following results on the evaluation set:
72
+ - Loss: 0.0564
73
 
74
  ## Model description
75
 
76
+ This model is for reference-free quality estimation (QE) of machine translation (MT) systems.
 
 
 
 
 
 
 
 
77
 
78
  ## Training procedure
79
 
80
  ### Training hyperparameters
81
 
82
+ This version of the model uses the full maximum length of the tokenizer, which is 8192.
83
+ The model with 512 maximum length can be found here [ymoslem/ModernBERT-large-qe-maxlen512-v1](https://huggingface.co/ymoslem/ModernBERT-large-qe-maxlen512-v1)
84
+
85
  The following hyperparameters were used during training:
86
  - learning_rate: 8e-05
87
  - train_batch_size: 128
 
95
 
96
  | Training Loss | Epoch | Step | Validation Loss |
97
  |:-------------:|:------:|:-----:|:---------------:|
98
+ | 0.0631 | 0.1004 | 1000 | 0.0674 |
99
+ | 0.0614 | 0.2007 | 2000 | 0.0599 |
100
+ | 0.0578 | 0.3011 | 3000 | 0.0585 |
101
+ | 0.0585 | 0.4015 | 4000 | 0.0579 |
102
+ | 0.0568 | 0.5019 | 5000 | 0.0570 |
103
+ | 0.057 | 0.6022 | 6000 | 0.0568 |
104
+ | 0.0579 | 0.7026 | 7000 | 0.0567 |
105
+ | 0.0573 | 0.8030 | 8000 | 0.0565 |
106
+ | 0.0568 | 0.9033 | 9000 | 0.0564 |
107
+ | 0.0571 | 1.0037 | 10000 | 0.0564 |
108
 
109
 
110
  ### Framework versions
 
113
  - Pytorch 2.4.1+cu124
114
  - Datasets 3.2.0
115
  - Tokenizers 0.21.0
116
+
117
+ ## Inference
118
+
119
+ 1. Install the required libraries.
120
+
121
+ ```bash
122
+ pip3 install --upgrade datasets accelerate transformers
123
+ pip3 install --upgrade flash_attn triton
124
+ ```
125
+
126
+ 2. Load the test dataset.
127
+
128
+ ```python
129
+ from datasets import load_dataset
130
+
131
+ test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
132
+ split="test",
133
+ trust_remote_code=True
134
+ )
135
+ print(test_dataset)
136
+ ```
137
+
138
+ 3. Load the model and tokenizer:
139
+
140
+ ```python
141
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
142
+ import torch
143
+
144
+ # Load the fine-tuned model and tokenizer
145
+ model_name = "ymoslem/ModernBERT-large-qe-v1"
146
+ model = AutoModelForSequenceClassification.from_pretrained(
147
+ model_name,
148
+ device_map="auto",
149
+ torch_dtype=torch.bfloat16,
150
+ attn_implementation="flash_attention_2",
151
+ )
152
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
153
+
154
+ # Move model to GPU if available
155
+ device = "cuda" if torch.cuda.is_available() else "cpu"
156
+ model.to(device)
157
+ model.eval()
158
+ ```
159
+
160
+ 4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT.
161
+
162
+ ```python
163
+ sep_token = tokenizer.sep_token
164
+ input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
165
+ ```
166
+
167
+ 5. Generate predictions.
168
+
169
+ If you print `model.config.problem_type`, the output is `regression`.
170
+ Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)):
171
+
172
+ ```python
173
+ from transformers import pipeline
174
+
175
+ classifier = pipeline("text-classification",
176
+ model=model_name,
177
+ tokenizer=tokenizer,
178
+ device=0,
179
+ )
180
+
181
+ predictions = classifier(input_test_texts,
182
+ batch_size=128,
183
+ truncation=True,
184
+ padding="max_length",
185
+ max_length=tokenizer.model_max_length,
186
+ )
187
+ predictions = [prediction["score"] for prediction in predictions]
188
+
189
+ ```
190
+
191
+ Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.
192
+
193
+ ```python
194
+ from torch.utils.data import DataLoader
195
+ import torch
196
+ from tqdm.auto import tqdm
197
+
198
+ # Tokenization function
199
+ def process_batch(batch, tokenizer, device):
200
+ sep_token = tokenizer.sep_token
201
+ input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
202
+ tokens = tokenizer(input_texts,
203
+ truncation=True,
204
+ padding="max_length",
205
+ max_length=tokenizer.model_max_length,
206
+ return_tensors="pt",
207
+ ).to(device)
208
+ return tokens
209
+
210
+
211
+
212
+ # Create a DataLoader for batching
213
+ test_dataloader = DataLoader(test_dataset,
214
+ batch_size=128, # Adjust batch size as needed
215
+ shuffle=False)
216
+
217
+
218
+ # List to store all predictions
219
+ predictions = []
220
+
221
+ with torch.no_grad():
222
+ for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):
223
+
224
+ tokens = process_batch(batch, tokenizer, device)
225
+
226
+ # Forward pass: Generate model's logits
227
+ outputs = model(**tokens)
228
+
229
+ # Get logits (predictions)
230
+ logits = outputs.logits
231
+
232
+ # Extract the regression predicted values
233
+ batch_predictions = logits.squeeze()
234
+
235
+ # Extend the list with the predictions
236
+ predictions.extend(batch_predictions.tolist())
237
+ ```