ymoslem
/

xlm-roberta-large-qe-v1

@@ -37,7 +37,29 @@ datasets:
 - ymoslem/wmt-da-human-evaluation
 model-index:
 - name: Quality Estimation for Machine Translation
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -51,15 +73,7 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -106,3 +120,126 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.1+cu124
 - Datasets 3.2.0
 - Tokenizers 0.21.0

 - ymoslem/wmt-da-human-evaluation
 model-index:
 - name: Quality Estimation for Machine Translation
+  results:
+  - task:
+      type: regression
+    dataset:
+      name: ymoslem/wmt-da-human-evaluation
+      type: QE
+    metrics:
+    - name: Pearson Correlation
+      type: Pearson
+      value: 0.422
+    - name: Mean Absolute Error
+      type: MAE
+      value: 0.196
+    - name: Root Mean Squared Error
+      type: RMSE
+      value: 0.245
+    - name: R-Squared
+      type: R2
+      value: 0.245
+metrics:
+- perplexity
+- mae
+- r_squared
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+This model is for reference-free quality estimation (QE) of machine translation (MT) systems.
 ## Training procedure
 - Pytorch 2.4.1+cu124
 - Datasets 3.2.0
 - Tokenizers 0.21.0
+## Inference
+1. Install the required libraries.
+```bash
+pip3 install --upgrade datasets accelerate transformers
+pip3 install --upgrade flash_attn triton
+```
+2. Load the test dataset.
+```python
+from datasets import load_dataset
+test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
+                             split="test",
+                             trust_remote_code=True
+                            )
+print(test_dataset)
+```
+3. Load the model and tokenizer:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load the fine-tuned model and tokenizer
+model_name = "ymoslem/ModernBERT-large-qe-v1"
+model = AutoModelForSequenceClassification.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Move model to GPU if available
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+model.eval()
+```
+4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT.
+```python
+sep_token = tokenizer.sep_token
+input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
+```
+5. Generate predictions.
+If you print `model.config.problem_type`, the output is `regression`.
+Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)):
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification",
+                      model=model_name,
+                      tokenizer=tokenizer,
+                      device=0,
+                     )
+predictions = classifier(input_test_texts,
+                         batch_size=128,
+                         truncation=True,
+                         padding="max_length",
+                         max_length=tokenizer.model_max_length,
+                       )
+predictions = [prediction["score"] for prediction in predictions]
+```
+Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.
+```python
+from torch.utils.data import DataLoader
+import torch
+from tqdm.auto import tqdm
+# Tokenization function
+def process_batch(batch, tokenizer, device):
+    sep_token = tokenizer.sep_token
+    input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
+    tokens = tokenizer(input_texts,
+                       truncation=True,
+                       padding="max_length",
+                       max_length=tokenizer.model_max_length,
+                       return_tensors="pt",
+                      ).to(device)
+    return tokens
+# Create a DataLoader for batching
+test_dataloader = DataLoader(test_dataset,
+                             batch_size=128,   # Adjust batch size as needed
+                             shuffle=False)
+# List to store all predictions
+predictions = []
+with torch.no_grad():
+    for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):
+        tokens = process_batch(batch, tokenizer, device)
+        # Forward pass: Generate model's logits
+        outputs = model(**tokens)
+        # Get logits (predictions)
+        logits = outputs.logits
+        # Extract the regression predicted values
+        batch_predictions = logits.squeeze()
+        # Extend the list with the predictions
+        predictions.extend(batch_predictions.tolist())
+```