Update README.md

e244648 verified 4 days ago

6.78 kB

	---
	library_name: transformers
	language:
	- multilingual
	- bn
	- cs
	- de
	- en
	- et
	- fi
	- fr
	- gu
	- ha
	- hi
	- is
	- ja
	- kk
	- km
	- lt
	- lv
	- pl
	- ps
	- ru
	- ta
	- tr
	- uk
	- xh
	- zh
	- zu
	license: mit
	base_model: FacebookAI/xlm-roberta-large
	tags:
	- quality-estimation
	- regression
	- generated_from_trainer
	datasets:
	- ymoslem/wmt-da-human-evaluation
	model-index:
	- name: Quality Estimation for Machine Translation
	results:
	- task:
	type: regression
	dataset:
	name: ymoslem/wmt-da-human-evaluation
	type: QE
	metrics:
	- name: Pearson Correlation
	type: Pearson
	value: 0.422
	- name: Mean Absolute Error
	type: MAE
	value: 0.196
	- name: Root Mean Squared Error
	type: RMSE
	value: 0.245
	- name: R-Squared
	type: R2
	value: 0.245
	metrics:
	- perplexity
	- mae
	- r_squared
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Quality Estimation for Machine Translation

	This model is a fine-tuned version of [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) on the ymoslem/wmt-da-human-evaluation dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0752

	## Model description

	This model is for reference-free quality estimation (QE) of machine translation (MT) systems.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-05
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- training_steps: 20000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 0.0743 \| 0.0502 \| 1000 \| 0.0598 \|
	\| 0.0853 \| 0.1004 \| 2000 \| 0.0745 \|
	\| 0.0829 \| 0.1506 \| 3000 \| 0.0726 \|
	\| 0.0814 \| 0.2008 \| 4000 \| 0.0872 \|
	\| 0.0805 \| 0.2509 \| 5000 \| 0.0715 \|
	\| 0.0782 \| 0.3011 \| 6000 \| 0.0819 \|
	\| 0.0789 \| 0.3513 \| 7000 \| 0.0733 \|
	\| 0.0791 \| 0.4015 \| 8000 \| 0.0748 \|
	\| 0.0787 \| 0.4517 \| 9000 \| 0.0759 \|
	\| 0.0761 \| 0.5019 \| 10000 \| 0.0725 \|
	\| 0.0746 \| 0.5521 \| 11000 \| 0.0745 \|
	\| 0.0762 \| 0.6023 \| 12000 \| 0.0750 \|
	\| 0.077 \| 0.6524 \| 13000 \| 0.0725 \|
	\| 0.0777 \| 0.7026 \| 14000 \| 0.0737 \|
	\| 0.0764 \| 0.7528 \| 15000 \| 0.0745 \|
	\| 0.0781 \| 0.8030 \| 16000 \| 0.0750 \|
	\| 0.0748 \| 0.8532 \| 17000 \| 0.0765 \|
	\| 0.0768 \| 0.9034 \| 18000 \| 0.0750 \|
	\| 0.0737 \| 0.9536 \| 19000 \| 0.0759 \|
	\| 0.0769 \| 1.0038 \| 20000 \| 0.0752 \|


	### Framework versions

	- Transformers 4.48.0
	- Pytorch 2.4.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## Inference

	1. Install the required libraries.

	```bash
	pip3 install --upgrade datasets accelerate transformers
	pip3 install --upgrade flash_attn triton
	```

	2. Load the test dataset.

	```python
	from datasets import load_dataset

	test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
	split="test",
	trust_remote_code=True
	)
	print(test_dataset)
	```

	3. Load the model and tokenizer:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load the fine-tuned model and tokenizer
	model_name = "ymoslem/ModernBERT-large-qe-v1"
	model = AutoModelForSequenceClassification.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	attn_implementation="flash_attention_2",
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Move model to GPU if available
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)
	model.eval()
	```

	4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT.

	```python
	sep_token = tokenizer.sep_token
	input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
	```

	5. Generate predictions.

	If you print `model.config.problem_type`, the output is `regression`.
	Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)):

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification",
	model=model_name,
	tokenizer=tokenizer,
	device=0,
	)

	predictions = classifier(input_test_texts,
	batch_size=128,
	truncation=True,
	padding="max_length",
	max_length=tokenizer.model_max_length,
	)
	predictions = [prediction["score"] for prediction in predictions]

	```

	Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.

	```python
	from torch.utils.data import DataLoader
	import torch
	from tqdm.auto import tqdm

	# Tokenization function
	def process_batch(batch, tokenizer, device):
	sep_token = tokenizer.sep_token
	input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
	tokens = tokenizer(input_texts,
	truncation=True,
	padding="max_length",
	max_length=tokenizer.model_max_length,
	return_tensors="pt",
	).to(device)
	return tokens



	# Create a DataLoader for batching
	test_dataloader = DataLoader(test_dataset,
	batch_size=128, # Adjust batch size as needed
	shuffle=False)


	# List to store all predictions
	predictions = []

	with torch.no_grad():
	for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):

	tokens = process_batch(batch, tokenizer, device)

	# Forward pass: Generate model's logits
	outputs = model(**tokens)

	# Get logits (predictions)
	logits = outputs.logits

	# Extract the regression predicted values
	batch_predictions = logits.squeeze()

	# Extend the list with the predictions
	predictions.extend(batch_predictions.tolist())
	```