Update README.md

ab14d80 verified 14 days ago

6.37 kB

	---
	library_name: transformers
	language:
	- multilingual
	- bn
	- cs
	- de
	- en
	- et
	- fi
	- fr
	- gu
	- ha
	- hi
	- is
	- ja
	- kk
	- km
	- lt
	- lv
	- pl
	- ps
	- ru
	- ta
	- tr
	- uk
	- xh
	- zh
	- zu
	license: apache-2.0
	base_model: answerdotai/ModernBERT-large
	tags:
	- quality-estimation
	- regression
	- generated_from_trainer
	datasets:
	- ymoslem/wmt-da-human-evaluation
	model-index:
	- name: Quality Estimation for Machine Translation
	results:
	- task:
	type: regression
	dataset:
	name: ymoslem/wmt-da-human-evaluation
	type: QE
	metrics:
	- name: Pearson Correlation
	type: Pearson
	value: 0.4458
	- name: Mean Absolute Error
	type: MAE
	value: 0.1876
	- name: Root Mean Squared Error
	type: RMSE
	value: 0.2393
	- name: R-Squared
	type: R2
	value: 0.1987
	metrics:
	- pearsonr
	- mae
	- r_squared
	---


	# Quality Estimation for Machine Translation

	This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)
	on the [ymoslem/wmt-da-human-evaluation](https://huggingface.co/ymoslem/wmt-da-human-evaluation) dataset.

	It achieves the following results on the evaluation set:
	- Loss: 0.0564

	## Model description

	This model is for reference-free quality estimation (QE) of machine translation (MT) systems.

	## Training procedure

	### Training hyperparameters

	This model uses the full maximum length of the tokenizer, which is 8192.
	The version with 512 maximum length can be found here [ymoslem/ModernBERT-large-qe-maxlen512-v1](https://huggingface.co/ymoslem/ModernBERT-large-qe-maxlen512-v1)

	The following hyperparameters were used during training:
	- learning_rate: 8e-05
	- train_batch_size: 128
	- eval_batch_size: 128
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- training_steps: 10000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 0.0631 \| 0.1004 \| 1000 \| 0.0674 \|
	\| 0.0614 \| 0.2007 \| 2000 \| 0.0599 \|
	\| 0.0578 \| 0.3011 \| 3000 \| 0.0585 \|
	\| 0.0585 \| 0.4015 \| 4000 \| 0.0579 \|
	\| 0.0568 \| 0.5019 \| 5000 \| 0.0570 \|
	\| 0.057 \| 0.6022 \| 6000 \| 0.0568 \|
	\| 0.0579 \| 0.7026 \| 7000 \| 0.0567 \|
	\| 0.0573 \| 0.8030 \| 8000 \| 0.0565 \|
	\| 0.0568 \| 0.9033 \| 9000 \| 0.0564 \|
	\| 0.0571 \| 1.0037 \| 10000 \| 0.0564 \|


	### Framework versions

	- Transformers 4.48.0
	- Pytorch 2.4.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## Inference

	1. Install the required libraries.

	```bash
	pip3 install --upgrade datasets accelerate transformers
	pip3 install --upgrade flash_attn triton
	```

	2. Load the test dataset.

	```python
	from datasets import load_dataset

	test_dataset = load_dataset("ymoslem/wmt-da-human-evaluation",
	split="test",
	trust_remote_code=True
	)
	print(test_dataset)
	```

	3. Load the model and tokenizer:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load the fine-tuned model and tokenizer
	model_name = "ymoslem/ModernBERT-large-qe-v1"
	model = AutoModelForSequenceClassification.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	attn_implementation="flash_attention_2",
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Move model to GPU if available
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)
	model.eval()
	```

	4. Prepare the dataset. Each source segment `src` and target segment `tgt` are separated by the `sep_token`, which is `'</s>'` for ModernBERT.

	```python
	sep_token = tokenizer.sep_token
	input_test_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(test_dataset["src"], test_dataset["mt"])]
	```

	5. Generate predictions.

	If you print `model.config.problem_type`, the output is `regression`.
	Still, you can use the "text-classification" pipeline as follows (cf. [pipeline documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextClassificationPipeline)):

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification",
	model=model_name,
	tokenizer=tokenizer,
	device=0,
	)

	predictions = classifier(input_test_texts,
	batch_size=128,
	truncation=True,
	padding="max_length",
	max_length=tokenizer.model_max_length,
	)
	predictions = [prediction["score"] for prediction in predictions]

	```

	Alternatively, you can use an elaborate version of the code, which is slightly faster and provides more control.

	```python
	from torch.utils.data import DataLoader
	import torch
	from tqdm.auto import tqdm

	# Tokenization function
	def process_batch(batch, tokenizer, device):
	sep_token = tokenizer.sep_token
	input_texts = [f"{src} {sep_token} {tgt}" for src, tgt in zip(batch["src"], batch["mt"])]
	tokens = tokenizer(input_texts,
	truncation=True,
	padding="max_length",
	max_length=tokenizer.model_max_length,
	return_tensors="pt",
	).to(device)
	return tokens



	# Create a DataLoader for batching
	test_dataloader = DataLoader(test_dataset,
	batch_size=128, # Adjust batch size as needed
	shuffle=False)


	# List to store all predictions
	predictions = []

	with torch.no_grad():
	for batch in tqdm(test_dataloader, desc="Inference Progress", unit="batch"):

	tokens = process_batch(batch, tokenizer, device)

	# Forward pass: Generate model's logits
	outputs = model(**tokens)

	# Get logits (predictions)
	logits = outputs.logits

	# Extract the regression predicted values
	batch_predictions = logits.squeeze()

	# Extend the list with the predictions
	predictions.extend(batch_predictions.tolist())
	```