metadata
license: apache-2.0
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
library_name: transformers
tags:
- transformers
ElEmperador
Introduction:
ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.
The argilla/ultrafeedback-binarized-preferences-cleaned dataset was used for training, albeit a small portion was used due to GPU constraints.
Evals:
BLEU:0.209
Inference Script:
def generate_response(model_name, input_text, max_new_tokens=50):
# Load the tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Tokenize the input text
input_ids = tokenizer(input_text, return_tensors='pt').input_ids
# Generate a response using the model
with torch.no_grad():
generated_ids = model.generate(input_ids, max_new_tokens=max_new_tokens)
# Decode the generated tokens into text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return generated_text
if __name__ == "__main__":
# Set the model name from Hugging Face Hub
model_name = "AINovice2005/ElEmperador"
input_text = "Hello, how are you?"
# Generate and print the model's response
output = generate_response(model_name, input_text)
print(f"Input: {input_text}")
print(f"Output: {output}")
Results
ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences, leading to more user-friendly and acceptable results.