File size: 1,849 Bytes

7dac834
 
 
 
 
 
 
 
 
 
 
5ebd31b
 
 
 
bc769af
 
fdb9a0a
59729a1
2e499ac
59729a1
23d2bb9
 
 
59729a1
bc769af
f56be9a
bc769af
d6ab674
59729a1
1bee25d
b03f230
93dd8c7
d625ec5
93dd8c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf10ea8
59729a1
bf10ea8
1eaa66d
5ebd31b

---
license: apache-2.0
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
library_name: transformers
tags:
- transformers
- ORPO
- RLHF
- notus
- argilla
---

#  Model Overview

# 𝐌𝐨𝐝𝐞𝐥 𝐍𝐚𝐦𝐞:ElEmperador

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png)


## Model Description:

ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.


## Evals:
BLEU:0.209

## Inference Script:

```python
def generate_response(model_name, input_text, max_new_tokens=50):
    # Load the tokenizer and model from Hugging Face Hub
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    
    # Tokenize the input text
    input_ids = tokenizer(input_text, return_tensors='pt').input_ids
    
    # Generate a response using the model
    with torch.no_grad():
        generated_ids = model.generate(input_ids, max_new_tokens=max_new_tokens)
    
    # Decode the generated tokens into text
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    
    return generated_text

if __name__ == "__main__":
    # Set the model name from Hugging Face Hub
    model_name = "AINovice2005/ElEmperador" 
    input_text = "Hello, how are you?"

    # Generate and print the model's response
    output = generate_response(model_name, input_text)
    
    print(f"Input: {input_text}")
    print(f"Output: {output}")
```

## Results

Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the model’s outputs more closely with human preferences,
leading to more user-friendly and acceptable results.