This is a Mistral-7B Reward Model trained on reciprocate/tinygsm_dpo

from transformers import pipeline

reward_fn = pipeline(
    "text-classification",
    model="reciprocate/mistral-7b-gsm8k-code-rm",
    truncation=True,
    max_length=4096,
    function_to_apply="none"
)

prompt = """\
Consider the following grade-school math problem: Megan has read 32 books this year. Kelcie has read 1/4 the amount of books that Megan has read. Greg has read 9 more than twice the number of books that Kelcie has read. How many books total have Megan, Kelcie, and Greg read?
Solve this problem using code.
- Give the complete solution to solve the problem written in Python.
- The program should contain multiple lines of code and end with 'result = XXX'.
- Use markdown to format your response starting with '```python' and ending with '```'.
"""

output = """\
Let's solve this problem using Python code.
```python
books_megan = 32
books_kelcie = books_megan / 4
books_kelcie = int(books_kelcie)
books_greg = 2 * books_kelcie + 9
total_books = books_megan + books_kelcie + books_greg
result = total_books```
"""

chats = [[
    {"role": "user", "content": prompt},
    {"role": "assistant", "content":  output}
]]

inputs = [reward_fn.tokenizer.apply_chat_template(chat, tokenize=False) for chat in chats]
output = reward_fn(inputs)
scores = [x["score"] for x in output]
print(scores)
Downloads last month
13
Safetensors
Model size
7.11B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.