File size: 8,885 Bytes

---
language:
- en
license: apache-2.0
tags:
- nvidia
- code
- math
base_model:
- mistralai/Mistral-7B-v0.1
datasets:
- nvidia/OpenMathInstruct-1
model-index:
- name: OpenMath-Mistral-7B-v0.1-hf
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 59.39
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 81.78
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 59.34
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 46.13
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 77.27
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 0.08
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=nvidia/OpenMath-Mistral-7B-v0.1-hf
      name: Open LLM Leaderboard
---


# OpenMath-Mistral-7B-v0.1-hf

OpenMath models were designed to solve mathematical problems by integrating text-based reasoning with code blocks
executed by Python interpreter. The models were trained on [OpenMathInstruct-1](https://huggingface.co/datasets/nvidia/OpenMathInstruct-1),
a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed
[Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) model.

<table border="1">
  <tr>
    <td></td>
    <td colspan="2" style="text-align: center;">greedy</td>
    <td colspan="2" style="text-align: center;">majority@50</td>
  </tr>
  <tr>
    <td style="text-align: center;">model</td>
    <td style="text-align: center;">GSM8K</td>
    <td style="text-align: center;">MATH</td>
    <td style="text-align: center;">GMS8K</td>
    <td style="text-align: center;">MATH</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-CodeLlama-7B (<a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-7b-Python">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-7b-Python-hf">HF</a>)</td>
    <td style="text-align: center;">75.9</td>
    <td style="text-align: center;">43.6</td>
    <td style="text-align: center;">84.8</td>
    <td style="text-align: center;">55.6</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-Mistral-7B (<a href="https://huggingface.co/nvidia/OpenMath-Mistral-7B-v0.1">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-Mistral-7B-v0.1-hf">HF</a>)</td>
    <td style="text-align: center;">80.2</td>
    <td style="text-align: center;">44.5</td>
    <td style="text-align: center;">86.9</td>
    <td style="text-align: center;">57.2</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-CodeLlama-13B (<a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-13b-Python">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-13b-Python-hf">HF</a>)</td>
    <td style="text-align: center;">78.8</td>
    <td style="text-align: center;">45.5</td>
    <td style="text-align: center;">86.8</td>
    <td style="text-align: center;">57.6</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-CodeLlama-34B (<a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-34b-Python">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-34b-Python-hf">HF</a>)</td>
    <td style="text-align: center;">80.7</td>
    <td style="text-align: center;">48.3</td>
    <td style="text-align: center;">88.0</td>
    <td style="text-align: center;">60.2</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-Llama2-70B (<a href="https://huggingface.co/nvidia/OpenMath-Llama-2-70b">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-Llama-2-70b-hf">HF</a>)</td>
    <td style="text-align: center;"><b>84.7</b></td>
    <td style="text-align: center;">46.3</td>
    <td style="text-align: center;">90.1</td>
    <td style="text-align: center;">58.3</td>
  </tr>
  <tr>
    <td style="text-align: right;">OpenMath-CodeLlama-70B (<a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-70b-Python">nemo</a> | <a href="https://huggingface.co/nvidia/OpenMath-CodeLlama-70b-Python-hf">HF</a>)</td>
    <td style="text-align: center;">84.6</td>
    <td style="text-align: center;"><b>50.7</b></td>
    <td style="text-align: center;"><b>90.8</b></td>
    <td style="text-align: center;"><b>60.4</b></td>
  </tr>
</table>

The pipeline we used to produce these models is fully open-sourced!

- [Code](https://github.com/Kipok/NeMo-Skills)
- [Models](https://huggingface.co/collections/nvidia/openmath-65c5619de2ba059be0775014)
- [Dataset](https://huggingface.co/datasets/nvidia/OpenMathInstruct-1)

See our [paper](https://arxiv.org/abs/2402.10176) for more details!

# How to use the models?

Try to [run inference with our models](https://github.com/Kipok/NeMo-Skills/blob/main/docs/inference.md) with just a few commands!

# Reproducing our results

We provide [all instructions](https://github.com/Kipok/NeMo-Skills/blob/main/docs/reproducing-results.md) to fully reproduce our results.

# Improving other models

To improve other models or to learn more about our code, read through the docs below.

- [NeMo-Skills Pipeline](https://github.com/Kipok/NeMo-Skills)
    - [Generating synthetic data](https://github.com/Kipok/NeMo-Skills/blob/main/docs/synthetic-data-generation.md)
    - [Finetuning models](https://github.com/Kipok/NeMo-Skills/blob/main/docs/finetuning.md)
    - [Evaluating models](https://github.com/Kipok/NeMo-Skills/blob/main/docs/evaluation.md)

In our pipeline we use [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/),
an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere.
It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models,
offering enterprises an easy, cost-effective, and fast way to adopt generative AI.

# Citation

If you find our work useful, please consider citing us!

```bibtex
@article{toshniwal2024openmath,
  title   = {OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset},
  author  = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},
  year    = {2024},
  journal = {arXiv preprint arXiv: Arxiv-2402.10176}
}
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_nvidia__OpenMath-Mistral-7B-v0.1-hf)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |54.00|
|AI2 Reasoning Challenge (25-Shot)|59.39|
|HellaSwag (10-Shot)              |81.78|
|MMLU (5-Shot)                    |59.34|
|TruthfulQA (0-shot)              |46.13|
|Winogrande (5-shot)              |77.27|
|GSM8k (5-shot)                   | 0.08|