Safetensors
llama
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Step-level Value Preference Optimization for Mathematical Reasoning

This is the official repository for paper Step-level Value Preference Optimization for Mathematical Reasoning. It is extracted from our internal corporate codebase. As a result, there may be slight differences when reproducing the numbers reported in our paper, but they should be very close.

The implementation of SVPO is based on AlphaMath, such as MCTS and Step-level beam search (SBS). Therefore, we provide the code of step-level preference pairs construction in this repository to facilitate reproduction.

Citation

SVPO

@misc{chen2024steplevel,
      title={Step-level Value Preference Optimization for Mathematical Reasoning}, 
      author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
      year={2024},
      eprint={2406.10858},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

AlphaMATH

@misc{chen2024alphamath,
      title={AlphaMath Almost Zero: process Supervision without process}, 
      author={Guoxin Chen and Minpeng Liao and Chengxi Li and Kai Fan},
      year={2024},
      eprint={2405.03553},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
6.91B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .