ICLR 2024

Offical checkpoint for Tool-Augmented Reward Modeling (ICLR 2024 spotlight).

Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

πŸ”₯ News

  • 9 February, 2024: πŸŽ‰ We release the official codebase and model weights of baidu/Themis-7b. Stay tuned!πŸ”₯
  • 16 January, 2024: πŸŽ‰ Our work has been accepted to ICLR 2024 spotlight! ✨

Citation

@inproceedings{tarm-2024-ernie,
  author = {Lei Li and
            Yekun Chai and
            Shuohuan Wang and
            Yu Sun and
            Hao Tian and
            Ningyu Zhang and
            Hua Wu},
  title = {Tool-Augmented Reward Modeling},
  booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
  year = {2024},
  url = {https://openreview.net/forum?id=d94x0gWTUX},
}
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train baidu/Themis-7b

Collection including baidu/Themis-7b