Will Brooks
TornButter
AI & ML interests
None yet
Recent Activity
liked
a model
5 days ago
openbmb/MiniCPM-o-2_6
reacted
to
MoritzLaurer's
post
with ๐ฅ
12 days ago
The TRL v0.13 release is ๐ฅ! My highlight are the new process reward trainer to train models similar to o1 and tool call support:
๐ง Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.
๐ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.
๐ ๏ธ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.
โ๏ธ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.
Read the release notes and other resources here ๐
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: https://huggingface.co/papers/2409.20370
liked
a model
13 days ago
hexgrad/Kokoro-82M
Organizations
None yet
models
None public yet
datasets
None public yet