Will Brooks's picture

Will Brooks

TornButter

·

AI & ML interests

None yet

Recent Activity

liked a model 5 days ago

openbmb/MiniCPM-o-2_6

reacted to MoritzLaurer's post with 🔥 12 days ago

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support: 🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning. 🔀 Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub. 🛠️ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts. ⚖️ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation. Read the release notes and other resources here 👇 Release: https://github.com/huggingface/trl/releases/tag/v0.13.0 Mergekit: https://github.com/arcee-ai/mergekit Mixture of judges paper: https://huggingface.co/papers/2409.20370

liked a model 13 days ago

hexgrad/Kokoro-82M

View all activity

Organizations

None yet

models

None public yet

datasets

None public yet