Mayank Singh's picture

4 1

Mayank Singh

mankness

·

mayanksingh09

AI & ML interests

None yet

Recent Activity

reacted to lewtun's post with 🔥 24 days ago

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think" We're open sourcing the full recipe and sharing a detailed blog post. In our blog post we cover: 📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time. 🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets. 🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM Here's the links: - Blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute - Code: https://github.com/huggingface/search-and-learn Enjoy!

View all activity

Organizations

None yet

mankness's activity

upvoted a paper 3 months ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 136

upvoted 3 papers 11 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 606

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 114