-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 33 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 41 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 28 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 42
Kai Zuberbühler
kaizuberbuehler
AI & ML interests
language models, agents, image generation, music generation
Recent Activity
updated
a collection
about 13 hours ago
Benchmarks
updated
a collection
about 13 hours ago
Agents
upvoted
a
paper
about 13 hours ago
The BrowserGym Ecosystem for Web Agent Research
Organizations
None yet
Collections
21
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 186 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 24 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 34
spaces
1
datasets
None public yet