Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.07572

Running

2

🥇

WebWalkerQALeaderboard
callanwu/WebWalkerQA

Viewer • Updated 6 days ago • 14.3k • 66 • 3
Running

3

🌐

WebWalker
WebWalker: Benchmarking LLMs in Web Traversal

Paper • 2501.07572 • Published 6 days ago • 18

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 18 days ago • 95
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published 17 days ago • 47
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Paper • 2501.01423 • Published 17 days ago • 36
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Paper • 2411.13552 • Published Nov 20, 2024

WebWalker: Benchmarking LLMs in Web Traversal

Paper • 2501.07572 • Published 6 days ago • 18

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 11 days ago • 232
Learning an evolved mixture model for task-free continual learning

Paper • 2207.05080 • Published Jul 11, 2022 • 1
EVOLvE: Evaluating and Optimizing LLMs For Exploration

Paper • 2410.06238 • Published Oct 8, 2024 • 1
Smaller Language Models Are Better Instruction Evolvers

Paper • 2412.11231 • Published Dec 15, 2024 • 27

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published Nov 27, 2024 • 18
VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published Nov 29, 2024 • 9
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published Dec 3, 2024 • 23
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published Dec 4, 2024 • 16

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images

Paper • 2411.05738 • Published Nov 8, 2024 • 14
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents

Paper • 2410.22476 • Published Oct 29, 2024 • 25
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 46
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 54

about 8 hours ago

More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3, 2024 • 52
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 43
Generative Agents: Interactive Simulacra of Human Behavior

Paper • 2304.03442 • Published Apr 7, 2023 • 12
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8

about 8 hours ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 187
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 25
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs