Collections
Discover the best community collections!
Collections including paper arxiv:2501.07572
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 95 -
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Paper • 2501.01257 • Published • 47 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 36 -
REDUCIO! Generating 1024times1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Paper • 2411.13552 • Published
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 232 -
Learning an evolved mixture model for task-free continual learning
Paper • 2207.05080 • Published • 1 -
EVOLvE: Evaluating and Optimizing LLMs For Exploration
Paper • 2410.06238 • Published • 1 -
Smaller Language Models Are Better Instruction Evolvers
Paper • 2412.11231 • Published • 27
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 9 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 23 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 16
-
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Paper • 2411.05738 • Published • 14 -
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 25 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 46 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 54
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 52 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 43 -
Generative Agents: Interactive Simulacra of Human Behavior
Paper • 2304.03442 • Published • 12 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 187 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 25 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35