Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3, 2024 • 52
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published Oct 2, 2024 • 30
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 104
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs Paper • 2409.14988 • Published Sep 23, 2024 • 22
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 36
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 43
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published Oct 3, 2024 • 26
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 144
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published Oct 3, 2024 • 47
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7, 2024 • 13
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 111
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7, 2024 • 70
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published Nov 7, 2024 • 48
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Paper • 2411.04999 • Published Nov 7, 2024 • 16
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Paper • 2411.03590 • Published Nov 6, 2024 • 9
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published Nov 5, 2024 • 63
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 64
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge Paper • 2411.02657 • Published Nov 4, 2024 • 5
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 48
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 33
Survey of Cultural Awareness in Language Models: Text and Beyond Paper • 2411.00860 • Published Oct 30, 2024 • 23
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4, 2024 • 25
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 20
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models Paper • 2411.00492 • Published Nov 1, 2024 • 6
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 46
DELTA: Dense Efficient Long-range 3D Tracking for any video Paper • 2410.24211 • Published Oct 31, 2024 • 8
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published Oct 29, 2024 • 13
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset Paper • 2410.22325 • Published Oct 29, 2024 • 10
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 54
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32
LongReward: Improving Long-context Large Language Models with AI Feedback Paper • 2410.21252 • Published Oct 28, 2024 • 17
Teach Multimodal LLMs to Comprehend Electrocardiographic Images Paper • 2410.19008 • Published Oct 21, 2024 • 23
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24, 2024 • 40
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 18
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes Paper • 2410.18084 • Published Oct 23, 2024 • 13
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes Paper • 2410.17249 • Published Oct 22, 2024 • 41
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21, 2024 • 59
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors Paper • 2410.16271 • Published Oct 21, 2024 • 81
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 41
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17, 2024 • 32
Exploring Model Kinship for Merging Large Language Models Paper • 2410.12613 • Published Oct 16, 2024 • 20
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 49
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts Paper • 2410.10626 • Published Oct 14, 2024 • 38
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 47
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 31
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13, 2024 • 10
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks Paper • 2411.06490 • Published Nov 10, 2024 • 6
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 62
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Paper • 2411.08868 • Published Nov 13, 2024 • 12
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 42
On Domain-Specific Post-Training for Multimodal Large Language Models Paper • 2411.19930 • Published Nov 29, 2024 • 25
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper • 2411.15139 • Published Nov 22, 2024 • 15
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 76
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 47
Patience Is The Key to Large Language Model Reasoning Paper • 2411.13082 • Published Nov 20, 2024 • 7
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published 24 days ago • 50
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published 22 days ago • 20