Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 17 days ago β’ 116
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling β’ 3 items β’ Updated 15 days ago β’ 112
The Open Source Advantage in Large Language Models (LLMs) Paper β’ 2412.12004 β’ Published 18 days ago β’ 9
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper β’ 2412.09604 β’ Published 22 days ago β’ 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 21 days ago β’ 136
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper β’ 2412.08737 β’ Published 23 days ago β’ 52
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 22 days ago β’ 92
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper β’ 2412.08443 β’ Published 23 days ago β’ 38
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published 23 days ago β’ 45
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 24 days ago β’ 50
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper β’ 2412.07589 β’ Published 24 days ago β’ 46
Evaluating and Aligning CodeLLMs on Human Preference Paper β’ 2412.05210 β’ Published 28 days ago β’ 47
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 24 days ago β’ 70
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published 25 days ago β’ 64
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published 25 days ago β’ 72
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 26 days ago β’ 71
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper β’ 2412.05237 β’ Published 28 days ago β’ 46
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper β’ 2412.04862 β’ Published 29 days ago β’ 48