Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 4 days ago • 23
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 19 days ago • 44
Looking Inward: Language Models Can Learn About Themselves by Introspection Paper • 2410.13787 • Published Oct 17, 2024 • 7
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 11 days ago • 28
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response Paper • 2412.14922 • Published 15 days ago • 82
LlamaFusion: Adapting Pretrained Language Models for Multimodal Generation Paper • 2412.15188 • Published 15 days ago • 1
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 16 days ago • 52
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published 16 days ago • 69
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published 17 days ago • 18
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations Paper • 2412.13171 • Published 17 days ago • 31
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers Paper • 2412.12276 • Published 18 days ago • 15
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 22 days ago • 80
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 21 days ago • 136