-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 26 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 37 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 52 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2406.18219
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 5 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 4 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
Unlocking Continual Learning Abilities in Language Models
Paper • 2406.17245 • Published • 29 -
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 16 -
Symbolic Learning Enables Self-Evolving Agents
Paper • 2406.18532 • Published • 12 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 42
-
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 16 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 105 -
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Paper • 2412.04449 • Published • 6 -
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper • 2412.14711 • Published • 15
-
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 35 -
A Closer Look into Mixture-of-Experts in Large Language Models
Paper • 2406.18219 • Published • 16 -
ThinK: Thinner Key Cache by Query-Driven Pruning
Paper • 2407.21018 • Published • 31 -
Meltemi: The first open Large Language Model for Greek
Paper • 2407.20743 • Published • 68
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 129 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 17 -
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation
Paper • 2401.15688 • Published • 11 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 70 -
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Paper • 2401.15071 • Published • 35
-
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
Paper • 2402.00769 • Published • 22 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 82 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 20 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 17
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16 -
Mixtral of Experts
Paper • 2401.04088 • Published • 158 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 45