-
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 10 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper • 2406.04333 • Published • 37 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2405.05254
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 145 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 77 -
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Paper • 2405.08707 • Published • 30
-
xLSTM: Extended Long Short-Term Memory
Paper • 2405.04517 • Published • 12 -
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 10 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 17 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 129
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 12 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 22 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 21
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 51 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 43 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 12 -
MatFormer: Nested Transformer for Elastic Inference
Paper • 2310.07707 • Published • 1 -
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper • 2310.08461 • Published • 1