VoladorLuYu
's Collections
Efficient LLM
updated
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
•
2401.10774
•
Published
•
54
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper
•
2401.06761
•
Published
•
1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
•
2401.02669
•
Published
•
14
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
52
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
•
2401.15077
•
Published
•
19
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
Paper
•
2401.07324
•
Published
•
3
Hierarchical State Space Models for Continuous Sequence-to-Sequence
Modeling
Paper
•
2402.10211
•
Published
•
11
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
114
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper
•
2402.13720
•
Published
•
6
Paper
•
2402.13144
•
Published
•
95
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
•
2401.18058
•
Published
•
20
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
•
2402.10685
•
Published
•
1
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
27
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
•
2401.06951
•
Published
•
25
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
•
2402.04617
•
Published
•
4
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
•
2402.11131
•
Published
•
42
Towards Optimal Learning of Language Models
Paper
•
2402.17759
•
Published
•
16
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method
Paper
•
2402.17193
•
Published
•
23
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
DenseMamba: State Space Models with Dense Hidden Connection for
Efficient Large Language Models
Paper
•
2403.00818
•
Published
•
15
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Paper
•
2307.02486
•
Published
•
80
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper
•
2403.09919
•
Published
•
20
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper
•
2403.19928
•
Published
•
10
ReFT: Representation Finetuning for Language Models
Paper
•
2404.03592
•
Published
•
91
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
12
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
34
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
•
2404.08197
•
Published
•
27
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
88
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Skill-it! A Data-Driven Skills Framework for Understanding and Training
Language Models
Paper
•
2307.14430
•
Published
•
3
Compression Represents Intelligence Linearly
Paper
•
2404.09937
•
Published
•
27
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
•
2405.12130
•
Published
•
46
SLAB: Efficient Transformers with Simplified Linear Attention and
Progressive Re-parameterized Batch Normalization
Paper
•
2405.11582
•
Published
•
13
How Abilities in Large Language Models are Affected by Supervised
Fine-tuning Data Composition
Paper
•
2310.05492
•
Published
•
2
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
•
2404.13208
•
Published
•
38
Unlocking Continual Learning Abilities in Language Models
Paper
•
2406.17245
•
Published
•
28