Gmc2
's Collections
Long context LLM
updated
Sequence Parallelism: Long Sequence Training from System Perspective
Paper
•
2105.13120
•
Published
•
5
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
•
2310.01889
•
Published
•
10
Striped Attention: Faster Ring Attention for Causal Transformers
Paper
•
2311.09431
•
Published
•
4
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
•
2309.14509
•
Published
•
17
LightSeq: Sequence Level Parallelism for Distributed Training of Long
Context Transformers
Paper
•
2310.03294
•
Published
•
2
BurstAttention: An Efficient Distributed Attention Framework for
Extremely Long Sequences
Paper
•
2403.09347
•
Published
•
20
Beyond the Limits: A Survey of Techniques to Extend the Context Length
in Large Language Models
Paper
•
2402.02244
•
Published
•
1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
•
2401.02669
•
Published
•
14
Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey
Paper
•
2311.12351
•
Published
•
3
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
27
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
114
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Longformer: The Long-Document Transformer
Paper
•
2004.05150
•
Published
•
3
Generating Long Sequences with Sparse Transformers
Paper
•
1904.10509
•
Published
•
1
A Unified Sequence Parallelism Approach for Long Context Generative AI
Paper
•
2405.07719
•
Published
•
3
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
Parallelism
Paper
•
2406.18485
•
Published
•
2