-
Adapting Language Models to Compress Contexts
Paper • 2305.14788 • Published • 1 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization
Paper • 2401.07793 • Published • 3 -
Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression
Paper • 2402.16058 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2402.10790
-
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 11 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 17 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 25 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 47
-
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 13 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 30 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 41
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 114 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 23 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 16 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
-
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
Paper • 2402.04617 • Published • 4 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 20 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 41 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 22 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper • 2410.16256 • Published • 59 -
Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 24
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 41 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 16 -
A Neural Conversational Model
Paper • 1506.05869 • Published • 2 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 23
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 40 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 25 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 12 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 24