Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Note
similar https://huggingface.co/papers/2402.18668
Simple linear attention language models balance the recall-throughput
tradeoff
Paper
•
2402.18668
•
Published
•
18
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
Two-Phase Partition
Paper
•
2402.15220
•
Published
•
19
Linear Transformers are Versatile In-Context Learners
Paper
•
2402.14180
•
Published
•
6
Scaling Laws for Fine-Grained Mixture of Experts
Paper
•
2402.07871
•
Published
•
11
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts
Models
Paper
•
2402.07033
•
Published
•
16
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
Quantization
Paper
•
2401.18079
•
Published
•
7
Note
kinda similar https://arxiv.org/pdf/2402.02750.pdf
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
•
2402.01391
•
Published
•
41
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
Note
qmoe - https://arxiv.org/pdf/2310.16795.pdf
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
69
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
73
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
•
2402.01032
•
Published
•
22
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
•
2401.18058
•
Published
•
20
Can Large Language Models Understand Context?
Paper
•
2402.00858
•
Published
•
22
WARM: On the Benefits of Weight Averaged Reward Models
Paper
•
2401.12187
•
Published
•
18
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
•
2401.12070
•
Published
•
43
Zero Bubble Pipeline Parallelism
Paper
•
2401.10241
•
Published
•
23
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
145
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
•
2402.01093
•
Published
•
45
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
•
2401.08967
•
Published
•
29
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
•
2401.06951
•
Published
•
25
Tuning Language Models by Proxy
Paper
•
2401.08565
•
Published
•
21
Extending LLMs' Context Window with 100 Samples
Paper
•
2401.07004
•
Published
•
15
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
•
2401.06080
•
Published
•
26
Efficient LLM inference solution on Intel GPU
Paper
•
2401.05391
•
Published
•
9
The Impact of Reasoning Step Length on Large Language Models
Paper
•
2401.04925
•
Published
•
16
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
•
2401.04658
•
Published
•
25
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
49
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
•
2401.03462
•
Published
•
27
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
•
2401.01325
•
Published
•
27
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
•
2312.12456
•
Published
•
40
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
•
2401.15077
•
Published
•
19
OLMo: Accelerating the Science of Language Models
Paper
•
2402.00838
•
Published
•
82
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
•
2402.00159
•
Published
•
61
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
•
2402.04248
•
Published
•
30
LiPO: Listwise Preference Optimization through Learning-to-Rank
Paper
•
2402.01878
•
Published
•
19
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
•
2402.04291
•
Published
•
48
Direct Language Model Alignment from Online AI Feedback
Paper
•
2402.04792
•
Published
•
29
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper
•
2402.05099
•
Published
•
19
Model Editing with Canonical Examples
Paper
•
2402.06155
•
Published
•
11
SubGen: Token Generation in Sublinear Time and Memory
Paper
•
2402.06082
•
Published
•
10
InternLM-Math: Open Math Large Language Models Toward Verifiable
Reasoning
Paper
•
2402.06332
•
Published
•
18
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper
•
2402.07319
•
Published
•
13
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
•
2402.07625
•
Published
•
12
Suppressing Pink Elephants with Direct Principle Feedback
Paper
•
2402.07896
•
Published
•
9
Buffer Overflow in Mixture of Experts
Paper
•
2402.05526
•
Published
•
8
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
•
2402.11131
•
Published
•
42
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
•
2402.10644
•
Published
•
79
LongAgent: Scaling Language Models to 128k Context through Multi-Agent
Collaboration
Paper
•
2402.11550
•
Published
•
16
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
•
2402.10193
•
Published
•
19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
114
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
FuseChat: Knowledge Fusion of Chat Models
Paper
•
2402.16107
•
Published
•
36
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
•
2402.15627
•
Published
•
34
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper
•
2402.16837
•
Published
•
24
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
•
2402.14830
•
Published
•
24
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
•
2402.15319
•
Published
•
19
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
•
2402.14083
•
Published
•
47
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
OneBit: Towards Extremely Low-bit Large Language Models
Paper
•
2402.11295
•
Published
•
23
AtP*: An efficient and scalable method for localizing LLM behaviour to
components
Paper
•
2403.00745
•
Published
•
12
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
•
2403.07816
•
Published
•
39
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
•
2403.07508
•
Published
•
74
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
•
2403.06504
•
Published
•
53
ReALM: Reference Resolution As Language Modeling
Paper
•
2403.20329
•
Published
•
21
sDPO: Don't Use Your Data All at Once
Paper
•
2403.19270
•
Published
•
40
Long-form factuality in large language models
Paper
•
2403.18802
•
Published
•
24
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
•
2403.13372
•
Published
•
62
Evolutionary Optimization of Model Merging Recipes
Paper
•
2403.13187
•
Published
•
50
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
•
2403.10704
•
Published
•
57
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Pre-training Small Base LMs with Fewer Tokens
Paper
•
2404.08634
•
Published
•
34
Dataset Reset Policy Optimization for RLHF
Paper
•
2404.08495
•
Published
•
8
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
34
Bootstrapping Language Models with DPO Implicit Rewards
Paper
•
2406.09760
•
Published
•
38
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
•
2408.08152
•
Published
•
52
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
58