-
Textbooks Are All You Need
Paper β’ 2306.11644 β’ Published β’ 142 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 87 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper β’ 2305.07759 β’ Published β’ 33 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper β’ 2406.20094 β’ Published β’ 98
Collections
Discover the best community collections!
Collections including paper arxiv:2304.12244
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper β’ 2407.03502 β’ Published β’ 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper β’ 2406.08464 β’ Published β’ 67 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper β’ 2404.14219 β’ Published β’ 256 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper β’ 2402.10379 β’ Published β’ 31
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper β’ 2401.16380 β’ Published β’ 49 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper β’ 2404.07503 β’ Published β’ 30 -
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper β’ 2304.12244 β’ Published β’ 14 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper β’ 2402.13064 β’ Published β’ 48
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper β’ 2402.19427 β’ Published β’ 53 -
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 147 -
Tuning Language Models by Proxy
Paper β’ 2401.08565 β’ Published β’ 22 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper β’ 2401.06066 β’ Published β’ 47
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper β’ 1810.04805 β’ Published β’ 16 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper β’ 1907.11692 β’ Published β’ 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper β’ 1910.01108 β’ Published β’ 14
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 50 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper β’ 2307.08691 β’ Published β’ 8 -
Mixtral of Experts
Paper β’ 2401.04088 β’ Published β’ 158 -
Mistral 7B
Paper β’ 2310.06825 β’ Published β’ 47
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper β’ 2310.13961 β’ Published β’ 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper β’ 2309.09582 β’ Published β’ 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper β’ 2310.13127 β’ Published β’ 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper β’ 2308.14306 β’ Published β’ 1
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper β’ 2310.13961 β’ Published β’ 5 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper β’ 2202.07922 β’ Published β’ 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper β’ 2310.13671 β’ Published β’ 19 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper β’ 2309.09582 β’ Published β’ 4
-
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper β’ 2308.09583 β’ Published β’ 7 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper β’ 2306.08568 β’ Published β’ 28 -
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper β’ 2304.12244 β’ Published β’ 14