-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2402.16827
-
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
HuggingFaceFW/fineweb-edu
Updated • 226k • 582 -
allenai/MADLAD-400
Updated • 31.3k • 132
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 61 -
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 32
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
-
Effective pruning of web-scale datasets based on complexity of concept clusters
Paper • 2401.04578 • Published -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 40 -
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
LESS: Selecting Influential Data for Targeted Instruction Tuning
Paper • 2402.04333 • Published • 3