Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2001.08361

Scaling Laws 📏

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Paper • 2206.10789 • Published Jun 22, 2022 • 4
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Paper • 2401.00448 • Published Dec 31, 2023 • 28
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 7

Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 7
Scaling Laws for Autoregressive Generative Modeling

Paper • 2010.14701 • Published Oct 28, 2020
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 10
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26, 2024 • 4

LLM-Alignment Papers

Concrete Problems in AI Safety

Paper • 1606.06565 • Published Jun 21, 2016 • 1
The Off-Switch Game

Paper • 1611.08219 • Published Nov 24, 2016 • 1
Learning to summarize from human feedback

Paper • 2009.01325 • Published Sep 2, 2020 • 4
Truthful AI: Developing and governing AI that does not lie

Paper • 2110.06674 • Published Oct 13, 2021 • 1

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 7
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 24 days ago • 82

Ilya's papers for Carmack

Ilya Sutskever: "If you really learn all of these, you’ll know 90% of what matters today." Full list: https://punkx.org/jackdoe/30.html

Recurrent Neural Network Regularization

Paper • 1409.2329 • Published Sep 8, 2014
Pointer Networks

Paper • 1506.03134 • Published Jun 9, 2015
Order Matters: Sequence to sequence for sets

Paper • 1511.06391 • Published Nov 19, 2015
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Papers - Model Scaling

Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 7
An Empirical Model of Large-Batch Training

Paper • 1812.06162 • Published Dec 14, 2018 • 3
Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2

🚀 Spinning Up in LLMs

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 37
Efficient Estimation of Word Representations in Vector Space

Paper • 1301.3781 • Published Jan 16, 2013 • 6
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50

Interesting AI papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 16
Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 6
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 12

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs