Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 156
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 145
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Paper • 2305.10429 • Published May 17, 2023 • 3