Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published May 23, 2024 • 11