Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30, 2024 • 22
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 48