Import data from LM Contamination Index (#7) e1c863c verified Iker borgr OSainz commited on Apr 19, 2024
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus" (#6) 935e79b verified Iker vishaal27 commited on Apr 18, 2024
Add data from "An Open-Source Data Contamination Report for Large Language Models" 6169ce2 verified vishaal27 commited on Apr 18, 2024