Checkpoint pre training data
#7
by
cobquintero
- opened
I am aware that the checkpoints after stage 1 and stage 2 you know all the data that has been seen in training. Is there a way to know after every checkpoint what specific data files the model has seen? e.g. for stage1-step286000-tokens1200B is there a way to see the datafiles the model has seen up to that point?