Ref-Pretrain-Qwen-104M

paper | code

Ref-Pretrain-Qwen-104M is a 104M model with Qwen achitecture conventionally pre-trained from scratch on the Pile for 5B tokens.

We also open-source the tokenized pre-training corpus for reproducibility.

It is used as the reference model in the MiniPLM knwoledge distillation framework to construct the refined pre-training corpus. The data is then used to train MiniPLM models.

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Citation

@article{miniplm,
    title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
    author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
    journal={arXiv preprint arXiv:2410.17215},
    year={2024}
}
Downloads last month
25
Safetensors
Model size
104M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train MiniLLM/Ref-Pretrain-Qwen-104M