Multi-token prediction models and baselines

Models accompanying the research paper "Better & Faster Large Language Models via Multi-token Prediction" (https://arxiv.org/abs/2404.19737).

Included are the following four 7B parameter models trained on code:

  • baseline model (n=1) trained on 200B tokens of code: 7B_200B_1/
  • multi-token prediction model (n=4) trained on 200B tokens of code: 7B_200B_4/
  • baseline model (n=1) trained on 1T tokens of code: 7B_1T_1/
  • multi-token prediction model (n=4) trained on 1T tokens of code: 7B_1T_4/

Tokenizer: standard Llama 2 SentencePiece tokenizer in tokenizer.model.

Quickstart

Install torch, fairscale, fire and sentencepiece and run

torchrun --nproc_per_node 1 example_completion.py --ckpt_dir 7B_200B_4/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 2

replacing 7B_200B_4 by the respective checkpoint directory.

Format

The Pytorch state_dicts are compatible with Llama format: the layers of the shared trunk and the next-token prediction head layer are numbered contiguously. Additional prediction heads for tokens further in the future are names extra_heads and can be ignored for standard autoregressive inference.

The implementation of forward() in llama/model.py provides an additional argument return_all_heads. If set, the additional prediction heads are called and the logits are returned in shape (batch_size, seq_len, n_future_tokens, vocab_size). Otherwise, the logit's shape is (batch_size, seq_len, 1, vocab_size).

Citation

Gloeckle, F., Idrissi, B. Y., Rozière, B., Lopez-Paz, D., & Synnaeve, G. (2024). Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737.

Bibtex entry:

@article{gloeckle2024better,
  title={Better \& faster large language models via multi-token prediction},
  author={Gloeckle, Fabian and Idrissi, Badr Youbi and Rozi{\`e}re, Baptiste and Lopez-Paz, David and Synnaeve, Gabriel},
  journal={arXiv preprint arXiv:2404.19737},
  year={2024}
}

Feedback and comments

Please report risks as indicated in the Acceptable Use Policy and address bugs and any other comments to the corresponding authors as indicated in the research paper.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .