nvidia/Llama-3_1-Nemotron-51B-Instruct · What is the context size this model was trained on?

12 days ago

The (very new) GGUF version of this model as implemented in llama.cpp doesn't work after about 4K tokens of context. I'm wondering if the original model works well with long-context prompts or if it's a bug in llama.cpp.

tdh111

9 days ago

There is a PR designed to deal with that issue: https://github.com/ggerganov/llama.cpp/pull/11008

treehugg3

7 days ago

Yes, thank you. You're right, that issue with the 4k context was with the GGUF settings. I am still curious about the training context size of this model, though.

For https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct it says the 70B model was trained with:

Input:
Input Type(s): Text
Input Format: String
Input Parameters: One Dimensional (1D)
Other Properties Related to Input: Max of 128k tokens

Output:
Output Type(s): Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Properties Related to Output: Max of 4k tokens

I wonder if this model was trained on the same dataset.