context length
Greetings,
i got this model set up and running without issues, according to the blogpost, except for the context length: the blogpost states a length of 4k tokens, but every time i go beyond 2k tokens (while 4k are correctly set) with this GGUF version, i get the context window error from llama.cpp. (text-generation-webui & SillyTavern)
I wasn't able to test out other versions yet.
Does anyone happen to know if this is a limitation with this GGUF version or with llama in general or may the error be somewhere else?
Thanks in advance.
Is it an error, or just a warning? Does it say "warning: model might not support context sizes greater than 2048 tokens .. expect poor results" ? If so you can safely ignore that. I'm not sure why it's still in the code, but it doesn't apply to Llama 2 models or any model which has extended context
Unfortunately, it's the error and therefore it's not generating any output. It happens using SillyTavern+tgw and tgw alone.
In both instances i did verify that the context length was set to 4k and llama.cpp was used for loading the model.
I get the feeling that it's an error with tgw on my end π€
After setting context length in tgw, remember to reload the model. I was getting "llama_tokenize_with_model: too many tokens" in the terminal until I did that.