Seems to be broken.

by Herman555 - opened Apr 28, 2024

Discussion

Herman555

Apr 28, 2024

Just writes endlessly, stop token doesn't seem to reliably function when using ChatML format.

mradermacher

Owner Apr 28, 2024

This is a llama-3 model, make sure your tools have been updated for it. What you see is a typical symptom of using an old inference engine, or the wrong configuration.

mradermacher changed discussion status to closed Apr 28, 2024

Herman555

Apr 28, 2024

I was just using another ChatML Llama 3 finetune with no problems at all so I don't think there are any problems with my configuration.

Using latest Koboldcpp and Sillytavern, no problems with other models. Not sure if the problem is with the quants or finetune itself. Was using Q5_K_M just for reference.

mradermacher

Owner Apr 28, 2024

•

edited Apr 28, 2024

current koboldcpp does not have support for the llama 3 end tokens unless you manually configure them - did you do so? if not, that is the problem. other finetunes don't matter, because they might not use the same end tokens.

in any case, i only provide the quants - any vocabulary problem is up to the original model. but the symtpoms you describe are a clear indication of a configuration problem, especially since kopboldcpp has not yet been updated for llama-3 as of the latest release.

mradermacher

Owner Apr 29, 2024

llama-3 support has landed a few hours ago in llama.cpp, and is expected to be in the next koboldcpp version. the equivalent of --override-kv tokenizer.ggml.pre=str:llama3 likely also needs to be specified (it's not clear whether koboldcpp will have such a switch).

Herman555

Apr 29, 2024

Yes I saw... it was my mistake, I thought the tokenizer issues were already fixed. Will give this a try when new version of koboldcpp releases.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment