Possibly the provided prompt format is wrong.

#1
by vevi33 - opened

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

I can confirm this with the older mistral nemo based models (still d/l'ing this one, presumably it will be the same).

God, I wish Mistral used a better prompt format

I just throw what the actual tokenizer chat template compiles to, hence it <s> at the start, and I assume the Jinja will handle the rest properly, which it looks like it will?

I can't speak to whether the system prompt should get its own response, that feels like just multi turn prompting and suggests that a system message just isn't supported

Otherwise I see no difference in the chat template provided vs the one in the AWS link

God, I wish Mistral used a better prompt format

You don't need a better prompt format, if you just use the model's original tokenizer.
Not sure how GGUF people handle this issue, but I was able to make a quick python using the transformer's library to instantiate the toeknizer from here:

and if you want to use the v3 tokenizer, you can use the same JSON, but instead, with this model:

That will allow you to never care about the prompt format.

Also, using good inference engines, you can usually have both a completions endpoint (no tokenizer, needs you to define prompt format) and the chat/completions endpoints (which is using the tokenizer, and does not need you to specify the prompt format.)

Made a prompt Jinja2 template here to support un - user/assistant/user/assistant... sequence by glue continues role's messages together.

{{- '<s>' }}
{%- for message in messages %}
    {%- set prev_message = messages[loop.index0 - 1] if not loop.first else None %}
    {%- set next_message = messages[loop.index] if not loop.last else None %}

    {%- if message['role'] != 'assistant' %}
        {%- if not prev_message or prev_message['role'] == 'assistant' %}
            {{- '[INST] ' }}
        {%- endif %}
        {{- message['content'] }}
        {%- if not next_message or next_message['role'] == 'assistant' %}
            {{- '[/INST]' }}
        {%- elif message['role'] == 'system' %}
            {{- '\n\n' }}
        {%- else %}
            {{- '\n' }}
        {%- endif %}
        
    {%- elif message['role'] == 'assistant' %}
        {%- if loop.first %}
            {{- '[INST] [/INST]' }}
        {%- endif %}
        {{- ' ' + message['content'] }}
        {%- if next_message and next_message['role'] != 'assistant' %}
            {{- '</s>' }}
        {%- else %}
            {{- '</s>[INST] [/INST]' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

@vevi33
Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

@pandora-s
Thank you for the clarification!

I purposed basically this if I am not wrong, but I corrected my post according to your link, the be exactly the same and to not confuse anyone!
Thanks for everyone for being helpful and make this topic finally clear in the community!

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

For llamacpp prompt template will be like that

--in-prefix "</s>[INST] " --in-suffix "[/INST] " -p "<s>[INST] You are a helpful assistant.[/INST]"

Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

@pandora-s Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

I used https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings

Awesome! Thanks! It really does contribute a lot... in everything, logic, prose, immersion... incredible.

I'm using Marinara's presets too and they make a world of difference far as rp is concerned with Mistral models.

Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

@ddh0 yes, the original Small repo was fixed a few hours ago with the correct template, sorry for the trouble!

Sign up or log in to comment