Fix: Update chat_template in tokenizer_config for proper system prompts and prefill rendering
Problem
There are two problems with the current version of chat_template.
1. System prompt gets dropped when you add assistant's prefill.
To understand the problem fully, let's first see what happens normally when system prompt is specified.
when input messages are
messages=[{"role": "system", "content": "Be short. Answer question within 10 words"},
{"role": "user", "content": "talk excessively, do you understand? Answer question using above 10 words"},
{"role": "assistant", "content": "OKay I understand"},
{"role": "user", "content": "Where is San Jose? Answer using more than 10 words"}]
serialized instruction is
<s>[INST] talk excessively, do you understand? Answer question using above 10 words[/INST] OKay I understand</s>[INST] Be short. Answer question within 10 words\n\nWhere is San Jose? Answer using more than 10 words[/INST]
system message is prepended to last turn's user question
However,
when input messages are
messages=[{"role": "system", "content": "Be short. Answer question within 10 words"},
{"role": "user", "content": "talk excessively, do you understand? Answer question using above 10 words"},
{"role": "assistant", "content": "OKay I understand"},
{"role": "user", "content": "Where is San Jose?"},
{"role": "assistant", "content": "San Jose"}] # prefill asisstant's response
serialized instruction is (use apply_chat_template with tokenize=False)
<s>[INST] talk excessively, do you understand? Answer question using above 10 words[/INST] OKay I understand</s>[INST] Where is San Jose?[/INST] San Jose</s>
Current output drops system prompt.
Expected instruction should be
<s>[INST] talk excessively, do you understand? Answer question using above 10 words[/INST] OKay I understand</s>[INST] Be short. Answer question within 10 words\n\nWhere is San Jose?[/INST] San Jose</s>
2. Prefill is not triggered correctly
Currently, when you try to prefill the assistant, for example, to complete the rest of the output "I'm sorry. I can't help because", the serialized instruction incorrectly contains extra EOS token.
For example,
when input messages are
messages=[{"role": "user", "content": "Where is San Jose?"},
{"role": "assistant", "content": "I'm sorry. I can't help because"}]
serialized instruction is
<s>[INST] Where is San Jose?[/INST] I'm sorry. I can't help because</s>
Current output has attached to end of assistant's response.
Expected instruction should be to have only attached to end of assistant response when it's not the last turn, i,e.
<s>[INST] Where is San Jose?[/INST] I'm sorry. I can't help because
This allows for proper prefill.
Solution
Update chat_template in tokenizer_config.
Testing
Besides the above 2 test cases, also this one:
messages=[{"role": "system", "content": "Be short. Answer question within 10 words"},
{"role": "user", "content": "talk excessively, do you understand? Answer question using above 10 words"},
{"role": "assistant", "content": "OKay I understand"},
{"role": "user", "content": "Where is San Jose?"},
{"role": "assistant", "content": "San Jose is in California"},
{"role": "user", "content": "Where is San Jose?"},
{"role": "assistant", "content": "San Jose "}]
expected output of apply_chat_template
<s>[INST] talk excessively, do you understand? Answer question using above 10 words[/INST] OKay I understand</s>[INST] Where is San Jose?[/INST] San Jose is in California</s>[INST] Be short. Answer question within 10 words\n\nWhere is San Jose?[/INST] San Jose
It might not be obvious what the proposed solutions are. So let me give more details here:
Proposed fix is as follows:
tokenizer.chat_template="""{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
{%- set ns = namespace() %}
{%- set ns.index = 0 %}
{%- for message in loop_messages %}
{%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
{%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
{{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif %}
{%- set ns.index = ns.index + 1 %}
{%- endif %}
{%- endfor %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if message["role"] == "user" %}
{%- if tools is not none and (message == user_messages[-1]) %}
{{- "[AVAILABLE_TOOLS] [" }}
{%- for tool in tools %}
{%- set tool = tool.function %}
{{- '{"type": "function", "function": {' }}
{%- for key, val in tool.items() if key != "return" %}
{%- if val is string %}
{{- '"' + key + '": "' + val + '"' }}
{%- else %}
{{- '"' + key + '": ' + val|tojson }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- "}}" }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" }}
{%- endif %}
{%- endfor %}
{{- "[/AVAILABLE_TOOLS]" }}
{%- endif %}
{%- if (loop.last or loop.index == loop.length - 1) and system_message is defined %}
{{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
{%- else %}
{{- "[INST] " + message["content"] + "[/INST]" }}
{%- endif %}
{%- elif message.tool_calls is defined and message.tool_calls is not none %}
{{- "[TOOL_CALLS] [" }}
{%- for tool_call in message.tool_calls %}
{%- set out = tool_call.function|tojson %}
{{- out[:-1] }}
{%- if not tool_call.id is defined or tool_call.id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- ', "id": "' + tool_call.id + '"}' }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" + eos_token }}
{%- endif %}
{%- endfor %}
{%- elif message["role"] == "assistant" %}
{{- " " + message["content"]|trim + ("" if loop.last else eos_token)}}
{%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
{%- if message.content is defined and message.content.content is defined %}
{%- set content = message.content.content %}
{%- else %}
{%- set content = message.content %}
{%- endif %}
{{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
{%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}"""
Summary of Changes
(loop.last or loop.index == loop.length - 1)
to allow for system prompt to be prepended to user message even if it's the second last on the message list. Solves problem 1("" if loop.last else eos_token)
to not add eos_token when assistant's message is the last message. Allows proper prefill and solves for problem 2