Skip to content

Conversation

@kallewoof
Copy link

@kallewoof kallewoof commented Aug 8, 2025

  1. The system message needs an # Instructions\n\n prefix.
  2. The adapter_end <|return|> token is used for training, and we want to use <|end|> for the generic case. See details below.

The .jinja template has this:

{#- Render developer message #}
{%- if developer_message or tools %}
    {{- "<|start|>developer<|message|>" }}
    {%- if developer_message %}
        {{- "# Instructions\n\n" }}
        {{- developer_message }}
    {%- endif %}
    {%- if tools -%}
        {{- "\n\n" }}
        {{- "# Tools\n\n" }}
        {{- render_tool_namespace("functions", tools) }}
    {%- endif -%}
    {{- "<|end|>" }}
{%- endif %}

i.e. it prefixes the developer (system) message with # Instructions\n\n.

(Caught in #1659, but pushing as a separate PR as I don't know if that PR will survive.)

Edit: after re-running the test, I'm also seeing this

OpenAI Harmony                 = OpenAI Harmony                 : missing expected fragment
	adapter:  <|start|>assistant<|channel|>final<|message|>asst_1<|return|>
	tokenizer: <|start|>assistant<|channel|>final<|message|>asst_1<|end|><|start|>user<|message|>user_2<|end|> openai/gpt-oss-120b                         

which indicates the assistant end prefix should be <|end|> and not <|return|>. I'm not clear on the difference atm.

For [{"role":"developer","content":"hi"},{"role":"user","content":"hello"},{"role":"assistant","content":"e"},{"role":"user","content":"o"}], the chat template outputs (with add_generation_prompt=True):

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|end|><|start|>user<|message|>6\n'
 '<|end|><|start|>assistant')
>>> 

(add_generation_prompt=False):

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|end|><|start|>user<|message|>6\n'
 '<|end|>')

A-HA! This thing is specifically for training: when the assistant is the last role and there is no add_generation_prompt, it assumes you are generating training examples, and it uses <|return|> instead of <|end|>, as described in

        {%- elif loop.last and not add_generation_prompt %}
            {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
            {#- This is a situation that should only occur in training, never in inference. #}
            {%- if "thinking" in message %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
            {%- endif %}
            {#- <|return|> indicates the end of generation, but <|end|> does not #}
            {#- <|return|> should never be an input to the model, but we include it as the final token #}
            {#- when training, so the model learns to emit it. #}
            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}

-- the above with the last user turn removed and add_generation_prompt=False:

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|return|>')

Nifty. But it does mean the adapter should use <|end|>, I'd say.

@kallewoof kallewoof marked this pull request as draft August 8, 2025 06:36
@kallewoof kallewoof marked this pull request as ready for review August 8, 2025 06:42
@kallewoof kallewoof changed the title tweak OpenAI Harmony autoguess developer prefix tweak OpenAI Harmony autoguess developer prefix and assistant end token Aug 8, 2025
@LostRuins
Copy link
Owner

OpenAI's docs are inconsistent. Some examples they use <|return|> as well. I have no preference, so if you think <|end|> is better we can do that

image image

Copy link
Owner

@LostRuins LostRuins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do let me know the decision.

Edit: maybe <|return|> is more like an EOS?

@kallewoof
Copy link
Author

OpenAI's docs are inconsistent. Some examples they use <|return|> as well. I have no preference, so if you think <|end|> is better we can do that

Good catch. Damn. Will have to dig a bit deeper. So far, though, the .jinja template as well as the Ollama template at https://ollama.com/library/gpt-oss:20b/blobs/51468a0fd901 both seem to suggest <|end|> is preferred (in fact there is no <|return|> in the Ollama template anywhere). Not to put too much faith on the Ollama devs or anything.

Another hint as to why this <|return|> thing exists at all may be

When training: If you’re formatting examples for training, you generally want to include the chain of thought in the final message. The right place to do this is in the thinking key.

from https://huggingface.co/blog/welcome-openai-gpt-oss

where they talk about the fact you should only train on/unmask the very last message due to how the thinking stuff works. Maybe <|return|> is helpful when doing training somehow.

Either way, I think using <|end|> is a safe bet for now, but you're right that it seems a bit inconsistent.. :/

@LostRuins
Copy link
Owner

LostRuins commented Aug 8, 2025

image

it's mostly this multiturn example that has me conflicted (from openai docs website above).

@kallewoof
Copy link
Author

Yeah, that example looks broken to me. <|return|> is, as they themselves define it, purely a stop token:

Indicates the model is done with sampling the response message. A valid “stop token” indicating that you should stop inference.

It has no place in a chat log other than to tell the inference engine that the model is done. <|end|> is, again their definition:

Indicates the end of a message

There may be some subtle distinction in their minds between "end of the turn" and "finished completion" which has to do with tool calling or things, but they're not being very obvious about it:

The harmony response format consists of “messages” with the model potentially generating multiple messages in one go. The general structure of a message is as follows:
<|start|>{header}<|message|>{content}<|end|>
The {header} contains a series of meta information including the role. <|end|> represents the end of a fully completed message but the model might also use other stop tokens such as <|call|> for tool calling and <|return|> to indicate the model is done with the completion.

The conclusion I think is: if the front end does not add <|return|> to their stop strings, they will potentially get weirdness. But the same probably goes for <|end|>. We do know for sure that <|end|> is used to mark the end of messages, both by user and assistant. Their message definition clearly includes an <|end|>. And their jinja template says, as noted earlier that <|return|> should never be an input to the model:

{#- <|return|> indicates the end of generation, but <|end|> does not #}
{#- <|return|> should never be an input to the model, but we include it as the final token #}
{#- when training, so the model learns to emit it. #}
{{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}

@LostRuins
Copy link
Owner

alright lets merge this

@LostRuins LostRuins merged commit 866cc34 into LostRuins:concedo_experimental Aug 8, 2025
@kallewoof kallewoof deleted the 202508-openai-adapter-tweak branch August 8, 2025 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants