tweak OpenAI Harmony autoguess developer prefix and assistant end token #1673

kallewoof · 2025-08-08T06:29:47Z

The system message needs an # Instructions\n\n prefix.
The adapter_end <|return|> token is used for training, and we want to use <|end|> for the generic case. See details below.

The .jinja template has this:

{#- Render developer message #}
{%- if developer_message or tools %}
    {{- "<|start|>developer<|message|>" }}
    {%- if developer_message %}
        {{- "# Instructions\n\n" }}
        {{- developer_message }}
    {%- endif %}
    {%- if tools -%}
        {{- "\n\n" }}
        {{- "# Tools\n\n" }}
        {{- render_tool_namespace("functions", tools) }}
    {%- endif -%}
    {{- "<|end|>" }}
{%- endif %}

i.e. it prefixes the developer (system) message with # Instructions\n\n.

(Caught in #1659, but pushing as a separate PR as I don't know if that PR will survive.)

Edit: after re-running the test, I'm also seeing this

OpenAI Harmony                 = OpenAI Harmony                 : missing expected fragment
	adapter:  <|start|>assistant<|channel|>final<|message|>asst_1<|return|>
	tokenizer: <|start|>assistant<|channel|>final<|message|>asst_1<|end|><|start|>user<|message|>user_2<|end|> openai/gpt-oss-120b

which indicates the assistant end prefix should be <|end|> and not <|return|>. I'm not clear on the difference atm.

For [{"role":"developer","content":"hi"},{"role":"user","content":"hello"},{"role":"assistant","content":"e"},{"role":"user","content":"o"}], the chat template outputs (with add_generation_prompt=True):

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|end|><|start|>user<|message|>6\n'
 '<|end|><|start|>assistant')
>>>

(add_generation_prompt=False):

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|end|><|start|>user<|message|>6\n'
 '<|end|>')

A-HA! This thing is specifically for training: when the assistant is the last role and there is no add_generation_prompt, it assumes you are generating training examples, and it uses <|return|> instead of <|end|>, as described in

        {%- elif loop.last and not add_generation_prompt %}
            {#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
            {#- This is a situation that should only occur in training, never in inference. #}
            {%- if "thinking" in message %}
                {{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
            {%- endif %}
            {#- <|return|> indicates the end of generation, but <|end|> does not #}
            {#- <|return|> should never be an input to the model, but we include it as the final token #}
            {#- when training, so the model learns to emit it. #}
            {{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}

-- the above with the last user turn removed and add_generation_prompt=False:

('<|start|>system<|message|>You are ChatGPT, a large language model trained by '
 'OpenAI.\n'
 'Knowledge cutoff: 2024-06\n'
 'Current date: 2025-08-08\n'
 '\n'
 'Reasoning: medium\n'
 '\n'
 '# Valid channels: analysis, commentary, final. Channel must be included for '
 'every message.<|end|><|start|>developer<|message|># Instructions\n'
 '\n'
 '1\n'
 '<|end|><|start|>user<|message|>2\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>3\n'
 '<|end|><|start|>user<|message|>4\n'
 '<|end|><|start|>assistant<|channel|>final<|message|>5\n'
 '<|return|>')

Nifty. But it does mean the adapter should use <|end|>, I'd say.

LostRuins · 2025-08-08T08:45:55Z

OpenAI's docs are inconsistent. Some examples they use <|return|> as well. I have no preference, so if you think <|end|> is better we can do that

LostRuins

do let me know the decision.

Edit: maybe <|return|> is more like an EOS?

kallewoof · 2025-08-08T10:02:19Z

OpenAI's docs are inconsistent. Some examples they use <|return|> as well. I have no preference, so if you think <|end|> is better we can do that

Good catch. Damn. Will have to dig a bit deeper. So far, though, the .jinja template as well as the Ollama template at https://ollama.com/library/gpt-oss:20b/blobs/51468a0fd901 both seem to suggest <|end|> is preferred (in fact there is no <|return|> in the Ollama template anywhere). Not to put too much faith on the Ollama devs or anything.

Another hint as to why this <|return|> thing exists at all may be

When training: If you’re formatting examples for training, you generally want to include the chain of thought in the final message. The right place to do this is in the thinking key.

from https://huggingface.co/blog/welcome-openai-gpt-oss

where they talk about the fact you should only train on/unmask the very last message due to how the thinking stuff works. Maybe <|return|> is helpful when doing training somehow.

Either way, I think using <|end|> is a safe bet for now, but you're right that it seems a bit inconsistent.. :/

LostRuins · 2025-08-08T12:05:40Z

it's mostly this multiturn example that has me conflicted (from openai docs website above).

kallewoof · 2025-08-08T12:45:21Z

Yeah, that example looks broken to me. <|return|> is, as they themselves define it, purely a stop token:

Indicates the model is done with sampling the response message. A valid “stop token” indicating that you should stop inference.

It has no place in a chat log other than to tell the inference engine that the model is done. <|end|> is, again their definition:

Indicates the end of a message

There may be some subtle distinction in their minds between "end of the turn" and "finished completion" which has to do with tool calling or things, but they're not being very obvious about it:

The harmony response format consists of “messages” with the model potentially generating multiple messages in one go. The general structure of a message is as follows:
<|start|>{header}<|message|>{content}<|end|>
The {header} contains a series of meta information including the role. <|end|> represents the end of a fully completed message but the model might also use other stop tokens such as <|call|> for tool calling and <|return|> to indicate the model is done with the completion.

{#- <|return|> indicates the end of generation, but <|end|> does not #}
{#- <|return|> should never be an input to the model, but we include it as the final token #}
{#- when training, so the model learns to emit it. #}
{{- "<|start|>assistant<|channel|>final<|message|>" + message.content + "<|return|>" }}

LostRuins · 2025-08-08T13:14:50Z

alright lets merge this

tweak OpenAI Harmony autoguess developer prefix

429e1e7

kallewoof marked this pull request as draft August 8, 2025 06:36

use <|end|> for adapter end

cd745c6

kallewoof marked this pull request as ready for review August 8, 2025 06:42

kallewoof changed the title ~~tweak OpenAI Harmony autoguess developer prefix~~ tweak OpenAI Harmony autoguess developer prefix and assistant end token Aug 8, 2025

LostRuins approved these changes Aug 8, 2025

View reviewed changes

LostRuins merged commit 866cc34 into LostRuins:concedo_experimental Aug 8, 2025

kallewoof deleted the 202508-openai-adapter-tweak branch August 8, 2025 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tweak OpenAI Harmony autoguess developer prefix and assistant end token #1673

tweak OpenAI Harmony autoguess developer prefix and assistant end token #1673

Uh oh!

kallewoof commented Aug 8, 2025 •

edited

Loading

Uh oh!

LostRuins commented Aug 8, 2025

Uh oh!

LostRuins left a comment •

edited

Loading

Uh oh!

kallewoof commented Aug 8, 2025

Uh oh!

LostRuins commented Aug 8, 2025 •

edited

Loading

Uh oh!

kallewoof commented Aug 8, 2025

Uh oh!

LostRuins commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tweak OpenAI Harmony autoguess developer prefix and assistant end token #1673

tweak OpenAI Harmony autoguess developer prefix and assistant end token #1673

Uh oh!

Conversation

kallewoof commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Aug 8, 2025

Uh oh!

LostRuins left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallewoof commented Aug 8, 2025

Uh oh!

LostRuins commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Aug 8, 2025

Uh oh!

LostRuins commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kallewoof commented Aug 8, 2025 •

edited

Loading

LostRuins left a comment •

edited

Loading

LostRuins commented Aug 8, 2025 •

edited

Loading