Skip to content

Add parser for GPT-OSS Harmony tool call format#146

Open
aleroot wants to merge 2 commits intoml-explore:mainfrom
aleroot:harmony_tools
Open

Add parser for GPT-OSS Harmony tool call format#146
aleroot wants to merge 2 commits intoml-explore:mainfrom
aleroot:harmony_tools

Conversation

@aleroot
Copy link
Copy Markdown
Contributor

@aleroot aleroot commented Mar 13, 2026

Proposed changes

Add parser for GPT-OSS Harmony tool call format

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@ronaldmannak
Copy link
Copy Markdown
Contributor

FYI: This approach looks similar to #116. The issue is you want to parse Harmony on the token level, not string level. See PicoHarmony for example.

@aleroot
Copy link
Copy Markdown
Contributor Author

aleroot commented Mar 21, 2026

FYI: This approach looks similar to #116. The issue is you want to parse Harmony on the token level, not string level. See PicoHarmony for example.

In order to parse at token level, it is not necessary to link against the official rust library that is overkill for this project from my point of view. I have quickly implemented a possible small Swift state machine approach(second commit), which avoids FFI complexity. If this kind of approach is something the project maintainer(@davidkoski) would consider I can improve it eventually.

@ronaldmannak
Copy link
Copy Markdown
Contributor

I missed the token-level outer state machine on my first pass and only noticed the decoded-string header parsing, so apologies for that.

I agree that token-level parsing does not require pulling in the official Rust Harmony library, and I am not arguing that this project should take on that dependency. A small Swift state machine seems like a reasonable direction if the goal is to keep the implementation lightweight and avoid FFI complexity.

My concern is more about where the current boundary is drawn. The outer framing is token-based, but the header interpretation is still done on decoded strings, and the remaining visible output still appears to be flattened through the normal detokenizer. That means tool extraction improves, but clients still receive Harmony-tagged text for non-tool frames (e.g. <|channel|>analysis<|message|>Let me think.<|end|><|channel|>final<|message|>Done.), which is hard to handle downstream if they care about Harmony structure.

So I think the main question for the maintainer is less “should this use the official parser?” and more “what level of Harmony support does mlx-swift-lm want to provide?” If the goal is only lightweight tool-call extraction in the .chunk path, this seems like a useful step. But as mentioned, it still pushes an important part of the parsing problem downstream, since non-tool Harmony structure is flattened into strings. If the goal is broader Harmony-aware streaming support, I think more structure probably needs to be preserved than .chunk currently provides.

Related to that, GenerateTokens() was added in MLXLMCommon/Evaluate.swift specifically so clients can handle Harmony parsing the way they want.

@louis-jan louis-jan mentioned this pull request Mar 22, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants