Add parser for GPT-OSS Harmony tool call format by aleroot · Pull Request #146 · ml-explore/mlx-swift-lm

aleroot · 2026-03-13T17:51:29Z

Proposed changes

Add parser for GPT-OSS Harmony tool call format

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

ronaldmannak · 2026-03-21T00:06:24Z

FYI: This approach looks similar to #116. The issue is you want to parse Harmony on the token level, not string level. See PicoHarmony for example.

aleroot · 2026-03-21T11:16:46Z

FYI: This approach looks similar to #116. The issue is you want to parse Harmony on the token level, not string level. See PicoHarmony for example.

In order to parse at token level, it is not necessary to link against the official rust library that is overkill for this project from my point of view. I have quickly implemented a possible small Swift state machine approach(second commit), which avoids FFI complexity. If this kind of approach is something the project maintainer(@davidkoski) would consider I can improve it eventually.

ronaldmannak · 2026-03-21T16:34:33Z

I missed the token-level outer state machine on my first pass and only noticed the decoded-string header parsing, so apologies for that.

I agree that token-level parsing does not require pulling in the official Rust Harmony library, and I am not arguing that this project should take on that dependency. A small Swift state machine seems like a reasonable direction if the goal is to keep the implementation lightweight and avoid FFI complexity.

My concern is more about where the current boundary is drawn. The outer framing is token-based, but the header interpretation is still done on decoded strings, and the remaining visible output still appears to be flattened through the normal detokenizer. That means tool extraction improves, but clients still receive Harmony-tagged text for non-tool frames (e.g. <|channel|>analysis<|message|>Let me think.<|end|><|channel|>final<|message|>Done.), which is hard to handle downstream if they care about Harmony structure.

So I think the main question for the maintainer is less “should this use the official parser?” and more “what level of Harmony support does mlx-swift-lm want to provide?” If the goal is only lightweight tool-call extraction in the .chunk path, this seems like a useful step. But as mentioned, it still pushes an important part of the parsing problem downstream, since non-tool Harmony structure is flattened into strings. If the goal is broader Harmony-aware streaming support, I think more structure probably needs to be preserved than .chunk currently provides.

Related to that, GenerateTokens() was added in MLXLMCommon/Evaluate.swift specifically so clients can handle Harmony parsing the way they want.

Add parser for GPT-OSS Harmony tool call format

c1e8e36

aleroot force-pushed the harmony_tools branch from 1a04460 to c1e8e36 Compare March 14, 2026 06:49

Implement Token-level parsing simple approach

0309d6e

louis-jan mentioned this pull request Mar 22, 2026

Add harmony tool call parser #116

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser for GPT-OSS Harmony tool call format#146

Add parser for GPT-OSS Harmony tool call format#146
aleroot wants to merge 2 commits intoml-explore:mainfrom
aleroot:harmony_tools

aleroot commented Mar 13, 2026

Uh oh!

ronaldmannak commented Mar 21, 2026

Uh oh!

aleroot commented Mar 21, 2026

Uh oh!

ronaldmannak commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aleroot commented Mar 13, 2026

Proposed changes

Checklist

Uh oh!

ronaldmannak commented Mar 21, 2026

Uh oh!

aleroot commented Mar 21, 2026

Uh oh!

ronaldmannak commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants