Skip to content

Fix tool calling for Llama 3#145

Open
aleroot wants to merge 1 commit intoml-explore:mainfrom
aleroot:main
Open

Fix tool calling for Llama 3#145
aleroot wants to merge 1 commit intoml-explore:mainfrom
aleroot:main

Conversation

@aleroot
Copy link
Copy Markdown
Contributor

@aleroot aleroot commented Mar 12, 2026

Proposed changes

Support multiple parallel tool calls and buffering for Llama 3

Llama 3 natively supports tool calling through an ipython environment which
generates arrays for multiple parallel tool invocations. Depending on the
model size and prompt, it generates either a JSON list of function objects
or a python-style array of function calls.

  • Sets startTag to <|python_tag|> to ensure ToolCallProcessor
    correctly buffers tool output without leaking it to the streaming UI.
  • Upgrades Llama3ToolCallParser to parse multiple parallel tool calls
    from JSON array payloads [{"name": ...}] during parseEOS.
  • Upgrades PythonicToolCallParser to extract multiple sequential
    pythonic function calls [func1(), func2()] via parseEOS.
  • Refactors PythonicToolCallParser to use modern high-performance
    Swift 5.7+ Regex literals instead of legacy NSRegularExpression.
  • Add integration unit tests for both parsers to verify multi-call arrays.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

@aleroot aleroot force-pushed the main branch 4 times, most recently from d5f8c0c to abb2ea1 Compare March 13, 2026 07:26
@davidkoski
Copy link
Copy Markdown
Collaborator

Sorry for the long turnaround here -- the PythonicToolCallParser has a conflict with #152. Can you resolve that or let me know and I will resolve it. Thanks!

Support multiple parallel tool calls and buffering for Llama 3

Llama 3 natively supports tool calling through an ipython environment which
generates arrays for multiple parallel tool invocations. Depending on the
model size and prompt, it generates either a JSON list of function objects
or a python-style array of function calls.

- Sets `startTag` to `<|python_tag|>` to ensure `ToolCallProcessor`
  correctly buffers tool output without leaking it to the streaming UI.
- Upgrades `Llama3ToolCallParser` to parse multiple parallel tool calls
  from JSON array payloads `[{"name": ...}]` during `parseEOS`.
- Upgrades `PythonicToolCallParser` to extract multiple sequential
  pythonic function calls `[func1(), func2()]` via `parseEOS`.
- Refactors `PythonicToolCallParser` to use modern high-performance
  Swift 5.7+ Regex literals instead of legacy NSRegularExpression.
- Add integration unit tests for both parsers to verify multi-call arrays.
@aleroot
Copy link
Copy Markdown
Contributor Author

aleroot commented Mar 27, 2026

Sorry for the long turnaround here -- the PythonicToolCallParser has a conflict with #152. Can you resolve that or let me know and I will resolve it. Thanks!

it should be fixed now.
@davidkoski I know it is off topic but I am looking for the equivalent of this project in C++, but I cannot find the mlx-cpp-lm and it looks like the mlx-lm is in python? I would like to embed MLX into my new in development coding agent in addition to llama.cpp, but I can't find this lm project in C++, do you know if it exists and where can I find it? Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants