Skip to content

[BUG] Qwen3-Next Family Uses Incorrect Tool Call Format #161

@neon52

Description

@neon52

Describe the bug
Qwen3-Next and Qwen3-Coder-Next use an XML-based tool calling format rather than the default JSON format. Currently, ToolCallFormat.infer(from:) does not recognize the qwen3_next prefix, causing it to fall back to the JSON parser, which fails to extract tool calls from the model output.

Thankfully, there is a very easy short term fix and perhaps a longer term change to consider.

Sort term:
Update ToolCallFormat.infer(from:) to include detection for the Qwen3-Next family as shown below:

    public static func infer(from modelType: String) -> ToolCallFormat? {
        let type = modelType.lowercased()

        // ...

        // ADD THIS SECTION
        // Qwen3 Coder / Next family (qwen3_next, etc.)
        if type.hasPrefix("qwen3_next") {
            return .xmlFunction
        }

        // Qwen3.5 family (qwen3_5, qwen3_5_moe, etc.)
        if type.hasPrefix("qwen3_5") {
            return .xmlFunction
        }

        // ...

        return nil
    }

I tested this fix and it allowed tool calling to work successfully.

Long term:
You may want to consider updating the parser detection to be closer to what the mlx-lm python library uses. I asked Opus to compare the tool calling logic between the 2 libraries and here was the response:

------ BEGIN LLM OUTPUT ------

  1. Python mlx-lm at mlx_lm/tokenizer_utils.py lines 484-488 infers the tool parser by detecting <tool_call>\n<function= in the chat template, and selects the qwen3_coder parser — which is the XML function format parser (mlx_lm/tool_parsers/qwen3_coder.py).

  2. Swift mlx-swift-lm's XMLFunctionParser is explicitly a port of that same Python file (as stated in its own header comment: Reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/tool_parsers/qwen3_coder.py).

  3. The Swift side just doesn't trigger it for qwen3_next because ToolCallFormat.infer only checks for qwen3_5 prefix, not qwen3_next. The Python side doesn't have this gap because it inspects the chat template content rather than the model type string.
    ------ END LLM OUTPUT ------

In other words, instead of only hard coding each model family and what tool call parser it should use, the code could be more robust by also looking for known tags in the chat template as shown below from the python tokenizer_utils.py file (https://github.com/ml-explore/mlx-lm/blob/4d3af3cebcb26d1b4bfb37aa0876e966ad7ec954/mlx_lm/tokenizer_utils.py#L470):

def _infer_tool_parser(chat_template):
    """Attempt to auto-infer a tool parser from the chat template."""
    if not isinstance(chat_template, str):
        return None
    elif "<minimax:tool_call>" in chat_template:
        return "minimax_m2"
    elif "<start_function_call>" in chat_template:
        return "function_gemma"
    elif "<longcat_tool_call>" in chat_template:
        return "longcat"
    elif "<arg_key>" in chat_template:
        return "glm47"
    elif "<|tool_list_start|>" in chat_template:
        return "pythonic"
    elif (
        "<tool_call>\\n<function=" in chat_template
        or "<tool_call>\n<function=" in chat_template
    ):
        return "qwen3_coder"
    elif "<|tool_calls_section_begin|>" in chat_template:
        return "kimi_k2"
    elif "[TOOL_CALLS]" in chat_template:
        return "mistral"
    elif "<tool_call>" in chat_template and "tool_call.name" in chat_template:
        return "json_tools"
    return None

Now that this Swift library has been gaining better and better tool calling support, this is a great time to improve the detection!

To Reproduce
Testing was performed with Qwen3-Coder-Next-4bit (https://huggingface.co/mlx-community/Qwen3-Coder-Next-4bit).

  1. Load the model using mlx-swift-lm
  2. Give the model access to a tool and send a request that would trigger that tool (i.e. "what time is it?")
  3. See that the correct tool call output tokens are generated, but they are not detected by the library

Example output from the model provided below:

<tool_call>
  <function=search_web>
    <parameter=query>latest news</parameter>
  </function>
</tool_call>

Expected behavior
The library should detect the XML tool call and process it successfully.

Desktop (please complete the following information):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions