[BUG] Qwen3-Next Family Uses Incorrect Tool Call Format

**Describe the bug**
Qwen3-Next and Qwen3-Coder-Next use an XML-based tool calling format rather than the default JSON format. Currently, ToolCallFormat.infer(from:) does not recognize the qwen3_next prefix, causing it to fall back to the JSON parser, which fails to extract tool calls from the model output.

Thankfully, there is a very easy short term fix and perhaps a longer term change to consider.

**Sort term:**
Update ToolCallFormat.infer(from:) to include detection for the Qwen3-Next family as shown below:
```swift
    public static func infer(from modelType: String) -> ToolCallFormat? {
        let type = modelType.lowercased()

        // ...

        // ADD THIS SECTION
        // Qwen3 Coder / Next family (qwen3_next, etc.)
        if type.hasPrefix("qwen3_next") {
            return .xmlFunction
        }

        // Qwen3.5 family (qwen3_5, qwen3_5_moe, etc.)
        if type.hasPrefix("qwen3_5") {
            return .xmlFunction
        }

        // ...

        return nil
    }
```

I tested this fix and it allowed tool calling to work successfully.

**Long term:**
You may want to consider updating the parser detection to be closer to what the mlx-lm python library uses.  I asked Opus to compare the tool calling logic between the 2 libraries and here was the response:

------ BEGIN LLM OUTPUT ------
1. **Python `mlx-lm`** at `mlx_lm/tokenizer_utils.py` lines 484-488 infers the tool parser by detecting `<tool_call>\n<function=` in the chat template, and selects the `qwen3_coder` parser — which is the XML function format parser (`mlx_lm/tool_parsers/qwen3_coder.py`).

2. **Swift `mlx-swift-lm`**'s `XMLFunctionParser` is explicitly a port of that same Python file (as stated in its own header comment: `Reference: https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/tool_parsers/qwen3_coder.py`).

3. The Swift side just doesn't trigger it for `qwen3_next` because `ToolCallFormat.infer` only checks for `qwen3_5` prefix, not `qwen3_next`. The Python side doesn't have this gap because it inspects the chat template content rather than the model type string.
------ END LLM OUTPUT ------

In other words, instead of only hard coding each model family and what tool call parser it should use, the code could be more robust by also looking for known tags in the chat template as shown below from the python tokenizer_utils.py file (https://github.com/ml-explore/mlx-lm/blob/4d3af3cebcb26d1b4bfb37aa0876e966ad7ec954/mlx_lm/tokenizer_utils.py#L470):
```python
def _infer_tool_parser(chat_template):
    """Attempt to auto-infer a tool parser from the chat template."""
    if not isinstance(chat_template, str):
        return None
    elif "<minimax:tool_call>" in chat_template:
        return "minimax_m2"
    elif "<start_function_call>" in chat_template:
        return "function_gemma"
    elif "<longcat_tool_call>" in chat_template:
        return "longcat"
    elif "<arg_key>" in chat_template:
        return "glm47"
    elif "<|tool_list_start|>" in chat_template:
        return "pythonic"
    elif (
        "<tool_call>\\n<function=" in chat_template
        or "<tool_call>\n<function=" in chat_template
    ):
        return "qwen3_coder"
    elif "<|tool_calls_section_begin|>" in chat_template:
        return "kimi_k2"
    elif "[TOOL_CALLS]" in chat_template:
        return "mistral"
    elif "<tool_call>" in chat_template and "tool_call.name" in chat_template:
        return "json_tools"
    return None
```

Now that this Swift library has been gaining better and better tool calling support, this is a great time to improve the detection!

**To Reproduce**
Testing was performed with Qwen3-Coder-Next-4bit (https://huggingface.co/mlx-community/Qwen3-Coder-Next-4bit).
1. Load the model using mlx-swift-lm
2. Give the model access to a tool and send a request that would trigger that tool (i.e. "what time is it?")
3. See that the correct tool call output tokens are generated, but they are not detected by the library

Example output from the model provided below:
```
<tool_call>
  <function=search_web>
    <parameter=query>latest news</parameter>
  </function>
</tool_call>
```

**Expected behavior**
The library should detect the XML tool call and process it successfully.

**Desktop (please complete the following information):**
 - OS Version: MacOS 15
 - Device: M4 MacBook Pro
 - mlx-swift-lm Version: latest main from 6 days ago (https://github.com/ml-explore/mlx-swift-lm/tree/edd42fcd947eea0b19665248acf2975a28ddf58b)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen3-Next Family Uses Incorrect Tool Call Format #161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Qwen3-Next Family Uses Incorrect Tool Call Format #161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions