Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

Summary

This patch teaches the Granite chat initializer and parser to dynamically detect and honor whichever sentinel format the Jinja template uses - either <|tool_call|> or the plain <tool_call> / </tool_call> pair - without breaking existing Hermes 2 Pro or Qwen templates.

Details

  • During initialization, the code inspects the Jinja source to determine which sentinel style is present.
  • Grammar construction now escapes the literal sentinel, supports an optional closing </tool_call>, and updates the preserved-token list so generations remain faithful to the template.
  • The Granite parser has been extended to accept both sentinel variants at runtime and optionally consume a closing tag, while leaving the fallback path untouched.
  • Template detection logic now matches both the legacy 'elif thinking' branch and the newer 'tools_system_message_prefix' signature, ensuring Granite templates are correctly routed instead of falling back to Hermes/Qwen handling.

Rationale

Granite's Jinja templates resemble Hermes 2's layout, which made them "accidentally compatible" with Hermes logic - but tool calls, document sections, and grammar boundaries could silently misalign.

This patch corrects that safely using a defensive auto-detection approach:
no global flags, no behavioral change for other models, and zero regression risk.

Result

Full Granite Jinja compatibility, correct tool-call parsing and grammar alignment, and stable cross-model behavior - all achieved through minimal, reversible, and backward-safe changes.

fixes #16415

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 12, 2025

Sans titre
srv  params_from_: Chat format: Granite
curl https://www.serveurperso.com/ia/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "MoE-Granite-4.0-h-small-32B",
    "stream": true,
    "parallel_tool_calls": true,
    "messages": [
      {
        "role": "system",
        "content": "Follow the Granite tool-call format exactly, including the <tool_call> JSON list when you invoke more than one tool."
      },
      {
        "role": "user",
        "content": "Call add with x=1,y=2 and multiply with a=3,b=4; respond only with the tool list."
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "add",
          "description": "Add two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "x": { "type": "integer" },
              "y": { "type": "integer" }
            },
            "required": ["x", "y"]
          }
        }
      },
      {
        "type": "function",
        "function": {
          "name": "multiply",
          "description": "Multiply two integers",
          "parameters": {
            "type": "object",
            "properties": {
              "a": { "type": "integer" },
              "b": { "type": "integer" }
            },
            "required": ["a", "b"]
          }
        }
  }'
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"role":"assistant","content":null}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"id":"Lr8TrmCzBBkGXtsF8lHyOtA4ank8rKSg","type":"function","function":{"name":"add","arguments":""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"x"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"1,"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"y"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"2}"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"id":"n1nz7Z2FUmJntPIyyYh2HeUOQ1yfJYnM","type":"function","function":{"name":"multiply","arguments":""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"a"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"3,"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\""}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"b"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"\":"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":1,"function":{"arguments":"4}"}}]}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk"}

data: {"choices":[{"finish_reason":"tool_calls","index":0,"delta":{}}],"created":1760292279,"id":"chatcmpl-x0qIHKvZr3kpaYy1Kl9H2UEKutgVcr0x","model":"MoE-Granite-4.0-h-small-32B","system_fingerprint":"b6761-464e1dce2","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":302,"prompt_ms":149.809,"prompt_per_token_ms":0.4960562913907285,"prompt_per_second":2015.9002463136392,"predicted_n":48,"predicted_ms":558.761,"predicted_per_token_ms":11.640854166666665,"predicted_per_second":85.90434908663991}}

data: [DONE]

@ServeurpersoCom ServeurpersoCom marked this pull request as draft October 12, 2025 18:06
@ServeurpersoCom ServeurpersoCom marked this pull request as ready for review October 12, 2025 18:08
@ServeurpersoCom
Copy link
Collaborator Author

Same test problem on master :

=== llama_chat_format_single (system message) ===

fmt_sys(chatml) : <|im_start|>system
You are a helpful assistant<|im_end|>

-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(mistral-v1) :  [INST] You are a helpful assistant


-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(mistral-v3) : [INST] You are a helpful assistant


-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(mistral-v3-tekken) : [INST]You are a helpful assistant


-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(mistral-v7) : [SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(llama2) : [INST] You are a helpful assistant

-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(llama2-sys) : [INST] <<SYS>>
You are a helpful assistant
<</SYS>>


-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(mistral) : [INST] You are a helpful assistant

-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(gemma) :
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(llama3) : <|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|>
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_sys(gigachat) : <s>You are a helpful assistant<|message_sep|>
-------------------------


=== llama_chat_format_single (user message) ===

fmt_single(chatml) :
<|im_start|>user
How are you<|im_end|>
<|im_start|>assistant

-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(mistral-v1) :  [INST] How are you [/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(mistral-v3) : [INST] How are you[/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(mistral-v3-tekken) : [INST]How are you[/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(mistral-v7) : [INST] How are you[/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(llama2) : [INST] How are you [/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(mistral) : [INST] How are you [/INST]
-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(gemma) :
<start_of_turn>user
How are you<end_of_turn>
<start_of_turn>model

-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(llama3) : <|start_header_id|>user<|end_header_id|>

How are you<|eot_id|><|start_header_id|>assistant<|end_header_id|>


-------------------------
Failed to infer a tool call example (possible template bug)
fmt_single(gigachat) : user<|role_sep|>How are you<|message_sep|>available functions<|role_sep|>[]<|message_sep|>assistant<|role_sep|>
-------------------------
(root|~/llama.cpp.pascal)

@ServeurpersoCom ServeurpersoCom marked this pull request as draft October 12, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Granite 4 template detection fails

1 participant