Skip to content

Eval bug: Tools In Prompt Crashing On gpt-oss 20b #15102

@kennethgoodman

Description

@kennethgoodman

Name and Version

Ran with:

$ git pull
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1

This happens when I use Claude-code to route requests so there is a lot of tokens being sent here

Llama.cpp verbose logs:

(base) kennethgoodman@kenneths-mbp-4 llama.cpp % tail -f tmp.log
slot process_toke: id  0 | task 652 | n_decoded = 54, n_remaining = 31946, next token: 200002 ''
slot      release: id  0 | task 652 | stop processing: n_past = 11197, truncated = 0
slot print_timing: id  0 | task 652 | 
prompt eval time =   13836.82 ms / 11063 tokens (    1.25 ms per token,   799.53 tokens per second)
       eval time =     823.03 ms /    54 tokens (   15.24 ms per token,    65.61 tokens per second)
      total time =   14659.85 ms / 11117 tokens
srv  update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input

Claude code logs:

[2025-08-05T21:37:04.228Z] Original Response: {
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": null
      }
    }
  ],
  "created": 1754429824,
  "id": "chatcmpl-nRGxMwrj1j6M96P70YtFvR3TmCRbVNJJ",
  "model": "gpt-oss-20b",
  "system_fingerprint": "b6096-fd1234cb",
  "object": "chat.completion.chunk"
}
[2025-08-05T21:37:04.229Z] send data: event: message_start
data: {"type":"message_start","message":{"id":"msg_1754429809537","type":"message","role":"assistant","content":[],"model":"gpt-oss-20b","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":1,"output_tokens":1}}}
[2025-08-05T21:37:05.802Z] cancle stream: null

Operating systems

Mac

GGML backends

Metal

Hardware

Apple M2 Max 96GB

Models

ggml-org/gpt-oss-20b-GGUF

Problem description & steps to reproduce

Build:

$ git pull
$ cmake --build build --config Release -j 8
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1

claude code router config:

{
  "LOG": true,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "ollama",
      "api_base_url": "http://127.0.0.1:8080/v1/chat/completions",
      "api_key": "ollama",
      "models": ["gpt-oss-20b"],
    }
  ],
  "Router": {
    "default": "ollama,gpt-oss-20b",
  }
}

running claude code router:

$ ccr code

Then just any message like "hi" will crash

First Bad Commit

No response

Relevant log output

prompt eval time =   13836.82 ms / 11063 tokens (    1.25 ms per token,   799.53 tokens per second)
       eval time =     823.03 ms /    54 tokens (   15.24 ms per token,    65.61 tokens per second)
      total time =   14659.85 ms / 11117 tokens
srv  update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions