Eval bug: Tools In Prompt Crashing On gpt-oss 20b

### Name and Version

Ran with:

```sh
$ git pull
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1
```

This happens when I use Claude-code to route requests so there is a lot of tokens being sent here

Llama.cpp verbose logs:
```
(base) kennethgoodman@kenneths-mbp-4 llama.cpp % tail -f tmp.log
slot process_toke: id  0 | task 652 | n_decoded = 54, n_remaining = 31946, next token: 200002 ''
slot      release: id  0 | task 652 | stop processing: n_past = 11197, truncated = 0
slot print_timing: id  0 | task 652 | 
prompt eval time =   13836.82 ms / 11063 tokens (    1.25 ms per token,   799.53 tokens per second)
       eval time =     823.03 ms /    54 tokens (   15.24 ms per token,    65.61 tokens per second)
      total time =   14659.85 ms / 11117 tokens
srv  update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input
```

Claude code logs:
```
[2025-08-05T21:37:04.228Z] Original Response: {
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": null
      }
    }
  ],
  "created": 1754429824,
  "id": "chatcmpl-nRGxMwrj1j6M96P70YtFvR3TmCRbVNJJ",
  "model": "gpt-oss-20b",
  "system_fingerprint": "b6096-fd1234cb",
  "object": "chat.completion.chunk"
}
[2025-08-05T21:37:04.229Z] send data: event: message_start
data: {"type":"message_start","message":{"id":"msg_1754429809537","type":"message","role":"assistant","content":[],"model":"gpt-oss-20b","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":1,"output_tokens":1}}}
[2025-08-05T21:37:05.802Z] cancle stream: null
```

### Operating systems

Mac

### GGML backends

Metal

### Hardware

Apple M2 Max 96GB

### Models

ggml-org/gpt-oss-20b-GGUF

### Problem description & steps to reproduce

Build:
```
$ git pull
$ cmake --build build --config Release -j 8
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1
```

claude code router config: 
```
{
  "LOG": true,
  "API_TIMEOUT_MS": 600000,
  "Providers": [
    {
      "name": "ollama",
      "api_base_url": "http://127.0.0.1:8080/v1/chat/completions",
      "api_key": "ollama",
      "models": ["gpt-oss-20b"],
    }
  ],
  "Router": {
    "default": "ollama,gpt-oss-20b",
  }
}
```

running claude code router:
```
$ ccr code
```

Then just any message like "hi" will crash

### First Bad Commit

_No response_

### Relevant log output

```shell
prompt eval time =   13836.82 ms / 11063 tokens (    1.25 ms per token,   799.53 tokens per second)
       eval time =     823.03 ms /    54 tokens (   15.24 ms per token,    65.61 tokens per second)
      total time =   14659.85 ms / 11117 tokens
srv  update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Tools In Prompt Crashing On gpt-oss 20b #15102

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Tools In Prompt Crashing On gpt-oss 20b #15102

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions