-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Open
Labels
Description
Name and Version
Ran with:
$ git pull
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1
This happens when I use Claude-code to route requests so there is a lot of tokens being sent here
Llama.cpp verbose logs:
(base) kennethgoodman@kenneths-mbp-4 llama.cpp % tail -f tmp.log
slot process_toke: id 0 | task 652 | n_decoded = 54, n_remaining = 31946, next token: 200002 ''
slot release: id 0 | task 652 | stop processing: n_past = 11197, truncated = 0
slot print_timing: id 0 | task 652 |
prompt eval time = 13836.82 ms / 11063 tokens ( 1.25 ms per token, 799.53 tokens per second)
eval time = 823.03 ms / 54 tokens ( 15.24 ms per token, 65.61 tokens per second)
total time = 14659.85 ms / 11117 tokens
srv update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input
Claude code logs:
[2025-08-05T21:37:04.228Z] Original Response: {
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"role": "assistant",
"content": null
}
}
],
"created": 1754429824,
"id": "chatcmpl-nRGxMwrj1j6M96P70YtFvR3TmCRbVNJJ",
"model": "gpt-oss-20b",
"system_fingerprint": "b6096-fd1234cb",
"object": "chat.completion.chunk"
}
[2025-08-05T21:37:04.229Z] send data: event: message_start
data: {"type":"message_start","message":{"id":"msg_1754429809537","type":"message","role":"assistant","content":[],"model":"gpt-oss-20b","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":1,"output_tokens":1}}}
[2025-08-05T21:37:05.802Z] cancle stream: null
Operating systems
Mac
GGML backends
Metal
Hardware
Apple M2 Max 96GB
Models
ggml-org/gpt-oss-20b-GGUF
Problem description & steps to reproduce
Build:
$ git pull
$ cmake --build build --config Release -j 8
$ ./build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 -fa --jinja --reasoning-format none -v --verbose-prompt > tmp.log 2>&1
claude code router config:
{
"LOG": true,
"API_TIMEOUT_MS": 600000,
"Providers": [
{
"name": "ollama",
"api_base_url": "http://127.0.0.1:8080/v1/chat/completions",
"api_key": "ollama",
"models": ["gpt-oss-20b"],
}
],
"Router": {
"default": "ollama,gpt-oss-20b",
}
}
running claude code router:
$ ccr code
Then just any message like "hi" will crash
First Bad Commit
No response
Relevant log output
prompt eval time = 13836.82 ms / 11063 tokens ( 1.25 ms per token, 799.53 tokens per second)
eval time = 823.03 ms / 54 tokens ( 15.24 ms per token, 65.61 tokens per second)
total time = 14659.85 ms / 11117 tokens
srv update_chat_: Parsing chat message: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
Parsing input with format GPT-OSS: <|channel|>analysis<|message|>The user says "hi". It's a greeting. We should respond simply, probably "Hello!" or similar. According to guidelines, greeting response: maybe just "Hello!" We should not mention todo or anything.<|start|>assistant<|channel|>final<|message|>Hello!
(lldb) process attach --pid 5320
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected content at end of input
marceldev89, EndlessReform, Heliem, bluecoconut, tombl and 7 more