Skip to content

Bug: stream response without <think> tokenΒ #776

@calvin2021y

Description

@calvin2021y

What happened?

when use deepseek, qwen-think, glm in think mode. the response start without <think> token.

I try unsloth and ubergarm gguf, all have this problem.

Name and Version

for the last 2 week, I try rebuild at every new commit. they all has this problem.

What operating system are you seeing the problem on?

Linux

Relevant log output

llama-server --port 1024 -a a --threads 40 --threads-batch 80 --no-mmap -vq --no-display-prompt -m unsloth/DeepSeek-V3.1-UD-IQ1_M-00001-of-00005.gguf --jinja --chat-template-kwargs {"thinking":true} --temp 0.6 --top-p 0.95 -c 65536 -np 1 -mla 3 -fmoe -ctk q8_0 -fa -ub 4096 -b 4096



curl -v -N http://127.0.0.1:1024/v1/chat/completions -H 'Connection: keep-alive' -H 'Content-Type: application/json' --data-raw '{"messages":[{"role":"user","content":"hi"}],"stream":false,"cache_prompt":false,"timings_per_token":false}'

*   Trying 127.0.0.1:1024...
* Connected to 127.0.0.1 (127.0.0.1) port 1024 (#0)
> POST /v1/chat/completions HTTP/1.1
> Host: 127.0.0.1:1024
> User-Agent: curl/7.88.1
> Accept: */*
> Connection: keep-alive
> Content-Type: application/json
> Content-Length: 107
> 
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: 
< Content-Length: 1011
< Content-Type: application/json; charset=utf-8
< Keep-Alive: timeout=5, max=5
< Server: llama.cpp
< 
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hmm, the user just said \"hi\" which is a simple greeting. No additional context or request provided. \n\nSince it's a casual opening, a warm and friendly response would be appropriate. No need to overcomplicate it. \n\nI can keep it simple with a cheerful greeting and an open-ended offer to help. That way the user can choose to elaborate or ask something specific. \n\nAdding a bit of emoji might make it feel more natural and engaging.</think>Hi! How can I help you today? 😊"}}],"created":1757612171,"model":"gpt-3.5-turbo-0613","object":"chat.completion","usage":{"completion_tokens":108,"prompt_tokens":5,"total_tokens":113},"id":"chatcmpl-sQiTyk5PlTbcQwji39vAqpblZvECeGHG","timings":{"prompt_n":5,"prompt_ms":450.934,"prompt_per_token_ms":90.1868,"prompt_per_second":11.088097149471984,"predicted_n":108,"predicted_ms":12810.345,"predicted_per_token_ms":118.61430555555555,"predicted_per_second":8.43068629299211}}* Connection #0 to host 127.0.0.1 left intact

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions