-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Closed as not planned
Closed as not planned
Copy link
Labels
Description
Git commit
git checkout -b b4667
Operating systems
Linux
GGML backends
CUDA
Problem description & steps to reproduce
In streaming output mode, the content in delta is missing from the second to last data, which can cause some third-party applications to make errors when calling it. The data style is as follows:
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":""}}],"created":1738980577,"id":"zhp","model":"DeepSeek-R1-UD-IQ1_M","system_fingerprint":"b0-unknown","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":"stop","index":0,"delta":{}}],"created":1738980577,"id":"hp","model":"DeepSeek-R1-UD-IQ1_M","system_fingerprint":"b0-unknown","object":"chat.completion.chunk","usage":{"completion_tokens":206,"prompt_tokens":12,"total_tokens":218},"timings":{"prompt_n":10,"prompt_ms":970.0,"prompt_per_token_ms":97.0,"prompt_per_second":10.309278350515463,"predicted_n":206,"predicted_ms":25605.215,"predicted_per_token_ms":124.29716019417477,"predicted_per_second":8.045236097412188}}
data: [DONE]
First Bad Commit
No response
Compile command
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j96Relevant log output
no log