server: fix OpenAI Streaming API compatibility for usage statistics in chat streams #15444

TeoZosa · 2025-08-20T07:58:08Z

Fixes OpenAI Streaming API spec compatibility for chat streams which include usage statistics (the default in llama-server).

Closes:

Misc. bug: server: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API #15443

TeoZosa · 2025-08-20T08:02:03Z

tools/server/server.cpp

The only (non-test) PR change: adding an extra chunk with an empty choices array and setting usage stats there.

Signposting that this change looks to be backwards-compatible with the bench script which checks if a chunk contains a usage field, independent of the choices content:

llama.cpp/tools/server/bench/script.js

Lines 128 to 136 in 1d36b36

if (chunk.usage) {

prompt_tokens = chunk.usage.prompt_tokens

llamacpp_prompt_tokens.add(prompt_tokens)

llamacpp_prompt_tokens_total_counter.add(prompt_tokens)

completions_tokens = chunk.usage.completion_tokens

llamacpp_completion_tokens.add(completions_tokens)

llamacpp_completion_tokens_total_counter.add(completions_tokens)

}

…eams

h9j6k · 2025-08-21T00:22:36Z

Hello,

I am getting can't access property "delta", ie.choices[0] is undefined in a browser pop-up error msg when using llama-server as before.

./llama-server -m ~/llm/models/google_gemma-3n-E4B-it-Q4_0.gguf -ot per_layer_token_embd.weight=CPU --host HOSTNAME --port PORT -c 8192 -b 8192 -e -ngl 99 -t 8 -n -1 --no-mmap -fa --jinja

Could it be related to this commit? Thanks.

TeoZosa · 2025-08-21T01:10:25Z

Hello,

I am getting can't access property "delta", ie.choices[0] is undefined in a browser pop-up error msg when using llama-server as before.
./llama-server -m ~/llm/models/google_gemma-3n-E4B-it-Q4_0.gguf -ot per_layer_token_embd.weight=CPU --host HOSTNAME --port PORT -c 8192 -b 8192 -e -ngl 99 -t 8 -n -1 --no-mmap -fa --jinja
Could it be related to this commit? Thanks.

Most likely! My fault for not catching what other compatibility was affected outside of tests and calling the model directly. It is most likely due to this line:

llama.cpp/tools/server/webui/src/utils/app.context.tsx

Line 258 in 5682a37

const addedContent = chunk.choices[0].delta.content;

I can make a PR later today (assuming no one else gets to it first).

…eams (ggml-org#15444)

The "choices" in the last chunk can be empty, so save the last non-empty in order to record the streaming response properly. Without this patch we don't properly record a streaming response after llama.cpp has been bumped to include ggml-org/llama.cpp#15444. Signed-off-by: Dorin Geman <[email protected]>

TeoZosa requested a review from ngxson as a code owner August 20, 2025 07:58

github-actions bot added examples server labels Aug 20, 2025

TeoZosa force-pushed the server/openai-api-spec-compatibility/chat-completion-chunk-usage-statistics-chunk branch from 9d92f7b to 6c37034 Compare August 20, 2025 07:59

TeoZosa commented Aug 20, 2025

View reviewed changes

TeoZosa force-pushed the server/openai-api-spec-compatibility/chat-completion-chunk-usage-statistics-chunk branch from 6c37034 to d4cca6b Compare August 20, 2025 08:15

github-actions bot added the python python script changes label Aug 20, 2025

TeoZosa force-pushed the server/openai-api-spec-compatibility/chat-completion-chunk-usage-statistics-chunk branch 3 times, most recently from c4f2bc1 to ba37940 Compare August 20, 2025 09:59

server: fix OpenAI API compatibility for usage statistics in chat str…

3ec1bc7

…eams

TeoZosa force-pushed the server/openai-api-spec-compatibility/chat-completion-chunk-usage-statistics-chunk branch from ba37940 to 3ec1bc7 Compare August 20, 2025 11:58

ngxson approved these changes Aug 20, 2025

View reviewed changes

ngxson merged commit 1bc664a into ggml-org:master Aug 20, 2025
49 checks passed

TeoZosa mentioned this pull request Aug 20, 2025

Misc. bug: server: Usage statistics in chat streams added to slightly different chunk from OpenAI Streaming API #15443

Closed

This was referenced Aug 21, 2025

[WIP] server: fix web UI access of final chunk with empty choices array #15464

Closed

Misc. bug: WebUI (server) crash after streeaming response #15461

Closed

sighpher mentioned this pull request Aug 21, 2025

Misc. bug: Long-prompt decode crash with MoE #15481

Closed

qnixsynapse pushed a commit to janhq/llama.cpp that referenced this pull request Aug 22, 2025

server: fix OpenAI API compatibility for usage statistics in chat str…

d051f99

…eams (ggml-org#15444)

zhouxihong1 mentioned this pull request Aug 26, 2025

Misc. bug: llama-server JS error "can't access property "delta", B.choices[0] is undefined" #15571

Closed

doringeman mentioned this pull request Sep 11, 2025

fix(OpenAIRecorder): save last valid "choices" docker/model-runner#156

Merged

This was referenced Sep 17, 2025

Misc. bug: server is always sending usage statistic #16048

Closed

server : include usage statistics only when user request them #16052

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: fix OpenAI Streaming API compatibility for usage statistics in chat streams #15444

server: fix OpenAI Streaming API compatibility for usage statistics in chat streams #15444

Uh oh!

TeoZosa commented Aug 20, 2025 •

edited

Loading

Uh oh!

TeoZosa Aug 20, 2025 •

edited

Loading

Uh oh!

TeoZosa Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

h9j6k commented Aug 21, 2025

Uh oh!

TeoZosa commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (chunk.usage) {
	prompt_tokens = chunk.usage.prompt_tokens
	llamacpp_prompt_tokens.add(prompt_tokens)
	llamacpp_prompt_tokens_total_counter.add(prompt_tokens)

	completions_tokens = chunk.usage.completion_tokens
	llamacpp_completion_tokens.add(completions_tokens)
	llamacpp_completion_tokens_total_counter.add(completions_tokens)
	}

server: fix OpenAI Streaming API compatibility for usage statistics in chat streams #15444

server: fix OpenAI Streaming API compatibility for usage statistics in chat streams #15444

Uh oh!

Conversation

TeoZosa commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TeoZosa Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TeoZosa Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

h9j6k commented Aug 21, 2025

Uh oh!

TeoZosa commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TeoZosa commented Aug 20, 2025 •

edited

Loading

TeoZosa Aug 20, 2025 •

edited

Loading

TeoZosa Aug 20, 2025 •

edited

Loading