Skip to content

Conversation

@doringeman
Copy link
Contributor

Fix a bug where OpenAIRecorder doesn't properly capture a streaming response due to the fact that the last valid "choices" field comes in the second last chunk of the response.

The "choices" in the last chunk can be empty, so save the last non-empty in order to record the streaming response properly. Without this patch we don't properly record a streaming response after llama.cpp has been bumped to include ggml-org/llama.cpp#15444.

You can try it out with:

MODEL_RUNNER_PORT=8080 make run
$ curl http://localhost:8080/engines/v1/chat/completions \
 -X POST \
 -H "Content-Type: application/json" \
 -d '{
   "model": "ai/smollm2",
   "messages": [
     {
       "role": "user",
       "content": "Capital of Romania?"
     }
   ], "stream": true,
   "stream_options": {
     "include_usage": true
   }
 }'

The "choices" in the last chunk can be empty, so save the last non-empty in order to record the streaming response properly. Without this patch we don't properly record a streaming response after llama.cpp has been bumped to include ggml-org/llama.cpp#15444.

Signed-off-by: Dorin Geman <[email protected]>
@kiview
Copy link
Member

kiview commented Sep 11, 2025

Can we add the integration test of the correct behavior in conjunction with bundled llama.cpp in this repo?

@doringeman
Copy link
Contributor Author

@kiview We'll have a separate PR for setting up the initial plumbing for e2e/integration tests.
Do you mind if we merge this PR now?

@kiview
Copy link
Member

kiview commented Sep 11, 2025

I don't mind at all.

This is more wondering, if we can have such integration tests in this repo, instead of relying on e2e pinata tests for this (which are so much harder to maintain and brittle).

If possible, we should aim at having this integration test already here (since it is a breaking behavior of this component).

@doringeman
Copy link
Contributor Author

If possible, we should aim at having this integration test already here (since it is a breaking behavior of this component).

Totally agree!

@doringeman doringeman merged commit 239d3e1 into docker:main Sep 11, 2025
1 check passed
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants