Skip to content

Conversation

@mglambda
Copy link
Contributor

@mglambda mglambda commented Mar 7, 2025

I noticed that the /chat/completions and /v1/completions endpoints do not return the "__verbose" field in the final server response when running llama-server with -lv 10 and streaming enabled. This is inconsistent with the non-streaming behaviour of those endpoints.

This pr adds verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.

This was motivated by me wanting to know the tokens_cached in streaming mode on those endpoints. If anyone has another way of doing that, preferably without verbose mode, I'd be very interested.

@ngxson
Copy link
Collaborator

ngxson commented Mar 7, 2025

I can merge only if you can fix the CI

@mglambda mglambda requested a review from ggerganov as a code owner March 8, 2025 06:30
@github-actions github-actions bot added script Script related devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 8, 2025
@mglambda
Copy link
Contributor Author

mglambda commented Mar 8, 2025

Ok, let me try to figure out how to run the CI locally.

@mglambda mglambda force-pushed the server-verbose-result-oaicompat-stream branch from 7a54db2 to 46cca0d Compare March 8, 2025 09:21
@github-actions github-actions bot added documentation Improvements or additions to documentation build Compilation issues testing Everything test related android Issues specific to Android Nvidia GPU Issues specific to Nvidia GPUs nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Vulkan Issues specific to the Vulkan backend python python script changes SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Kompute https://github.com/KomputeProject/kompute/ labels Mar 8, 2025
@mglambda mglambda force-pushed the server-verbose-result-oaicompat-stream branch from 46cca0d to 35d63f1 Compare March 8, 2025 09:41
@mglambda
Copy link
Contributor Author

mglambda commented Mar 8, 2025

Ok I saw the whitespace problem. Hope things are fine now. Baby's first pr.

Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.
@mglambda mglambda force-pushed the server-verbose-result-oaicompat-stream branch from 35d63f1 to d261b13 Compare March 23, 2025 15:55
@ngxson ngxson merged commit 77f9c6b into ggml-org:master Mar 23, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

android Issues specific to Android Apple Metal https://en.wikipedia.org/wiki/Metal_(API) build Compilation issues devops improvements to build systems and github actions documentation Improvements or additions to documentation examples ggml changes relating to the ggml tensor library for machine learning Kompute https://github.com/KomputeProject/kompute/ nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment Nvidia GPU Issues specific to Nvidia GPUs python python script changes script Script related server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants