server : Add verbose output to OAI compatible chat endpoint. #12246
+5
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that the /chat/completions and /v1/completions endpoints do not return the "__verbose" field in the final server response when running llama-server with -lv 10 and streaming enabled. This is inconsistent with the non-streaming behaviour of those endpoints.
This pr adds verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.
This was motivated by me wanting to know the tokens_cached in streaming mode on those endpoints. If anyone has another way of doing that, preferably without verbose mode, I'd be very interested.