feat: parse and return token usage in Chat Completions stream #7000
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request enhances the SSE (Server-Sent Events) processing logic in
chat_completions.rsto improve how response metadata is handled, specifically by capturing and propagating theresponse_idand detailed token usage statistics from the OpenAI API responses. The changes ensure that this information is correctly parsed, stored, and included in theResponseEvent::Completedmessages, making downstream processing more robust and informative.Improvements to response metadata handling:
parse_openai_usagefunction to extract detailed token usage statistics—including input, output, cached, and reasoning tokens—from the OpenAI API response, and populate aTokenUsagestruct.process_chat_sseto capture and store theresponse_idand parsedtoken_usagefrom each incoming chunk, ensuring these fields are preserved across the session.Enhancements to event emission:
ResponseEvent::Completedemissions to include the actualresponse_idandtoken_usageinstead of defaulting to empty orNone, improving traceability and observability for clients consuming these events.Link Issue: #6834