Bug: Streaming token metrics are undercounted and have a missing 'model_name' label

**What happened**:

For streaming responses, Prometheus metrics for token counts were recorded with an empty `model_name` label, while the `target_model_name` label was correctly populated. This corrupts observability data, making it impossible to filter metrics by the public-facing model name.

Additionally, the token counting logic itself was brittle. It only parsed the final message in a stream for the `usage` block, meaning if token counts appeared in an earlier message, they would be missed.

**What you expected to happen**:

Metrics for streaming responses should be recorded with all labels, including `model_name` and `target_model_name`, correctly populated from the request context. The token counting logic should also be robust and accumulate usage data from all messages in the stream.

**How to reproduce it (as minimally and precisely as possible)**:

This was discovered when refactoring the hermetic integration tests.

1.  Send a streaming request (e.g., a chat completion request) where the model name is not present in the top-level JSON body.
2.  Ensure the request includes the `x-gateway-api-inference-objective-key` header.
3.  Observe the `inference_objective_input_tokens_bucket` metric after the request completes.
4.  The metric will be present, but the `model_name` label will be empty (`model_name=""`).

**Anything else we need to know?**:

_Root Cause Analysis:_

Two related issues were discovered:

1.  The `director.HandleRequest` function incorrectly overwrites the `RequestContext.IncomingModelName` (which is correctly set from the objective header) with a value parsed from the request body's `model` field. For requests like chat completions, this field doesn't exist at the top level, causing `IncomingModelName` to be reset to an empty string. This corrupted context persists for the life of the stream and is used when the final response metrics are recorded.
2.  The token counting logic in `HandleResponseBodyModelStreaming` only checked the final `[DONE]` message for a `usage` block. It did not accumulate token counts from earlier messages in the stream, making it possible to miss metrics entirely.

_Production Risk / Impact:_

The production risk is **high**. This bug leads to corrupted and unusable observability data for common streaming use cases. Metrics for streaming token usage will have a missing `model_name` label, making it impossible to accurately filter, aggregate, or alert on a per-model basis possibly breaking monitoring, billing, and capacity planning capabilities.

**Environment**:

- Discovered during hermetic integration test refactoring.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Streaming token metrics are undercounted and have a missing 'model_name' label #1626

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Streaming token metrics are undercounted and have a missing 'model_name' label #1626

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions