Skip to content

OpenAI Responses API Observability #5192

@gyliu513

Description

@gyliu513

🚀 Describe the new functionality needed

Add observability support for the OpenAI Responses API in llama-stack. This involves integrating OpenTelemetry instrumentation so that calls made through the Responses API (both sync and streaming) produce proper traces and spans, consistent with the existing chat completions observability.

This feature depends on upstream work in the OpenTelemetry Python contrib repository. The following PRs need to be merged and released first:

Once these upstream PRs are merged and a new opentelemetry-instrumentation-openai-v2 release is available, llama-stack can integrate the updated package to enable Responses API observability.

This is a sub task of #2596

@leseb @iamemilio @cdoern ^^

💡 Why is this needed? What if we don't build it?

Llama-stack already supports the OpenAI Responses API, but there is currently no telemetry coverage for it. Without this, users have no visibility into Responses API call latency, token usage, errors, or streaming behavior through their observability stack (e.g., Jaeger, Grafana). This makes debugging and performance monitoring significantly harder for anyone using the Responses API.

Other thoughts

  • This is blocked on upstream OTel contrib PRs, no implementation work should start until those are merged and released, but we can do some early test with patched code.
  • Once available, integration should be straightforward: bump the opentelemetry-instrumentation-openai-v2 dependency version and verify traces are emitted for Responses API calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions