-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
🚀 Describe the new functionality needed
Add observability support for the OpenAI Responses API in llama-stack. This involves integrating OpenTelemetry instrumentation so that calls made through the Responses API (both sync and streaming) produce proper traces and spans, consistent with the existing chat completions observability.
This feature depends on upstream work in the OpenTelemetry Python contrib repository. The following PRs need to be merged and released first:
- Implement OpenAI Responses API instrumentation and examples Implement OpenAI Responses API instrumentation and examples open-telemetry/opentelemetry-python-contrib#4166
- Add response wrappers for OpenAI Responses API streams Add response wrappers for OpenAI Responses API streams. open-telemetry/opentelemetry-python-contrib#4280
- feat: OpenAI responses extractors feat: OpenAI responses extractors open-telemetry/opentelemetry-python-contrib#4337
Once these upstream PRs are merged and a new opentelemetry-instrumentation-openai-v2 release is available, llama-stack can integrate the updated package to enable Responses API observability.
This is a sub task of #2596
💡 Why is this needed? What if we don't build it?
Llama-stack already supports the OpenAI Responses API, but there is currently no telemetry coverage for it. Without this, users have no visibility into Responses API call latency, token usage, errors, or streaming behavior through their observability stack (e.g., Jaeger, Grafana). This makes debugging and performance monitoring significantly harder for anyone using the Responses API.
Other thoughts
- This is blocked on upstream OTel contrib PRs, no implementation work should start until those are merged and released, but we can do some early test with patched code.
- Once available, integration should be straightforward: bump the opentelemetry-instrumentation-openai-v2 dependency version and verify traces are emitted for Responses API calls.