Skip to content

feat(openai): OpenAI responses minimal instrumentation#3052

Merged
nirga merged 8 commits intotraceloop:mainfrom
lmnr-ai:openai-responses
Jul 2, 2025
Merged

feat(openai): OpenAI responses minimal instrumentation#3052
nirga merged 8 commits intotraceloop:mainfrom
lmnr-ai:openai-responses

Conversation

@dinmukhamedm
Copy link
Collaborator

@dinmukhamedm dinmukhamedm commented Jun 28, 2025

This is a basic instrumentation of OpenAI responses, supporting the following:

  • text inputs and outputs
  • (function) tool calls
  • (function) tool inputs
  • (function) tool definitions
  • computer use tool calls
  • async versions of all of the above

Things NOT supported, but that would be great to add:

  • input image parsing
  • other types of tool calls such as MCP etc
  • emitting log events alongside spans
  • streaming responses
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.
Screenshot 2025-06-28 at 16 44 28 Screenshot 2025-06-28 at 17 02 52

Important

Add instrumentation for OpenAI responses, including text inputs/outputs, tool calls, and async operations, with tests for verification.

  • Instrumentation:
    • Add support for OpenAI responses in responses_wrappers.py, handling text inputs/outputs, tool calls, and async operations.
    • Update __init__.py to integrate new response wrappers for synchronous and asynchronous operations.
  • Testing:
    • Add test_responses.py to test response creation, input history, and tool calls.
    • Include VCR cassettes for test scenarios in test_responses.yaml, test_responses_tool_calls.yaml, and test_responses_with_input_history.yaml.

This description was created by Ellipsis for 5714d08. You can customize this summary. It will automatically update as commits are pushed.

@dinmukhamedm dinmukhamedm marked this pull request as ready for review June 28, 2025 23:23
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 5714d08 in 1 minute and 34 seconds. Click for details.
  • Reviewed 1289 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py:303
  • Draft comment:
    The _try_wrap calls for 'Responses.create' and 'Responses.retrieve' both use the same wrapper (responses_get_or_create_wrapper). Ensure this is intentional since these methods could potentially require different instrumentation logic. Consider adding a comment to document the rationale.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates several rules. It's asking for confirmation of intention ("Ensure this is intentional") and requesting documentation. It's speculative ("could potentially require different instrumentation logic"). Without seeing the implementation of responses_get_or_create_wrapper, we can't know if this is actually an issue. The author likely had a good reason for using the same wrapper. Maybe there really is a bug here and these operations should be handled differently? The shared wrapper name is unusual. Even if there was a bug, asking for confirmation isn't helpful - the comment should point out the specific issue. The shared wrapper could be intentional if these operations are similar enough to share instrumentation logic. This comment should be deleted as it's primarily asking for confirmation and documentation rather than pointing out a clear issue that needs to be fixed.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:446
  • Draft comment:
    The code uses the dictionary merge operator (|) to combine output_blocks from the parsed response and existing data. This requires Python 3.9 or later; ensure that this version requirement is documented and acceptable for the project.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is informative and suggests ensuring documentation of a version requirement, which is not allowed by the rules. It doesn't provide a specific code suggestion or ask for a test to be written.

Workflow ID: wflow_TgkmsYc5QmFqJGlM

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

response_model: Optional[str] = pydantic.Field(default=None)


responses: dict[str, TracedData] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module uses a global 'responses' dictionary to store traced data. In a multi-threaded or async environment, this could potentially lead to race conditions. Consider using a thread-safe or async-safe data structure, or document why this is acceptable in your use case.

@dinmukhamedm dinmukhamedm requested a review from nirga June 29, 2025 17:28
@dinmukhamedm dinmukhamedm changed the title OpenAI responses feat(openai) OpenAI responses minimal instrumentation Jun 30, 2025
@dinmukhamedm dinmukhamedm changed the title feat(openai) OpenAI responses minimal instrumentation feat(openai): OpenAI responses minimal instrumentation Jun 30, 2025
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dinmukhamedm! Left a small nit

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Tracer API Parameter Errors

The tracer.start_span() calls within responses_cancel_wrapper and async_responses_cancel_wrapper contain two bugs:

  1. They incorrectly pass record_exception=True, a parameter not supported by the OpenTelemetry Tracer API, which will cause a TypeError. Exceptions are already correctly recorded via span.record_exception().
  2. The start_time parameter is passed as a float (existing_data.start_time) instead of being converted to an integer (nanoseconds), which is inconsistent with other start_span calls and may lead to a TypeError or incorrect span timing.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py#L581-L587

if existing_data is not None:
span = tracer.start_span(
SPAN_NAME,
kind=SpanKind.CLIENT,
start_time=existing_data.start_time,
record_exception=True,
)

Fix in Cursor


Bug: Sync/Async Wrapper Default Value Mismatch

The responses_get_or_create_wrapper (sync) and async_responses_get_or_create_wrapper (async) functions have inconsistent default value handling within their exception blocks.

Specifically:

  • instructions: The async version defaults to an empty string (""), while the sync version does not (resulting in None).
  • output_text and request_model: The sync version defaults to "", while the async version does not (resulting in None).
  • response_model: The sync version uses parsed_response.model as a fallback, whereas the async version does not (resulting in None).

This inconsistency can lead to different TracedData object creation and potential TypeError when None values are passed where strings are expected.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py#L495-L511

try:
traced_data = TracedData(
start_time=existing_data.get("start_time", start_time),
response_id=response_id or "",
input=process_input(
kwargs.get("input", existing_data.get("input", []))
),
instructions=kwargs.get(
"instructions", existing_data.get("instructions", "")
),
tools=get_tools_from_kwargs(kwargs) or existing_data.get("tools", []),
output_blocks=existing_data.get("output_blocks", {}),
usage=existing_data.get("usage"),
output_text=kwargs.get("output_text", existing_data.get("output_text")),
request_model=kwargs.get("model", existing_data.get("request_model")),
response_model=existing_data.get("response_model"),
)

Fix in Cursor


BugBot free trial expires on July 22, 2025
You have used $0.00 of your $50.00 spend limit so far. Manage your spend limit in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

@nirga nirga merged commit cfcbe1e into traceloop:main Jul 2, 2025
10 checks passed
@nirga nirga deleted the openai-responses branch July 2, 2025 18:33
amitalokbera pushed a commit to amitalokbera/openllmetry that referenced this pull request Jul 15, 2025
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
nina-kollman pushed a commit that referenced this pull request Aug 11, 2025
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants