feat(openai): OpenAI responses minimal instrumentation by dinmukhamedm · Pull Request #3052 · traceloop/openllmetry

dinmukhamedm · 2025-06-28T16:04:30Z

This is a basic instrumentation of OpenAI responses, supporting the following:

text inputs and outputs
(function) tool calls
(function) tool inputs
(function) tool definitions
computer use tool calls
async versions of all of the above

Things NOT supported, but that would be great to add:

input image parsing
other types of tool calls such as MCP etc
emitting log events alongside spans
streaming responses

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Add instrumentation for OpenAI responses, including text inputs/outputs, tool calls, and async operations, with tests for verification.

Instrumentation:
- Add support for OpenAI responses in responses_wrappers.py, handling text inputs/outputs, tool calls, and async operations.
- Update __init__.py to integrate new response wrappers for synchronous and asynchronous operations.
Testing:
- Add test_responses.py to test response creation, input history, and tool calls.
- Include VCR cassettes for test scenarios in test_responses.yaml, test_responses_tool_calls.yaml, and test_responses_with_input_history.yaml.

^{This description was created by}^{for 5714d08. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 5714d08 in 1 minute and 34 seconds. Click for details.

Reviewed 1289 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py:303

Draft comment:
The _try_wrap calls for 'Responses.create' and 'Responses.retrieve' both use the same wrapper (responses_get_or_create_wrapper). Ensure this is intentional since these methods could potentially require different instrumentation logic. Consider adding a comment to document the rationale.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates several rules. It's asking for confirmation of intention ("Ensure this is intentional") and requesting documentation. It's speculative ("could potentially require different instrumentation logic"). Without seeing the implementation of responses_get_or_create_wrapper, we can't know if this is actually an issue. The author likely had a good reason for using the same wrapper. Maybe there really is a bug here and these operations should be handled differently? The shared wrapper name is unusual. Even if there was a bug, asking for confirmation isn't helpful - the comment should point out the specific issue. The shared wrapper could be intentional if these operations are similar enough to share instrumentation logic. This comment should be deleted as it's primarily asking for confirmation and documentation rather than pointing out a clear issue that needs to be fixed.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:446

Draft comment:
The code uses the dictionary merge operator (|) to combine output_blocks from the parsed response and existing data. This requires Python 3.9 or later; ensure that this version requirement is documented and acceptable for the project.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is informative and suggests ensuring documentation of a version requirement, which is not allowed by the rules. It doesn't provide a specific code suggestion or ask for a test to be written.

Workflow ID: wflow_TgkmsYc5QmFqJGlM

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

ellipsis-dev · 2025-06-28T23:25:02Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

+    response_model: Optional[str] = pydantic.Field(default=None)
+
+
+responses: dict[str, TracedData] = {}


The module uses a global 'responses' dictionary to store traced data. In a multi-threaded or async environment, this could potentially lead to race conditions. Consider using a thread-safe or async-safe data structure, or document why this is acceptable in your use case.

nirga

Thanks @dinmukhamedm! Left a small nit

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

cursor

Bug: Tracer API Parameter Errors

The tracer.start_span() calls within responses_cancel_wrapper and async_responses_cancel_wrapper contain two bugs:

They incorrectly pass record_exception=True, a parameter not supported by the OpenTelemetry Tracer API, which will cause a TypeError. Exceptions are already correctly recorded via span.record_exception().
The start_time parameter is passed as a float (existing_data.start_time) instead of being converted to an integer (nanoseconds), which is inconsistent with other start_span calls and may lead to a TypeError or incorrect span timing.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py#L581-L587

openllmetry/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

Lines 581 to 587 in 1a8ad5b

    
           if existing_data is not None: 
        
               span = tracer.start_span( 
        
                   SPAN_NAME, 
        
                   kind=SpanKind.CLIENT, 
        
                   start_time=existing_data.start_time, 
        
                   record_exception=True, 
        
               )

Fix in Cursor

Bug: Sync/Async Wrapper Default Value Mismatch

The responses_get_or_create_wrapper (sync) and async_responses_get_or_create_wrapper (async) functions have inconsistent default value handling within their exception blocks.

Specifically:

instructions: The async version defaults to an empty string (""), while the sync version does not (resulting in None).
output_text and request_model: The sync version defaults to "", while the async version does not (resulting in None).
response_model: The sync version uses parsed_response.model as a fallback, whereas the async version does not (resulting in None).

This inconsistency can lead to different TracedData object creation and potential TypeError when None values are passed where strings are expected.

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py#L495-L511

openllmetry/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

Lines 495 to 511 in 1a8ad5b

    
           try: 
        
               traced_data = TracedData( 
        
                   start_time=existing_data.get("start_time", start_time), 
        
                   response_id=response_id or "", 
        
                   input=process_input( 
        
                       kwargs.get("input", existing_data.get("input", [])) 
        
                   ), 
        
                   instructions=kwargs.get( 
        
                       "instructions", existing_data.get("instructions", "") 
        
                   ), 
        
                   tools=get_tools_from_kwargs(kwargs) or existing_data.get("tools", []), 
        
                   output_blocks=existing_data.get("output_blocks", {}), 
        
                   usage=existing_data.get("usage"), 
        
                   output_text=kwargs.get("output_text", existing_data.get("output_text")), 
        
                   request_model=kwargs.get("model", existing_data.get("request_model")), 
        
                   response_model=existing_data.get("response_model"), 
        
               )

Fix in Cursor

BugBot free trial expires on July 22, 2025
You have used $0.00 of your $50.00 spend limit so far. Manage your spend limit in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

dinmukhamedm force-pushed the openai-responses branch from 548b547 to 8ffd459 Compare June 28, 2025 16:14

feat(openai): very basic instrumentation of openai.responses

12165b7

dinmukhamedm force-pushed the openai-responses branch from 8ffd459 to 12165b7 Compare June 28, 2025 22:44

conditional imports for older sdk versions

5714d08

dinmukhamedm marked this pull request as ready for review June 28, 2025 23:23

ellipsis-dev bot reviewed Jun 28, 2025

View reviewed changes

remove debug re-raise

9680414

dinmukhamedm requested a review from nirga June 29, 2025 17:28

dinmukhamedm changed the title ~~OpenAI responses~~ feat(openai) OpenAI responses minimal instrumentation Jun 30, 2025

dinmukhamedm changed the title ~~feat(openai) OpenAI responses minimal instrumentation~~ feat(openai): OpenAI responses minimal instrumentation Jun 30, 2025

nirga reviewed Jul 1, 2025

View reviewed changes

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py Outdated Show resolved Hide resolved

dinmukhamedm and others added 5 commits July 1, 2025 10:52

update span name to openai.response (singular)

f731b07

add gen_ai.system

02da9b6

prevent stream from failure

1dbe3cc

Merge branch 'main' into openai-responses

54e2ff2

Merge branch 'main' into openai-responses

1a8ad5b

nirga approved these changes Jul 2, 2025

View reviewed changes

cursor bot reviewed Jul 2, 2025

View reviewed changes

nirga merged commit cfcbe1e into traceloop:main Jul 2, 2025
10 checks passed

nirga deleted the openai-responses branch July 2, 2025 18:33

nirga mentioned this pull request Jul 5, 2025

🚀 Feature: Instrument OpenAI's new Responses API #2782

Closed

1 task

dinmukhamedm mentioned this pull request Jul 14, 2025

Feature: Instrument OpenAI's new Responses API #2782 #2787

Closed

4 tasks

amitalokbera pushed a commit to amitalokbera/openllmetry that referenced this pull request Jul 15, 2025

feat(openai): OpenAI responses minimal instrumentation (traceloop#3052)

8ed2fec

Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

nina-kollman pushed a commit that referenced this pull request Aug 11, 2025

feat(openai): OpenAI responses minimal instrumentation (#3052)

e80094d

Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): OpenAI responses minimal instrumentation#3052

feat(openai): OpenAI responses minimal instrumentation#3052
nirga merged 8 commits intotraceloop:mainfrom
lmnr-ai:openai-responses

dinmukhamedm commented Jun 28, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

ellipsis-dev bot Jun 28, 2025

Uh oh!

nirga left a comment

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		response_model: Optional[str] = pydantic.Field(default=None)


		responses: dict[str, TracedData] = {}

	if existing_data is not None:
	span = tracer.start_span(
	SPAN_NAME,
	kind=SpanKind.CLIENT,
	start_time=existing_data.start_time,
	record_exception=True,
	)

	try:
	traced_data = TracedData(
	start_time=existing_data.get("start_time", start_time),
	response_id=response_id or "",
	input=process_input(
	kwargs.get("input", existing_data.get("input", []))
	),
	instructions=kwargs.get(
	"instructions", existing_data.get("instructions", "")
	),
	tools=get_tools_from_kwargs(kwargs) or existing_data.get("tools", []),
	output_blocks=existing_data.get("output_blocks", {}),
	usage=existing_data.get("usage"),
	output_text=kwargs.get("output_text", existing_data.get("output_text")),
	request_model=kwargs.get("model", existing_data.get("request_model")),
	response_model=existing_data.get("response_model"),
	)

Conversation

dinmukhamedm commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Tracer API Parameter Errors

Bug: Sync/Async Wrapper Default Value Mismatch

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dinmukhamedm commented Jun 28, 2025 •

edited

Loading