feat(endpoints): add OpenAI Responses API endpoint support#695
feat(endpoints): add OpenAI Responses API endpoint support#695ajcasagrande wants to merge 4 commits intomainfrom
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@991ebccb8958173b53bed1ec5cfce72a44aa89e7Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@991ebccb8958173b53bed1ec5cfce72a44aa89e7Last updated for commit: |
b02db9d to
acd6f4a
Compare
WalkthroughA new OpenAI Responses API endpoint has been integrated into AIPerf, adding the ResponsesEndpoint class with support for multi-modal inputs (text, images, audio) and both streaming and non-streaming responses. This includes endpoint implementation, plugin configuration, comprehensive unit tests, and documentation, plus a minor update to CLI options. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/tutorials/openai-responses.md (1)
58-70:--endpoint /v1/responsesis redundant — the plugin metadata already sets this path.Since the
responsesplugin entry definesendpoint_path: /v1/responses, specifying--endpoint /v1/responseson every command is unnecessary. Including it in every example may mislead users into thinking it's required. Consider removing it from most examples and mentioning in a note that it's only needed for custom paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/openai-responses.md` around lines 58 - 70, Remove the redundant --endpoint /v1/responses flag from the aiperf profile example since the plugin metadata already sets endpoint_path: /v1/responses; update the example invocation under the aiperf profile block to omit the --endpoint option and add a brief note (near the examples) that --endpoint is only required when using a custom endpoint path (referencing the --endpoint flag and the plugin metadata key endpoint_path: /v1/responses for clarity).src/aiperf/endpoints/openai_responses.py (1)
108-154: Shortcut path silently ignores videos.The fast path (lines 110–115) checks for single text + no images + no audios, but doesn't check
len(turn.videos) == 0. If a turn has both a single text and video data, it takes the string shortcut and silently drops the video content. Whilesupports_videosisn't declared in the plugin metadata, theTurnmodel still carries avideosfield, and a defensive check avoids silent data loss if the API adds video support later.Similarly, the multi-modal branch (lines 121–152) doesn't process
turn.videosat all. This is consistent with the plugin not declaring video support, but a log warning when videos are present would prevent silent surprises.Add video guard to the shortcut condition
if ( len(turn.texts) == 1 and len(turn.texts[0].contents) == 1 and len(turn.images) == 0 and len(turn.audios) == 0 + and len(turn.videos) == 0 ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/endpoints/openai_responses.py` around lines 108 - 154, The _set_item_content shortcut currently ignores turn.videos and can drop video data; update the initial fast-path conditional in _set_item_content to also require len(turn.videos) == 0 so a turn with a single text plus videos does not take the string shortcut, and in the multimodal branch add a defensive check that if turn.videos is non-empty you emit a warning (e.g., logging.warning or the module's logger) stating that video content is present but not supported/ignored before continuing to build item["content"] from texts/images/audios; reference the Turn model, the _set_item_content function, and item["content"] when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/tutorials/openai-responses.md`:
- Line 74: The fenced code block showing the sample output (the block containing
the line "INFO Starting AIPerf System") lacks a language identifier and
triggers markdownlint MD040; update that code fence to include a language tag
(e.g., change ``` to ```text or ```console) so the block becomes ```text (or
```console) followed by the sample output and the closing ```, ensuring the
fenced block now has an explicit language identifier.
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 246-290: The _extract_response_content function currently only
inspects output[] items with type "message" so it never sees top-level
"reasoning" items; update it to also iterate output[] for items with type
"reasoning" and extract any summary text from that item's summary[] entries
where part.get("type") == "summary_text", then include that text in the returned
ReasoningResponseData.reasoning (and keep existing message content extraction
for TextResponseData/ReasoningResponseData.content). Ensure you reference the
same return types (ReasoningResponseData, TextResponseData) and preserve the
existing fallback to json_obj["output_text"].
---
Nitpick comments:
In `@docs/tutorials/openai-responses.md`:
- Around line 58-70: Remove the redundant --endpoint /v1/responses flag from the
aiperf profile example since the plugin metadata already sets endpoint_path:
/v1/responses; update the example invocation under the aiperf profile block to
omit the --endpoint option and add a brief note (near the examples) that
--endpoint is only required when using a custom endpoint path (referencing the
--endpoint flag and the plugin metadata key endpoint_path: /v1/responses for
clarity).
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 108-154: The _set_item_content shortcut currently ignores
turn.videos and can drop video data; update the initial fast-path conditional in
_set_item_content to also require len(turn.videos) == 0 so a turn with a single
text plus videos does not take the string shortcut, and in the multimodal branch
add a defensive check that if turn.videos is non-empty you emit a warning (e.g.,
logging.warning or the module's logger) stating that video content is present
but not supported/ignored before continuing to build item["content"] from
texts/images/audios; reference the Turn model, the _set_item_content function,
and item["content"] when making these changes.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Add ResponsesEndpoint for benchmarking servers implementing the OpenAI Responses API (POST /v1/responses), with multi-modal input formatting, streaming SSE event parsing, and plugin registration. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
4be3a9a to
991ebcc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/unit/endpoints/test_responses_endpoint.py (2)
620-660:test_parse_response_extracts_usagedoesn't verifyparsed.datawhen both text and usage are present.The
full_response_with_textcase (lines 625–633) sends a response with bothoutputandusage, but the test only asserts onparsed.usage. A combined assertion would catch regressions where_parse_full_responsereturnsdata=Nonedespite a valid output.♻️ Suggested addition
assert parsed is not None assert parsed.usage == Usage(expected_usage) + if "output" in json_data and any( + item.get("type") == "message" for item in json_data.get("output", []) if isinstance(item, dict) + ): + assert parsed.data is not None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/endpoints/test_responses_endpoint.py` around lines 620 - 660, The test test_parse_response_extracts_usage currently only asserts parsed.usage for the "full_response_with_text" case but must also verify parsed.data is populated when output is present; update the test case (the param with id "full_response_with_text") to assert that parsed.data is not None and contains the expected message ("Hello") or matches the original output (e.g., check parsed.data includes the value created by _msg("Hello")), so that endpoint.parse_response / _parse_full_response failures returning data=None will be caught.
349-384: No test case forstream_options=Noneinextrawithuse_server_token_count=True.If the optional fix to
format_payload(guarding againststream_options=None) is applied, add a corresponding parametrize entry to ensureinclude_usageis still injected:♻️ Suggested additional param
param(False, [], _NOT_PRESENT, id="not_added_when_not_streaming"), + param( + True, + [("stream_options", None)], + {"include_usage": True}, + id="none_stream_options_replaced", + ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/endpoints/test_responses_endpoint.py` around lines 349 - 384, Add a parametrize case to test_format_payload_stream_options_with_server_token_count that covers extra containing ("stream_options", None) when streaming is True and use_server_token_count=True so the test verifies format_payload (ResponsesEndpoint.format_payload via create_endpoint_with_mock_transport and _responses_model_endpoint) will inject include_usage into a None stream_options; add a param like: streaming=True, extra=[("stream_options", None)], expected={"include_usage": True} (give a clear id such as "none_adds_include_usage") so the test asserts payload["stream_options"] == expected.src/aiperf/endpoints/openai_responses.py (3)
110-118: Deadelse ""guard is unreachable.The outer condition already guarantees
len(turn.texts[0].contents) == 1, soturn.texts[0].contentsis always truthy here.♻️ Proposed simplification
- item["content"] = ( - turn.texts[0].contents[0] if turn.texts[0].contents else "" - ) + item["content"] = turn.texts[0].contents[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/endpoints/openai_responses.py` around lines 110 - 118, The conditional assigns item["content"] using a redundant ternary fallback because the outer checks already ensure turn.texts[0].contents has length 1; remove the unreachable else "" and directly assign item["content"] = turn.texts[0].contents[0] inside the same if block (locate the assignment near the if that checks len(turn.texts)==1, len(turn.texts[0].contents)==1, len(turn.images)==0, len(turn.audios)==0).
39-39: Consider extracting long exception messages into custom exception classes (Ruff TRY003).Both
ValueErrorstrings at lines 39 and 140 are flagged byruff(TRY003). Moving them into dedicated exception subclasses improves testability and consistency.Also applies to: 140-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/endpoints/openai_responses.py` at line 39, Extract the long inline ValueError messages in src/aiperf/endpoints/openai_responses.py into dedicated exception classes and raise those instead: create two new exception subclasses (e.g., MissingTurnsError and MissingResponsesError) in the same module or a central exceptions module, move the messages currently used at the raise sites in the function(s) that raise ValueError (the one at the top of openai_responses.py and the one around lines 140-142) into the respective exception class docstring or default message, replace the raise ValueError(...) calls with raise MissingTurnsError() and raise MissingResponsesError(), and update any imports/tests to reference the new exception types so behavior and tests remain consistent.
61-71: Silentinclude_usageomission whenstream_optionsis explicitlyNoneinextra.If a caller passes
extra=[("stream_options", None)],payload["stream_options"]isNoneafter theupdate. The firstelifthen short-circuits (isinstance(None, dict)isFalse), soinclude_usageis never injected even thoughuse_server_token_count=True.♻️ Proposed fix
- if "stream_options" not in payload: + if "stream_options" not in payload or payload["stream_options"] is None: payload["stream_options"] = {"include_usage": True} elif ( isinstance(payload["stream_options"], dict) and "include_usage" not in payload["stream_options"] ): payload["stream_options"]["include_usage"] = True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/endpoints/openai_responses.py` around lines 61 - 71, When model_endpoint.endpoint.streaming and model_endpoint.endpoint.use_server_token_count are true, ensure payload["stream_options"] gets include_usage even if extra set stream_options to None: replace the current three-branch logic with a check that treats non-dict values as absent—e.g., if "stream_options" not in payload or not isinstance(payload["stream_options"], dict) then assign payload["stream_options"] = {"include_usage": True}; otherwise set payload["stream_options"]["include_usage"] = True. This touches the block referencing payload, stream_options, and include_usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 216-221: The code assumes json_obj["response"] is a dict and calls
resp.get("usage"), which raises AttributeError if "response" is explicitly null;
in the event handler where event_type == "response.completed" (variables:
json_obj, resp, usage, ParsedResponse) ensure resp is a dict before calling .get
— e.g., set resp = json_obj.get("response") or {} or check isinstance(resp,
dict) and only then read usage; if no usage, return None, otherwise return
ParsedResponse(perf_ns=perf_ns, data=None, usage=usage).
---
Duplicate comments:
In `@docs/tutorials/openai-responses.md`:
- Around line 74-91: The fenced code block that shows the AIPerf System sample
output is missing a language identifier; update that Markdown block by adding
the language tag "text" to the opening triple backticks (i.e., change ``` to
```text for the AIPerf System sample output block) so markdownlint rule MD040 is
satisfied.
---
Nitpick comments:
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 110-118: The conditional assigns item["content"] using a redundant
ternary fallback because the outer checks already ensure turn.texts[0].contents
has length 1; remove the unreachable else "" and directly assign item["content"]
= turn.texts[0].contents[0] inside the same if block (locate the assignment near
the if that checks len(turn.texts)==1, len(turn.texts[0].contents)==1,
len(turn.images)==0, len(turn.audios)==0).
- Line 39: Extract the long inline ValueError messages in
src/aiperf/endpoints/openai_responses.py into dedicated exception classes and
raise those instead: create two new exception subclasses (e.g.,
MissingTurnsError and MissingResponsesError) in the same module or a central
exceptions module, move the messages currently used at the raise sites in the
function(s) that raise ValueError (the one at the top of openai_responses.py and
the one around lines 140-142) into the respective exception class docstring or
default message, replace the raise ValueError(...) calls with raise
MissingTurnsError() and raise MissingResponsesError(), and update any
imports/tests to reference the new exception types so behavior and tests remain
consistent.
- Around line 61-71: When model_endpoint.endpoint.streaming and
model_endpoint.endpoint.use_server_token_count are true, ensure
payload["stream_options"] gets include_usage even if extra set stream_options to
None: replace the current three-branch logic with a check that treats non-dict
values as absent—e.g., if "stream_options" not in payload or not
isinstance(payload["stream_options"], dict) then assign
payload["stream_options"] = {"include_usage": True}; otherwise set
payload["stream_options"]["include_usage"] = True. This touches the block
referencing payload, stream_options, and include_usage.
In `@tests/unit/endpoints/test_responses_endpoint.py`:
- Around line 620-660: The test test_parse_response_extracts_usage currently
only asserts parsed.usage for the "full_response_with_text" case but must also
verify parsed.data is populated when output is present; update the test case
(the param with id "full_response_with_text") to assert that parsed.data is not
None and contains the expected message ("Hello") or matches the original output
(e.g., check parsed.data includes the value created by _msg("Hello")), so that
endpoint.parse_response / _parse_full_response failures returning data=None will
be caught.
- Around line 349-384: Add a parametrize case to
test_format_payload_stream_options_with_server_token_count that covers extra
containing ("stream_options", None) when streaming is True and
use_server_token_count=True so the test verifies format_payload
(ResponsesEndpoint.format_payload via create_endpoint_with_mock_transport and
_responses_model_endpoint) will inject include_usage into a None stream_options;
add a param like: streaming=True, extra=[("stream_options", None)],
expected={"include_usage": True} (give a clear id such as
"none_adds_include_usage") so the test asserts payload["stream_options"] ==
expected.
| if event_type == "response.completed": | ||
| resp = json_obj.get("response", {}) | ||
| usage = resp.get("usage") or None | ||
| if usage: | ||
| return ParsedResponse(perf_ns=perf_ns, data=None, usage=usage) | ||
| return None |
There was a problem hiding this comment.
resp.get("usage") raises AttributeError when "response": null is explicit in the event.
dict.get(key, default) only uses the default when the key is absent — if json_obj["response"] is explicitly null/None, resp is None and resp.get("usage") throws.
🛡️ Proposed fix
- resp = json_obj.get("response", {})
+ resp = json_obj.get("response") or {}
usage = resp.get("usage") or None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/aiperf/endpoints/openai_responses.py` around lines 216 - 221, The code
assumes json_obj["response"] is a dict and calls resp.get("usage"), which raises
AttributeError if "response" is explicitly null; in the event handler where
event_type == "response.completed" (variables: json_obj, resp, usage,
ParsedResponse) ensure resp is a dict before calling .get — e.g., set resp =
json_obj.get("response") or {} or check isinstance(resp, dict) and only then
read usage; if no usage, return None, otherwise return
ParsedResponse(perf_ns=perf_ns, data=None, usage=usage).
|
Hey @ajcasagrande , this is Andrew with Meta. Any plans to merge this PR? We are evaluating aiperf for our benchmarking and would like to use responses API. I tested your PR on a deployment and works great. |
| <!-- | ||
| SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
| --> |
There was a problem hiding this comment.
this needs to be update to use the fern mdx formatting now.
debermudez
left a comment
There was a problem hiding this comment.
Inline review comments for two issues found during branch review.
| return None | ||
|
|
||
| # All other events (response.created, response.in_progress, etc.) | ||
| return None |
There was a problem hiding this comment.
Potential compatibility issue: this ignores all *.done events, including response.output_text.done. Some providers may emit final text only in done events (or miss deltas under load), which can lead to empty recorded content. Consider parsing response.output_text.done as a fallback when it includes text.
| SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
| --> | ||
|
|
There was a problem hiding this comment.
Docs discoverability: this tutorial is useful, but I could not find it linked from the README.md tutorial index tables. Please add a README tutorial entry so users can discover responses endpoint guidance from the main docs landing page.
Add ResponsesEndpoint for benchmarking servers implementing the OpenAI Responses API (POST /v1/responses), with multi-modal input formatting, streaming SSE event parsing, and plugin registration.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests