Skip to content

feat(endpoints): add OpenAI Responses API endpoint support#695

Open
ajcasagrande wants to merge 4 commits intomainfrom
ajc/responses
Open

feat(endpoints): add OpenAI Responses API endpoint support#695
ajcasagrande wants to merge 4 commits intomainfrom
ajc/responses

Conversation

@ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Feb 20, 2026

Add ResponsesEndpoint for benchmarking servers implementing the OpenAI Responses API (POST /v1/responses), with multi-modal input formatting, streaming SSE event parsing, and plugin registration.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for profiling the OpenAI Responses API with multi-modal input capabilities (text, images, audio).
    • Available in both streaming and non-streaming modes.
  • Documentation

    • New comprehensive tutorial for profiling the OpenAI Responses API, covering setup, configuration, and advanced scenarios.
    • Updated CLI endpoint options to include the new Responses API endpoint type.
  • Tests

    • Added comprehensive unit test coverage for the new Responses API endpoint functionality.

@github-actions
Copy link

github-actions bot commented Feb 20, 2026

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@991ebccb8958173b53bed1ec5cfce72a44aa89e7

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@991ebccb8958173b53bed1ec5cfce72a44aa89e7

Last updated for commit: 991ebccBrowse code

@github-actions github-actions bot added the feat label Feb 20, 2026
@ajcasagrande ajcasagrande marked this pull request as draft February 20, 2026 18:49
@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

Walkthrough

A new OpenAI Responses API endpoint has been integrated into AIPerf, adding the ResponsesEndpoint class with support for multi-modal inputs (text, images, audio) and both streaming and non-streaming responses. This includes endpoint implementation, plugin configuration, comprehensive unit tests, and documentation, plus a minor update to CLI options.

Changes

Cohort / File(s) Summary
Documentation
docs/cli_options.md, docs/tutorials/openai-responses.md
Added tutorial for profiling OpenAI Responses API with configuration details for streaming, multi-modal inputs, and concurrency control. Updated CLI options to include responses endpoint type choice.
Endpoint Implementation
src/aiperf/endpoints/openai_responses.py
New ResponsesEndpoint class handling payload formatting for multi-modal inputs, streaming/non-streaming response parsing, content assembly from turns, and usage data extraction.
Plugin Configuration
src/aiperf/plugin/plugins.yaml
Registered responses endpoint with metadata including path (/v1/responses), streaming support, image/audio capability flags, and token handling configuration.
Endpoint Tests
tests/unit/endpoints/test_responses_endpoint.py
Comprehensive unit tests for ResponsesEndpoint.format_payload covering payload construction, multi-modal content handling, system instructions, streaming modes, and error conditions across various scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whiskers twitching with delight so bright,
A Responses endpoint hopped into sight!
With images, audio, streaming dreams true,
Tests comprehensive, docs fresh and new—
Multi-modal magic, now AIPerf's flight! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(endpoints): add OpenAI Responses API endpoint support' directly and clearly describes the main change: adding a new endpoint for the OpenAI Responses API. It matches the PR objectives perfectly.
Docstring Coverage ✅ Passed Docstring coverage is 90.20% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/tutorials/openai-responses.md (1)

58-70: --endpoint /v1/responses is redundant — the plugin metadata already sets this path.

Since the responses plugin entry defines endpoint_path: /v1/responses, specifying --endpoint /v1/responses on every command is unnecessary. Including it in every example may mislead users into thinking it's required. Consider removing it from most examples and mentioning in a note that it's only needed for custom paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/tutorials/openai-responses.md` around lines 58 - 70, Remove the
redundant --endpoint /v1/responses flag from the aiperf profile example since
the plugin metadata already sets endpoint_path: /v1/responses; update the
example invocation under the aiperf profile block to omit the --endpoint option
and add a brief note (near the examples) that --endpoint is only required when
using a custom endpoint path (referencing the --endpoint flag and the plugin
metadata key endpoint_path: /v1/responses for clarity).
src/aiperf/endpoints/openai_responses.py (1)

108-154: Shortcut path silently ignores videos.

The fast path (lines 110–115) checks for single text + no images + no audios, but doesn't check len(turn.videos) == 0. If a turn has both a single text and video data, it takes the string shortcut and silently drops the video content. While supports_videos isn't declared in the plugin metadata, the Turn model still carries a videos field, and a defensive check avoids silent data loss if the API adds video support later.

Similarly, the multi-modal branch (lines 121–152) doesn't process turn.videos at all. This is consistent with the plugin not declaring video support, but a log warning when videos are present would prevent silent surprises.

Add video guard to the shortcut condition
         if (
             len(turn.texts) == 1
             and len(turn.texts[0].contents) == 1
             and len(turn.images) == 0
             and len(turn.audios) == 0
+            and len(turn.videos) == 0
         ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/endpoints/openai_responses.py` around lines 108 - 154, The
_set_item_content shortcut currently ignores turn.videos and can drop video
data; update the initial fast-path conditional in _set_item_content to also
require len(turn.videos) == 0 so a turn with a single text plus videos does not
take the string shortcut, and in the multimodal branch add a defensive check
that if turn.videos is non-empty you emit a warning (e.g., logging.warning or
the module's logger) stating that video content is present but not
supported/ignored before continuing to build item["content"] from
texts/images/audios; reference the Turn model, the _set_item_content function,
and item["content"] when making these changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/tutorials/openai-responses.md`:
- Line 74: The fenced code block showing the sample output (the block containing
the line "INFO     Starting AIPerf System") lacks a language identifier and
triggers markdownlint MD040; update that code fence to include a language tag
(e.g., change ``` to ```text or ```console) so the block becomes ```text (or
```console) followed by the sample output and the closing ```, ensuring the
fenced block now has an explicit language identifier.

In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 246-290: The _extract_response_content function currently only
inspects output[] items with type "message" so it never sees top-level
"reasoning" items; update it to also iterate output[] for items with type
"reasoning" and extract any summary text from that item's summary[] entries
where part.get("type") == "summary_text", then include that text in the returned
ReasoningResponseData.reasoning (and keep existing message content extraction
for TextResponseData/ReasoningResponseData.content). Ensure you reference the
same return types (ReasoningResponseData, TextResponseData) and preserve the
existing fallback to json_obj["output_text"].

---

Nitpick comments:
In `@docs/tutorials/openai-responses.md`:
- Around line 58-70: Remove the redundant --endpoint /v1/responses flag from the
aiperf profile example since the plugin metadata already sets endpoint_path:
/v1/responses; update the example invocation under the aiperf profile block to
omit the --endpoint option and add a brief note (near the examples) that
--endpoint is only required when using a custom endpoint path (referencing the
--endpoint flag and the plugin metadata key endpoint_path: /v1/responses for
clarity).

In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 108-154: The _set_item_content shortcut currently ignores
turn.videos and can drop video data; update the initial fast-path conditional in
_set_item_content to also require len(turn.videos) == 0 so a turn with a single
text plus videos does not take the string shortcut, and in the multimodal branch
add a defensive check that if turn.videos is non-empty you emit a warning (e.g.,
logging.warning or the module's logger) stating that video content is present
but not supported/ignored before continuing to build item["content"] from
texts/images/audios; reference the Turn model, the _set_item_content function,
and item["content"] when making these changes.

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 99.18699% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/endpoints/openai_responses.py 99.18% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Add ResponsesEndpoint for benchmarking servers implementing the OpenAI
Responses API (POST /v1/responses), with multi-modal input formatting,
streaming SSE event parsing, and plugin registration.

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@ajcasagrande ajcasagrande marked this pull request as ready for review February 21, 2026 07:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
tests/unit/endpoints/test_responses_endpoint.py (2)

620-660: test_parse_response_extracts_usage doesn't verify parsed.data when both text and usage are present.

The full_response_with_text case (lines 625–633) sends a response with both output and usage, but the test only asserts on parsed.usage. A combined assertion would catch regressions where _parse_full_response returns data=None despite a valid output.

♻️ Suggested addition
     assert parsed is not None
     assert parsed.usage == Usage(expected_usage)
+    if "output" in json_data and any(
+        item.get("type") == "message" for item in json_data.get("output", []) if isinstance(item, dict)
+    ):
+        assert parsed.data is not None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/endpoints/test_responses_endpoint.py` around lines 620 - 660, The
test test_parse_response_extracts_usage currently only asserts parsed.usage for
the "full_response_with_text" case but must also verify parsed.data is populated
when output is present; update the test case (the param with id
"full_response_with_text") to assert that parsed.data is not None and contains
the expected message ("Hello") or matches the original output (e.g., check
parsed.data includes the value created by _msg("Hello")), so that
endpoint.parse_response / _parse_full_response failures returning data=None will
be caught.

349-384: No test case for stream_options=None in extra with use_server_token_count=True.

If the optional fix to format_payload (guarding against stream_options=None) is applied, add a corresponding parametrize entry to ensure include_usage is still injected:

♻️ Suggested additional param
     param(False, [], _NOT_PRESENT, id="not_added_when_not_streaming"),
+    param(
+        True,
+        [("stream_options", None)],
+        {"include_usage": True},
+        id="none_stream_options_replaced",
+    ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/endpoints/test_responses_endpoint.py` around lines 349 - 384, Add
a parametrize case to test_format_payload_stream_options_with_server_token_count
that covers extra containing ("stream_options", None) when streaming is True and
use_server_token_count=True so the test verifies format_payload
(ResponsesEndpoint.format_payload via create_endpoint_with_mock_transport and
_responses_model_endpoint) will inject include_usage into a None stream_options;
add a param like: streaming=True, extra=[("stream_options", None)],
expected={"include_usage": True} (give a clear id such as
"none_adds_include_usage") so the test asserts payload["stream_options"] ==
expected.
src/aiperf/endpoints/openai_responses.py (3)

110-118: Dead else "" guard is unreachable.

The outer condition already guarantees len(turn.texts[0].contents) == 1, so turn.texts[0].contents is always truthy here.

♻️ Proposed simplification
-        item["content"] = (
-            turn.texts[0].contents[0] if turn.texts[0].contents else ""
-        )
+        item["content"] = turn.texts[0].contents[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/endpoints/openai_responses.py` around lines 110 - 118, The
conditional assigns item["content"] using a redundant ternary fallback because
the outer checks already ensure turn.texts[0].contents has length 1; remove the
unreachable else "" and directly assign item["content"] =
turn.texts[0].contents[0] inside the same if block (locate the assignment near
the if that checks len(turn.texts)==1, len(turn.texts[0].contents)==1,
len(turn.images)==0, len(turn.audios)==0).

39-39: Consider extracting long exception messages into custom exception classes (Ruff TRY003).

Both ValueError strings at lines 39 and 140 are flagged by ruff (TRY003). Moving them into dedicated exception subclasses improves testability and consistency.

Also applies to: 140-142

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/endpoints/openai_responses.py` at line 39, Extract the long inline
ValueError messages in src/aiperf/endpoints/openai_responses.py into dedicated
exception classes and raise those instead: create two new exception subclasses
(e.g., MissingTurnsError and MissingResponsesError) in the same module or a
central exceptions module, move the messages currently used at the raise sites
in the function(s) that raise ValueError (the one at the top of
openai_responses.py and the one around lines 140-142) into the respective
exception class docstring or default message, replace the raise ValueError(...)
calls with raise MissingTurnsError() and raise MissingResponsesError(), and
update any imports/tests to reference the new exception types so behavior and
tests remain consistent.

61-71: Silent include_usage omission when stream_options is explicitly None in extra.

If a caller passes extra=[("stream_options", None)], payload["stream_options"] is None after the update. The first elif then short-circuits (isinstance(None, dict) is False), so include_usage is never injected even though use_server_token_count=True.

♻️ Proposed fix
-        if "stream_options" not in payload:
+        if "stream_options" not in payload or payload["stream_options"] is None:
             payload["stream_options"] = {"include_usage": True}
         elif (
             isinstance(payload["stream_options"], dict)
             and "include_usage" not in payload["stream_options"]
         ):
             payload["stream_options"]["include_usage"] = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/endpoints/openai_responses.py` around lines 61 - 71, When
model_endpoint.endpoint.streaming and
model_endpoint.endpoint.use_server_token_count are true, ensure
payload["stream_options"] gets include_usage even if extra set stream_options to
None: replace the current three-branch logic with a check that treats non-dict
values as absent—e.g., if "stream_options" not in payload or not
isinstance(payload["stream_options"], dict) then assign
payload["stream_options"] = {"include_usage": True}; otherwise set
payload["stream_options"]["include_usage"] = True. This touches the block
referencing payload, stream_options, and include_usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 216-221: The code assumes json_obj["response"] is a dict and calls
resp.get("usage"), which raises AttributeError if "response" is explicitly null;
in the event handler where event_type == "response.completed" (variables:
json_obj, resp, usage, ParsedResponse) ensure resp is a dict before calling .get
— e.g., set resp = json_obj.get("response") or {} or check isinstance(resp,
dict) and only then read usage; if no usage, return None, otherwise return
ParsedResponse(perf_ns=perf_ns, data=None, usage=usage).

---

Duplicate comments:
In `@docs/tutorials/openai-responses.md`:
- Around line 74-91: The fenced code block that shows the AIPerf System sample
output is missing a language identifier; update that Markdown block by adding
the language tag "text" to the opening triple backticks (i.e., change ``` to
```text for the AIPerf System sample output block) so markdownlint rule MD040 is
satisfied.

---

Nitpick comments:
In `@src/aiperf/endpoints/openai_responses.py`:
- Around line 110-118: The conditional assigns item["content"] using a redundant
ternary fallback because the outer checks already ensure turn.texts[0].contents
has length 1; remove the unreachable else "" and directly assign item["content"]
= turn.texts[0].contents[0] inside the same if block (locate the assignment near
the if that checks len(turn.texts)==1, len(turn.texts[0].contents)==1,
len(turn.images)==0, len(turn.audios)==0).
- Line 39: Extract the long inline ValueError messages in
src/aiperf/endpoints/openai_responses.py into dedicated exception classes and
raise those instead: create two new exception subclasses (e.g.,
MissingTurnsError and MissingResponsesError) in the same module or a central
exceptions module, move the messages currently used at the raise sites in the
function(s) that raise ValueError (the one at the top of openai_responses.py and
the one around lines 140-142) into the respective exception class docstring or
default message, replace the raise ValueError(...) calls with raise
MissingTurnsError() and raise MissingResponsesError(), and update any
imports/tests to reference the new exception types so behavior and tests remain
consistent.
- Around line 61-71: When model_endpoint.endpoint.streaming and
model_endpoint.endpoint.use_server_token_count are true, ensure
payload["stream_options"] gets include_usage even if extra set stream_options to
None: replace the current three-branch logic with a check that treats non-dict
values as absent—e.g., if "stream_options" not in payload or not
isinstance(payload["stream_options"], dict) then assign
payload["stream_options"] = {"include_usage": True}; otherwise set
payload["stream_options"]["include_usage"] = True. This touches the block
referencing payload, stream_options, and include_usage.

In `@tests/unit/endpoints/test_responses_endpoint.py`:
- Around line 620-660: The test test_parse_response_extracts_usage currently
only asserts parsed.usage for the "full_response_with_text" case but must also
verify parsed.data is populated when output is present; update the test case
(the param with id "full_response_with_text") to assert that parsed.data is not
None and contains the expected message ("Hello") or matches the original output
(e.g., check parsed.data includes the value created by _msg("Hello")), so that
endpoint.parse_response / _parse_full_response failures returning data=None will
be caught.
- Around line 349-384: Add a parametrize case to
test_format_payload_stream_options_with_server_token_count that covers extra
containing ("stream_options", None) when streaming is True and
use_server_token_count=True so the test verifies format_payload
(ResponsesEndpoint.format_payload via create_endpoint_with_mock_transport and
_responses_model_endpoint) will inject include_usage into a None stream_options;
add a param like: streaming=True, extra=[("stream_options", None)],
expected={"include_usage": True} (give a clear id such as
"none_adds_include_usage") so the test asserts payload["stream_options"] ==
expected.

Comment on lines +216 to +221
if event_type == "response.completed":
resp = json_obj.get("response", {})
usage = resp.get("usage") or None
if usage:
return ParsedResponse(perf_ns=perf_ns, data=None, usage=usage)
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

resp.get("usage") raises AttributeError when "response": null is explicit in the event.

dict.get(key, default) only uses the default when the key is absent — if json_obj["response"] is explicitly null/None, resp is None and resp.get("usage") throws.

🛡️ Proposed fix
-        resp = json_obj.get("response", {})
+        resp = json_obj.get("response") or {}
         usage = resp.get("usage") or None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/endpoints/openai_responses.py` around lines 216 - 221, The code
assumes json_obj["response"] is a dict and calls resp.get("usage"), which raises
AttributeError if "response" is explicitly null; in the event handler where
event_type == "response.completed" (variables: json_obj, resp, usage,
ParsedResponse) ensure resp is a dict before calling .get — e.g., set resp =
json_obj.get("response") or {} or check isinstance(resp, dict) and only then
read usage; if no usage, return None, otherwise return
ParsedResponse(perf_ns=perf_ns, data=None, usage=usage).

@aarkharov
Copy link
Contributor

Hey @ajcasagrande , this is Andrew with Meta. Any plans to merge this PR? We are evaluating aiperf for our benchmarking and would like to use responses API. I tested your PR on a deployment and works great.

<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be update to use the fern mdx formatting now.

Copy link
Contributor

@debermudez debermudez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline review comments for two issues found during branch review.

return None

# All other events (response.created, response.in_progress, etc.)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential compatibility issue: this ignores all *.done events, including response.output_text.done. Some providers may emit final text only in done events (or miss deltas under load), which can lead to empty recorded content. Consider parsing response.output_text.done as a fallback when it includes text.

SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs discoverability: this tutorial is useful, but I could not find it linked from the README.md tutorial index tables. Please add a README tutorial entry so users can discover responses endpoint guidance from the main docs landing page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants