Skip to content

fix: #1104 preserve incomplete status for rejected function tool outputs#1110

Draft
seratch wants to merge 3 commits intomainfrom
fix/issue-1104-incomplete-tool-status
Draft

fix: #1104 preserve incomplete status for rejected function tool outputs#1110
seratch wants to merge 3 commits intomainfrom
fix/issue-1104-incomplete-tool-status

Conversation

@seratch
Copy link
Copy Markdown
Member

@seratch seratch commented Mar 20, 2026

This pull request resolves #1104. It fixes inconsistent approval-rejection handling so rejected function tool outputs are no longer persisted or projected as if they completed successfully across agents-core, Conversations-backed OpenAI sessions, and Realtime transport/history flows.

Behaviorally, this change:

  • marks manually rejected function tool outputs as status: 'incomplete' while keeping successful executions on completed
  • keeps OpenAI Responses/Conversations round-trips aligned by verifying the rejected status survives persistence and reload
  • aligns Realtime outbound function_call_output payloads and local item_update projections with the same rejection status

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 20, 2026

🦋 Changeset detected

Latest commit: a45d3f6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@openai/agents-core Patch
@openai/agents-openai Patch
@openai/agents-realtime Patch
@openai/agents-extensions Patch
@openai/agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@seratch
Copy link
Copy Markdown
Member Author

seratch commented Mar 20, 2026

Quick update on the investigation so far:

  • function_call_output.status = "incomplete" is accepted by the OpenAI APIs as a valid request value.
  • However, in our live probes, OpenAI-managed history did not preserve that value:
    • Responses API input_items came back as completed
    • Conversations API items.create / items.list also came back as completed
  • We also did not see reliable evidence that the model behavior changes based on incomplete versus completed in these flows.

So at this point, the client-side SDK can represent incomplete locally, but once the flow depends on OpenAI-managed server-side history, the behavior appears inconsistent. We plan to discuss these server-side findings with the owning team before deciding on the right SDK behavior. Because of that, we should not merge this PR yet.

@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for 10 days with no activity.

@github-actions github-actions Bot added this to the 0.8.x milestone Apr 13, 2026
Copy link
Copy Markdown
Contributor

@wsk-builds wsk-builds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up and for leaving the investigation note about managed-history behavior. I did a cross-PR pass against #1104 and #1191, and also ran the targeted tests locally:

  • pnpm -F @openai/agents-core test runner/toolExecution.test.ts
  • pnpm -F @openai/agents-openai test openaiResponsesModel.helpers.test.ts hitlOpenAIConversationsSession.test.ts
  • pnpm -F @openai/agents-realtime test openaiRealtimeBase.test.ts realtimeSession.test.ts

All passed.

My read: using status: "incomplete" for a human-rejected approval is consistent with #1104. The important distinction is that the tool did not actually run, so completed gives the model a structurally successful signal that conflicts with the rejection text. The core and Realtime changes both align with that interpretation, and the helper tests cover preserving the value through our local conversion layer.

I do think it is worth keeping the server-managed-history caveat visible before merge. The note in this PR says Responses/Conversations managed history may normalize incomplete back to completed, so this PR improves the SDK-local / request-side representation but should not be presented as a guaranteed persisted signal when OpenAI owns the transcript. A short comment in the PR description, release note, or a nearby test name would help avoid future confusion.

For #1191: I do see a semantic tension, but not necessarily a direct conflict if we frame them separately. #1110 is about a deliberate human rejection, where incomplete is the more honest model-visible state. #1191 is trying to rebalance an already-aborted managed conversation by synthesizing an output for a provider-persisted call; if that PR keeps completed for compatibility, it should explicitly justify that as an abort-recovery/server-transcript constraint rather than treating aborts and rejected approvals as the same status category.

Changeset coverage looks reasonable for the packages whose behavior/API surface changes, and the unit coverage is good for the SDK-local behavior. I would not block this PR on broader live API probes, but I would keep the managed-history limitation documented because the current tests cannot prove that server-side history preserves incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rejected tool calls use status: 'completed' in function_call_result, causing model hallucinations

2 participants