Skip to content

fix(core): replay aborted managed tool outputs#1191

Draft
MukundaKatta wants to merge 2 commits intoopenai:mainfrom
MukundaKatta:codex/agents-js-abort-tool-cleanup
Draft

fix(core): replay aborted managed tool outputs#1191
MukundaKatta wants to merge 2 commits intoopenai:mainfrom
MukundaKatta:codex/agents-js-abort-tool-cleanup

Conversation

@MukundaKatta
Copy link
Copy Markdown

Summary

  • queue synthetic function_call_result items when a streamed managed-conversation run aborts after OpenAI Responses function_call events have already been emitted
  • replay those synthetic outputs on the next conversationId turn so the server-side conversation store stays balanced
  • add a streaming regression test that aborts after a raw function_call event and verifies the next run sends the synthetic output

Testing

  • git diff --check
  • npm test -- --run packages/agents-core/test/run.stream.test.ts (blocked locally because vitest is not installed in this clone)

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 21, 2026

🦋 Changeset detected

Latest commit: 6e5df89

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@openai/agents-core Patch
@openai/agents-extensions Patch
@openai/agents-openai Patch
@openai/agents-realtime Patch
@openai/agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 081a5a386c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1321 to +1324
if (serverConversationTracker?.conversationId && inputMarked) {
clearManagedConversationSupplementalItems(
serverConversationTracker.conversationId,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear pending abort outputs on every successful turn

clearManagedConversationSupplementalItems is called only in the streaming path, so abort outputs queued in pendingManagedConversationAbortItems survive successful non-streaming conversationId turns. Because getManagedConversationSupplementalItems always prepends those pending items, a later fresh runner.run(..., { conversationId }) can resend the same synthetic function_call_result again, which can rebalance the transcript incorrectly or trigger duplicate-call errors from the provider. Please clear the queue after successful non-streaming model calls as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e5df89e5d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ProcessedResponse<any>,
AgentInputItem[]
>();
const pendingManagedConversationAbortItems = new Map<string, AgentInputItem[]>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound pending abort cache to avoid unbounded growth

The new pendingManagedConversationAbortItems cache is a process-global Map keyed by conversationId, but this commit adds no general eviction path for entries that are never replayed successfully. Since queueManagedConversationSupplementalItems keeps inserting IDs and cleanup currently depends on later turn success, aborted conversations that are never resumed can accumulate indefinitely in long-lived workers, causing memory growth over time. Please add a bounded eviction strategy (for example TTL/size cap, and/or broader clear paths) so orphaned conversation IDs do not stay resident forever.

Useful? React with 👍 / 👎.

@seratch seratch marked this pull request as draft April 23, 2026 23:11
Copy link
Copy Markdown
Contributor

@wsk-builds wsk-builds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I checked out the current draft locally and ran:

  • pnpm i
  • pnpm -F @openai/agents-core build-check
  • pnpm -F @openai/agents-core dist:check
  • CI=1 pnpm test -- packages/agents-core/test/run.stream.test.ts

All completed successfully; the Vitest command ended up running the full suite in this checkout and reported 111 test files / 1783 tests passing.

I agree this is an important edge case for conversationId + streaming aborts: if a Responses function call has already been persisted server-side, the next turn needs some way to avoid leaving the managed conversation in an unbalanced state.

I think the main risk is the current ownership/lifetime of the pending abort items. pendingManagedConversationAbortItems is a module-level Map keyed only by conversationId, so the cleanup state can outlive a Runner/request and can also be shared across unrelated callers in the same process. That makes a few cases hard to reason about: concurrent runs using the same conversation id, aborted runs that are never resumed, and long-lived server processes where pending entries may remain indefinitely. I think this state should either be tied to RunState / the runner/session lifecycle, or the PR should explicitly document why process-global conversation-id state is safe here.

I also think the synthesized output semantics need another look. The PR turns a raw function_call into a function_call_result with status: "completed" and output text "aborted". A user abort is not really a completed tool execution, and this overlaps with the status/history concerns discussed in #1104 / #1110. If completed is required for server-managed history compatibility, it would be good to call that out and add a test/probe explaining why incomplete or another representation is not viable. Otherwise the model-visible history may incorrectly imply the tool succeeded.

For tests, the current regression test covers the basic replay path, but I think this needs a broader matrix before merge:

  • multiple function calls emitted before abort
  • abort before a usable call_id is available
  • previousResponseId mode remains a no-op
  • pending items are cleared after a successful replay
  • pending items do not get lost or duplicated if the next run fails
  • concurrent runs with the same conversationId
  • interaction with existing managed conversation supplemental handoff items

One smaller thing: the changeset package looks right as @openai/agents-core, but the summary should probably use the repo's Conventional Commit style, e.g. fix(core): replay aborted managed conversation tool outputs.

Overall, I think the PR is pointed at a real bug, but I'd prefer to see the pending-item ownership and synthesized-result semantics clarified before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants