-
Notifications
You must be signed in to change notification settings - Fork 3.1k
fix: preserve multimodal content during handoffs with nest_handoff_hi… #2233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: preserve multimodal content during handoffs with nest_handoff_hi… #2233
Conversation
…story When nest_handoff_history=True (the default), multimodal content (images, files, audio) from user messages was being lost during handoffs because the content was converted to a plain text summary. This fix: - Extracts multimodal content from user messages before summarization - Adds the multimodal content as a separate user message after the summary - Improves the text summary to show '[N image(s) attached]' instead of raw JSON Fixes issue where target agents lose access to uploaded images during same-turn handoffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Add marker system to track preserved multimodal content: - Add _PRESERVED_MULTIMODAL_MARKER constant to mark preserved messages - Skip already-preserved messages during extraction - Add _collect_preserved_multimodal_content() to carry forward existing preserved content across chained handoffs - Add test for chained handoffs scenario This addresses the P1 code review feedback about duplicate conversation turns across chained handoffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4223ce12f5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue where multimodal content (images, files, audio) was being lost during agent handoffs when using nest_handoff_history=True (the default behavior). The solution preserves multimodal content from user messages by extracting it and including it as a separate user message alongside the conversation summary.
Changes:
- Modified
default_handoff_history_mapper()to extract and preserve multimodal content from user messages during handoffs - Added helper functions to extract, collect, and format multimodal content with deduplication support for chained handoffs
- Enhanced
_stringify_content_list()to provide human-readable summaries indicating presence of images, files, and audio
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/agents/handoffs/history.py | Core implementation: added multimodal content preservation logic including extraction, collection, and formatting functions; uses marker to prevent duplication in chained handoffs |
| tests/test_extension_filters.py | Added 7 comprehensive tests covering image, file, and audio preservation; backward compatibility; and chained handoff deduplication |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… content validation - Filter out messages with __multimodal_preserved__ marker from summaries to prevent confusing output and duplicate lines in chained handoffs - Add validation for multimodal content types in _collect_preserved_multimodal_content to ensure consistency - Handle output_text type in _stringify_content_list for consistent behavior with assistant messages - All 34 related tests pass with no linting errors
t log --oneline -5 Merge branch 'fix/image-loss-during-handoffs' of https://github.com/saksham-1304/openai-agents-python into fix/image-loss-during-handoffs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@seratch Please review the PR |
|
Thanks for sending this patch! I can see this issue caused by nested handoff approach, but I am not convinced this is the best solution. Given the nature of nested handoffs, it does make sense to include different types of items as you suggest, but using a marker name feels like a fragile workaround, even if it happens to work with the server right now. My understanding is that this is a fundamental limitation of nested handoffs, and we plan to move this logic to an opt-in feature starting from the next minor release. Because of that, I do not plan to pursue this enhancement in the short term. For reference, here is a thorough review by Codex, and I agree with most of the points raised there as well: Findings
Open questions / assumptions
Alternative approaches
Overall, the goal (preventing multimodal loss and avoiding huge JSON summaries) is solid, and the tests cover the main scenarios. The main risk is the name marker compatibility and the loss of per-turn attachment grouping. If those two are addressed, this would be a clean and reasonable solution. |
Fix: Preserve multimodal content during same-turn handoffs
Fix #2220
Summary
This PR fixes an issue where multimodal content (images, files, audio) uploaded by users was being lost when the agent performed a handoff to another agent in the same turn with
nest_handoff_history=True(the default behavior).Problem: When a user uploaded an image and the first agent handed off to a specialist agent, the target agent lost access to the uploaded image because the
nest_handoff_historyfunction converted all content to a plain text summary, discarding the multimodal data.Solution: Modified
nest_handoff_historyto:Changes
Core Fix
_MULTIMODAL_CONTENT_TYPESconstant to identify multimodal types_PRESERVED_MULTIMODAL_MARKERconstant to mark preserved messages and prevent duplicationdefault_handoff_history_mapper()to extract, preserve, and carry forward multimodal content_extract_multimodal_content()function to scan user messages for images, files, and audio_collect_preserved_multimodal_content()function to collect already-preserved content from previous handoffs_stringify_content_list()function to provide human-readable summaries of multimodal contentTests
Testing
✅ All 20 existing extension filter tests pass
✅ All 14 handoff tool tests pass
✅ All 17 run step processing tests pass
✅ All 7 new multimodal preservation tests pass (including chained handoffs test)
✅ Code passes
ruff formatandruff check✅ Code passes
mypy --ignore-missing-imports --no-warn-unused-ignoresTest Plan:
pytest tests/test_extension_filters.py -vpytest tests/test_handoff_tool.py tests/test_run_step_processing.py -vruff format && ruff checkmypy src/agents/handoffs/history.py tests/test_extension_filters.py --ignore-missing-importsImpact
Backward Compatible: Yes. Text-only conversations work exactly as before. Multimodal content is now additionally preserved when present.
Example Usage:
Related Issue
Resolves the issue reported about multimodal content being lost during same-turn handoffs with
nest_handoff_history=True.Checklist
ruff formatruff checkmypy