-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Label attached images so agent can understand in-message labels #8950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
charley-oai
merged 13 commits into
main
from
dev/ccunningham/improve-pasted-images-prompt
Jan 10, 2026
Merged
Label attached images so agent can understand in-message labels #8950
charley-oai
merged 13 commits into
main
from
dev/ccunningham/improve-pasted-images-prompt
Jan 10, 2026
+798
−606
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aibrahim-oai
reviewed
Jan 9, 2026
aibrahim-oai
reviewed
Jan 9, 2026
aibrahim-oai
reviewed
Jan 9, 2026
aibrahim-oai
reviewed
Jan 9, 2026
aibrahim-oai
approved these changes
Jan 9, 2026
Collaborator
aibrahim-oai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Code feels cleaner. Let's add integration tests to make sure we send the right request shape to model. We persist the right thing to the rollout file. Maybe a UI snapshot as well.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Agent wouldn't "see" attached images and would instead try to use the view_file tool:

In this PR, we wrap image content items in XML tags with the name of each image (now just a numbered name like
[Image #1]), so that the model can understand inline image references (based on name). We also put the image content items above the user message which the model seems to prefer (maybe it's more used to definitions being before references).We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before

After

[ { "id": "single_describe", "prompt": "Describe the attached image in one sentence.", "images": ["image_a.png"] }, { "id": "single_color", "prompt": "What is the dominant color in the image? Answer with a single color word.", "images": ["image_b.png"] }, { "id": "orientation_check", "prompt": "Is the image portrait or landscape? Answer in one sentence.", "images": ["image_c.png"] }, { "id": "detail_request", "prompt": "Look closely at the image and call out any small details you notice.", "images": ["image_d.png"] }, { "id": "two_images_compare", "prompt": "I attached two images. Are they the same or different? Briefly explain.", "images": ["image_a.png", "image_b.png"] }, { "id": "two_images_captions", "prompt": "Provide a short caption for each image (Image 1, Image 2).", "images": ["image_c.png", "image_d.png"] }, { "id": "multi_image_rank", "prompt": "Rank the attached images from most colorful to least colorful.", "images": ["image_a.png", "image_b.png", "image_c.png"] }, { "id": "multi_image_choice", "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.", "images": ["image_b.png", "image_d.png"] } ]Fixes issue #8523