Skip to content

fix: forward MCP tool images to LLM context#2180

Open
gary149 wants to merge 1 commit intomainfrom
fix/mcp-tool-images-in-llm-context
Open

fix: forward MCP tool images to LLM context#2180
gary149 wants to merge 1 commit intomainfrom
fix/mcp-tool-images-in-llm-context

Conversation

@gary149
Copy link
Collaborator

@gary149 gary149 commented Mar 13, 2026

Summary

  • MCP tools can return image content blocks ({ type: "image", data, mimeType }), but these were only displayed in the UI — never forwarded to the LLM in the follow-up turn
  • OpenAI's role: "tool" messages only accept string | Array<TextPart>, so a separate role: "user" message is the only way to inject images into the LLM context
  • Adds a toToolImagePart helper that converts MCP ImageContent blocks into OpenAI-compatible image_url parts, and injects them as a user message when the model supports multimodal input

Changes

  • toolInvocation.ts: New ToolImagePart type, toToolImagePart() converter, image extraction in the collation loop, placeholder text when output is empty but images exist
  • runMcpFlow.ts: When mmEnabled and tool images are present, appends a role: "user" message with image parts after tool results; adds toolImageCount to logger

Test plan

  • npm run check — no type errors
  • npm run lint — passes
  • Manual test: use an MCP tool that returns images (e.g., screenshot tool), verify the model can describe the image content in its follow-up response

MCP tools can return image content blocks, but these were only passed to
the UI for display and never forwarded to the LLM in the follow-up turn.
Since OpenAI's `role: "tool"` messages only accept text, inject a
separate `role: "user"` message containing the image parts when the
model supports multimodal input.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a6780d301

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return undefined;
return {
type: "image_url",
image_url: { url: `data:${obj.mimeType};base64,${obj.data}`, detail: "auto" },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound MCP image payloads before adding data URLs

This constructs image_url parts from raw MCP image blocks without any size or mime normalization, so a tool that returns a large/unsupported image (for example a full-resolution screenshot) will be forwarded verbatim and can cause the follow-up chat.completions.create call to fail on payload/image validation. In the same flow, user-uploaded images are constrained via makeImageProcessor (maxSizeInMB, width/height), so tool-returned images need equivalent checks or preprocessing before being appended to toolImages.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant