-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Bug Report
Description
When a ToolMessage contains multimodal content (text + images via image_url blocks), the @langchain/google converter stuffs the inlineData (base64 images) inside the functionResponse.response.result as serialized JSON. Gemini's functionResponse only accepts JSON — it cannot interpret inlineData nested inside it. The images are effectively invisible to the model.
Environment
@langchain/google: 0.1.5@langchain/core: 1.1.31- Model:
gemini-3-flash-preview(also affectsgemini-2.5-flash)
Reproduction
import { ToolMessage, AIMessage, HumanMessage } from '@langchain/core/messages'
const messages = [
new HumanMessage('What is in the attachment?'),
new AIMessage({
content: '',
tool_calls: [{ id: 'call_123', name: 'read_file', args: { path: 'image.png' } }]
}),
new ToolMessage({
content: [
{ type: 'text', text: 'Image: screenshot.png (133KB)' },
{
type: 'image_url',
image_url: { url: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUg==' }
}
],
tool_call_id: 'call_123',
name: 'read_file'
})
]Expected Behavior
The converted Gemini content should have inlineData as a sibling part alongside functionResponse, so Gemini can visually interpret the image:
{
"role": "function",
"parts": [
{
"functionResponse": {
"name": "read_file",
"response": { "result": "Image: screenshot.png (133KB)" }
}
},
{
"inlineData": {
"mimeType": "image/png",
"data": "iVBORw0KGgoAAAANSUhEUg=="
}
}
]
}Actual Behavior
inlineData is serialized as JSON inside functionResponse.response.result:
{
"role": "function",
"parts": [
{
"functionResponse": {
"name": "read_file",
"response": {
"result": "[{\"type\":\"text\",\"text\":\"Image: screenshot.png (133KB)\"},{\"inlineData\":{\"mimeType\":\"image/png\",\"data\":\"iVBORw0KGgoAAAANSUhEUg==\"}}]"
}
}
}
]
}The model receives the base64 string as text, not as visual data. It either hallucinates the image content or fails to interpret it.
Root Cause
In src/converters/messages.ts, both convertStandardContentMessageToGeminiContent and convertLegacyContentMessageToGeminiContent convert all ToolMessage content parts into a single functionResponse. When the content is multimodal (contains image_url blocks), the inlineData parts get serialized into the response.result JSON string rather than being extracted as sibling Gemini content parts.
Suggested Fix
When processing a ToolMessage, extract inlineData and fileData parts before building the functionResponse, then add them as sibling parts:
if (ToolMessage.isInstance(message) && message.tool_call_id) {
const mediaParts = parts.filter((p) => "inlineData" in p || "fileData" in p);
const textParts = parts.filter((p) => !("inlineData" in p) && !("fileData" in p));
const responseContent = typeof message.content === "string"
? message.content
: textParts.map((p) => p.text || "").filter(Boolean).join("\n") || JSON.stringify(message.content);
parts.length = 0;
parts.push(
{
functionResponse: {
id: message.tool_call_id,
name: message.name || "unknown",
response: { result: responseContent }
}
},
...mediaParts
);
}This keeps functionResponse as pure JSON (text only) and places inlineData/fileData as sibling parts where Gemini can interpret them visually.
Impact
Any agent that returns images from tool calls (e.g., reading files, fetching screenshots, downloading attachments) will have the images silently dropped when using Gemini models through @langchain/google. The model receives only the base64 string as text and either hallucinates the image description or fails.
Workaround
We are currently using a pnpm patch on @langchain/google to extract media parts from ToolMessage content before building the functionResponse.