Skip to content

BinaryImage causes token explosion when passed to marvin.cast() #1246

@zzstoatzz

Description

@zzstoatzz

Problem

When passing a BinaryImage to marvin.cast(), the resulting API request exceeds token limits even for moderately-sized images.

Reproduction

Using a 1600x1200 PNG image (~200KB on disk):

from pathlib import Path
from pydantic_ai.messages import BinaryImage
import marvin
from pydantic import BaseModel

class ModerationResult(BaseModel):
    is_safe: bool
    explanation: str

image_bytes = Path("handguns.png").read_bytes()
binary_image = BinaryImage(data=image_bytes, media_type="image/png")

moderator_agent = marvin.Agent(
    name="Content Moderator",
    model="anthropic:claude-sonnet-4-5",
    model_settings={"temperature": 0},
)

result = marvin.cast(
    binary_image,
    target=ModerationResult,
    instructions="Analyze this image",
    agent=moderator_agent,
)

Error:

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 207768 tokens > 200000 maximum'}}

Expected Behavior

The same image works perfectly when using pydantic-ai directly:

from pydantic_ai import Agent
from pydantic_ai.messages import BinaryImage

moderator_agent = Agent[None, ModerationResult](
    "gateway/anthropic:claude-sonnet-4-5",
    output_type=ModerationResult
)

result = await moderator_agent.run(
    ["Analyze this image", binary_image],
    instructions="Analyze this image",
    model_settings={"temperature": 0},
)
# Works fine - no token issues

Analysis

Something in marvin's request construction when handling BinaryImage is causing the image data to be sent in a way that consumes 207K+ tokens, when the same image via pydantic-ai directly uses a normal amount of tokens.

Possibly marvin is encoding the image differently, duplicating it in the request, or not properly utilizing Anthropic's image handling?

Environment

  • marvin: 2.3.9
  • pydantic-ai: latest
  • anthropic: latest (via pydantic-ai)
  • model: claude-sonnet-4-5

Impact

This makes image analysis completely unusable via marvin.cast() for any real-world images, as even small images exceed token limits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions