design: Add 0004-multimodal-i2t proposal by sangminwoo · Pull Request #674 · strands-agents/docs

sangminwoo · 2026-03-17T23:53:51Z

Description

Add design doc for multimodal image-to-text evaluation support in strands-evals SDK.

Introduces MultimodalOutputEvaluator extending OutputEvaluator to enable MLLM-as-a-Judge evaluation for image/document-to-text tasks. The evaluator constructs multimodal prompts using strands SDK ContentBlock format and supports both reference-based and reference-free evaluation across four dimensions: Overall Quality (P0), Correctness (P0), Faithfulness (P1), and Instruction Following (P1).

Key design decisions:

Extends OutputEvaluator to reuse rubric/model/system_prompt management
Built-in rubric templates + convenience subclasses per dimension
InputT=dict carries {"image": ImageData, "instruction": str}
ImageData supports file paths, base64, data URLs, bytes, PIL Images with JSON-safe serialization

Related Issues

strands-agents/evals Issue #128

Type of Change

New content

Checklist

I have read the CONTRIBUTING document
My changes follow the project's documentation style
I have tested the documentation locally using npm run dev
Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

afarntrog · 2026-03-19T18:42:00Z

designs/0004-multimodal-i2t-evaluation.md

+
+* `InputT=dict` is less type-safe than a dataclass (`MultimodalInput` TypedDict provides partial typing)
+* Multimodal judge calls are more expensive/slower than text-only (image tokens cost more)
+* Remote image sources (S3, HTTP URLs) require user to download before evaluation — no built-in fetching to avoid heavy dependencies (boto3, requests)


I think we can probably support remote images in this:

# Define cases with image data in input dict cases = [Case[dict, str]( input={"image": ImageData(source="chart.png"), "instruction": "What is the revenue trend?"}, )]

Good point. we can support HTTP URLs using urllib.request (stdlib), so no new dependency is needed. For S3 URIs, we can make boto3 an optional dependency:

HTTP/HTTPS: auto fetched via urllib.request

S3: auto fetched if boto3 is installed, error message otherwise

Does this makes sense to you?

design: Add 0004-multimodal-i2t proposal

920914b

sangminwoo requested a deployment to manual-approval March 17, 2026 23:54 — with GitHub Actions Waiting

sangminwoo had a problem deploying to manual-approval March 17, 2026 23:54 — with GitHub Actions Error

sangminwoo marked this pull request as draft March 18, 2026 00:08

sangminwoo marked this pull request as ready for review March 18, 2026 00:08

sangminwoo requested a deployment to manual-approval March 18, 2026 00:12 — with GitHub Actions Waiting

afarntrog reviewed Mar 19, 2026

View reviewed changes

support for remote image sources

a8b8d7d

sangminwoo requested a deployment to manual-approval March 21, 2026 00:30 — with GitHub Actions Waiting

sangminwoo had a problem deploying to manual-approval March 21, 2026 00:30 — with GitHub Actions Error

Add overall quality evaluator

1c10b8e

sangminwoo requested a deployment to manual-approval March 23, 2026 22:45 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: Add 0004-multimodal-i2t proposal#674

design: Add 0004-multimodal-i2t proposal#674
sangminwoo wants to merge 3 commits intostrands-agents:mainfrom
sangminwoo:main

sangminwoo commented Mar 17, 2026 •

edited

Loading

Uh oh!

afarntrog Mar 19, 2026

Uh oh!

sangminwoo Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sangminwoo commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Type of Change

Checklist

Uh oh!

afarntrog Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

sangminwoo Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sangminwoo commented Mar 17, 2026 •

edited

Loading