How to transport multimodel messages like images, files?

Currently, AG-UI only supports text-based messages. However, in recent practical agent applications, the ability to handle multimodal input/output messages—even general files—has become increasingly critical. Imagine an agent designed for creating presentations: users input text, images, and video materials, and the agent retrieves relevant information to generate a complete PowerPoint file. While using custom extensions can temporarily support such functionality, it introduces fragmentation at the protocol level.

I propose establishing certain standardized specifications at the protocol layer to unify support and avoid confusion.

Supported Content:

User Input:
- Text
- Audio
- Images
- General Files

Agent Output (`AssistantMessage`, `UserMessage`, `ToolMessage`):
- Text
- Audio
- Images
- General Files

Similar issues:
#26 #77 #280

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to transport multimodel messages like images, files? #126

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to transport multimodel messages like images, files? #126

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions