Skip to content

How to transport multimodel messages like images, files? #126

@NiuBlibing

Description

@NiuBlibing

Currently, AG-UI only supports text-based messages. However, in recent practical agent applications, the ability to handle multimodal input/output messages—even general files—has become increasingly critical. Imagine an agent designed for creating presentations: users input text, images, and video materials, and the agent retrieves relevant information to generate a complete PowerPoint file. While using custom extensions can temporarily support such functionality, it introduces fragmentation at the protocol level.

I propose establishing certain standardized specifications at the protocol layer to unify support and avoid confusion.

Supported Content:

User Input:

  • Text
  • Audio
  • Images
  • General Files

Agent Output (AssistantMessage, UserMessage, ToolMessage):

  • Text
  • Audio
  • Images
  • General Files

Similar issues:
#26 #77 #280

Metadata

Metadata

Assignees

Labels

RoadmapThis feature or functionality should be added to the roadmap.proposalIf you'd like to propose adding something to the roadmap

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions