Skip to content

Python: [Bug]: AG-UI reasoning and multimodal media parsing doesn't correctly follow the specification #5340

@Rickyneer

Description

@Rickyneer

Description

The AG-UI specification states that the role for reasoning can only ever be "reasoning", while it is currently assistant (see line 599 and 616 of https://github.com/microsoft/agent-framework/blob/main/python/packages/ag-ui/agent_framework_ag_ui/_run_common.py using "assistant" as role).

I can trace back that this was an issue with the AG-UI SDK which seems to have been solved as per ag-ui-protocol/ag-ui#1193 in version 0.1.16.
As a result, thinking messages will not be rendered correctly when the reasoning message event is exchanged with an AG-UI compatible client.

Additionally, function _parse_multimodal_media_part (at https://github.com/microsoft/agent-framework/blob/main/python/packages/ag-ui/agent_framework_ag_ui/_message_adapters.py#L266) parses multimodal media incorrectly. As per the AG-UI specification that describes the format being output, it should be the field value that contains the data, not data (although there is an older deprecated format that does store the data in the data field. However, that does not align with the intent of the implementation, as it looks like that intent was to support the newer non-deprecated format.
As a result, any AI agent that is reachable through the AG-UI endpoint will not see any multimodal media submitted by the user.

Code Sample

Not applicable

Error Messages / Stack Traces

Not applicable

Package Versions

agent-framework-core: 1.0.1, agent-framework-ag-ui: 1.0.0b260409, agent-framework-openai: 1.0.1

Python Version

Python 3.12

Additional Context

-

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions