Skip to content

[FEATURE] Implement Vision Capabilities for A2A servers OpenAI-Compatible Agent #117

@edenreich

Description

@edenreich

Summary

Some OpenAI-compatible providers models support vision, it would be great if an A2A server gets out of the box vision capabilities. For example a browser-agent might want to solve captchas, and it needs vision.

It's important to note that while the Inference Gateway supports those models, not all providers have them, so an error will be thrown when the chosen model is attaching an image to the payload - the operator have to choose the right model for the right tasks.

Acceptance Criteria

  • The A2A internal agent supports sending images as base64 or image urls to OpenAI-compatible APIs
  • The default agent support it without the need to implement custom code
  • It's also possible to run it with custom tasks - the user gets the OpenAI compatible agent and they can decide how to use it
  • It's documented
  • It's tested

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions