Replies: 1 comment 3 replies
-
They can return any object |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
With the dawn of multi modal LLMs it is not only important that they can output multiple content types but that they can also receive it as its input.
In agents, Tools are one of the main communication mechanisms with the LLM but they are currently limited to only text output. It would be ideal to support tools which can output a combination of different types of outputs such text and images.
Motivation
I have stumbled on some scenarios where I have tools that retrieve images that need to be given to the LLM. Doing so it is currently not possible.
Proposal (If applicable)
All the most popular LLMs (Claude, Gemini, OpenAI) support mixed multimedia input as its prompt, where they can receive a combination of multiple text inputs and multiple images mixed together in the same prompt. Langchain currently supports this type of input in messages by leveraging the
MessageContentComplex[]
type as its input.It would be ideal and practical to implement in a backwards compatible fashion a new method in the Tool's interface that supports a type similar to
MessageContentComplex[]
as its return type.This solution will enable tools to be as dynamic in their output as the messages themselves and will allow them to naturally evolve with new different types of outputs as the LLMs themselves improve their multi modal capabilities.
Beta Was this translation helpful? Give feedback.
All reactions