Tools should support multimedia output, not just text output. #4862

Altaflux · 2024-03-23T17:47:15Z

Altaflux
Mar 23, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

With the dawn of multi modal LLMs it is not only important that they can output multiple content types but that they can also receive it as its input.

In agents, Tools are one of the main communication mechanisms with the LLM but they are currently limited to only text output. It would be ideal to support tools which can output a combination of different types of outputs such text and images.

Motivation

I have stumbled on some scenarios where I have tools that retrieve images that need to be given to the LLM. Doing so it is currently not possible.

Proposal (If applicable)

All the most popular LLMs (Claude, Gemini, OpenAI) support mixed multimedia input as its prompt, where they can receive a combination of multiple text inputs and multiple images mixed together in the same prompt. Langchain currently supports this type of input in messages by leveraging the MessageContentComplex[] type as its input.

It would be ideal and practical to implement in a backwards compatible fashion a new method in the Tool's interface that supports a type similar to MessageContentComplex[] as its return type.

This solution will enable tools to be as dynamic in their output as the messages themselves and will allow them to naturally evolve with new different types of outputs as the LLMs themselves improve their multi modal capabilities.

hinthornw · 2024-03-23T19:34:59Z

hinthornw
Mar 23, 2024
Collaborator

They can return any object

3 replies

hinthornw Mar 23, 2024
Collaborator

Check out the structured tool docs

Altaflux Mar 23, 2024
Author

@hinthornw can you point me to that documentation? My understanding of StructuredTool purpose is that the LLM can pass structured object as an input to the tool, but the output of the tool still must be a string.

https://api.js.langchain.com/classes/langchain_core_tools.StructuredTool.html#call

All examples in the documentation only show tools that return strings too:
https://js.langchain.com/docs/modules/agents/tools/dynamic

Altaflux Mar 31, 2024
Author

After taking a look what make this feature hard to implement is that Tools inherit from Chains which are expected to only had text output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tools should support multimedia output, not just text output. #4862

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Tools should support multimedia output, not just text output. #4862

Uh oh!

Uh oh!

Altaflux Mar 23, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment · 3 replies

Uh oh!

hinthornw Mar 23, 2024 Collaborator

Uh oh!

hinthornw Mar 23, 2024 Collaborator

Uh oh!

Uh oh!

Altaflux Mar 23, 2024 Author

Uh oh!

Altaflux Mar 31, 2024 Author

Altaflux
Mar 23, 2024

Replies: 1 comment 3 replies

hinthornw
Mar 23, 2024
Collaborator

hinthornw Mar 23, 2024
Collaborator

Altaflux Mar 23, 2024
Author

Altaflux Mar 31, 2024
Author