-
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
enhancementNew feature or requestNew feature or request
Description
https://python.langchain.com/docs/how_to/multimodal_inputs/
There are currently several leading approaches to presenting a series of messages:
AnthropicMessage
Content = IList<ContentBlock = OneOf<Text, Image, ToolUse, ToolResult>>
Other non content properties
OllamaMessage
Content = string
Images = IList<string>
ToolCalls = IList<ToolCall>
OpenAiMessage // Each Role Has Different Content
System
Content = IList<ContentPart = OneOf<Text>>
User
Content = IList<ContentPart = OneOf<Text, Image>>
Assistant
Content = IList<ContentPart = OneOf<Text, Refusal>>
ToolCalls
Tool
Content = IList<ContentPart = OneOf<Text>>
GoogleMessage
Content = IList<ContentPart = OneOf<Text, Blob = (byte[], string MimeType)>>
I like the simplicity of Anthropic, but I would change Block to ContentPart
So far in LangChain I see it as:
Message
Content = IList<ContentPart = OneOf<Text, Image, ToolUse, ToolResult, Blob, Video>>
or as separate messages that don't allow parts inside
TextMessage
ImageMessage
ToolUseMessage
ToolResultMessage
BlobMessage // allow you to specify a MimeType
VideoMessage
When the user returns multiple parts, we just use two messages in a row
OpenAI also has changes to the message structure in the Realtime API
I will add to this taking into account the changes in the OpenAI Realtime API
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request