New design for Multimodal Messages

https://python.langchain.com/docs/how_to/multimodal_inputs/

There are currently several leading approaches to presenting a series of messages:
```
AnthropicMessage
  Content = IList<ContentBlock = OneOf<Text, Image, ToolUse, ToolResult>>
  Other non content properties

OllamaMessage
  Content = string
  Images = IList<string>
  ToolCalls = IList<ToolCall>

OpenAiMessage  // Each Role Has Different Content
  System
    Content = IList<ContentPart = OneOf<Text>>
  User
    Content = IList<ContentPart = OneOf<Text, Image>>
  Assistant
    Content = IList<ContentPart = OneOf<Text, Refusal>>
    ToolCalls
  Tool
    Content = IList<ContentPart = OneOf<Text>>

GoogleMessage
  Content = IList<ContentPart = OneOf<Text, Blob = (byte[], string MimeType)>>
```
I like the simplicity of Anthropic, but I would change Block to ContentPart 

So far in LangChain I see it as:
```
Message
Content = IList<ContentPart = OneOf<Text, Image, ToolUse, ToolResult, Blob, Video>>
```
or as separate messages that don't allow parts inside
```
TextMessage
ImageMessage
ToolUseMessage
ToolResultMessage
BlobMessage // allow you to specify a MimeType 
VideoMessage

When the user returns multiple parts, we just use two messages in a row 
```


OpenAI also has changes to the message structure in the Realtime API
I will add to this taking into account the changes in the OpenAI Realtime API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New design for Multimodal Messages #73

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

New design for Multimodal Messages #73

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions