-
Notifications
You must be signed in to change notification settings - Fork 278
Closed
Labels
Description
Area(s)
area:gen-ai
What's missing?
- As pointed out in GenAI handle tool message embedded within user message #1883, the current events don't match the API request structure when a message contains a combination of text and tool call responses.
- The naming and separation of events is messy and confusing:
- There's different events for user and system messages, but the distinction is very artificial. The bodies have the same structure, the only difference is the role, which is also present in the body anyway, so a single event name would have worked.
- genai: handle system role renamed to developer in openai #1877 shows that these roles change over time and aren't reliable enough to be embedded into the event name which needs to be very stable. One day new developers working with OpenAI will only be familiar with the role being called
developerinstead ofsystemand won't understand why the event is calledgen_ai.system.message. - Assistant message events can simultaneously contain text content with any number of tool calls, but for user messages these are split into multiple events. Why the inconsistency? Why not an event per tool call?
- The event name
gen_ai.tool.messagedoesn't make it clear that it means the result of a tool call, rather than the tool call itself. In other words, it's not clear at a glance whether it's sent by the user or assistant.
- There's no clear place for multi-modal content (Support for Multi modal inputs and generations #1556), e.g. a message containing both text and images.
- Tool calls have a
typefield which should apparently always befunction, so its purpose is not clear.
Describe the solution you'd like
There needs to be a conceptual hierarchy, where a request consists of a list of messages, and a message consists of a list of 'parts'. Here's one way the events could look:
- One event per message in the request, each with the same event name, e.g.
gen_ai.message. - Each message event has
roleandcontentkeys in the body, similar to the current events. roleis required and is used to distinguish between user, system, and assistant messages.contentin the body is an array of parts.- Each part is an object with a
typefield. Some possible values fortypearetext,image,tool_call, andtool_response.- If needed and possible, this field should account for multiple different types of tool call that the existing
typefield seems to be meant for.
- If needed and possible, this field should account for multiple different types of tool call that the existing
- The separate
tool_callsarray in the bodies of assistant and choice events are removed in favour oftool_callparts in thecontentarray. - User and assistant messages (including the response
choiceevents) all have the same structure, except that the choice events have additional fields (index, finish_reason) that don't make sense in the request events.
Support for this data model:
- The structure of Google's Gemini/Vertex request messages: https://github.com/googleapis/googleapis/blob/58be301346758c9a342de5632c3f9284d05c4b95/google/cloud/aiplatform/v1/content.proto#L80-L100
- OpenAI uses an array of parts in each message for image and audio inputs: https://platform.openai.com/docs/api-reference/chat/create
pydantic_aihasModelMessagewhich is eitherModelRequestorModelResponse, each of which has apartsfield which is a list of things likeTextPartorToolCallPart. This may seem like a biased example but the point is thatpydantic_aiworks with many different underlying AI APIs from different companies, and it translates between all those schemas andModelMessageout of necessity.
Kludex
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done