Skip to content

Model AI chat events as a list of request/response messages, with each message containing a list of parts #1913

@alexmojaki

Description

@alexmojaki

Area(s)

area:gen-ai

What's missing?

  • As pointed out in GenAI handle tool message embedded within user message #1883, the current events don't match the API request structure when a message contains a combination of text and tool call responses.
  • The naming and separation of events is messy and confusing:
    • There's different events for user and system messages, but the distinction is very artificial. The bodies have the same structure, the only difference is the role, which is also present in the body anyway, so a single event name would have worked.
    • genai: handle system role renamed to developer in openai #1877 shows that these roles change over time and aren't reliable enough to be embedded into the event name which needs to be very stable. One day new developers working with OpenAI will only be familiar with the role being called developer instead of system and won't understand why the event is called gen_ai.system.message.
    • Assistant message events can simultaneously contain text content with any number of tool calls, but for user messages these are split into multiple events. Why the inconsistency? Why not an event per tool call?
    • The event name gen_ai.tool.message doesn't make it clear that it means the result of a tool call, rather than the tool call itself. In other words, it's not clear at a glance whether it's sent by the user or assistant.
  • There's no clear place for multi-modal content (Support for Multi modal inputs and generations #1556), e.g. a message containing both text and images.
  • Tool calls have a type field which should apparently always be function, so its purpose is not clear.

Describe the solution you'd like

There needs to be a conceptual hierarchy, where a request consists of a list of messages, and a message consists of a list of 'parts'. Here's one way the events could look:

  • One event per message in the request, each with the same event name, e.g. gen_ai.message.
  • Each message event has role and content keys in the body, similar to the current events.
  • role is required and is used to distinguish between user, system, and assistant messages.
  • content in the body is an array of parts.
  • Each part is an object with a type field. Some possible values for type are text, image, tool_call, and tool_response.
    • If needed and possible, this field should account for multiple different types of tool call that the existing type field seems to be meant for.
  • The separate tool_calls array in the bodies of assistant and choice events are removed in favour of tool_call parts in the content array.
  • User and assistant messages (including the response choice events) all have the same structure, except that the choice events have additional fields (index, finish_reason) that don't make sense in the request events.

Support for this data model:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions