Support part chunk streaming #796

swapydapy · 2025-06-26T21:57:33Z

swapydapy
Jun 26, 2025
Collaborator

Background

A2A protocol currently supports streaming task updates and result artifacts. Currently the artifacts can be streamed as follows:

Multiple artifacts for a task can be sent over multiple TaskArtifactUpdateEvent events.
Individual parts for a given artifact can be sent over multiple TaskArtifactUpdateEvent events.

But a Part needs to be sent as a whole in each TaskArtifactUpdateEvent. The protocol doesn’t allow streaming the chunks of a part over multiple TaskArtifactUpdateEvents.

Reason: Since there's no partId to uniquely identify a part, there's no way to identify chunks, belonging to the same-part, across TaskArtifactUpdateEvent stream updates.

Need for Part chunk streaming

TextPart: A major streaming use case is for large text paragraphs. Within the current paradigm, the text result is one artifact and possibly just one text-part. But this would not allow it to be streamed. Unless the agent breaks the entire paragraph into 2-3 word parts. This is subpar experience, as it will create way too many part objects in the task artifact.
Inline image streaming: Current FilePart allows for sending images inline as bytes. Currently entire image bytes need to be sent at once. But with streaming, client can progressively render the image.
DataPart: Currently the user has to wait for the whole dictionary (key,value pairs) to be sent. While streaming keys of a dictionary is tricky and probably specific to how an agent does it, it still allows for quicker feedback for clients.

Example for text chunk streaming in current protocol:

Lets say I ask agent to write an story: Currently streaming would like:

Update 1:
Artifact A:
Parts = [{text: "There"}]
Append = false
lastChunk = false

Update 2:
Artifact A:
Parts = [{text: "was"}]
Append = true
lastChunk = false

Update 3:
Artifact A:
Parts = [{text: "a"}]
Append = true
lastChunk = false

Update 4:
Artifact A:
Parts = [{text: "jungle"}]
Append = true
lastChunk = false

Update 5:
Artifact A:
Parts = [{text: "- written"}]
Append = true
lastChunk = false

Update 6:
Artifact A:
Parts = [{text: "by me"}]
Append = true
lastChunk = true

=== Full Artifact on task completion:
Artifact A:
Parts = [
{text: "There"},
{text: "was"},
{text: "a"},
{text: "jungle"},
{text: "- written"},
{text: "by me"},
]

Design

Introduce 2 new optional fields in Part object:

partId to uniquely identify parts across TaskArtifactUpdateEvent stream updates.
1. Help assemble the chunk updates.
2. Also helps bookkeep parts across clients & servers.
lastChunk to indicate that no more chunk updates will be streamed by the agent.
3. The client can safely assemble the streamed chunks for this partId and process them.
4. The benefit of lastChunk is that the client doesn’t need to wait for the next update event to be streamed.

Update 1:
Artifact A:
Parts = [{partId: "mystory", text: "There"}]

Update 2:
Artifact A:
Parts = [{partId: "mystory", text: "was"}]
Append = false
lastChunk = false

Update 3:
Artifact A:
Parts = [{partId: "mystory", text: "a"}]
Append = true
lastChunk = false

Update 4:
Artifact A:
Parts = [{partId: "mystory", text: "jungle", lastChunk: true}]
Append = true
lastChunk = false

Update 5:
Artifact A:
Parts = [{partId: "credits", text: "- written by me", lastChunk: true}]
Append = false
lastChunk = true

=== Full Artifact on task completion:
Artifact A:
Parts = [
  {partId: "mystory", text: "There was a jungle"},
  {partId: "credits", text: "- written by me"},
]

Backward Compatibility

Both partId & lastChunk fields are optional. If partId is not provided, the client can treat each part-chunk as a new unique part.

lastChunk is optional, since client infer this by:

Either the lastChunk is set on the artifact itself.
Or a new partID is received in subsequent TaskArtifactUpdateEvent.

mikeas1 · 2025-06-27T17:39:57Z

mikeas1
Jun 27, 2025
Collaborator

My overall feeling is that this is just the Artifact wrapper by another name -- we already have the concept of an identifier for an Artifact, and the Parts of that Artifact are the chunks. It's not clear to me what this accomplishes more so than using Artifacts differently than you are suggesting. Rather than sticking everything in a single Artifact and identifying different components of it with partIds, just use separate Artifacts.

TextPart: A major streaming use case is for large text paragraphs. Within the current paradigm, the text result is one artifact and possibly just one text-part. But this would not allow it to be streamed. Unless the agent breaks the entire paragraph into 2-3 word parts. This is subpar experience, as it will create way too many part objects in the task artifact.

I'm not so sure this is actually a subpar experience. Developers can choose the level of chunking they provide on the TextPart -- it doesn't need to be 2-3 word parts. I expect this will probably come directly from their LLM API of choice, rather than a decision they make directly. LLM APIs have effectively the same design, where one text block is streamed in parts. The primary difference here appears to be whether the resulting Part chunks are concatenated together in the stored Artifact or stored separately, which comes down to overhead of the Part wrapper. I don't think that overhead is so high that it means streaming large text is intractable.

Inline image streaming: Current FilePart allows for sending images inline as bytes. Currently entire image bytes need to be sent at once. But with streaming, client can progressively render the image.

DataPart: Currently the user has to wait for the whole dictionary (key,value pairs) to be sent. While streaming keys of a dictionary is tricky and probably specific to how an agent does it, it still allows for quicker feedback for clients.

It seems like both of these can be accomplished with the existing Artifact + Parts. You send multiple DataParts or FileParts with "append": true. I'm not sure what the difference would be, other than the concatenation happens server side AND client side, instead of just client side. Clients receiving the stream still need to manage concatenating the chunks identified by the same partId together.

That said, I do think that there are some outstanding questions on how best to use Parts with Artifacts. Another interpretation of multiple TextParts in a single Artifact is that they are alternates, perhaps the same text but in different languages. It's not clear to me if that grouping is better than just adding multiple Artifacts, one per language (for example). If we see clear patterns or advantages for adding another layer of hierarchy to Parts, then I think this is a good option. It's a low-cost addition that's backwards compatible, which is ideal.

0 replies

swapydapy · 2025-06-28T20:16:38Z

swapydapy
Jun 28, 2025
Collaborator Author

I say both Artifact, Part have their importance is providing structure to an agent response.

Artifact purpose: A task might ask for multiple things from an agent. An artifact defines one of the resultant object to fulfill the overall goal.

For e.g: Plan a trip to Finland and do the bookings for flight, hotel & activities. In this case, below would be the expected artifacts.

Artifact A: Trip Itinerary.
Artifact B: Flight booking.
Artifact C: Hotel booking.
Artifact D: First Activity booking.
Artifact E: Second Activity booking.

Part purpose: They provide semantic structure to the artifact. And allow each part to be contain one logical unit information for that overall artifact.

Eg: Artifact: Flight booking.

Part A: (TextPart) Summary of the flight, seats type etc.
Part B: (TextPart) Directions on how to reach airport, how to hotel once your reach.
Part C: (FilePart) PDF invoice of the flight booking.
Part D: (DartPart) Ticket PNR, Time of flights, flight No, Seat No.

Treating parts as chunk, and artifacts as wrapper over a result type (text, file or data), has two downsides:

We lose this semantic structure of multiple resultant entities for a goal. All entities would need to be broken down into their data type constituents and streamed as artifacts. There will be no API level disctinction.
The parts (which are now treated as chunks) are persisted in the datastore as well psued through API. This will cause unnecessary clutter and booking.
Even when task is finished, both client & server would have to always assemble these parts to get the final result. The parts being treated as chunks was for streaming, but that chunking would still remain long after the task has been streamed.

Hence, the proposal for part level streaming. Even OpenAI has delta for text part, though a little convulated: https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/delta

Streaming chunks is a method to send over partial data to the client, so that they don't have to wait for the complete task response. I see three levels benefits of streaming:

Client can receive individual artifacts. They don't have to wait for all task artifacts to be sent and they can act on each artifact as they are sent over.
Clients can receive individual parts of a specific artifact. This way don't have to wait for entire artifact to be sent and instead can show/process a part, as it is received.
(Proposal) Clients can receive chunks for a part, can show realtime sneak-peak of the text/image/data part, without waiting for the whole part.

0 replies

DracoBlue · 2025-07-04T10:06:56Z

DracoBlue
Jul 4, 2025

I second @mikeas1's opinion here.

The TaskArtifactUpdateEvent has

append to say if the artifact needs to be replaced or appended with the given parts
lastChunk to say if it is complete

The Artifact has: artifactId, optional name and even optional description for everything related to the artifact.

This combination IMHO works for even streaming a pdf artifact or image artifact.

0 replies

mikeas1 · 2025-07-11T20:51:06Z

mikeas1
Jul 11, 2025
Collaborator

Ok, so breaking it down, I think there are two components to what you're proposing:

Adding an additional hierarchy layer to Artifacts, so that they can contain named parts.
Ability to stream updates to individual Parts.

I think my primary critique is on the first, but I'm not entirely sure the second is necessary. You can technically do the second without the first, you just identify a Part by its index in the parts array. That's how the OpenAI API does it.

My concern with names on Parts is that it's not clear that it's actually more useful. I'm not clear on why Artifact["Flight booking"].Part["flight summary"] is better than Artifact["Flight Booking/Summary"]. Any organization you make out of named chunks in artifacts can be equivalently created with just artifacts -- you just flatten out the hierarchy.

Giving chunks names just makes Artifacts a 2-level organizational hierarchy (level 1: the containing artifact, level 2: the named chunks within it). That raises the question of why 2 levels is the magic number -- why not 3? Artifact > Component > Chunk? You could say that Artifact["Flight Booking"] > Component["Directions"] > (Chunk"Directions to Airport", Chunk"Map to Airport") is a better organization. Why not arbitrary? Artifact > [Component > [Component]]?

Even when task is finished, both client & server would have to always assemble these parts to get the final result. The parts being treated as chunks was for streaming, but that chunking would still remain long after the task has been streamed.

This is true, but I don't know that it's a significant burden. You already need to do this when you're actively listening to the stream. You just do the same thing again when you retrieve it offline.

Even OpenAI has delta for text part, though a little convulated: https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/delta

Ok, interesting, so it looks like response = Task, output = Artifact, and Part ~ content. Indeed they can stream content values, though there are some differences: 1. content doesn't have names, it's just addressed by index, and 2. there are separate structures for each type of content, and separate streaming methods for each with semantics specifically tuned to that type.

I guess I'm just not sure that this actually simplifies anything. If you set aside the organization of parts within an artifact, the only functional difference here is whether we concat the Parts together at rest or require a client to do it. You still stream content out in effectively the same way, and the client still needs to deal with it.

0 replies

swapydapy · 2025-07-29T21:04:36Z

swapydapy
Jul 29, 2025
Collaborator Author

I think we need to agree on the meaning of each construct: Artifact, Part.

Artifacts

They denote one complete meaningful resultant entity. Ex:

So If user asked to create 3 images. Then output would be 3 artifacts, each artifact is an image.
If user asked to plan a trip. Then each artifact maps to one outcome of that planning:
- Artifact A: Flight booking
- Artifact B: Hotel Booking
- Artifact C: Activity 1 booking
- Artifact D: Activity 2 booking

Part

They individual section of the artifact which hold a complete meaning wihtin that artifact. That's why there are differnt types of Part supported: Text, File & Data (Dict). This allows an artifact to compose over multiple heterogenous Parts. Ex:

Within google drive, user asked to create 3 images. Then there will be 3 artifacts: Each artifact will be one image.

Part A: File content
Part B: DataPart: Format, File name, GDrive shareable URL.

Plan a trip:

Artifact: Flight booking.
- Part A: (TextPart) Summary of the flight, seats type etc.
- Part B: (TextPart) Directions on how to reach airport, how to hotel once your reach.
- Part C: (FilePart) PDF invoice of the flight booking.
- Part D: (DataPart) Ticket PNR, Time of flights, flight No, Seat No.
Artifact: Hotel booking.
- Part A: (TextPart) Summary of the hotel booking, room info.
- Part B: (TextPart) Directions on how to reach hotel.
- Part C: (FilePart) Voucher hotel booking.
- Part D: (DartPart) Checkin, Checkout, Breakfast, Lunch timing.

If we flatten out the parts into multiple artifacts, then we are asking the client to be intelligent to infer the grouping from flattened "artifacts". The point of A2A protocol is to provide structure, so that simpler clients can benefit from that Task structure. Even if the client is intelligent, the artifact generating agent is much more well-suited to align the parts within one artifact.
Ex. If there's simple app client for this booking agent. It would be much better if there's one artifact for each of the booking, instead of 3 artifacts just for flight booking: ["Flight/Summary", "Flight/Directions", "Flight/PDF", "Flight/TicketInfo"], ["Hotel/Summary", "Hotel/Voucher", "Hotel/Direction"]

This becomes worse, when there are multiple artifacts. Then all parts would need to be flattened as artifacts to be streamable in current spec.

To your point of that why not go for 3rd level of hierarchy, we are not proposing another level of hierarchy. Just support for part chunks to be streamable, and hence need a unique identifier to tie up the part-chunks. This identifier can be an part-ID, part-name or just simple part-index. From protocol perspective, these chunks are ephemeral and do not exist in synchronous API or the storage layer.

Why we should not reuse parts for streaming

The protocol dictates "parts" as structure of artifact, so they need to be stored as-is in the protocol synchronous APIs as well as the rest layer. We can not club them together as one part and store it.

If saving as-is, the chunks as parts, would cause blow-up of storage space needed as well as bandwidth for sync calls to getTask, get task history. From a chunk which can be streamed as string, will need to be mapped as part structure during storage {kind: "text", text: "some-chunk"}

Even the concatenation computational overhead, is fine for first time streaming, as the agent itself is generating the artifact parts on the fly. But once it has been fully generated, the overhead to concatenate them for each API call is wasteful.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support part chunk streaming #796

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support part chunk streaming #796

Uh oh!

Uh oh!

swapydapy Jun 26, 2025 Collaborator

Background

Need for Part chunk streaming

Design

Backward Compatibility

Replies: 5 comments

Uh oh!

mikeas1 Jun 27, 2025 Collaborator

Uh oh!

swapydapy Jun 28, 2025 Collaborator Author

Uh oh!

DracoBlue Jul 4, 2025

Uh oh!

mikeas1 Jul 11, 2025 Collaborator

Uh oh!

swapydapy Jul 29, 2025 Collaborator Author

Artifacts

Part

Why we should not reuse parts for streaming

swapydapy
Jun 26, 2025
Collaborator

mikeas1
Jun 27, 2025
Collaborator

swapydapy
Jun 28, 2025
Collaborator Author

DracoBlue
Jul 4, 2025

mikeas1
Jul 11, 2025
Collaborator

swapydapy
Jul 29, 2025
Collaborator Author