Replies: 5 comments
-
|
My overall feeling is that this is just the Artifact wrapper by another name -- we already have the concept of an identifier for an Artifact, and the
I'm not so sure this is actually a subpar experience. Developers can choose the level of chunking they provide on the TextPart -- it doesn't need to be 2-3 word parts. I expect this will probably come directly from their LLM API of choice, rather than a decision they make directly. LLM APIs have effectively the same design, where one text block is streamed in parts. The primary difference here appears to be whether the resulting Part chunks are concatenated together in the stored Artifact or stored separately, which comes down to overhead of the Part wrapper. I don't think that overhead is so high that it means streaming large text is intractable.
It seems like both of these can be accomplished with the existing Artifact + Parts. You send multiple DataParts or FileParts with That said, I do think that there are some outstanding questions on how best to use Parts with Artifacts. Another interpretation of multiple TextParts in a single Artifact is that they are alternates, perhaps the same text but in different languages. It's not clear to me if that grouping is better than just adding multiple Artifacts, one per language (for example). If we see clear patterns or advantages for adding another layer of hierarchy to Parts, then I think this is a good option. It's a low-cost addition that's backwards compatible, which is ideal. |
Beta Was this translation helpful? Give feedback.
-
|
I say both Artifact, Part have their importance is providing structure to an agent response. Artifact purpose: A task might ask for multiple things from an agent. An artifact defines one of the resultant object to fulfill the overall goal. For e.g: Plan a trip to Finland and do the bookings for flight, hotel & activities. In this case, below would be the expected artifacts.
Part purpose: They provide semantic structure to the artifact. And allow each part to be contain one logical unit information for that overall artifact. Eg: Artifact: Flight booking.
Treating parts as chunk, and artifacts as wrapper over a result type (text, file or data), has two downsides:
Hence, the proposal for part level streaming. Even OpenAI has delta for text part, though a little convulated: https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/delta Streaming chunks is a method to send over partial data to the client, so that they don't have to wait for the complete task response. I see three levels benefits of streaming:
|
Beta Was this translation helpful? Give feedback.
-
|
I second @mikeas1's opinion here. The TaskArtifactUpdateEvent has
The Artifact has: artifactId, optional name and even optional description for everything related to the artifact. This combination IMHO works for even streaming a pdf artifact or image artifact. |
Beta Was this translation helpful? Give feedback.
-
|
Ok, so breaking it down, I think there are two components to what you're proposing:
I think my primary critique is on the first, but I'm not entirely sure the second is necessary. You can technically do the second without the first, you just identify a Part by its index in the parts array. That's how the OpenAI API does it. My concern with names on Parts is that it's not clear that it's actually more useful. I'm not clear on why Artifact["Flight booking"].Part["flight summary"] is better than Artifact["Flight Booking/Summary"]. Any organization you make out of named chunks in artifacts can be equivalently created with just artifacts -- you just flatten out the hierarchy. Giving chunks names just makes Artifacts a 2-level organizational hierarchy (level 1: the containing artifact, level 2: the named chunks within it). That raises the question of why 2 levels is the magic number -- why not 3? Artifact > Component > Chunk? You could say that Artifact["Flight Booking"] > Component["Directions"] > (Chunk"Directions to Airport", Chunk"Map to Airport") is a better organization. Why not arbitrary? Artifact > [Component > [Component]]?
This is true, but I don't know that it's a significant burden. You already need to do this when you're actively listening to the stream. You just do the same thing again when you retrieve it offline.
Ok, interesting, so it looks like I guess I'm just not sure that this actually simplifies anything. If you set aside the organization of parts within an artifact, the only functional difference here is whether we concat the |
Beta Was this translation helpful? Give feedback.
-
|
I think we need to agree on the meaning of each construct: Artifact, Part. ArtifactsThey denote one complete meaningful resultant entity. Ex:
PartThey individual section of the artifact which hold a complete meaning wihtin that artifact. That's why there are differnt types of Part supported: Text, File & Data (Dict). This allows an artifact to compose over multiple heterogenous Parts. Ex: Within google drive, user asked to create 3 images. Then there will be 3 artifacts: Each artifact will be one image.
Plan a trip:
If we flatten out the parts into multiple artifacts, then we are asking the client to be intelligent to infer the grouping from flattened "artifacts". The point of A2A protocol is to provide structure, so that simpler clients can benefit from that This becomes worse, when there are multiple artifacts. Then all parts would need to be flattened as artifacts to be streamable in current spec. To your point of that why not go for 3rd level of hierarchy, we are not proposing another level of hierarchy. Just support for part chunks to be streamable, and hence need a unique identifier to tie up the part-chunks. This identifier can be an part-ID, part-name or just simple part-index. From protocol perspective, these chunks are ephemeral and do not exist in synchronous API or the storage layer. Why we should not reuse parts for streamingThe protocol dictates "parts" as structure of artifact, so they need to be stored as-is in the protocol synchronous APIs as well as the rest layer. We can not club them together as one part and store it. If saving as-is, the chunks as parts, would cause blow-up of storage space needed as well as bandwidth for sync calls to getTask, get task history. From a chunk which can be streamed as Even the concatenation computational overhead, is fine for first time streaming, as the agent itself is generating the artifact parts on the fly. But once it has been fully generated, the overhead to concatenate them for each API call is wasteful. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
A2A protocol currently supports streaming task updates and result artifacts. Currently the artifacts can be streamed as follows:
But a
Partneeds to be sent as a whole in eachTaskArtifactUpdateEvent. The protocol doesn’t allow streaming the chunks of a part over multiple TaskArtifactUpdateEvents.Reason: Since there's no partId to uniquely identify a part, there's no way to identify chunks, belonging to the same-part, across
TaskArtifactUpdateEventstream updates.Need for Part chunk streaming
TextPart: A major streaming use case is for large text paragraphs. Within the current paradigm, the text result is one artifact and possibly just one text-part. But this would not allow it to be streamed. Unless the agent breaks the entire paragraph into 2-3 word parts. This is subpar experience, as it will create way too many part objects in the task artifact.
Inline image streaming: Current FilePart allows for sending images inline as bytes. Currently entire image bytes need to be sent at once. But with streaming, client can progressively render the image.
DataPart: Currently the user has to wait for the whole dictionary (key,value pairs) to be sent. While streaming keys of a dictionary is tricky and probably specific to how an agent does it, it still allows for quicker feedback for clients.
Example for text chunk streaming in current protocol:
Design
Introduce 2 new optional fields in Part object:
partIdto uniquely identify parts acrossTaskArtifactUpdateEventstream updates.lastChunkto indicate that no more chunk updates will be streamed by the agent.3. The client can safely assemble the streamed chunks for this partId and process them.
4. The benefit of lastChunk is that the client doesn’t need to wait for the next update event to be streamed.
Backward Compatibility
Both
partId&lastChunkfields are optional. IfpartIdis not provided, the client can treat each part-chunk as a new unique part.lastChunkis optional, since client infer this by:TaskArtifactUpdateEvent.Beta Was this translation helpful? Give feedback.
All reactions