A2A-Go SDK distributed mode #139
yarolegovich
started this conversation in
Ideas
Replies: 1 comment
-
|
The proposal was implemented #115 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Objective
Make the SDK suitable for distributed deployments where multiple servers run behind a load-balancer. This means providing extension points which have enough information and controls for:
re #115
PoC
There are four PRs which implement the changes suggested in this proposal:
EventQueueuse for task update notifications (refactor: event queue semantics #120)TaskVersionconcept and extendedTaskStoreAPI (feat: introduce and integrate TaskVersion #127)WorkQueueabstraction (feat: introduce and integrate TaskVersion #127)Background
The current high levels interactions looks like the following (with extension points highlighted in green):

API
Solution
The proposal is to change this to the following:
Where
ExecutionFrontendandExecutionBackendare logical entities. Every process runs both a frontend which writes work to aWorkQueueand backend which invokesAgentExecutorafter reading work from the queue.WorkQueue
A new extension point responsible for addressing work ownership and concurrent request detection. Implementations might do work distribution by (taskID, payloadType), keep track of concurrent executions, reject duplicate messages (using IDs as a deduplication key), implement idempotency (using message hashing), implement heartbeats and retries supporting both push and pull models.
workqueue.Queueis the single dependency whicha2asrv.RequestHandlertakes on construction.ExecutorBackend when created registers itself as a handler:
Which happens when a
a2asrv.RequestHandleris created in the new “cluster mode”:Push-based
This API provides a high level of control and no overhead for integrations where things like heartbeats or load-based work distribution is already handled by the framework. A push-based work queue adapter takes only a writer as a dependency and returns a handler function which will be delegating to the handler (i.e.
ExecutorBackend) which registers itself with the queue:Pull-based queue
Pull-based API makes it possible to provide support for utilities related to work polling strategies, concurrency control, retry policies etc. A pull-based workqueue adapter takes a pull-able source as a dependency and starts a poller thread when the execution handler is registered:
TaskStore
TaskStore API will be extended to handle:
TaskVersionwhich can be used for optimistic concurrency control.Savewill take the event which triggered an update, so that it can be stored transactionally with the new task state.TaskVersionshould be used to detect when another update (eg. task cancellation) was applied concurrently with task execution. In this case execution should be immediately aborted.EventQueue
EventQueuegets repurposed by rearranging component connections. Instead of serving as a connector ofAgentExecutorand event processor, it will act as an event bus for already processed and applied updates. The interface will also be extended to work withTaskVersion.With the new API the task resubscription logic will be roughly:
AgentExecutor
In the next major SDK version update the suggestion is to change the abstraction to something independent of other extension points (i.e. EventQueue), for example:
Renaming
AgentExecutortoAgentAdapteris a change which subjectively will make the purpose of the extension point more clear.Migration
To make the existing implementation migration trivial we can export a
TaskVersionMissingtype which can be used returned and passed to all the changed methods:If implementations did not use versioning in any way they should continue working with the version placeholder object.
Beta Was this translation helpful? Give feedback.
All reactions