A2A: async task dispatch with SSE streaming and progress polling#1066
Open
pbranchu wants to merge 17 commits intoRightNow-AI:mainfrom
Open
A2A: async task dispatch with SSE streaming and progress polling#1066pbranchu wants to merge 17 commits intoRightNow-AI:mainfrom
pbranchu wants to merge 17 commits intoRightNow-AI:mainfrom
Conversation
jaberjaber23
requested changes
Apr 17, 2026
Member
jaberjaber23
left a comment
There was a problem hiding this comment.
Thanks for the A2A async dispatch work @pbranchu. The direction (SSE-streaming task dispatch + progress polling) is useful, but there's work to do before merge.
Blockers
- Tool description contradicts the code.
a2a_sendtool description says "quick tasks expected to complete in <30s" while the client default timeout is 300s and the streaming path has no per-request timeout. Pick one story — either (a) enforce <30s and bail otherwise, or (b) update the description to match a longer timeout and document backpressure behavior. std::sync::Mutexinside async paths.A2A_TASK_PROGRESSusesArc<Mutex<String>>held across awaits / locked from sync tool paths. Under contention this blocks the executor. Replace withtokio::sync::Mutexor anArcSwap<String>if the payload is immutable per-write.- Global
ASYNC_TASKS/A2A_TASK_PROGRESSwith no TTL, eviction, or size cap. Process-wideDashMapmeans unbounded growth on long-running agents. Add a max-entry cap, a TTL, or per-session scoping. - No tests in the diff. SSE parsing has a lot of edge cases (chunk boundaries mid-event, missing
final, malformed JSON, server disconnect). Please add unit tests for at least: normal completion, disconnect mid-stream, malformed JSON event, final-event absent. - CI is red across the board. Check / Test / Clippy / Security Audit all failing. Must be green before merge.
Concerns
- If the spawned task panics before the cleanup
remove()calls run, theASYNC_TASKSandA2A_TASK_PROGRESSentries leak. Wrap the async block in a guard (e.g.,scopeguard::defer) so cleanup always runs. - Interaction with #1054: both touch the channel-bridge dispatch ordering. After #1054's
set_channel_contextis merged, confirmthread_idin the captured context is the thread created by smart-thread, not the parent channel. - Remote task output is fed back through
inject_async_callback. That content is untrusted from the remote agent's perspective — treat it as a prompt-injection source. Sanitize or at minimum tag it in the agent history. agent_urlgoes through an SSRF check — good. Confirmcheck_ssrfhere uses the same canonical implementation as #1060 (now merged).
Recommendation
Fix the blockers, add tests, rebase, and re-request review. Happy to help nail down the Mutex/TTL design if useful.
Claude Code tasks (file edits, test runs, multi-step implementation) easily exceed the previous 30-second limit, causing spurious connection errors when Grace delegates work via tasks/send. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ubscribe
Replaces the synchronous `tasks/send` call in `tool_a2a_send` with a new
`send_task_streaming` method that consumes the `tasks/sendSubscribe` SSE
stream, accumulating text chunks until the server emits `"final": true`.
This eliminates the 300 s hard timeout and unblocks the executor during
long-running Claude Code delegations.
Closes pbranchu/openfang-1#15
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…c_callback, bridge context capture - Add `channel_contexts: DashMap<AgentId, ChannelCallbackContext>` field to `OpenFangKernel` - Implement `get_channel_context`, `set_channel_context`, and `inject_async_callback` in `impl KernelHandle for OpenFangKernel`; inject sends the callback message to the agent, then delivers the agent response to the originating channel via `send_channel_message` - Add `set_channel_context` default method to `ChannelBridgeHandle` trait (no-op default) - Override it in `KernelBridgeAdapter` to call through to the kernel - Call `set_channel_context` in `dispatch_message` (bridge.rs) just before dispatching to the agent so every inbound channel message captures its reply context Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add ChannelCallbackContext to openfang-types - Add get_channel_context / inject_async_callback to KernelHandle trait - Add ASYNC_TASKS + A2A_TASK_PROGRESS statics to tool_runner - Implement tool_a2a_send_async: fires SSE task in background, accumulates progress chunks, delivers final result via inject_async_callback - Implement tool_a2a_check_task: polls live accumulated progress buffer - Implement tool_a2a_cancel_task: aborts JoinHandle, cleans both maps - Add send_task_streaming_with_progress to A2aClient: streams SSE and appends agent text chunks to shared Arc<Mutex<String>> progress buffer - Register all three tools in the dispatch match and as ToolDefinitions
…ed "claude-code" Co-Authored-By: Philippe Branchu <philippe@branchu.com>
- tool_runner: pass missing &[] allowed_hosts arg to check_ssrf in a2a_send_async - copilot: remove unused DEVICE_FLOW_POLL_INTERVAL constant, fetched_at field, refresh_token_expires_in field, and prompt_line function; change &PathBuf params to &Path; remove unnecessary i64 cast - openai: replace map_or(false, ..) with is_some_and(..) - subprocess_sandbox: replace iter().any() with slice::contains() - api/ws: replace redundant closure with function reference - kernel: remove unnecessary to_path_buf() call after &Path signature fix - cli/init_wizard: collapse nested if into single condition
…ed tool turns, strip @default from agent IDs - agent_loop.rs: log LLM response text (500 chars), tool call name+input (300 chars), and tool result content+error at INFO level (was debug or missing) - tool_runner.rs: strip @<suffix> from agent_id in tool_agent_send to handle hallucinated @default qualifier - session_repair.rs: add prune_failed_tool_turns; called from both EndTurn save paths so failed tool call+result pairs never persist to session history
- RwLock: replace Arc<Mutex<String>> with Arc<RwLock<String>> for
A2A_TASK_PROGRESS — eliminates exclusive lock in read-heavy path
- SSE parsing: extract pure parse_sse_data_line fn + SseLineOutcome enum;
refactor both send_task_streaming and send_task_streaming_with_progress
to use it; add timeout: Option<Duration> to both; 8 unit tests covering
all outcomes (final, update, empty, malformed, error, unknown)
- Bounded maps: add MAX_CONCURRENT_ASYNC_TASKS=256 cap with clear error;
AsyncTaskEntry{handle,inserted_at} tracks task age; OnceLock-based
background sweep (10 min interval, 2 h TTL) aborts stale handles;
TaskCleanupGuard RAII ensures cleanup on panic
- Prompt injection: rewrite inject_async_callback to deliver async results
via structural ToolUse+ToolResult pair instead of text framing; remote
agent content lands in a ToolResult block where LLM API semantics enforce
the data boundary; add prepend_turns: Option<Vec<Message>> to both
run_agent_loop and run_agent_loop_streaming so the synthetic ToolUse is
inserted AFTER validate_and_repair (prevents orphan removal of the
ToolResult user turn)
- Strip @default suffix from agent IDs in tool_agent_send (pre-existing fix)
- Update test to correctly verify prune_failed_tool_turns behavior
SSRF (RightNow-AI#1060): already using canonical crate::web_fetch::check_ssrf
Thread context (RightNow-AI#1054): context.thread_id passed through; will work
correctly once smart-thread sets it
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…0099 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orkspace-wide Pre-existing lint violations in CLI TUI keyboard handlers, channels, api, and kernel. All triggered by Rust 1.95 collapsible_match enforcement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
All reviewer concerns from the initial review have been addressed: Blockers resolved:
Other concerns addressed:
All 11 CI jobs passing (Clippy, Format, Check, Test on all platforms, Security Audit, Secrets Scan). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Motivation
Blocking A2A times out on any task taking more than a few seconds. Agents delegating to external coding CLIs need fire-and-forget dispatch with live polling and automatic result delivery. The SSE upgrade also makes synchronous `a2a_send` stream progressively rather than blocking until completion.
Changes from review
Blockers addressed
Timeout story clarified. `send_task_streaming` now takes `timeout: Option`. The sync `a2a_send` path passes `Some(300s)` and the tool description documents that. The async path passes `None` — fire-and-forget tasks have no per-request timeout by design.
`Mutex` replaced with `RwLock`. `A2A_TASK_PROGRESS` now uses `Arc<RwLock>`. The async write path takes a write lock only for final/incremental appends; `a2a_check_task` takes a read lock. No lock held across awaits.
TTL, eviction cap, and RAII cleanup added.
Unit tests added. Extracted `parse_sse_data_line` as a pure function with `SseLineOutcome` enum (`Skip | Update(A2aTask) | Final(A2aTask)`). 8 unit tests cover: empty/whitespace lines, final event, intermediate update, explicit `final: false`, malformed JSON, server error event, unknown structure, and result-not-a-task.
CI is green. All Check / Test / Clippy / Format / Security Audit jobs passing.
Concerns addressed
Test plan