feat(agent-server): add token-level streaming support to agent server WebSocket #2751
feat(agent-server): add token-level streaming support to agent server WebSocket #2751VascoSch92 wants to merge 3 commits intomainfrom
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant solution that eliminates special cases
KEY INSIGHT: This properly reuses existing PubSub infrastructure without introducing complexity. The streaming events flow through the same path as regular events, and the opt-in nature (stream: true) means zero risk to existing deployments.
VERDICT: ✅ Worth merging - Core design is sound, addresses a real UX problem (responsive feedback vs. batch-oriented waiting).
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Elegant solution that reuses existing infrastructure
VERDICT: ✅ Worth merging - Core design is sound, addresses a real UX problem
KEY INSIGHT: This properly reuses existing PubSub infrastructure without introducing complexity. The streaming events flow through the same path as regular events, and the opt-in nature (stream: true) means zero risk to existing deployments.
Summary
StreamingDeltaEvent, a lightweight transient event carryingcontentandreasoning_contenttext chunks for real-time WebSocket deliverytoken_callbackshandler inEventService.start()that publishesStreamingDeltaEvents directly toPubSub, bypassing event persistencestream: true; no behavior change for existing clientsHow it works
The token callback extracts
content/reasoning_contentfromModelResponseStreamchunks and publishes them viaasyncio.run_coroutine_threadsafeto the existingPubSub. The_WebSocketSubscriberalready forwards anyEventsubclass, so no changes are needed toPubSub,sockets.py, orAsyncCallbackWrapper.Why this matters
The SDK already supports token-level streaming in standalone mode (
token_callbacksinConversation), but this capability was not exposed through the agent server. Without it, WebSocket clients — TUIs, CLIs, web UIs — must wait for the entire LLM response to complete before displaying anything. This change closes that gap: clients now receive text as it's generated, enabling responsive, typewriter-style output that makes the agent feel interactive rather than batch-oriented.Closes #2735
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:e3d5c3a-pythonRun
All tags pushed for this build
About Multi-Architecture Support
e3d5c3a-python) is a multi-arch manifest supporting both amd64 and arm64e3d5c3a-python-amd64) are also available if needed