Skip to content

feat(agent-server): add token-level streaming support to agent server WebSocket #2751

Open
VascoSch92 wants to merge 3 commits intomainfrom
vasco/streaming
Open

feat(agent-server): add token-level streaming support to agent server WebSocket #2751
VascoSch92 wants to merge 3 commits intomainfrom
vasco/streaming

Conversation

@VascoSch92
Copy link
Copy Markdown
Contributor

@VascoSch92 VascoSch92 commented Apr 8, 2026

Summary

  • Add StreamingDeltaEvent, a lightweight transient event carrying content and reasoning_content text chunks for real-time WebSocket delivery
  • Wire a token_callbacks handler in EventService.start() that publishes StreamingDeltaEvents directly to PubSub, bypassing event persistence
  • Streaming is only active when the LLM config has stream: true; no behavior change for existing clients

How it works

The token callback extracts content/reasoning_content from ModelResponseStream chunks and publishes them via asyncio.run_coroutine_threadsafe to the existing PubSub. The _WebSocketSubscriber already forwards any Event subclass, so no changes are needed to PubSub, sockets.py, or AsyncCallbackWrapper.

Why this matters

The SDK already supports token-level streaming in standalone mode (token_callbacks in Conversation), but this capability was not exposed through the agent server. Without it, WebSocket clients — TUIs, CLIs, web UIs — must wait for the entire LLM response to complete before displaying anything. This change closes that gap: clients now receive text as it's generated, enabling responsive, typewriter-style output that makes the agent feel interactive rather than batch-oriented.

Closes #2735

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:e3d5c3a-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-e3d5c3a-python \
  ghcr.io/openhands/agent-server:e3d5c3a-python

All tags pushed for this build

ghcr.io/openhands/agent-server:e3d5c3a-golang-amd64
ghcr.io/openhands/agent-server:e3d5c3a-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:e3d5c3a-golang-arm64
ghcr.io/openhands/agent-server:e3d5c3a-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:e3d5c3a-java-amd64
ghcr.io/openhands/agent-server:e3d5c3a-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:e3d5c3a-java-arm64
ghcr.io/openhands/agent-server:e3d5c3a-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:e3d5c3a-python-amd64
ghcr.io/openhands/agent-server:e3d5c3a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:e3d5c3a-python-arm64
ghcr.io/openhands/agent-server:e3d5c3a-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:e3d5c3a-golang
ghcr.io/openhands/agent-server:e3d5c3a-java
ghcr.io/openhands/agent-server:e3d5c3a-python

About Multi-Architecture Support

  • Each variant tag (e.g., e3d5c3a-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., e3d5c3a-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@VascoSch92 VascoSch92 requested a review from all-hands-bot April 8, 2026 05:43
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   event_service.py3529074%57–58, 76–78, 87–91, 94–97, 117, 221, 238, 279, 289, 313–314, 318, 326, 329, 369–370, 386, 388, 392–394, 398, 407–408, 410, 414, 420, 422, 452–457, 488, 492, 624, 626–627, 631, 645–647, 649, 653–656, 660–663, 671–674, 693–694, 696–703, 705–706, 715–716, 718–719, 726–727, 729–730, 734, 740, 749–750
TOTAL22613556875% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Elegant solution that eliminates special cases

KEY INSIGHT: This properly reuses existing PubSub infrastructure without introducing complexity. The streaming events flow through the same path as regular events, and the opt-in nature (stream: true) means zero risk to existing deployments.

VERDICT: ✅ Worth merging - Core design is sound, addresses a real UX problem (responsive feedback vs. batch-oriented waiting).

@VascoSch92 VascoSch92 marked this pull request as ready for review April 8, 2026 05:52
@VascoSch92 VascoSch92 requested review from rbren and tofarr April 8, 2026 05:52
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Elegant solution that reuses existing infrastructure

VERDICT: ✅ Worth merging - Core design is sound, addresses a real UX problem

KEY INSIGHT: This properly reuses existing PubSub infrastructure without introducing complexity. The streaming events flow through the same path as regular events, and the opt-in nature (stream: true) means zero risk to existing deployments.

⚠️ Note for maintainers: Per repo guidelines, I'm flagging this as potentially needing lightweight eval verification since it changes how LLM output is delivered (though agent decision-making is unchanged). If you want to be extra cautious, consider a quick benchmark run to confirm streaming delivery doesn't affect agent performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add token-level streaming support to agent server WebSocket

2 participants