Call ensure_thread_subscription in open_existing_thread_sync#53348
Closed
lukemarsden wants to merge 163 commits intozed-industries:mainfrom
Closed
Call ensure_thread_subscription in open_existing_thread_sync#53348lukemarsden wants to merge 163 commits intozed-industries:mainfrom
lukemarsden wants to merge 163 commits intozed-industries:mainfrom
Conversation
Ports all Helix-specific changes from the helixml/zed fork onto a clean upstream Zed checkout, dropping changes that upstream now covers natively: ## Ported changes: - External WebSocket sync crate (Zed <-> Helix bidirectional sync) - Enterprise TLS skip (ZED_HTTP_INSECURE_TLS env var) - UI/branding: hide sign-in, remove SaaS agent upsells, hide onboarding - Title bar: Helix WebSocket connection status indicator - Qwen Code: shell output formatting (markdown tables) - --allow-multiple-instances CLI flag - show_onboarding and auto_open_panel settings ## Dropped (upstream now covers): - acp_runtime local fork (upstream uses agent-client-protocol 0.9.4) - ACP session listing/resume (upstream has list_sessions/resume_session) - Anthropic token counting API rewrite - Claude Opus 4.5/4.6 model support (upstream has all models) ## Stats: - 725 lines modified in 17 upstream files (down from 18,700 in 84 files) - external_websocket_sync crate: ~5,500 lines (new addition) - All changes behind cfg feature gates where possible Co-Authored-By: Claude <noreply@anthropic.com>
- PROTOCOL_SPEC.md: Complete bidirectional WebSocket protocol documentation
covering Zed→Helix (SyncMessage) and Helix→Zed (ExternalAgentCommand)
message formats, readiness protocol, and authentication
- mock_helix_server.rs: Full mock Helix WebSocket server for testing with:
- Auth token validation
- Readiness protocol (queue until agent_ready)
- Multiple concurrent sessions
- Event recording and assertion helpers
- 33 comprehensive tests covering protocol serialization,
server lifecycle, command sending, and full protocol flow
- Fix upstream workspace test-support compilation issue
(RemoteConnectionOptions::Mock variant not covered)
- Mark 2 legacy env-dependent protocol tests as #[ignore]
Test results: 37 passed, 0 failed, 2 ignored
Co-Authored-By: Claude <noreply@anthropic.com>
- Dockerfile (Ubuntu 25.10): Builds Zed with WebSocket sync feature, includes xvfb, software Vulkan (lavapipe), Claude Code agent - run_e2e.sh: Orchestrates full E2E test flow: 1. Starts mock Helix WebSocket server (Python) 2. Starts Zed with WebSocket sync pointing at mock server 3. Mock server sends chat_message after agent_ready 4. Validates thread_created, message_added, message_completed events 5. Checks assistant response content Run with: docker build -t zed-ws-e2e -f e2e-test/Dockerfile ../.. && docker run --rm -e ANTHROPIC_API_KEY=... zed-ws-e2e Co-Authored-By: Claude <noreply@anthropic.com>
- Add .dockerignore to exclude target/ (15GB+) and .git/ from build context - Use BuildKit cache mounts matching Helix's Dockerfile.zed-build pattern - Fix runtime deps: add libasound2-dev, libxkbcommon-x11-0, correct VK ICD path - Add D-Bus session daemon (required by Zed GPU init notification path) - Set ZED_ALLOW_ROOT, ZED_ALLOW_EMULATED_GPU for container environment - Configure Zed via env vars (ZED_EXTERNAL_SYNC_ENABLED, ZED_HELIX_URL) - Write settings.json with agent.default_model for native agent - Fix mock server: only send chat_message once (prevent feedback loop) - Fix WebSocket path matching for query string in websockets 15.x Zed starts and renders headlessly, WebSocket protocol flow works end-to-end. Remaining: model selection race condition (auth async). Co-Authored-By: Claude <noreply@anthropic.com>
- Pre-authenticate language model providers in thread_service before creating NativeAgent connection, ensuring model list is populated - Explicitly select default model from settings in registry - Add agent_settings dependency for reading model configuration - Fix model ID in E2E test: use claude-sonnet-4-5-latest (not -4-5) - Add language_models.anthropic.api_url to E2E settings.json - Add ca-certificates to Dockerfile runtime stage for TLS Note: E2E test model auth works, but event forwarding (message_added, message_completed) from ThreadView needs debugging - events aren't reaching the WebSocket mock server in the new fork. Co-Authored-By: Claude <noreply@anthropic.com>
The thread display notification handler was a no-op placeholder. Threads created by thread_service (via WebSocket chat_message) were never subscribed to for AcpThread events, so Stopped/MessageAdded/ MessageCompleted events never reached the WebSocket. Fix: Subscribe to AcpThread events directly from the AgentPanel when a thread display notification arrives. On Stopped, collect assistant response content and send MessageAdded + MessageCompleted via WebSocket. E2E test now passes on both old and new forks with real Anthropic API. Co-Authored-By: Claude <noreply@anthropic.com>
Three-phase test covering what Helix spectasks need - a single session spanning multiple Zed threads: Phase 1: Basic thread creation (Helix sends chat_message, no thread ID) Phase 2: Follow-up on existing thread (same acp_thread_id) Phase 3: Thread transition (new chat_message without ID = new thread) Validates: - 2 unique threads created (phases 1 and 3) - Phase 2 follow-up uses same thread (no thread_created event) - All 3 completions arrive with correct thread/request IDs - Assistant responses are correct (4, 6, 20) Co-Authored-By: Claude <noreply@anthropic.com>
Only ignore target/, target-ubuntu25/, and .git/ — the previous extensive ignore list was inherited from upstream and excluded files that aren't present or relevant in our Docker builds. Co-Authored-By: Claude <noreply@anthropic.com>
Cargo.lock was out of sync — cargo needed to add the agent_settings crate dependency, causing --locked builds to fail in CI. Co-Authored-By: Claude <noreply@anthropic.com>
Two fixes for the rebased Zed fork: 1. ThreadDisplayNotification handler now calls open_thread() to create a proper AcpServerView, restoring the old fork's behavior of: - Auto-opening the agent panel on the active thread - Live UI updates as the thread progresses - Full WebSocket event forwarding (NewEntry, EntryUpdated, Stopped) The previous handler only subscribed to Stopped events without displaying the thread, so users saw nothing in the agent sidebar. 2. Set trust_all_worktrees: true in default settings to disable restricted mode, which was blocking MCP servers and project settings from loading automatically. Co-Authored-By: Claude <noreply@anthropic.com>
The previous open_thread() approach failed because it creates a new ACP connection that can't find threads created by thread_service (different NativeAgent instance). This is the same problem the old fork solved with AcpThreadView::from_existing_thread(). New approach for the rebased fork architecture: - Add AcpServerView::from_existing_thread() that wraps an existing Entity<AcpThread> directly, bypassing the connection/loading path - Creates EntryViewState, syncs existing entries, subscribes to thread events via handle_thread_event for both UI rendering and WebSocket event forwarding (NewEntry, EntryUpdated, Stopped, TitleUpdated, etc.) - Add HeadlessConnection no-op AgentConnection for the ConnectedServerState - ThreadDisplayNotification handler now creates AcpServerView via from_existing_thread and sets it as the active view Co-Authored-By: Claude <noreply@anthropic.com>
Add a query_ui_state WebSocket command that lets the E2E test verify the agent panel actually displays threads with live updates. This catches regressions in from_existing_thread/ThreadDisplayNotification that the protocol-only tests miss (WebSocket events flow from thread_service independently of the UI layer). Changes: - types.rs: UiStateResponse variant on SyncEvent - external_websocket_sync.rs: callback channel for UI state queries - websocket_sync.rs: query_ui_state command handler - agent_panel.rs: responds with active_view, thread_id, entry_count - run_e2e.sh: mock server queries UI state after each phase Co-Authored-By: Claude <noreply@anthropic.com>
…tion Root cause: The rebased Zed fork's NativeAgentConnection bridge emits AcpThreadEvent from cx.spawn() without a window context. GPUI's subscribe_in requires window context to deliver events, so the AcpServerView subscription silently dropped all assistant entry events. The old fork worked because AcpThreadView was the direct panel item (different subscription ownership pattern). Fix: Send message_added and message_completed WebSocket events directly from thread_service.rs after the send task completes, for both new threads and follow-up messages. This bypasses the GPUI subscription issue entirely and is more reliable since thread_service has direct access to the AcpThread entity. Also updated from_existing_thread comment to document why subscribe_in doesn't work for bridge events and that WebSocket forwarding is handled by thread_service. Co-Authored-By: Claude <noreply@anthropic.com>
Two fixes for the headless thread display: 1. Channel-based event forwarding: Use windowless cx.subscribe to catch AcpThreadEvent from the NativeAgentConnection bridge (which emits from cx.spawn without window context), forward through a tokio mpsc channel, and process in a cx.spawn_in task that has window context for entry syncing. subscribe_in silently dropped these events. 2. Focus panel ordering: Move workspace.focus_panel() AFTER set_active_view to prevent set_active() from creating a default thread when the view is Uninitialized. The default thread's session_id was being returned by UI state queries instead of the correct headless thread. E2E test now passes all protocol + UI state checks across all 3 phases. Co-Authored-By: Claude <noreply@anthropic.com>
- Add cx.subscribe() for streaming message_added events during LLM response (previously only sent at completion). Subscription fires for bridge events from cx.spawn() since it's windowless. - Set NativeAgentSessionList on AcpThreadHistory in ThreadDisplayNotification handler so headless threads appear in "View All" history. - Make NativeAgentSessionList::new() public for cross-crate access. - Dismiss OnboardingUpsell in ThreadDisplayNotification handler to prevent "Welcome to Zed AI" from showing on external thread display. - Add streaming validation to E2E test: verifies message_added events arrive BEFORE message_completed. Co-Authored-By: Claude <noreply@anthropic.com>
Wrap the hardcoded Claude Code, Codex CLI, and Gemini CLI menu items in the agent panel's "New Thread" dropdown with #[cfg(not(feature = "external_websocket_sync"))]. In Helix builds, these agents are managed externally via the platform, so they should not appear as selectable options in the Zed UI. Custom agents (like Qwen Code) still appear via the dynamic agent list. Co-Authored-By: Claude <noreply@anthropic.com>
Enable AcpBetaFeatureFlag for all users by overriding enabled_for_all() to return true. This enables the session/list, session/load, and session/resume ACP capabilities which are needed for Helix's external agent session management (Qwen Code session persistence across Zed restarts). Without this, these features are gated behind staff-only access in release builds, preventing session resume functionality. Co-Authored-By: Claude <noreply@anthropic.com>
Phase 4 tests sending a chat_message back to Thread A after Phase 3 switched display to Thread B. This catches the 'entity released' regression where sending to a non-visible thread fails. Fix: when a follow-up message targets a registered thread, notify AgentPanel to display it (via ThreadDisplayNotification) before sending the message. Also send ThreadLoadError back to Helix if the follow-up fails, so the error is visible to the user. Co-Authored-By: Claude <noreply@anthropic.com>
When load_session was called via WebSocket's load_thread_from_agent, the Rc<NativeAgentConnection> was consumed, dropping the only strong reference to Entity<NativeAgent>. The spawned task inside open_thread captures a WeakEntity, which then fails with "entity released" when trying to update it. This fix was originally in commit bc721cd on the old fork but was lost during the rebase to upstream Zed. Re-applying it: clone the Entity before spawning the async task so it stays alive until open_thread completes. Co-Authored-By: Claude <noreply@anthropic.com>
Add content_only() method to AssistantMessage that returns just the content without the "## Assistant\n\n" heading wrapper. Use it in all WebSocket sync paths in thread_service.rs so messages sent to Helix don't include the heading. This fix was on the old fork but was lost during the rebase. Co-Authored-By: Claude <noreply@anthropic.com>
WebSocket events (MessageAdded, MessageCompleted) were being sent from both thread_service.rs AND thread_view.rs, causing duplicate messages in the Helix Sessions chat. The thread_service.rs handlers are the canonical source (added in cc037db). Remove the duplicate sends from thread_view.rs, keeping only UI-specific events (UserCreatedThread, ThreadTitleChanged). Also removes the now-unused last_tool_call_status field that was only used for boundary-based WebSocket updates in thread_view.rs. Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive documentation of all Helix-specific changes, critical fixes that must be preserved during rebases, and rebase checklist. Co-Authored-By: Claude <noreply@anthropic.com>
The streaming EntryUpdated events already deliver complete content with cumulative updates using a consistent message_id. The final summary message_added used message_id="response" which the API treated as a new distinct message, causing it to append the entire response again. Co-Authored-By: Claude <noreply@anthropic.com>
Add a new WebSocket command type `simulate_user_input` that injects a message into a thread WITHOUT marking it as external-originated. This allows the NewEntry subscription to fire, syncing the user message back to Helix — testing the Zed → Helix direction which previously had zero E2E coverage. Changes: - Add `simulate_input` field to ThreadCreationRequest - Add `simulate_user_input` handler in websocket_sync.rs - Conditionally skip mark_external_originated_entry when simulate_input=true - Add Phase 5 to E2E test verifying bidirectional sync Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NewEntry subscriptions in create_new_thread_sync() and handle_follow_up_message() were temporary — dropped after each message exchange completed. This meant that when a user typed directly in Zed's agent panel after an initial exchange, no subscription was listening and the message never synced back to Helix. Fix: use .detach() instead of dropping subscriptions so they live as long as the thread entity. Track which threads have persistent subscriptions to avoid duplicates when follow-up messages arrive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Claude Code menu item outside the cfg(external_websocket_sync) gate so it appears in Helix builds. Codex and Gemini remain hidden. Claude Code works in Helix via the Claude subscription provider, which syncs credentials into the container for Claude Code ACP to use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ude Code Remove the session_modes.is_none() guard that prevented always_allow_tool_actions from working with external agents like Claude Code. The setting now auto-approves permission prompts for all agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The m_2025_11_25 migration removes "source" from context_servers at the JSON value level, but update_value_in_json_text can't remove keys from JSON text. This causes the deprecation banner to show forever since the migration detects changes but produces identical output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change should_be_following from false to true in AcpThreadView::new so the editor automatically follows agent activity by default. Spec-Ref: helix-specs@3e1016f73:001315_in-zed-lets-default-to
Adds a new setting `remote.suggest_dev_container` (default: true) that controls whether a notification is shown suggesting to open the project in a dev container when a .devcontainer directory is detected. When set to false, the notification is suppressed. This is useful for platforms like Helix that run Zed inside their own managed containers. Changes: - Added suggest_dev_container field to RemoteSettingsContent schema - Added check in suggest_on_worktree_updated() to respect the setting - Added suggest_dev_container to RemoteSettings struct - Added default value in default.json Spec-Ref: helix-specs@de1a8873b:001318_in-zed-do-not-offer-to
Add request_id to message_added events
Two fixes to prevent response_entries leaking across interactions: 1. Capture request_id when the turn starts (first assistant entry), use it for the Stopped flush and message_completed instead of the global THREAD_REQUEST_MAP. This prevents a race where a follow-up or interrupt message updates the global request_id before the old turn's Stopped handler runs, causing old entries to be tagged with the new request_id. 2. Only flush entries from the current turn (after the last UserMessage), not the entire thread history. Previous turns' entries are already persisted and don't need re-sending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uest-id Fix Stopped flush: turn-scoped request_id, only flush current turn
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@e70794aff:001547_when-using-the-claude
Spec-Ref: helix-specs@e70794aff:001547_when-using-the-claude
When using the Claude agent that you can launch from insi...
…cated streaming When tool calls arrived during streaming, the preceding text appeared truncated because the NewEntry handler sent the tool_call entry without flushing the text entry's pending throttled content first. The stale-pending flush only existed in throttled_send_message_added, which NewEntry bypasses. Changes: - Extract flush_stale_pending_for_thread() helper from inline stale-pending flush - Call it in NewEntry handler before sending, ensuring text content is complete - Bypass 100ms throttle for tool_call entries (infrequent, need prompt delivery) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@df2833ab1:001694_during-streaming-i-still
… stale pending The previous fix (flush_stale_pending_for_thread) was insufficient because the throttle buffer's pending_content is a snapshot from the last EntryUpdated event. When AcpThread::push_entry() calls flush_streaming_text() before emitting NewEntry, the final tokens get flushed into the Markdown entity but no EntryUpdated is emitted — so the pending snapshot is stale. Fix: in the NewEntry handler, re-read ALL preceding entries' current content directly from the thread model (which has the complete text after flush_streaming_text()) and send fresh message_added events for each. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@393a7da43:001694_during-streaming-i-still
Upstream PR zed-industries#51499 added a StreamingTextBuffer that rate-limits how many bytes get revealed into the Markdown entity per tick, creating a smooth typewriter animation. However, content_only(cx) reads markdown.source() which only returns revealed bytes. This means EntryUpdated -> WebSocket sync sends incomplete content, causing the Go accumulator's baseline to drift behind reality. When patches are computed from this stale baseline, the frontend sees backwards edits that truncate text before tool calls. Fix: each 16ms timer tick now drains the entire pending buffer instead of limiting to bytes_to_reveal_per_tick. The timer still batches (avoids per-character markdown.append calls), but no bytes are withheld. This ensures content_only(cx) always returns all received content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@b244b04d4:001694_during-streaming-i-still
…eaks The NewEntry handler re-sent ALL preceding entries (from index 0) when a new entry arrived, but this included entries from previous turns. The Go server added these old message_ids to the current interaction's response_entries, causing isolation violations in the store. Apply the same turn-scoping logic already used by the Stopped handler: find the last UserMessage and only re-send entries after it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Spec-Ref: helix-specs@b244b04d4:001694_during-streaming-i-still
…-streaming-i-still Spec-Ref: helix-specs@15bac144f:001694_during-streaming-i-still
…-still During streaming, I still see truncated sentences. The se...
The notify_thread_display handler in agent_panel creates a ConversationView
with the correct Agent type (e.g. Custom("claude-acp")), but didn't update
selected_agent_type. This caused serialization to save NativeAgent, and on
restart the restoration check went through the sqlite path (which fails for
ACP agents), preventing thread restoration and creating a rogue thread.
Also adds debug logging for:
- ensure_thread_subscription creation and firing (entity IDs)
- register_thread calls from ConversationView (entity IDs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes to fix the race condition on container restart: 1. Fix selected_agent_type not being updated when notify_thread_display creates a from_existing_thread ConversationView. This caused the panel to serialize as NativeAgent, and on restart the restoration went through the sqlite path (which fails for ACP agents like claude-acp), creating a rogue thread that raced with the server's open_thread. 2. Wait for WebSocket to connect before panel restoration starts. This ensures the agent_ready → open_thread protocol handshake can complete, and that thread loading is serialized (no parallel loads). 3. Add debug logging for ensure_thread_subscription creation/firing and register_thread calls (with entity IDs) to diagnose subscription issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…path Panel restoration via initial_state bypasses thread_service, so neither agent_ready nor ensure_thread_subscription was called. This meant: 1. Server waited 60s timeout before flushing the open_thread queue 2. No WebSocket sync subscription existed for the loaded thread Now initial_state's completion handler calls ensure_thread_subscription (made public) and send_agent_ready when resuming a thread. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…agent_ready 1. Share THREAD_LOAD_IN_PROGRESS lock: initial_state now acquires the same lock that open_existing_thread_sync uses, preventing parallel thread loads from panel restoration and the WebSocket open_thread command. Uses a drop guard for automatic cleanup on all exit paths. 2. Use agent.agent_id() instead of string matching for agent_ready name. The server uses agent_name only for logging/metadata, so the actual value doesn't matter as long as it's consistent. 3. Made ensure_thread_subscription and lock helpers public so conversation_view can call them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If the panel was closed or has no last_active_thread, nobody was sending agent_ready. The server would wait 60s before timing out and flushing the readiness queue. Now we send agent_ready immediately after the WebSocket wait completes when there's no thread to restore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After a Zed restart, open_existing_thread_sync loads the thread and sends agent_ready but never called ensure_thread_subscription. Without it, the thread's NewEntry / EntryUpdated / Stopped events had no listener, so message_added was never emitted back to Helix and the interaction stayed stuck in waiting forever. Mirrors the call already present in create_new_thread_sync (~line 1218). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
We require contributors to sign our Contributor License Agreement, and we don't have @lukemarsden on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'. |
Member
|
This diff has a lot of unnecessary changes in it that will need to be cleaned up before someone on the team can review it. |
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Problem
After a Zed restart, when Helix sends
open_threadfor an existing thread followed bychat_message, the interaction stays stuck inwaitingstate forever. Zed makes LLM calls and does work but never emitsmessage_addedevents back to Helix.Root Cause
open_existing_thread_syncloaded the thread and sentagent_readybut never calledensure_thread_subscription. Without that call the thread'sNewEntry/EntryUpdated/Stoppedevents have no listener, so nomessage_addedevents are ever emitted back to Helix.create_new_thread_syncalways calledensure_thread_subscription(correctly). The slow path inopen_existing_thread_sync— used when the thread is not in the in-memory registry after a restart — was missing it.Why Phase 12 E2E Test Didn't Catch This
Phase 12 tests the restart-reconnect flow but was passing via a different code path: after
f3a2622736(2026-04-06), Zed's panel restoration reruns on restart and restores spec task threads, callingensure_thread_subscriptionvia that path. Theopen_threadcommand then hits the fast path (get_thread()returns Some, already subscribed). Production spec-task sessions don't go through panel restoration, so they always hit the slow path.Fix
Add the missing
ensure_thread_subscriptioncall inopen_existing_thread_syncbeforesend_agent_ready, mirroringcreate_new_thread_sync.Test plan
E2E_AGENTS="zed-agent,claude" ./run_docker_e2e.sh🤖 Generated with Claude Code