Skip to content

Improve reconnection reliability and surface disconnect events in chat#91

Open
zmanian wants to merge 1 commit intomcintyre94:mainfrom
zmanian:reconnect-reliability-clean
Open

Improve reconnection reliability and surface disconnect events in chat#91
zmanian wants to merge 1 commit intomcintyre94:mainfrom
zmanian:reconnect-reliability-clean

Conversation

@zmanian
Copy link
Copy Markdown
Contributor

@zmanian zmanian commented Mar 15, 2026

Summary

  • Server-side keepalive: Pass max_run_after_disconnect=300 on service creation so Claude keeps running for 5 minutes even if the iOS client disconnects (phone lock, background, network blip)
  • Fewer false timeouts: Increase idle timeout from 30s to 45s — requires 2+ missed heartbeats (every 20s) instead of just 1
  • More post-stop retries: Increase from 1 to 3 retries with exponential backoff (1s, 2s, 4s)
  • Exponential backoff on reconnect polling: Starts at 2s, doubles up to 30s max, resets when new events arrive
  • WebSocket ping: ExecSession sends ping every 15s to detect dead connections early instead of waiting for TCP timeout
  • Disconnect visibility in chat: New .systemNotice content type renders inline capsule banners showing "Connection lost — reconnecting...", "Reconnected — response complete", or "Reconnection gave up — response may be incomplete"
  • Claude disconnect awareness: When resuming after a failed reconnect, the next prompt is prefixed with context telling Claude its previous output may be incomplete

Test plan

  • Build succeeds
  • All 267 tests pass
  • Updated existing reconnect retry test to expect 4 attempts (was 2)
  • Updated content count assertions in replay/reconnect tests to account for new system notices
  • Manual: background app during long Claude response, verify "Connection lost" notice appears and reconnect recovers
  • Manual: verify system notices persist across app restart (reload chat from SwiftData)

🤖 Generated with Claude Code

@mcintyre94
Copy link
Copy Markdown
Owner

I'm hoping that #74 fixes a lot of the stability here in a different way. We now persist our long streaming connections instead of killing them when you leave the chat, and reconnect to them all when the app resumes from background. Hoping this is going to make it a lot more robust!

- Pass max_run_after_disconnect=300 on service creation so Claude keeps
  running for 5 minutes even if the client disconnects
- Increase idle timeout from 30s to 45s to avoid false timeouts during
  Claude thinking pauses (requires 2+ missed heartbeats)
- Increase post-stop retries from 1 to 3 with exponential backoff (1s, 2s, 4s)
- Add exponential backoff to reconnect polling (2s to 30s, resets on new events)
- Add WebSocket ping every 15s in ExecSession to detect dead connections
- Add .systemNotice content type to show disconnect/reconnect events inline
  in chat history (persisted across sessions)
- Prepend disconnect context to next --resume prompt so Claude knows its
  previous output may be incomplete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zmanian zmanian force-pushed the reconnect-reliability-clean branch from 7a7842e to f70d26d Compare March 15, 2026 19:53
@zmanian
Copy link
Copy Markdown
Contributor Author

zmanian commented Mar 16, 2026

so I think one thing that you might want to extract out of this is claude awareness of disconnects and other chat issues.

I've found that claude adapt how it's work in ways that also help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants