Skip to content

[Gastown] Design exploration: local-mode (compute on user's machine, data in cloud) #3190

@jrf0110

Description

@jrf0110

Summary

Explore shipping a "local gastown" mode where users run agents on their own machine while town SQL state continues to live in the Cloudflare Durable Object. The compute moves; the data plane is unchanged.

The architecture today is already shaped favourably for this: the cloud → container boundary is a single chokepoint (getTownContainerStub(env, townId).fetch(...) in services/gastown/src/dos/town/container-dispatch.ts), and the container itself has zero cloudflare:* imports — it's a Bun + Hono process that reads ~5 env vars at boot and calls back to the worker REST API for everything stateful. The container is pure compute; the data already lives 100% in the cloud.

This issue is exploratory. Not committing to either option, not committing to a timeline. Documenting the design space so we can decide whether to invest, and if so, in what shape.


Why this might be worth doing

  • Enterprise wedge: "your code, your machine, our control plane" — addresses the legitimate concern about source code touching cloud workers.
  • Debuggability: getting a shell into a Cloudflare container during a stuck bead is currently impossible. Local-mode means cd ~/.kilo/gastown/<townId>/<rigId>/... and you can inspect what the polecat is doing.
  • Latency: local repos, local LLM gateway hops, local git operations all get faster.
  • Resource flexibility: a user running on a 32-core M3 Max can throw way more parallel polecats at a town than a 1-CPU Cloudflare container slot.

The four real frictions

The architecture is clean but four things break a strict "drop-in replacement" mental model:

  1. Worker-initiated push. The cloud doesn't just respond to the container; it actively pushes (/refresh-token, /sync-config, dispatch). Any local mode needs a transport where the cloud can reach the laptop on demand. NAT-busting tunnel is mandatory.
  2. WebSocket upgrades. /ws, /agents/:id/stream, /agents/:id/pty/:ptyId/connect all need bidirectional streaming. The transport must preserve Upgrade: websocket end-to-end.
  3. Inbound auth doesn't exist. The control-server has no inbound auth today because Cloudflare's network isolation provides it. Exposing it via a tunnel requires adding inbound auth.
  4. Per-town singleton. idFromName(townId) assumes one container per town. Two laptops trying to run the same town locally need explicit arbitration.

None are showstoppers. All have known-good solutions. But they shape the design.


Option A — kilo gastown serve + WSS tunnel + DO broker

Shape

  • User runs kilo gastown serve and picks a town. CLI authenticates against kilo-pass, opens an outbound WSS to a new TunnelBrokerDO, flips town.runtime to 'local'.
  • CLI starts the existing services/gastown/container/src/ Bun process bound to 127.0.0.1:<rand>.
  • container-dispatch.ts gets one branch: if town.runtime === 'local', route the fetch through the broker DO instead of the container stub. Broker DO frames the fetch as a JSON envelope, sends it over the hibernating WS to the CLI, CLI proxies to localhost, response streams back.
  • The user keeps using the web dashboard at https://gastown.app/town/:id exactly as today. The CLI is just a daemon — it doesn't replace the UI. PTY streams, agent event websockets, log tails all terminate at the dashboard worker, traverse the broker DO's WS to the CLI, and proxy to the local control-server. Closer to how tailscale up or ngrok feel — start it, forget it, use the same product as before.

Tunnel mechanics decision

  • Cloudflare Tunnel (cloudflared): rejected. Adds a Cloudflare Zero Trust dependency that grows with the user base; per-session token provisioning is operationally painful at scale.
  • Plain WS to a dedicated broker DO: recommended. Hibernating WebSockets fit perfectly. Framing protocol is small (~4 message types: request / response / cancel / ping). The DO is a natural single-writer point that matches the "one town in local mode at a time" invariant.
  • Reuse Town.do: rejected. Conflates control-plane state with a hot data path; one bad WS frame stalls scheduling.

Auth

  • CLI → worker: reuse kilo-pass session token. POST /tunnel/claim {townId} returns a short-lived tunnelTicket JWT, presented on WS upgrade.
  • Worker → local control-server: existing GASTOWN_CONTAINER_TOKEN pattern survives unchanged. The token is minted by the worker, sent over the tunnel as an env var when the control-server boots, and presented on every callback. Symmetrically, requests into the control-server are framed by the broker DO and never hit the public internet — the localhost port is bound to 127.0.0.1, no spoofing surface.

Filesystem & UX

  • Workspace lives at ~/.kilo/gastown/<townId>/<rigId>/... — same layout as the container today, just rooted under the user's home. Make it user-visible — cd-able workspaces are a feature, not a bug.
  • CLI is a daemon. Optional --tui flag for status pane (tunnel state, RTT, agent list, log tail). Default is fire-and-forget.

Failure modes

  • Sleep / network loss: WS closes. Broker DO marks tunnel-disconnected with a 90s grace window. After grace expires, town flips back to "needs-dispatch" and beads re-dispatch to a cloud container or hold per town policy. Existing scheduler retry path handles in-flight work.
  • CLI restart: workspace persists on disk. Re-claim within grace window → agents resume via the existing bootHydration path.
  • Multi-machine: second connect rejected with 409 already_connected. Show a "take over" affordance.

Effort estimate

Workstream Size
WSS framing + reconnect + in-flight replay Large (the only large piece)
Broker DO + worker dispatch branch Medium
CLI UX + auth flow Small
Container-side changes None — Bun process runs unchanged

Total: ~1 engineer-quarter to a usable beta. No code-signing, no auto-update, no Docker support tickets.

Honest assessment

Smallest possible change that delivers the value. Reuses existing kilo-pass auth, container code unchanged. The CLI is a power-user dogfooding tool; mainstream adoption ceiling is modest. Strongest argument: it builds the WSS tunnel + broker DO, which is the same transport work Option B needs.


Option B — Tauri/Electron desktop app + local Docker

Shape

  • Desktop app (recommend Tauri over Electron: 10MB vs 150MB shell, Rust core sits naturally next to the bridge, better signing story).
  • Bundled "Local Agent Bridge" daemon owns Docker Engine API calls (bollard/dockerode), holds a hibernating WSS to a new TownContainerProxyDO, proxies each fetch envelope to a docker run-spawned container on localhost:<rand>.
  • The container image is identical to the Cloudflare one — pulled from a registry on first run.
  • Webview points at https://gastown.app/town/:id (cloud-hosted, unchanged) with a small JS bridge object exposing native affordances ("open in editor", "show in Finder", "reveal local container logs"). Zero rework on the Next.js side.

Container runtime decision

  • The image is already laptop-portableDockerfile is oven/bun:1-slim + apt packages + @kilocode/cli. No Cloudflare-specific runtime ties.
  • Runtime floor: Docker Desktop, Podman Desktop, OrbStack, or Colima. Use bollard (Rust) or dockerode (Node) and target the Docker Engine API directly. Don't shell out to docker.
  • Skipping the container (running kilo SDK on bare host) was considered and rejected: the image packages a curated dev environment (gh, ripgrep, jq, build-essential, default-jdk, libvips, ruby/python build deps, pinned Kilo CLI). Reproducibility is the whole point; bare-host execution loses it.

Data plane — worker still owns dispatch

The desktop app is a thin remote executor. The reconciler in Town.do.ts (convoy DAG traversal, dependency unblocking, dispatch alarms, idle-stop, mayor wake-up, escalation handling) stays cloud-side. Implementation: a town.runtime: 'cloud' | 'local' column in Town.do SQL, a single dispatch helper that returns either TownContainerDO stub or TownContainerProxyDO stub. Everything downstream is unchanged.

Auth

  • Web auth: identical (webview loads the same Next.js app behind the same session cookie).
  • GitHub tokens: worker still owns git-token-service. When dispatching, the proxy DO sends a setEnvVar envelope over the WS; the bridge stores in-memory and injects on next docker run. Tokens transit the WSS; never pass through the user-visible webview.
  • Secrets at rest:
    • User's gastown session JWT → OS keychain (Keychain on macOS, Credential Manager on Windows, Secret Service / libsecret on Linux). Tauri has first-class plugins.
    • Bridge's WSS tunnel auth token → keychain.
    • Per-job GitHub tokens → never persisted, held in bridge process memory only.

Distribution & install

Item Detail
Platforms macOS (universal), Windows x64, Linux x64 (AppImage + .deb)
Code-signing Apple Developer ID + notarization ($99/yr); Windows EV cert ($300/yr, hardware token); Linux GPG sig
Auto-update tauri-plugin-updater against an S3/R2 manifest
Install footprint Tauri shell ~10–15MB; container image pull ~800MB–1.5GB on first run
Docker Desktop license Free for personal/small-business; paid plan required for >250 employees / >$10M revenue

Failure modes

  • Quit mid-bead: bridge sends goodbye frame → proxy DO marks in-flight agents failed with runtime_disconnected → reconciler reschedules. Same path as a CF container OOM today.
  • Laptop sleep: WSS dies, dispatcher retries with backoff. On wake, bridge reconnects. "Park beads as held" vs "fail fast" should be configurable per town.
  • Docker not running: first-run wizard checks the daemon. At runtime, bridge surfaces "Docker stopped" banner via the bridge object.
  • Multi-machine: second connect rejected with 409 already_connected. Don't try to load-balance across laptops.

Effort estimate

Workstream Size Notes
Desktop shell (Tauri) M
Container orchestration (Docker Engine API) M
Worker-side branching + TownContainerProxyDO + WS protocol L The gnarly bit. Hibernating WS, fetch envelope serialization, streaming bodies, /ws agent event sub-multiplex, reconnection with in-flight replay
Auth / secrets / token plumbing S–M
Cross-platform CI + signing + auto-update L Apple notarization, Windows EV cert in CI, three update feeds. Underestimated by every team that ships a desktop app for the first time
First-run UX (Docker check, image pull progress, login) M

Total: medium-large project, ~2–3 engineers × ~3 months to a usable beta, plus a long tail of platform-specific paper cuts.

Honest assessment

Strongest argument for: the container/worker split is already designed exactly the way you'd design it for this. Replacing the DO-to-container hop with a DO-to-WSS-to-laptop-to-Docker hop is a localized change, container image needs zero modifications. Real product wedge for enterprise teams.

Strongest argument against: cross-platform desktop distribution is an underestimated tax. The WSS tunnel layer is identical work to Option A — but you take on desktop CI on top.


Side-by-side

Option A (CLI + tunnel) Option B (Desktop + Docker)
Web dashboard Unchanged, used as-is Unchanged, embedded in webview
New transport (WS tunnel + broker DO) Required Required (same work)
Container image work Run unchanged Bun process Run unchanged Docker image
New product surface A CLI subcommand A desktop app + auto-updater + signing
UX Browser dashboard + background daemon (start it, forget it) Browser-in-webview + native menu bar + notifications + "Open in VS Code"
Workspace location ~/.kilo/gastown/<townId>/... ~/Library/Application Support/Gastown/towns/<id>/... (or equivalent)
Container isolation Bun process on host Docker container
Adoption ceiling Power users, dogfooding Mainstream
Effort to beta ~1 engineer-quarter ~2–3 engineers × 3 months
Long-tail cost Low High (cross-platform support burden)

Recommended sequencing

The hard part for either option is the same: build the WSS tunnel + broker DO + worker dispatch branch. Option B isn't 3× more transport work than Option A; it's the same transport work plus a desktop app product.

Suggested path: build A first as the proof-of-concept that exercises the tunnel + broker DO, then build B on top of the same transport. A Tauri shell over a working kilo gastown serve is most of B. This:

  • Gives a usable thing in a quarter
  • Ships the tunnel work in production (where it gets exercised by power users + dogfooding)
  • Defers the desktop investment until after the protocol shape is proven
  • Lets us decide the desktop-app investment based on real demand signal, not speculation

Or stop at A — if local-mode is primarily a power-user / dogfooding / debuggability tool, A is sufficient on its own.


Out of scope for this issue

  • Specific implementation tickets — this is the design-space exploration, not the build plan.
  • LLM provider routing in local mode (does inference go through the cloud LLM gateway, the user's own credits, or BYO API key?). Worth its own issue if either option moves forward.
  • Billing implications (compute moves to user's machine — does that change pricing? credits accrual? usage caps?). Worth a separate conversation.
  • Multi-tenant local-mode (one machine running multiple towns). Out of scope for v1; broker DO is per-town anyway.

Decision needed

Whether/when to invest, and in what shape:

  1. Build A only (CLI dogfooding tool, ship behind feature flag)
  2. Build A then B (staged, A in Q+1, B in Q+2 if signal supports)
  3. Build B directly (skip A, accept the desktop-distribution tax)
  4. Defer entirely (architecture is favourable; revisit when we have the cycles)

No urgency. Architecture stays favourable as long as we don't accidentally couple the container to Cloudflare-specific runtime APIs in future work.

References

  • Cloud → container boundary: services/gastown/src/dos/town/container-dispatch.ts (single chokepoint via getTownContainerStub(env, townId).fetch(...))
  • Container code: services/gastown/container/src/ (no cloudflare:* imports)
  • Container HTTP API: services/gastown/container/src/control-server.ts
  • Container env contract: ~5 vars at boot + X-Town-Config JSON header per request
  • Container callbacks: all worker REST endpoints under /api/towns/:townId/... in services/gastown/src/gastown.worker.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    gt:containerContainer management, agent processes, SDK, heartbeatgt:coreReconciler, state machine, bead lifecycle, convoy flow

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions