Skip to content

feat(kiloclaw): add KiloClawRegistry DO and complete instance-keyed identity migration#1706

Draft
pandemicsyn wants to merge 3 commits intomainfrom
florian/feat/registry-do-proxy-cutover
Draft

feat(kiloclaw): add KiloClawRegistry DO and complete instance-keyed identity migration#1706
pandemicsyn wants to merge 3 commits intomainfrom
florian/feat/registry-do-proxy-cutover

Conversation

@pandemicsyn
Copy link
Copy Markdown
Contributor

Summary

  • Adds the KiloClawRegistry Durable Object — a per-owner SQLite-backed index (via Drizzle ORM) that maps instance IDs to DO keys. Keyed by user:{userId} or org:{orgId}. Supports lazy migration from Postgres for legacy instances on first access.
  • All new provisions (personal and org) now create Instance DOs keyed by idFromName(instanceId) with ki_-prefixed sandboxIds. Legacy instances remain at idFromName(userId) and are backfilled into the registry via lazy migration.
  • The catch-all proxy reads sandboxId from the DO's getStatus() for gateway token derivation, instead of the middleware-derived value. This is critical: instance-keyed DOs derive sandboxId from instanceId, which differs from sandboxIdFromUserId(). Without this, all new provisions would have gateway token mismatches.
  • Threads instanceId end-to-end through every lifecycle caller: all ~30 platform routes, all ~30 internal client methods, all tRPC router methods, admin routers, controller heartbeat, and snapshot-restore queue. This makes the PR self-contained — no follow-up PR required before deploy.
  • Adds isInstanceKeyedSandboxId() and instanceIdFromSandboxId() to @kilocode/worker-utils/instance-id for reverse-mapping ki_ sandboxIds to instance UUIDs (used by controller heartbeat).
  • restoreFromPostgres accepts opts.sandboxId for precise multi-instance lookup instead of ambiguous getActiveInstance(db, userId).
  • ensureActiveInstance supports org instances with instance-keyed sandboxId derivation (sandboxIdFromInstanceId).

Verification

  • pnpm typecheck (kiloclaw worker) — pass
  • pnpm typecheck (root / Next.js) — pass
  • pnpm test (kiloclaw) — 48 files, 1125 tests, all passing
  • pnpm lint (kiloclaw) — 0 warnings, 0 errors
  • pnpm format:check — pass (pre-push hook)
  • Manual provision + proxy test against staging

Visual Changes

N/A

Reviewer Notes

  • Deviations from plan documented at ~/fd-plans/kiloclaw/multi-instance-deviations.md — 24 deviations logged with rationale, including: Drizzle instead of gastown raw SQL, all new provisions instance-keyed (scope expansion), catch-all proxy sandboxId from DO status, ownerKey-as-param pattern, and full lifecycle threading pulled from PR 3 into PR 2.
  • Registry operations are best-effort — provision and destroy succeed even if the registry is unavailable. The resolveRegistryEntry fallback to idFromName(userId) only helps legacy instances; for instance-keyed DOs it returns "not provisioned" until the registry recovers.
  • Lazy migration retry has a 60-second cooldown to avoid hammering Hyperdrive during outages.
  • Controller getStatus() call — the controller checkin now makes one extra DO RPC to resolve the real userId for PostHog attribution and instance-ready emails. This is on the checkin hot path (~every 60s per instance) but getStatus() is a lightweight in-memory read.
  • The ki_ prefix on sandboxIds is the discriminator between legacy (base64url) and instance-keyed identity. All gateway token derivation, controller routing, and metadata recovery depend on it.

…y through registry

Add the KiloClawRegistry Durable Object (SQLite-backed via Drizzle ORM)
that indexes instances per owner (user or org). Wire provision, destroy,
and catch-all proxy flows through the registry. Enable lazy migration of
legacy instances from Postgres on first access.

Key changes:
- KiloClawRegistry DO with listInstances, createInstance, destroyInstance,
  resolveDoKey, findInstancesForUser methods
- Lazy migration: reads legacy instance from Postgres via Hyperdrive on
  first listInstances() call, with 60s retry cooldown
- Catch-all proxy reads sandboxId from DO status (not middleware) for
  gateway token derivation — critical for instance-keyed DOs using ki_
  sandboxIds
- Registry create/destroy are best-effort (non-fatal errors)
- resolveRegistryEntry falls back to legacy idFromName(userId) on
  registry failure
- ensureActiveInstance supports org instances with instance-keyed
  sandboxId derivation
- restoreFromPostgres accepts opts.sandboxId for precise multi-instance
  lookup
- tRPC router threads instanceId to worker for all provisions/destroys
Complete the instance-keyed DO migration by threading instanceId
through every caller that resolves a KiloClawInstance DO stub:

Worker:
- All ~30 platform.ts routes now parse ?instanceId= and pass to
  instanceStubFactory (3-arg calls)
- controller.ts handles ki_ sandboxIds via isInstanceKeyedSandboxId
  to resolve the correct DO key
- Snapshot-restore queue message includes optional instanceId;
  consumer uses it as DO key when present

Internal client:
- All ~30 instance-scoped methods accept optional instanceId as
  last parameter, forwarded as ?instanceId= query param

Next.js callers:
- All tRPC router methods call getActiveInstance(userId) and pass
  instance?.id to internal client
- Admin router methods pass instance.id from DB lookups
- Billing cron + autoResumeIfSuspended already had instanceId
  (verified pre-existing)

New exports from @kilocode/worker-utils/instance-id:
- isInstanceKeyedSandboxId(sandboxId): boolean
- instanceIdFromSandboxId(sandboxId): string
The controller checkin route used instanceId as a placeholder for userId
when handling ki_ sandboxIds. This caused PostHog attribution and
instance-ready emails to silently fail for instance-keyed DOs.

Fix: call stub.getStatus() after auth to read the real userId from the
DO, which always stores it during provision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant