feat(kiloclaw): add KiloClawRegistry DO and complete instance-keyed identity migration#1706
Draft
pandemicsyn wants to merge 3 commits intomainfrom
Draft
feat(kiloclaw): add KiloClawRegistry DO and complete instance-keyed identity migration#1706pandemicsyn wants to merge 3 commits intomainfrom
pandemicsyn wants to merge 3 commits intomainfrom
Conversation
…y through registry Add the KiloClawRegistry Durable Object (SQLite-backed via Drizzle ORM) that indexes instances per owner (user or org). Wire provision, destroy, and catch-all proxy flows through the registry. Enable lazy migration of legacy instances from Postgres on first access. Key changes: - KiloClawRegistry DO with listInstances, createInstance, destroyInstance, resolveDoKey, findInstancesForUser methods - Lazy migration: reads legacy instance from Postgres via Hyperdrive on first listInstances() call, with 60s retry cooldown - Catch-all proxy reads sandboxId from DO status (not middleware) for gateway token derivation — critical for instance-keyed DOs using ki_ sandboxIds - Registry create/destroy are best-effort (non-fatal errors) - resolveRegistryEntry falls back to legacy idFromName(userId) on registry failure - ensureActiveInstance supports org instances with instance-keyed sandboxId derivation - restoreFromPostgres accepts opts.sandboxId for precise multi-instance lookup - tRPC router threads instanceId to worker for all provisions/destroys
Complete the instance-keyed DO migration by threading instanceId through every caller that resolves a KiloClawInstance DO stub: Worker: - All ~30 platform.ts routes now parse ?instanceId= and pass to instanceStubFactory (3-arg calls) - controller.ts handles ki_ sandboxIds via isInstanceKeyedSandboxId to resolve the correct DO key - Snapshot-restore queue message includes optional instanceId; consumer uses it as DO key when present Internal client: - All ~30 instance-scoped methods accept optional instanceId as last parameter, forwarded as ?instanceId= query param Next.js callers: - All tRPC router methods call getActiveInstance(userId) and pass instance?.id to internal client - Admin router methods pass instance.id from DB lookups - Billing cron + autoResumeIfSuspended already had instanceId (verified pre-existing) New exports from @kilocode/worker-utils/instance-id: - isInstanceKeyedSandboxId(sandboxId): boolean - instanceIdFromSandboxId(sandboxId): string
The controller checkin route used instanceId as a placeholder for userId when handling ki_ sandboxIds. This caused PostHog attribution and instance-ready emails to silently fail for instance-keyed DOs. Fix: call stub.getStatus() after auth to read the real userId from the DO, which always stores it during provision.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
KiloClawRegistryDurable Object — a per-owner SQLite-backed index (via Drizzle ORM) that maps instance IDs to DO keys. Keyed byuser:{userId}ororg:{orgId}. Supports lazy migration from Postgres for legacy instances on first access.idFromName(instanceId)withki_-prefixed sandboxIds. Legacy instances remain atidFromName(userId)and are backfilled into the registry via lazy migration.sandboxIdfrom the DO'sgetStatus()for gateway token derivation, instead of the middleware-derived value. This is critical: instance-keyed DOs derive sandboxId from instanceId, which differs fromsandboxIdFromUserId(). Without this, all new provisions would have gateway token mismatches.instanceIdend-to-end through every lifecycle caller: all ~30 platform routes, all ~30 internal client methods, all tRPC router methods, admin routers, controller heartbeat, and snapshot-restore queue. This makes the PR self-contained — no follow-up PR required before deploy.isInstanceKeyedSandboxId()andinstanceIdFromSandboxId()to@kilocode/worker-utils/instance-idfor reverse-mappingki_sandboxIds to instance UUIDs (used by controller heartbeat).restoreFromPostgresacceptsopts.sandboxIdfor precise multi-instance lookup instead of ambiguousgetActiveInstance(db, userId).ensureActiveInstancesupports org instances with instance-keyed sandboxId derivation (sandboxIdFromInstanceId).Verification
pnpm typecheck(kiloclaw worker) — passpnpm typecheck(root / Next.js) — passpnpm test(kiloclaw) — 48 files, 1125 tests, all passingpnpm lint(kiloclaw) — 0 warnings, 0 errorspnpm format:check— pass (pre-push hook)Visual Changes
N/A
Reviewer Notes
~/fd-plans/kiloclaw/multi-instance-deviations.md— 24 deviations logged with rationale, including: Drizzle instead of gastown raw SQL, all new provisions instance-keyed (scope expansion), catch-all proxy sandboxId from DO status, ownerKey-as-param pattern, and full lifecycle threading pulled from PR 3 into PR 2.resolveRegistryEntryfallback toidFromName(userId)only helps legacy instances; for instance-keyed DOs it returns "not provisioned" until the registry recovers.getStatus()call — the controller checkin now makes one extra DO RPC to resolve the real userId for PostHog attribution and instance-ready emails. This is on the checkin hot path (~every 60s per instance) butgetStatus()is a lightweight in-memory read.ki_prefix on sandboxIds is the discriminator between legacy (base64url) and instance-keyed identity. All gateway token derivation, controller routing, and metadata recovery depend on it.