|
| 1 | +# Design Prompt: Hot Module Replacement for Smithers Workflows |
| 2 | + |
| 3 | +## What is Smithers? |
| 4 | + |
| 5 | +Smithers is a **workflow orchestration engine built on React**. Not React-DOM for browsers — it uses a **custom React reconciler** (`react-reconciler`) to render a JSX component tree into an XML-like structure that describes tasks to execute. Tasks are dispatched to AI agents (Claude, Codex, Gemini, etc.) that run in parallel. |
| 6 | + |
| 7 | +**Key insight: Smithers IS React.** The workflow definition is a React component tree. The engine renders it with a real React reconciler. State lives in SQLite, not in the React fiber tree. |
| 8 | + |
| 9 | +### Architecture overview |
| 10 | + |
| 11 | +A user defines a workflow as a `.tsx` file that exports a React component tree: |
| 12 | + |
| 13 | +```tsx |
| 14 | +// plue/workflow/components/workflow.tsx — a real user workflow |
| 15 | +export default smithers((ctx) => { |
| 16 | + return ( |
| 17 | + <Workflow name="plue-slop-factory"> |
| 18 | + <SuperRalph |
| 19 | + ctx={ctx} |
| 20 | + focuses={focuses} |
| 21 | + agents={{ |
| 22 | + opus: { |
| 23 | + agent: new ClaudeCodeAgent({ |
| 24 | + model: "claude-opus-4-6", |
| 25 | + systemPrompt: PLANNING_PROMPT, // ← a string constant, frozen at import time |
| 26 | + }), |
| 27 | + }, |
| 28 | + codex: { |
| 29 | + agent: new CodexAgent({ |
| 30 | + model: "gpt-5.3-codex", |
| 31 | + systemPrompt: "Implement with TDD.", |
| 32 | + }), |
| 33 | + }, |
| 34 | + }} |
| 35 | + /> |
| 36 | + </Workflow> |
| 37 | + ); |
| 38 | +}); |
| 39 | +``` |
| 40 | + |
| 41 | +The workflow file imports things from other files — prompts, config, focus lists, etc.: |
| 42 | + |
| 43 | +```ts |
| 44 | +// These are all module-level constants, evaluated once at import() time |
| 45 | +import { focuses } from "./focuses"; // list of work categories |
| 46 | +import { getTarget } from "../targets"; // build/test commands, code style |
| 47 | +import { WORKFLOW_MAX_CONCURRENCY } from "../config"; // concurrency limit |
| 48 | + |
| 49 | +const PLANNING_PROMPT = `Plan and research. PRIORITY: ...`; // prompt text |
| 50 | +``` |
| 51 | + |
| 52 | +### The engine loop |
| 53 | + |
| 54 | +The engine loads the workflow module once, then runs a `while(true)` loop. Each iteration: |
| 55 | + |
| 56 | +1. Loads current state from SQLite (inputs, outputs, ralph iterations) |
| 57 | +2. **Re-renders the React component tree** by calling `workflow.build(ctx)` |
| 58 | +3. The custom reconciler diffs the tree and produces `TaskDescriptor[]` |
| 59 | +4. The scheduler determines which tasks are runnable |
| 60 | +5. Launches runnable tasks as agent subprocesses |
| 61 | +6. Waits for at least one task to finish (`await Promise.race(inflight)`) |
| 62 | +7. Loops back to step 1 |
| 63 | + |
| 64 | +Here is the actual engine loop (abbreviated): |
| 65 | + |
| 66 | +```ts |
| 67 | +// src/engine/index.ts — the main engine loop |
| 68 | +const renderer = new SmithersRenderer(); // custom React reconciler |
| 69 | + |
| 70 | +while (true) { |
| 71 | + // 1. Load current state |
| 72 | + const inputRow = await loadInput(db, inputTable, runId); |
| 73 | + const outputs = await loadOutputs(db, schema, runId); |
| 74 | + const ctx = buildContext({ runId, iteration, input: inputRow, outputs }); |
| 75 | + |
| 76 | + // 2. Re-render the React tree — this calls the user's workflow function |
| 77 | + const { xml, tasks, mountedTaskIds } = await renderer.render( |
| 78 | + workflow.build(ctx), // ← workflow.build is the user's (ctx) => <Workflow>...</Workflow> |
| 79 | + { ralphIterations, defaultIteration, baseRootDir: rootDir }, |
| 80 | + ); |
| 81 | + |
| 82 | + // 3-4. Build plan tree, compute task states, schedule |
| 83 | + const { plan, ralphs } = buildPlanTree(xml); |
| 84 | + const stateMap = await computeTaskStates(adapter, db, runId, tasks, ...); |
| 85 | + const schedule = scheduleTasks(plan, stateMap, descriptorMap, ralphState); |
| 86 | + const runnable = applyConcurrencyLimits(schedule.runnable, stateMap, maxConcurrency, tasks); |
| 87 | + |
| 88 | + if (runnable.length === 0 && inflight.size > 0) { |
| 89 | + // 6. Nothing new to launch — wait for an in-flight task to finish |
| 90 | + await Promise.race(inflight); |
| 91 | + continue; |
| 92 | + } |
| 93 | + |
| 94 | + // 5. Launch new tasks |
| 95 | + for (const task of runnable) { |
| 96 | + const p = executeTask(adapter, db, runId, task, ...).finally(() => inflight.delete(p)); |
| 97 | + inflight.add(p); |
| 98 | + } |
| 99 | + // 6. Wait for at least one to finish, then re-render |
| 100 | + await Promise.race(inflight); |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +### The custom React reconciler |
| 105 | + |
| 106 | +Smithers uses `react-reconciler` to render JSX into a host tree of `HostElement`/`HostText` nodes, then extracts `TaskDescriptor[]` from it: |
| 107 | + |
| 108 | +```ts |
| 109 | +// src/dom/renderer.ts |
| 110 | +import Reconciler from "react-reconciler"; |
| 111 | + |
| 112 | +const reconciler = Reconciler(hostConfig); // standard react-reconciler with custom host config |
| 113 | + |
| 114 | +export class SmithersRenderer { |
| 115 | + private container: HostContainer; |
| 116 | + private root: any; |
| 117 | + |
| 118 | + constructor() { |
| 119 | + this.container = { root: null }; |
| 120 | + this.root = reconciler.createContainer(this.container, 0, null, false, ...); |
| 121 | + } |
| 122 | + |
| 123 | + async render(element: React.ReactElement, opts?: ExtractOptions) { |
| 124 | + reconciler.updateContainerSync(element, this.root, null, () => {}); |
| 125 | + reconciler.flushSyncWork(); |
| 126 | + return extractFromHost(this.container.root, opts); // → { xml, tasks, mountedTaskIds } |
| 127 | + } |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +### The workflow type |
| 132 | + |
| 133 | +```ts |
| 134 | +// src/SmithersWorkflow.ts |
| 135 | +export type SmithersWorkflow<Schema> = { |
| 136 | + db: unknown; // Drizzle SQLite DB |
| 137 | + build: (ctx: SmithersCtx<Schema>) => React.ReactElement; // the render function |
| 138 | + opts: SmithersWorkflowOptions; |
| 139 | + schemaRegistry?: Map<string, SchemaRegistryEntry>; |
| 140 | + zodToKeyName?: Map<ZodObject<any>, string>; |
| 141 | +}; |
| 142 | +``` |
| 143 | + |
| 144 | +### How workflows are loaded today |
| 145 | + |
| 146 | +The CLI loads the workflow module exactly **once** via dynamic `import()`: |
| 147 | + |
| 148 | +```ts |
| 149 | +// src/cli/index.ts |
| 150 | +async function loadWorkflow(path: string): Promise<SmithersWorkflow<any>> { |
| 151 | + const abs = resolve(process.cwd(), path); |
| 152 | + const mod = await import(pathToFileURL(abs).href); // cached by Bun's module system forever |
| 153 | + if (!mod.default) throw new Error("Workflow must export default"); |
| 154 | + return mod.default as SmithersWorkflow<any>; |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +And the engine receives the workflow object, using `workflow.build` on every loop iteration: |
| 159 | + |
| 160 | +```ts |
| 161 | +// src/cli/index.ts |
| 162 | +const workflow = await loadWorkflow(workflowPath); // loaded once |
| 163 | +const result = await runWorkflow(workflow, { ... }); // passed into engine |
| 164 | +``` |
| 165 | + |
| 166 | +### Where state lives |
| 167 | + |
| 168 | +**All workflow state is in SQLite**, not in the React tree: |
| 169 | +- Runs, attempts, frames, nodes, ralph iterations → `_smithers_*` tables |
| 170 | +- Task outputs → user-defined Drizzle tables |
| 171 | +- The React fiber tree is discarded after each render and rebuilt from scratch |
| 172 | + |
| 173 | +This means there is **no React state to lose** during a hot reload. The reconciler is essentially stateless between renders (unlike browser React where component state lives in fibers). |
| 174 | + |
| 175 | +### How consumers run workflows |
| 176 | + |
| 177 | +A typical consumer (e.g., the `plue` project) has a runner script: |
| 178 | + |
| 179 | +```ts |
| 180 | +// plue/workflow/run.ts |
| 181 | +const smithersCli = findSmithersCli(); |
| 182 | +await $`bun run ${smithersCli} run components/workflow.tsx --root ${ROOT_DIR} --max-concurrency 16`; |
| 183 | +``` |
| 184 | + |
| 185 | +The dependency tree of a typical workflow looks like: |
| 186 | + |
| 187 | +``` |
| 188 | +components/workflow.tsx ← the root workflow component |
| 189 | + ├── ../smithers.ts ← createSmithers() call, DB setup |
| 190 | + ├── ../config.ts ← WORKFLOW_MAX_CONCURRENCY, TASK_RETRIES |
| 191 | + ├── ../targets.ts ← build commands, test commands, code style |
| 192 | + ├── ./focuses.ts ← list of work categories |
| 193 | + ├── ./focusDirs.ts ← directory mappings |
| 194 | + ├── ./focusTestSuites.ts ← test suite mappings |
| 195 | + ├── @smithers-orchestrator/super-ralph ← smithers package (stable, not user code) |
| 196 | + └── smithers-orchestrator ← smithers core (stable, not user code) |
| 197 | +``` |
| 198 | + |
| 199 | +The user frequently wants to change: |
| 200 | +- **Prompt strings** — the `PLANNING_PROMPT`, `TESTING_PROMPT` constants, or prompts that live in `.md`/`.mdx` files |
| 201 | +- **Focus lists** — adding/removing/reprioritizing work categories in `focuses.ts` |
| 202 | +- **Config values** — changing concurrency, retries in `config.ts` |
| 203 | +- **Agent configuration** — changing models, timeouts, adding/removing agents |
| 204 | +- **Component structure** — changing the JSX tree (adding tasks, reordering sequences) |
| 205 | + |
| 206 | +Today, **none of these changes take effect until the entire process is killed and restarted** (or the run finishes and a new one starts). This is because `import()` caches the module and all its dependencies permanently. |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## The Feature: Hot Module Replacement for Workflows |
| 211 | + |
| 212 | +### What we want |
| 213 | + |
| 214 | +When a user edits any file in their workflow's dependency tree and saves, the **running workflow should pick up the changes on the next render cycle** — without: |
| 215 | +- Restarting the process |
| 216 | +- Losing the current run state (which is in SQLite anyway) |
| 217 | +- Interrupting in-flight tasks (they continue with their old prompts; only newly-scheduled tasks use new code) |
| 218 | + |
| 219 | +This is exactly analogous to how **Vite + React Fast Refresh** works in a web app: |
| 220 | +- Vite watches files → detects change → invalidates module graph → sends updated module to browser |
| 221 | +- React Fast Refresh swaps component implementations in the fiber tree → reconciler re-renders → state preserved |
| 222 | + |
| 223 | +**Smithers already has the React side of this.** The reconciler re-renders every loop iteration. State lives in SQLite, not fibers. What's missing is the **Vite dev server equivalent** — the file watching and module invalidation layer. |
| 224 | + |
| 225 | +### How Vite/Bun HMR works (for reference) |
| 226 | + |
| 227 | +Vite's HMR (and Bun's `import.meta.hot` which is compatible): |
| 228 | + |
| 229 | +1. **File watcher** detects a change to `foo.ts` |
| 230 | +2. **Module graph** is walked to find the HMR boundary (the nearest module that calls `import.meta.hot.accept()`) |
| 231 | +3. The changed module (and anything between it and the boundary) is **re-evaluated** with a cache-busting query string (`?t=1234567890`) |
| 232 | +4. The `accept()` callback receives the new module and swaps the relevant references |
| 233 | +5. **React Fast Refresh** (a special case) automatically registers component updates so React can swap function implementations without losing state |
| 234 | + |
| 235 | +Bun supports this same `import.meta.hot` API in `Bun.serve()` with `development: true`. However, Smithers is a **CLI process**, not a `Bun.serve()` server. So we need to implement the equivalent mechanism ourselves. |
| 236 | + |
| 237 | +### Key design principle |
| 238 | + |
| 239 | +**This is a feature of the Smithers engine, not the consumer.** The consumer shouldn't need to write `import.meta.hot.accept()` calls or any HMR-aware code. They just write normal `.tsx` workflow files and edit them. The engine handles everything. |
| 240 | + |
| 241 | +### What needs to be designed |
| 242 | + |
| 243 | +1. **File watching**: How do we discover and watch the workflow file's dependency tree? Options include: |
| 244 | + - `fs.watch` / `fs.watchFile` on known files |
| 245 | + - Bun's built-in file watching |
| 246 | + - `Bun.build()` to get the dependency graph, then watch all files in it |
| 247 | + - Walking `import` statements manually |
| 248 | + - Watching entire directories |
| 249 | + |
| 250 | +2. **Module invalidation**: How do we make `import()` return fresh code? Options include: |
| 251 | + - Cache-busting query string: `import(path + '?t=' + Date.now())` (same technique Vite uses) |
| 252 | + - Bun-specific module cache APIs if they exist |
| 253 | + - Clearing `require.cache` (CJS only, may not work with ESM) |
| 254 | + |
| 255 | +3. **Workflow hot-swap**: What parts of the `SmithersWorkflow` object can/should be swapped? |
| 256 | + - `workflow.build` — definitely yes, this is the render function |
| 257 | + - `workflow.db` — probably NOT, the DB connection should be preserved |
| 258 | + - `workflow.schemaRegistry` / `workflow.zodToKeyName` — need to think about this; schema changes mid-run could be dangerous |
| 259 | + - `workflow.opts` — maybe, but carefully |
| 260 | + |
| 261 | +4. **Wake signal**: The engine loop currently blocks on `await Promise.race(inflight)` waiting for a task to finish. A file change should wake the loop immediately so it re-renders with the new code. This probably means adding a file-change promise to the `Promise.race` set. |
| 262 | + |
| 263 | +5. **Safety boundaries**: What changes are safe to hot-reload vs. what should trigger a warning or require a restart? |
| 264 | + - Safe: prompt text, config values, focus lists, agent config, JSX tree structure |
| 265 | + - Unsafe: DB schema changes, output table changes, input table changes |
| 266 | + - Edge case: changing a task's `id` — the scheduler uses node IDs to track state; changing an ID effectively creates a "new" task and orphans the old one |
| 267 | + |
| 268 | +6. **CLI interface**: How does the user opt in? |
| 269 | + - `smithers run workflow.tsx --hot` flag? |
| 270 | + - `smithers dev workflow.tsx` command (like Vite's `vite dev`)? |
| 271 | + - Always-on in development? |
| 272 | + |
| 273 | +7. **Consumer API surface**: Should there be any new APIs for consumers? |
| 274 | + - A way to read prompt files that are automatically watched? e.g., `useFile("./prompts/planning.md")` |
| 275 | + - Or is just re-importing the module enough? |
| 276 | + |
| 277 | +8. **Scope of watched files**: The workflow imports smithers-orchestrator and super-ralph packages. These are **library code** and should NOT be watched (just like Vite doesn't watch `node_modules/`). Only the user's workflow files should be watched. How do we distinguish? |
| 278 | + |
| 279 | +9. **Error handling**: What happens if the user saves a file with a syntax error? |
| 280 | + - The old workflow should keep running |
| 281 | + - The error should be reported (logged, shown in UI) |
| 282 | + - When the error is fixed and the file is saved again, the new code should be picked up |
| 283 | + |
| 284 | +10. **Events/observability**: Should HMR events be part of the event bus? |
| 285 | + - `WorkflowReloaded` event with changed files list? |
| 286 | + - `WorkflowReloadFailed` event with error? |
| 287 | + |
| 288 | +### Constraints |
| 289 | + |
| 290 | +- This runs in Bun (not Node.js) — leverage Bun-specific APIs where beneficial |
| 291 | +- The smithers engine code (`src/`) is the only thing that changes. Consumer workflow code should work as-is without modifications. |
| 292 | +- Must not affect production behavior. HMR should be opt-in or dev-only. |
| 293 | +- Must handle the case where the workflow file has side effects at module scope (e.g., `createSmithers()` creates a DB connection — we don't want to create a new DB connection on every reload) |
| 294 | + |
| 295 | +### Deliverable |
| 296 | + |
| 297 | +Please produce a detailed engineering design that covers: |
| 298 | + |
| 299 | +1. **Architecture**: How the file watching, module invalidation, and hot-swap mechanism work together. Include a diagram. |
| 300 | +2. **API design**: The CLI interface, any new `RunOptions`, any new consumer-facing APIs. |
| 301 | +3. **Implementation plan**: Which files in `src/` need to change and how. Be specific about the changes to the engine loop, CLI, and any new modules. |
| 302 | +4. **Safety model**: What changes are safe, what triggers warnings, what requires restart. |
| 303 | +5. **Edge cases**: Syntax errors, schema changes, side effects, race conditions between file changes and in-flight tasks. |
| 304 | +6. **Testing strategy**: How to test HMR behavior. |
0 commit comments