Skip to content

Commit c7d7ecd

Browse files
feat: hot module replacement for workflows
Add --hot flag to smithers run/resume that watches workflow source files and reloads on save without restarting the process or losing run state. Architecture: - Generation overlay system: copies source tree to fresh URLs so all transitive deps are re-evaluated (no bundler needed) - createSmithers hot cache: prevents duplicate DB connections and preserves Zod schema reference identity across reloads - Engine wake signal: file changes wake the loop immediately via Promise.race, even while tasks are in-flight - Safety model: schema changes are blocked; in-flight tasks continue with their original code New files: - src/hot/watch.ts — recursive dir watcher with debounce - src/hot/overlay.ts — generation overlay builder (hardlink/copy) - src/hot/HotWorkflowController.ts — reload lifecycle manager Modified: - src/engine/index.ts — workflowRef swap, wake signal, skip unmount-cancel in hot mode - src/create.ts — module-local hot cache keyed by dbPath - src/RunOptions.ts — HotReloadOptions type - src/SmithersEvent.ts — 4 reload event types - src/cli/index.ts — --hot flag, SMITHERS_HOT env, progress logging Docs: - README.md — feature highlight - docs/guides/hot-reload.mdx — comprehensive guide - docs/cli/overview.mdx — --hot in options tables - docs/runtime/events.mdx — reload events documented Amp-Thread-ID: https://ampcode.com/threads/T-019c90c4-b379-7205-865d-a85d57095186 Co-authored-by: Amp <amp@ampcode.com>
1 parent a2a62dd commit c7d7ecd

File tree

15 files changed

+1224
-27
lines changed

15 files changed

+1224
-27
lines changed

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* Re-renders the workflow after each step
1212
* Resumes exactly where it left off after crashes
1313
* Supports subscriptions
14+
* Hot-reloads workflow code on file save (prompts, config, components) without restarting
1415

1516
There is no hidden in-memory state. Every task result is stored as:
1617

@@ -226,6 +227,36 @@ smithers list workflow.tsx
226227
smithers approve workflow.tsx --run-id abc123 --node-id review
227228
```
228229

230+
## Hot Module Replacement
231+
232+
Edit your workflow files while a run is executing. Smithers watches your source tree and hot-reloads changes on save — prompts, config, agent settings, and component structure — without restarting the process or losing run state.
233+
234+
```bash
235+
smithers run workflow.tsx --hot
236+
```
237+
238+
In-flight tasks continue with their original code. Only newly scheduled tasks pick up the changes.
239+
240+
```
241+
[00:05:12] ⟳ File change detected: 1 file(s)
242+
[00:05:12] ⟳ Workflow reloaded (generation 1)
243+
[00:05:13] → implement-cat-12 (attempt 1, iteration 0)
244+
```
245+
246+
**What you can change live:**
247+
248+
* Prompt strings and `.md`/`.mdx` prompt files
249+
* Focus lists, config values, concurrency settings
250+
* Agent models, timeouts, system prompts
251+
* JSX tree structure (add/remove/reorder tasks)
252+
253+
**What requires a restart:**
254+
255+
* Output schema changes (Zod shapes)
256+
* Database path changes
257+
258+
See the [Hot Reload Guide](/guides/hot-reload) for details.
259+
229260
---
230261

231262
## Built-in Tools

docs/cli/overview.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ smithers run <workflow.tsx> [options]
3030
| `--allow-network` | Allow the built-in `bash` tool to make network requests. **Note:** This flag only applies to Smithers' built-in tools. CLI agent wrappers (`ClaudeCodeAgent`, `CodexAgent`, `GeminiAgent`) manage their own sandboxing independently -- use the agent's own sandbox options (e.g. `sandbox: "read-only"` for CodexAgent) to control their network access. |
3131
| `--max-output-bytes N` | Maximum bytes a single tool call can return. Default: `200000`. |
3232
| `--tool-timeout-ms N` | Maximum wall-clock time per tool call in milliseconds. Default: `60000`. |
33+
| `--hot` | Enable hot module replacement. Watches workflow source files and reloads on change without restarting. See [Hot Reload](/guides/hot-reload). |
3334

3435
**Example:**
3536

@@ -49,7 +50,7 @@ Resume a paused or crashed run. Smithers reloads persisted state from SQLite and
4950
smithers resume <workflow.tsx> --run-id ID [options]
5051
```
5152

52-
Accepts the same options as `run` (except `--input`, which is loaded from the database).
53+
Accepts the same options as `run` (except `--input`, which is loaded from the database). Use `--hot` to enable hot reload during resumed runs.
5354

5455
**Example:**
5556

@@ -289,6 +290,7 @@ These options apply to `run` and `resume` commands:
289290
| `--max-concurrency N` | Maximum number of tasks running in parallel. | `4` |
290291
| `--input JSON` | Input data as a JSON string (for `run` only). | `{}` |
291292
| `--run-id ID` | Explicit run identifier. | Auto-generated |
293+
| `--hot` | Enable [hot module replacement](/guides/hot-reload). Watches workflow files and reloads on save. In-flight tasks continue; new tasks use updated code. | Disabled |
292294
| `--version`, `-v` | Print version and exit. | -- |
293295
| `--help`, `-h` | Print help and exit. | -- |
294296

docs/design-prompts/hmr-design.md

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# Design Prompt: Hot Module Replacement for Smithers Workflows
2+
3+
## What is Smithers?
4+
5+
Smithers is a **workflow orchestration engine built on React**. Not React-DOM for browsers — it uses a **custom React reconciler** (`react-reconciler`) to render a JSX component tree into an XML-like structure that describes tasks to execute. Tasks are dispatched to AI agents (Claude, Codex, Gemini, etc.) that run in parallel.
6+
7+
**Key insight: Smithers IS React.** The workflow definition is a React component tree. The engine renders it with a real React reconciler. State lives in SQLite, not in the React fiber tree.
8+
9+
### Architecture overview
10+
11+
A user defines a workflow as a `.tsx` file that exports a React component tree:
12+
13+
```tsx
14+
// plue/workflow/components/workflow.tsx — a real user workflow
15+
export default smithers((ctx) => {
16+
return (
17+
<Workflow name="plue-slop-factory">
18+
<SuperRalph
19+
ctx={ctx}
20+
focuses={focuses}
21+
agents={{
22+
opus: {
23+
agent: new ClaudeCodeAgent({
24+
model: "claude-opus-4-6",
25+
systemPrompt: PLANNING_PROMPT, // ← a string constant, frozen at import time
26+
}),
27+
},
28+
codex: {
29+
agent: new CodexAgent({
30+
model: "gpt-5.3-codex",
31+
systemPrompt: "Implement with TDD.",
32+
}),
33+
},
34+
}}
35+
/>
36+
</Workflow>
37+
);
38+
});
39+
```
40+
41+
The workflow file imports things from other files — prompts, config, focus lists, etc.:
42+
43+
```ts
44+
// These are all module-level constants, evaluated once at import() time
45+
import { focuses } from "./focuses"; // list of work categories
46+
import { getTarget } from "../targets"; // build/test commands, code style
47+
import { WORKFLOW_MAX_CONCURRENCY } from "../config"; // concurrency limit
48+
49+
const PLANNING_PROMPT = `Plan and research. PRIORITY: ...`; // prompt text
50+
```
51+
52+
### The engine loop
53+
54+
The engine loads the workflow module once, then runs a `while(true)` loop. Each iteration:
55+
56+
1. Loads current state from SQLite (inputs, outputs, ralph iterations)
57+
2. **Re-renders the React component tree** by calling `workflow.build(ctx)`
58+
3. The custom reconciler diffs the tree and produces `TaskDescriptor[]`
59+
4. The scheduler determines which tasks are runnable
60+
5. Launches runnable tasks as agent subprocesses
61+
6. Waits for at least one task to finish (`await Promise.race(inflight)`)
62+
7. Loops back to step 1
63+
64+
Here is the actual engine loop (abbreviated):
65+
66+
```ts
67+
// src/engine/index.ts — the main engine loop
68+
const renderer = new SmithersRenderer(); // custom React reconciler
69+
70+
while (true) {
71+
// 1. Load current state
72+
const inputRow = await loadInput(db, inputTable, runId);
73+
const outputs = await loadOutputs(db, schema, runId);
74+
const ctx = buildContext({ runId, iteration, input: inputRow, outputs });
75+
76+
// 2. Re-render the React tree — this calls the user's workflow function
77+
const { xml, tasks, mountedTaskIds } = await renderer.render(
78+
workflow.build(ctx), // ← workflow.build is the user's (ctx) => <Workflow>...</Workflow>
79+
{ ralphIterations, defaultIteration, baseRootDir: rootDir },
80+
);
81+
82+
// 3-4. Build plan tree, compute task states, schedule
83+
const { plan, ralphs } = buildPlanTree(xml);
84+
const stateMap = await computeTaskStates(adapter, db, runId, tasks, ...);
85+
const schedule = scheduleTasks(plan, stateMap, descriptorMap, ralphState);
86+
const runnable = applyConcurrencyLimits(schedule.runnable, stateMap, maxConcurrency, tasks);
87+
88+
if (runnable.length === 0 && inflight.size > 0) {
89+
// 6. Nothing new to launch — wait for an in-flight task to finish
90+
await Promise.race(inflight);
91+
continue;
92+
}
93+
94+
// 5. Launch new tasks
95+
for (const task of runnable) {
96+
const p = executeTask(adapter, db, runId, task, ...).finally(() => inflight.delete(p));
97+
inflight.add(p);
98+
}
99+
// 6. Wait for at least one to finish, then re-render
100+
await Promise.race(inflight);
101+
}
102+
```
103+
104+
### The custom React reconciler
105+
106+
Smithers uses `react-reconciler` to render JSX into a host tree of `HostElement`/`HostText` nodes, then extracts `TaskDescriptor[]` from it:
107+
108+
```ts
109+
// src/dom/renderer.ts
110+
import Reconciler from "react-reconciler";
111+
112+
const reconciler = Reconciler(hostConfig); // standard react-reconciler with custom host config
113+
114+
export class SmithersRenderer {
115+
private container: HostContainer;
116+
private root: any;
117+
118+
constructor() {
119+
this.container = { root: null };
120+
this.root = reconciler.createContainer(this.container, 0, null, false, ...);
121+
}
122+
123+
async render(element: React.ReactElement, opts?: ExtractOptions) {
124+
reconciler.updateContainerSync(element, this.root, null, () => {});
125+
reconciler.flushSyncWork();
126+
return extractFromHost(this.container.root, opts); // → { xml, tasks, mountedTaskIds }
127+
}
128+
}
129+
```
130+
131+
### The workflow type
132+
133+
```ts
134+
// src/SmithersWorkflow.ts
135+
export type SmithersWorkflow<Schema> = {
136+
db: unknown; // Drizzle SQLite DB
137+
build: (ctx: SmithersCtx<Schema>) => React.ReactElement; // the render function
138+
opts: SmithersWorkflowOptions;
139+
schemaRegistry?: Map<string, SchemaRegistryEntry>;
140+
zodToKeyName?: Map<ZodObject<any>, string>;
141+
};
142+
```
143+
144+
### How workflows are loaded today
145+
146+
The CLI loads the workflow module exactly **once** via dynamic `import()`:
147+
148+
```ts
149+
// src/cli/index.ts
150+
async function loadWorkflow(path: string): Promise<SmithersWorkflow<any>> {
151+
const abs = resolve(process.cwd(), path);
152+
const mod = await import(pathToFileURL(abs).href); // cached by Bun's module system forever
153+
if (!mod.default) throw new Error("Workflow must export default");
154+
return mod.default as SmithersWorkflow<any>;
155+
}
156+
```
157+
158+
And the engine receives the workflow object, using `workflow.build` on every loop iteration:
159+
160+
```ts
161+
// src/cli/index.ts
162+
const workflow = await loadWorkflow(workflowPath); // loaded once
163+
const result = await runWorkflow(workflow, { ... }); // passed into engine
164+
```
165+
166+
### Where state lives
167+
168+
**All workflow state is in SQLite**, not in the React tree:
169+
- Runs, attempts, frames, nodes, ralph iterations → `_smithers_*` tables
170+
- Task outputs → user-defined Drizzle tables
171+
- The React fiber tree is discarded after each render and rebuilt from scratch
172+
173+
This means there is **no React state to lose** during a hot reload. The reconciler is essentially stateless between renders (unlike browser React where component state lives in fibers).
174+
175+
### How consumers run workflows
176+
177+
A typical consumer (e.g., the `plue` project) has a runner script:
178+
179+
```ts
180+
// plue/workflow/run.ts
181+
const smithersCli = findSmithersCli();
182+
await $`bun run ${smithersCli} run components/workflow.tsx --root ${ROOT_DIR} --max-concurrency 16`;
183+
```
184+
185+
The dependency tree of a typical workflow looks like:
186+
187+
```
188+
components/workflow.tsx ← the root workflow component
189+
├── ../smithers.ts ← createSmithers() call, DB setup
190+
├── ../config.ts ← WORKFLOW_MAX_CONCURRENCY, TASK_RETRIES
191+
├── ../targets.ts ← build commands, test commands, code style
192+
├── ./focuses.ts ← list of work categories
193+
├── ./focusDirs.ts ← directory mappings
194+
├── ./focusTestSuites.ts ← test suite mappings
195+
├── @smithers-orchestrator/super-ralph ← smithers package (stable, not user code)
196+
└── smithers-orchestrator ← smithers core (stable, not user code)
197+
```
198+
199+
The user frequently wants to change:
200+
- **Prompt strings** — the `PLANNING_PROMPT`, `TESTING_PROMPT` constants, or prompts that live in `.md`/`.mdx` files
201+
- **Focus lists** — adding/removing/reprioritizing work categories in `focuses.ts`
202+
- **Config values** — changing concurrency, retries in `config.ts`
203+
- **Agent configuration** — changing models, timeouts, adding/removing agents
204+
- **Component structure** — changing the JSX tree (adding tasks, reordering sequences)
205+
206+
Today, **none of these changes take effect until the entire process is killed and restarted** (or the run finishes and a new one starts). This is because `import()` caches the module and all its dependencies permanently.
207+
208+
---
209+
210+
## The Feature: Hot Module Replacement for Workflows
211+
212+
### What we want
213+
214+
When a user edits any file in their workflow's dependency tree and saves, the **running workflow should pick up the changes on the next render cycle** — without:
215+
- Restarting the process
216+
- Losing the current run state (which is in SQLite anyway)
217+
- Interrupting in-flight tasks (they continue with their old prompts; only newly-scheduled tasks use new code)
218+
219+
This is exactly analogous to how **Vite + React Fast Refresh** works in a web app:
220+
- Vite watches files → detects change → invalidates module graph → sends updated module to browser
221+
- React Fast Refresh swaps component implementations in the fiber tree → reconciler re-renders → state preserved
222+
223+
**Smithers already has the React side of this.** The reconciler re-renders every loop iteration. State lives in SQLite, not fibers. What's missing is the **Vite dev server equivalent** — the file watching and module invalidation layer.
224+
225+
### How Vite/Bun HMR works (for reference)
226+
227+
Vite's HMR (and Bun's `import.meta.hot` which is compatible):
228+
229+
1. **File watcher** detects a change to `foo.ts`
230+
2. **Module graph** is walked to find the HMR boundary (the nearest module that calls `import.meta.hot.accept()`)
231+
3. The changed module (and anything between it and the boundary) is **re-evaluated** with a cache-busting query string (`?t=1234567890`)
232+
4. The `accept()` callback receives the new module and swaps the relevant references
233+
5. **React Fast Refresh** (a special case) automatically registers component updates so React can swap function implementations without losing state
234+
235+
Bun supports this same `import.meta.hot` API in `Bun.serve()` with `development: true`. However, Smithers is a **CLI process**, not a `Bun.serve()` server. So we need to implement the equivalent mechanism ourselves.
236+
237+
### Key design principle
238+
239+
**This is a feature of the Smithers engine, not the consumer.** The consumer shouldn't need to write `import.meta.hot.accept()` calls or any HMR-aware code. They just write normal `.tsx` workflow files and edit them. The engine handles everything.
240+
241+
### What needs to be designed
242+
243+
1. **File watching**: How do we discover and watch the workflow file's dependency tree? Options include:
244+
- `fs.watch` / `fs.watchFile` on known files
245+
- Bun's built-in file watching
246+
- `Bun.build()` to get the dependency graph, then watch all files in it
247+
- Walking `import` statements manually
248+
- Watching entire directories
249+
250+
2. **Module invalidation**: How do we make `import()` return fresh code? Options include:
251+
- Cache-busting query string: `import(path + '?t=' + Date.now())` (same technique Vite uses)
252+
- Bun-specific module cache APIs if they exist
253+
- Clearing `require.cache` (CJS only, may not work with ESM)
254+
255+
3. **Workflow hot-swap**: What parts of the `SmithersWorkflow` object can/should be swapped?
256+
- `workflow.build` — definitely yes, this is the render function
257+
- `workflow.db` — probably NOT, the DB connection should be preserved
258+
- `workflow.schemaRegistry` / `workflow.zodToKeyName` — need to think about this; schema changes mid-run could be dangerous
259+
- `workflow.opts` — maybe, but carefully
260+
261+
4. **Wake signal**: The engine loop currently blocks on `await Promise.race(inflight)` waiting for a task to finish. A file change should wake the loop immediately so it re-renders with the new code. This probably means adding a file-change promise to the `Promise.race` set.
262+
263+
5. **Safety boundaries**: What changes are safe to hot-reload vs. what should trigger a warning or require a restart?
264+
- Safe: prompt text, config values, focus lists, agent config, JSX tree structure
265+
- Unsafe: DB schema changes, output table changes, input table changes
266+
- Edge case: changing a task's `id` — the scheduler uses node IDs to track state; changing an ID effectively creates a "new" task and orphans the old one
267+
268+
6. **CLI interface**: How does the user opt in?
269+
- `smithers run workflow.tsx --hot` flag?
270+
- `smithers dev workflow.tsx` command (like Vite's `vite dev`)?
271+
- Always-on in development?
272+
273+
7. **Consumer API surface**: Should there be any new APIs for consumers?
274+
- A way to read prompt files that are automatically watched? e.g., `useFile("./prompts/planning.md")`
275+
- Or is just re-importing the module enough?
276+
277+
8. **Scope of watched files**: The workflow imports smithers-orchestrator and super-ralph packages. These are **library code** and should NOT be watched (just like Vite doesn't watch `node_modules/`). Only the user's workflow files should be watched. How do we distinguish?
278+
279+
9. **Error handling**: What happens if the user saves a file with a syntax error?
280+
- The old workflow should keep running
281+
- The error should be reported (logged, shown in UI)
282+
- When the error is fixed and the file is saved again, the new code should be picked up
283+
284+
10. **Events/observability**: Should HMR events be part of the event bus?
285+
- `WorkflowReloaded` event with changed files list?
286+
- `WorkflowReloadFailed` event with error?
287+
288+
### Constraints
289+
290+
- This runs in Bun (not Node.js) — leverage Bun-specific APIs where beneficial
291+
- The smithers engine code (`src/`) is the only thing that changes. Consumer workflow code should work as-is without modifications.
292+
- Must not affect production behavior. HMR should be opt-in or dev-only.
293+
- Must handle the case where the workflow file has side effects at module scope (e.g., `createSmithers()` creates a DB connection — we don't want to create a new DB connection on every reload)
294+
295+
### Deliverable
296+
297+
Please produce a detailed engineering design that covers:
298+
299+
1. **Architecture**: How the file watching, module invalidation, and hot-swap mechanism work together. Include a diagram.
300+
2. **API design**: The CLI interface, any new `RunOptions`, any new consumer-facing APIs.
301+
3. **Implementation plan**: Which files in `src/` need to change and how. Be specific about the changes to the engine loop, CLI, and any new modules.
302+
4. **Safety model**: What changes are safe, what triggers warnings, what requires restart.
303+
5. **Edge cases**: Syntax errors, schema changes, side effects, race conditions between file changes and in-flight tasks.
304+
6. **Testing strategy**: How to test HMR behavior.

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@
7171
"guides/error-handling",
7272
"guides/patterns",
7373
"guides/resumability",
74+
"guides/hot-reload",
7475
"guides/debugging",
7576
"guides/monitoring-logs",
7677
"guides/best-practices",

0 commit comments

Comments
 (0)