You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,13 @@ The system’s posture assumes **authors are cooperating** (instrumentation, not
33
33
34
34
This repo vendors its Rust dependencies under `vendor/` and forces offline builds via `.cargo/config.toml`. If you update dependencies, keep `Cargo.lock` and `vendor/` in sync by running `cargo vendor vendor --locked`.
Copy file name to clipboardExpand all lines: docs/fencerunner-user.md
+21-30Lines changed: 21 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,15 +20,14 @@ Most of the guide is about making that stream pleasant to consume and easy to ev
20
20
-[Generating boundaries](#generating-boundaries)
21
21
-[Troubleshooting](#troubleshooting)
22
22
-[Advanced](#advanced)
23
-
-[Signal audit grab bag](#signal-audit-grab-bag)
24
23
25
24
## The contract
26
25
27
26
>stdout is reserved.
28
27
29
28
`fencerunner` treats a script’s stdout as its interface: the boundary record. Anything else on stdout (logs, progress, stray `echo`) is a contract break. Send diagnostics to stderr, capture any command output you care about, and emit exactly one boundary record when the script is done.
30
29
31
-
Each top-level `*.sh` is one script, and each script emits one record. The record must declare `script.id`, and that id must match the filename stem (`my_probe.sh` → `my_probe`), so identity is stable and deterministic. In a single run, script ids must be unique so the stream can be consumed without ambiguity. `fencerunner`also enforces a fixed result.outcome vocabulary (success|denied|partial|error) for all scripts so downstream tools can treat `result.outcome` as a stable enum.
30
+
Each top-level `*.sh` is one script, and each script emits one record. The record must declare `script.id`, and that id must match the filename stem (`my_probe.sh` → `my_probe`), so identity is stable and deterministic. In a single run, script ids must be unique so the stream can be consumed without ambiguity. `fencerunner`(and `emit-record`) enforce a fixed `result.outcome` vocabulary (`success|denied|partial|error`) for all scripts so downstream tools can treat it as a stable enum. Unknown outcomes are a contract break: strict mode fails and supervised mode emits a synthetic error record.
32
31
33
32
That’s most of what `fencerunner` insists on. Everything else—what you encode in the record, how strict the schema is, which operation kinds exist—is defined by the contracts you choose to write in the run dir.
34
33
@@ -307,6 +306,8 @@ The recommended baseline treats every boundary record as the same five-part enve
307
306
308
307
In that envelope, downstream tools almost always care about the same few fields: `script.id` (filename stem), `operation.kind` and `operation.target` (what this record *is about*), `result.outcome` (fixed enum: `success|denied|partial|error`), and whatever you choose to put in `payload.raw` as your suite-specific structured payload.
309
308
309
+
If you need richer result semantics, keep `result.outcome` in this vocabulary and encode your extra meaning under `operation.*` or `payload.raw`.
310
+
310
311
`context.commitments` is where scripts record lightweight “I relied on / observed / emitted X” signals via `commit_help_me` (empty allowed). `payload.stdout_snippet` and `payload.stderr_snippet` are string evidence channels (empty allowed). If you want multiple run dirs to produce a unified stream, make `operation.kind` and `operation.target` mean the same thing everywhere.
311
312
312
313
### Using `emit-record`
@@ -365,6 +366,12 @@ head -c 2000 "${stderr_file}" > "${stderr_file}.trimmed"
365
366
366
367
Then pass the trimmed files to `emit-record` so stdout stays clean and payloads stay bounded.
367
368
369
+
`emit-record` (and supervised synthetic records) keep records compact and predictable by enforcing payload bounds:
370
+
371
+
-`payload` (as serialized JSON) is capped at 16 KiB (16384 bytes).
372
+
-`payload.stdout_snippet` and `payload.stderr_snippet` are NUL-stripped and truncated to 2000 characters (with an ellipsis).
373
+
- Common failure: `Payload exceeds 16384 bytes (got N)`; keep `payload.raw` summary-sized and write large artifacts to files (then reference paths/hashes in `payload.raw`).
374
+
368
375
---
369
376
370
377
## Run modes
@@ -375,13 +382,13 @@ Most users end up using both modes: strict when authoring and evolving a suite,
375
382
376
383
### Strict mode
377
384
378
-
Strict mode is the default. Use it when you want contract breaks to fail the run with a non-zero exit code. In strict mode a script must emit exactly one schema-valid boundary record on stdout; if it emits invalid JSON, violates `boundaries.json`, mismatches `script.id`, exits non-zero, or violates an enforced gate, the run fails and no record is emitted for that script.
385
+
Strict mode is the default. Use it when you want contract breaks to fail the run with a non-zero exit code. In strict mode a script must emit exactly one schema-valid boundary record on stdout; if it emits invalid JSON, violates `boundaries.json`, mismatches `script.id`, emits an unknown `result.outcome`, exits non-zero, or violates an enforced gate, the run fails and no record is emitted for that script.
379
386
380
387
Strict mode fails fast: the runner stops at the first script-level contract break, and any remaining scripts are not executed.
381
388
382
389
### Supervised mode
383
390
384
-
Supervised mode (`--supervised`) is for pipelines where a well-formed NDJSON stream matters more than perfect script behavior. `fencerunner` will output one record per script; when a script breaks the contract it emits a synthetic error record that captures stdout/stderr snippets and explains what happened. Supervised exits `0` unless preflight or the runner itself fails (missing contracts, invalid contracts, script not executable, duplicate script ids, and similar harness-level failures).
391
+
Supervised mode (`--supervised`) is for pipelines where a well-formed NDJSON stream matters more than perfect script behavior. `fencerunner` will output one record per script; when a script breaks the contract (including unknown outcomes) it emits a synthetic error record that captures stdout/stderr snippets and explains what happened. Supervised exits `0` unless preflight or the runner itself fails (missing contracts, invalid contracts, script not executable, duplicate script ids, and similar harness-level failures).
385
392
386
393
---
387
394
@@ -397,6 +404,8 @@ Some examples below use `jq` for reporting; `fencerunner` does not ship it.
397
404
398
405
Commitment ids are **simple tokens**: `^[A-Za-z0-9_.-]+$` (letters/digits plus `_`, `.`, `-`). If you need spaces, slashes, or other punctuation, put that detail into `payload.raw` or `operation.args` and keep the commitment id as the stable label. If you call `commit_help_me` with an invalid id, it fails and your script should treat that as a hard error.
399
406
407
+
`commit_help_me` treats duplicate enrollments as a contract break. If a script calls the same `<verb> <commitment.id>` pair twice, `commit-help-me` exits non-zero and the script should fail fast.
408
+
400
409
### Branching canaries
401
410
402
411
Sometimes you want the thinnest possible instrumentation: “did this code path run?”
@@ -573,6 +582,8 @@ In both run dirs, create the triad by copying the same `gates.json`, `commitment
573
582
574
583
Now add one script per run dir (note: ids must be globally unique across the whole run):
575
584
585
+
Script ids must be globally unique in a single run. If you run `fencerunner ./dirA ./dirB` and both contain `probe.sh` (id `probe`), preflight fails with a “Duplicate script id …” error. Fix: rename one of the scripts (or split runs).
Tip: run the same invocation with `--supervised` to get a synthetic record that captures stdout/stderr snippets.
1060
1071
1072
+
### “Payload exceeds 16384 bytes …”
1073
+
1074
+
-`emit-record` caps `payload` (as serialized JSON) at 16 KiB (16384 bytes) to keep streams compact and predictable.
1075
+
- Keep `payload.raw` summary-sized and write large artifacts to files (then reference paths/hashes in `payload.raw`).
1076
+
- Keep `payload.stdout_snippet` / `payload.stderr_snippet` small by trimming captured output before passing it to `emit-record`.
1077
+
1061
1078
### “record violates boundaries.json”
1062
1079
1063
1080
- Your script emitted JSON, but it didn’t satisfy the schema in `boundaries.json`.
@@ -1094,29 +1111,3 @@ If you want supervised mode *and* strict schema validation of synthetic records,
1094
1111
-`operation.kind = "harness.supervised"`
1095
1112
1096
1113
Keep this in the “advanced” bucket unless you have consumers that require “every line validates against `boundaries.json` even for synthetic errors”.
1097
-
1098
-
---
1099
-
1100
-
## Signal audit grab bag
1101
-
1102
-
>Loose ends worth validating.
1103
-
1104
-
### Payload size limits
1105
-
1106
-
`emit-record` (and supervised synthetic records) keep records compact and predictable by enforcing a size limit and truncating snippets:
1107
-
1108
-
-`payload` (as serialized JSON) is capped at 16 KiB (16384 bytes).
1109
-
-`payload.stdout_snippet` and `payload.stderr_snippet` are NUL-stripped and truncated to 2000 characters (with an ellipsis).
1110
-
- Common failure: `Payload exceeds 16384 bytes (got N)`; keep `payload.raw` summary-sized and write large artifacts to files (then reference paths/hashes in `payload.raw`).
1111
-
1112
-
### Duplicate commitment enrollments
1113
-
1114
-
`commit_help_me` treats duplicates as a contract break. If a script calls the same `<verb> <commitment.id>` pair twice, `commit-help-me` exits non-zero and the script should fail fast.
1115
-
1116
-
### Duplicate script ids across run dirs
1117
-
1118
-
Script ids must be globally unique in a single run. If you run `fencerunner ./dirA ./dirB` and both contain `probe.sh` (id `probe`), preflight fails with a “Duplicate script id …” error. Fix: rename one of the scripts (or split runs).
1119
-
1120
-
### Outcome vocabulary
1121
-
1122
-
`fencerunner` enforces a fixed outcome vocabulary: `success|denied|partial|error` (and `emit-record` enforces it too). If a script emits any other `result.outcome`, strict mode fails and supervised mode emits a synthetic error record. If you need richer result semantics, keep `result.outcome` in this vocabulary and encode your extra meaning under `operation.*` or `payload.raw`.
0 commit comments