You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Enforce invariant-level proptest coverage across epistemic math, signature validation, fallback behavior, and router/accounting logic.
70
+
- Run with deterministic proptest configuration (`PROPTEST_CASES=96`, `PROPTEST_RNG_SEED=424242`, `PROPTEST_RNG_ALGORITHM=cc`) so CI/local results are reproducible.
71
+
- Fail fast if any scoped property-test suite accidentally runs zero tests (false-green guardrail).
72
+
73
+
### Layer 2.9: Claude adapter end-to-end efficacy gate
74
+
75
+
```bash
76
+
make claude-adapter-gate
77
+
```
78
+
79
+
Purpose:
80
+
- Validate realistic Claude adapter observe/orient/decide/act scenarios, not just activation plumbing.
|`rlm-claude-code`|`54d88c085851fdc08028f3c1835527979645ffe5`| pinned `vendor/loop` = `6779cdbc970c70f3ce82a998d6dcda59cd171560`| Hard runtime/build vendoring (`rlm_core`) |`VG-RCC-001`, `VG-CONTRACT-001`|`evidence/2026-02-20/milestone-M7/M7-T10-VG-RCC-001.txt`|`supported`| Pin-aware scope only (D-008). Not a claim for `/Users/rand/src/loop` HEAD unless pinis updated and rerun. |
28
-
|`loop-agent`|active committed canonical: `30c1fa786d79e0984cf464ffb8e67cc7a1bfcaeb`; historical promotion candidate: `f2aeb1859592ef82f63f6ae416973854c381666b` (`/tmp/loop-agent-clean`) |`/Users/rand/src/loop` runtime seam contract | Optional runtime seam (classifier + trajectory + sensitivity guardrails) |`VG-LA-001`, `VG-CONTRACT-001`|`evidence/2026-02-20/milestone-M7/M7-T10-VG-LA-001.txt`|`supported`| D-017 clean-clone tuple policy remains in force; latest seam-critical run is green (`30 passed`) on `/tmp/loop-agent-clean` using shared toolchain interpreter with clean tuple source under test. |
29
-
|`io-rflx`|`abf11ca4069bac7a740508d02242114483a6cf51`| schema-first interop with loop |`io_rflx_interop.v0`|`VG-RFLX-001`, `VG-RFLX-002`, `VG-CONTRACT-001`|`evidence/2026-02-20/milestone-M7/M7-T09-validation-summary.md`|`supported`| Compile + contract validation scope remains additive and schema-first. `VG-RFLX-002` now validates loop-owned fixture corpus + calibration policy and targeted io-rflx roundtrip serialization tests with isolated `CARGO_TARGET_DIR`. |
27
+
|`rlm-claude-code`|`528f90018e0d464aa7e7459998191d8cfde27787`|loop candidate `75f806f85985302c498e9d8e4915af6f144ed6ad`; pinned `vendor/loop` = `6779cdbc970c70f3ce82a998d6dcda59cd171560`| Hard runtime/build vendoring (`rlm_core`) |`VG-RCC-001`, `VG-CONTRACT-001`|`evidence/2026-02-20/post-review-hardening/loop-5ut.6-weekly-cadence/weekly-cadence-m4/M4-T04-VG-RCC-001.txt`|`supported`| Pin-aware scope only (D-008). Candidate loop SHA differs from vendor pin; result scope is validated for the pinned vendor tuple plus compatibility check of the current loop candidate. |
28
+
|`loop-agent`|`2f4e762fbdb6fe40a00fe40b5df67b00b85dbb29` (canonical `dp/loop-agent`) | loop tuple `75f806f85985302c498e9d8e4915af6f144ed6ad` via clean-clone committed mode | Optional runtime seam (classifier + trajectory + sensitivity guardrails) |`VG-LA-001`, `VG-CONTRACT-001`|`evidence/2026-02-20/post-review-hardening/loop-5ut.6-weekly-cadence/weekly-cadence-m4/M4-T04-VG-LA-001.txt`|`supported`| D-017 policy is in force; claim-grade run used `/tmp/loop-agent-clean-cadence` clean clone, with advisory `VG-LA-002` snapshot green (`1052 passed`). |
29
+
|`io-rflx`|`abf11ca4069bac7a740508d02242114483a6cf51`|loop tuple `75f806f85985302c498e9d8e4915af6f144ed6ad` (schema-first interop) |`io_rflx_interop.v0`|`VG-RFLX-001`, `VG-RFLX-002`, `VG-CONTRACT-001`|`evidence/2026-02-20/post-review-hardening/loop-5ut.6-weekly-cadence/weekly-cadence-m4/M4-T04-VG-RFLX-001.txt` + `evidence/2026-02-20/post-review-hardening/loop-5ut.6-VG-RFLX-002.txt`|`supported`| Compile + contract validation remains additive and schema-first; fixture roundtrip/calibration checks rerun on refreshed tuple with isolated `CARGO_TARGET_DIR`. |
Copy file name to clipboardExpand all lines: docs/execution-plan/VALIDATION-MATRIX.md
+9-2Lines changed: 9 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,9 @@ This matrix defines mandatory validation gates for milestone completion.
18
18
- For `VG-LA-002` promotion claims, evidence must be tied to committed consumer SHA state (D-015), not a dirty working tree.
19
19
- For M7 gates, evidence must map each gate result to a specific M7 task ID (`M7-T01`..`M7-T10`).
20
20
- For `VG-COVERAGE-001`, CI evidence from `.github/workflows/rlm-core-coverage.yml` is canonical when local environments cannot install `cargo-llvm-cov`.
21
+
- For `VG-PY-INTEGRATION-001`, all-skipped or no-tests-ran outcomes are gate failures.
22
+
- For `VG-PROPTEST-001`, run with deterministic seed/config (`PROPTEST_RNG_SEED`, `PROPTEST_CASES`, `PROPTEST_RNG_ALGORITHM`) and fail if any scoped suite executes zero tests.
23
+
- For `VG-CLAUDE-ADAPTER-E2E-001`, fail if fewer than two scenario tests execute (filter drift guardrail).
21
24
22
25
## Core Loop Gates
23
26
@@ -28,7 +31,11 @@ This matrix defines mandatory validation gates for milestone completion.
| VG-LOOP-IGNORED-REPL-001 | Unattended ignored subprocess-integration health (`rlm_repl` + Lean REPL spawn paths) |`LOOP_MIN_AVAILABLE_MIB=3072 /Users/rand/src/loop/scripts/safe_run.sh bash -lc 'cd /Users/rand/src/loop/rlm-core && cargo test --no-default-features --features gemini test_repl_spawn -- --ignored --test-threads=1 && cargo test --no-default-features --features gemini test_lean_repl_spawn -- --ignored --test-threads=1'`| Commands complete deterministically (expected runtime: usually < 120s total); no orphaned `rlm_repl`/Lean `repl` subprocesses remain; environment failures fail fast with actionable stderr and are triaged via troubleshooting checklist |`.../VG-LOOP-IGNORED-REPL-001.txt`|
32
39
| VG-LOOP-CORE-001 | Full `rlm-core` regression |`LOOP_MIN_AVAILABLE_MIB=3072 /Users/rand/src/loop/scripts/safe_run.sh bash -lc 'cd /Users/rand/src/loop/rlm-core && cargo test --no-default-features --features gemini'`| No failing tests |`.../VG-LOOP-CORE-001.txt`|
33
40
| VG-COVERAGE-001 | Reproducible line-coverage gate (`rlm-core`) |`LOOP_MIN_AVAILABLE_MIB=4096 /Users/rand/src/loop/scripts/safe_run.sh bash -lc 'cd /Users/rand/src/loop && make coverage'`| Coverage run succeeds and line coverage is >= 80% (`COVERAGE_MIN_LINES`) |`.../VG-COVERAGE-001.txt` plus `coverage/lcov.info` and `coverage/summary.txt`|
Copy file name to clipboardExpand all lines: docs/execution-plan/evidence/2026-02-20/full-system-validation/VG-CONTRACT-001.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
# VG-CONTRACT-001
2
2
Date: 2026-02-20
3
3
Scope: Consumer contract consistency check against active implementations and tuple evidence
4
+
Status: Historical baseline. Refreshed tuple evidence is in `/Users/rand/src/loop/docs/execution-plan/evidence/2026-02-20/post-review-hardening/loop-5ut.6-VG-CONTRACT-001.md`.
Status: Historical baseline. Current tuple refresh is tracked in `/Users/rand/src/loop/docs/execution-plan/evidence/2026-02-20/post-review-hardening/loop-5ut.6-full-system-validation-refresh.md`.
Empirically validate end-to-end system behavior across intended loop use cases (not just static review), map execution to OODA flows, and identify/track remaining implementation or operational gaps.
@@ -47,7 +48,7 @@ Empirically validate end-to-end system behavior across intended loop use cases (
0 commit comments