Skip to content

Commit 44ba7f6

Browse files
committed
feature: Evidence Freshness Management (FPF B.3.4)
1 parent 57e7233 commit 44ba7f6

File tree

12 files changed

+866
-77
lines changed

12 files changed

+866
-77
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8787
- Added SQL indexes for efficient WLNK traversal.
8888
- Documented structural relations (B.1.1) in CLAUDE.md.
8989

90+
- **Evidence Freshness Management (FPF B.3.4)**:
91+
- New `waivers` table for tracking temporary risk acceptance with full audit trail.
92+
- `quint_check_decay` now supports three modes:
93+
- **Report mode** (default): Shows freshness report with STALE/FRESH/WAIVED holons.
94+
- **Deprecate mode**: Downgrades hypothesis (L2→L1 or L1→L0) when evidence is terminally stale.
95+
- **Waive mode**: Records explicit risk acceptance with rationale and expiration date.
96+
- `quint_test` now accepts L2 hypotheses for evidence refresh (L2 + PASS stays L2 with fresh evidence).
97+
- Freshness report now shows individual evidence IDs (not just counts) for actionable output.
98+
- Implements WLNK principle: one expired evidence item = entire holon is STALE.
99+
- Updated command documentation: `q-decay.md` and `q3-validate.md`.
100+
90101
- **CI/CD Pipeline**:
91102
- New GitHub Actions workflow (`.github/workflows/ci.yml`) for pull requests.
92103
- Triggers on PRs and pushes to `main` and `dev` branches.

src/mcp/cmd/commands/q-decay.md

Lines changed: 252 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,258 @@
1-
# q-decay: Assurance Maintenance
1+
# q-decay: Evidence Freshness Management
22

33
## Intent
4-
Identifies **Epistemic Debt (ED)** by finding holons with expired evidence that need re-validation.
54

6-
## Action (Run-Time)
7-
1. Call `quint_check_decay` to get a report of all holons with expired evidence.
8-
2. Review the output - it shows:
9-
- Which holons have stale evidence
10-
- How many evidence items are expired
11-
- How many days overdue
12-
3. Present findings to the user with recommendations.
5+
Manages **evidence freshness** by identifying stale decisions and providing governance actions. Implements FPF B.3.4 (Evidence Decay).
136

14-
## Tool Guide
7+
**Key principle:** Evidence is perishable. Decisions built on expired evidence carry hidden risk.
8+
9+
---
10+
11+
## Quick Concepts
12+
13+
### What is "stale" evidence?
14+
15+
Every piece of evidence has a `valid_until` date. A benchmark from 6 months ago may no longer reflect current system performance. A security audit from before a major dependency update doesn't account for new vulnerabilities.
16+
17+
When evidence expires, the decision it supports becomes **questionable** — not necessarily wrong, just unverified.
18+
19+
### What is "waiving"?
20+
21+
**Waiving = "I know this evidence is stale, I accept the risk temporarily."**
22+
23+
Use it when:
24+
- You're about to launch and don't have time to re-run all tests
25+
- The evidence is only slightly expired and probably still valid
26+
- You have a scheduled date to refresh it properly
27+
28+
A waiver is NOT ignoring the problem — it's **explicitly documenting** that you know about the risk and accept it until a specific date.
29+
30+
### The Three Actions
31+
32+
| Situation | Action | What it does |
33+
|-----------|--------|--------------|
34+
| Evidence is old but decision is still good | **Refresh** | Re-run the test, get fresh evidence |
35+
| Decision is obsolete, needs rethinking | **Deprecate** | Downgrade hypothesis, restart evaluation |
36+
| Accept risk temporarily | **Waive** | Record the risk acceptance with deadline |
37+
38+
---
39+
40+
## Natural Language Usage
41+
42+
**You don't need to memorize evidence IDs or tool parameters.** Just describe what you want.
43+
44+
### Example Workflow
45+
46+
```
47+
User: /q-decay
48+
49+
Agent shows report:
50+
## Evidence Freshness Report
51+
52+
### STALE (1 holon requires action)
53+
54+
#### Use Redis for Caching (L2)
55+
| ID | Type | Status | Details |
56+
|----|------|--------|---------|
57+
| ev-benchmark-2024-06-15 | validation | EXPIRED | 180 days overdue |
58+
| ev-load-test-2024-06-20 | validation | EXPIRED | 175 days overdue |
59+
60+
Actions:
61+
→ Refresh: /q3-validate hypothesis-redis-caching
62+
→ Deprecate: Tell me to deprecate this decision
63+
→ Waive: Tell me which evidence to waive and until when
64+
```
65+
66+
Now the user can respond naturally:
67+
68+
```
69+
User: Waive the benchmark until February, we'll re-run it after the migration.
70+
```
71+
72+
The agent reads the context, finds the matching evidence ID, and calls:
73+
```
74+
quint_check_decay(
75+
waive_id: "ev-benchmark-2024-06-15",
76+
waive_until: "2025-02-01",
77+
waive_rationale: "Will re-run after the migration"
78+
)
79+
```
80+
81+
**You don't need to copy IDs. Just reference what you mean.**
82+
83+
### More Natural Examples
84+
85+
```
86+
User: /q-decay
87+
88+
[sees report with stale evidence]
89+
90+
User: We're launching next week. Waive everything until January 15th,
91+
we have a full test cycle scheduled for January 20th.
92+
93+
Agent: I'll waive both pieces of expired evidence until 2025-01-15
94+
with rationale "Launch deadline, full test cycle scheduled Jan 20"
95+
96+
[calls quint_check_decay for each]
97+
98+
---
99+
100+
User: The vendor API is being discontinued. Deprecate that decision.
101+
102+
Agent: Deprecating hypothesis-vendor-api from L2 to L1.
103+
You'll need to run /q1-hypothesize to explore alternatives.
104+
105+
---
106+
107+
User: The load test is only 2 weeks overdue, I think it's still valid.
108+
Can we just refresh it?
109+
110+
Agent: Running /q3-validate to refresh the evidence for hypothesis-redis-caching.
111+
[calls quint_test]
112+
```
113+
114+
---
115+
116+
## How Evidence IDs Work
117+
118+
Evidence IDs are generated automatically when tests run:
119+
- Format: `ev-{type}-{date}` (e.g., `ev-benchmark-2024-06-15`)
120+
- Shown in the freshness report
121+
- **You don't need to remember them** — describe what you want, the agent finds the right ID
122+
123+
If you want to be explicit, you can use the full syntax:
124+
```
125+
User: /q-decay --waive ev-benchmark-2024-06-15 --until 2025-02-01 --rationale "Migration pending"
126+
```
127+
128+
But natural language works fine:
129+
```
130+
User: Waive the benchmark until February, migration is pending.
131+
```
132+
133+
---
134+
135+
## Detailed Actions
136+
137+
### Refresh (via /q3-validate)
138+
139+
**When:** Evidence is stale, but the decision is still correct. You just need fresh proof.
140+
141+
**What happens:**
142+
1. Agent calls `quint_test` on the hypothesis
143+
2. New evidence is recorded with a fresh `valid_until` date
144+
3. Holon status changes from STALE to FRESH
145+
146+
**Example:**
147+
```
148+
User: The Redis benchmark is 6 months old but Redis is still the right choice.
149+
Can we refresh the evidence?
150+
151+
Agent: Running validation for hypothesis-redis-caching...
152+
[executes benchmark]
153+
Evidence refreshed. Hypothesis remains at L2 with fresh evidence valid until 2025-06-21.
154+
```
155+
156+
### Deprecate
157+
158+
**When:** The decision itself is questionable. The world has changed, you need to reconsider.
159+
160+
**What happens:**
161+
1. Hypothesis moves down one level (L2→L1 or L1→L0)
162+
2. Audit log records who deprecated it and why
163+
3. You're prompted to run `/q1-hypothesize` to explore alternatives
164+
165+
**Example:**
166+
```
167+
User: AWS announced they're deprecating this service. Deprecate our decision to use it.
168+
169+
Agent: Deprecated: hypothesis-aws-service L2 → L1
170+
171+
This decision now requires re-evaluation.
172+
Next step: Run /q1-hypothesize to explore alternatives.
173+
```
174+
175+
### Waive
176+
177+
**When:** You know the evidence is stale, you accept the risk temporarily.
178+
179+
**What happens:**
180+
1. Waiver is recorded with: who, until when, why
181+
2. Evidence no longer shows as STALE (shows as WAIVED)
182+
3. When waiver expires, evidence returns to STALE status
183+
4. Full audit trail preserved
184+
185+
**Example:**
186+
```
187+
User: We're launching Tuesday. Waive the security audit until the 15th,
188+
we have a re-audit scheduled for the 20th.
189+
190+
Agent: Waiver recorded:
191+
- Evidence: ev-security-audit-2024-03
192+
- Waived until: 2025-01-15
193+
- Rationale: Launch deadline. Re-audit scheduled for the 20th.
194+
195+
⚠️ This evidence returns to EXPIRED status after 2025-01-15.
196+
```
197+
198+
---
199+
200+
## Tool Reference (for understanding, not memorization)
15201

16202
### `quint_check_decay`
17-
Scans all evidence and identifies expired items. Implements **Evidence Decay (B.3.4)**.
18-
- *No parameters required.*
19-
- *Returns:* Markdown report listing:
20-
- Holons with expired evidence
21-
- Count of expired evidence per holon
22-
- Days overdue
23-
- Recommendation to run `/q3-validate` for affected holons
24-
25-
## Example
26-
User: `/q-decay`
27-
28-
Agent calls:
29-
1. `quint_check_decay()`
30-
31-
If expired evidence is found, recommend:
32-
- Run `/q3-validate <hypothesis_id>` to refresh evidence
33-
- Or run `/q4-audit <hypothesis_id>` to reassess reliability
203+
204+
The agent translates your natural language into these parameters:
205+
206+
| Parameter | What it means |
207+
|-----------|--------------|
208+
| (none) | Show the freshness report |
209+
| `deprecate` | Which hypothesis to downgrade |
210+
| `waive_id` | Which evidence to waive |
211+
| `waive_until` | When the waiver expires (YYYY-MM-DD) |
212+
| `waive_rationale` | Why you're accepting this risk |
213+
214+
---
215+
216+
## WLNK Principle
217+
218+
A holon is **STALE** if *any* of its evidence is expired (and not waived).
219+
220+
This is the Weakest Link (WLNK) principle: reliability = min(all evidence). One stale piece makes the whole decision questionable.
221+
222+
---
223+
224+
## Audit Trail
225+
226+
All actions are logged for accountability:
227+
228+
| Action | What's Recorded |
229+
|--------|-----------------|
230+
| Deprecate | from_layer, to_layer, who, when |
231+
| Waive | evidence_id, until_date, rationale, who, when |
232+
233+
Waivers are stored in a dedicated table — you can query "who waived what and why" at any time.
234+
235+
---
236+
237+
## Common Workflows
238+
239+
### Weekly Maintenance
240+
```
241+
/q-decay # See what's stale
242+
# For each stale item, tell the agent: refresh, deprecate, or waive
243+
```
244+
245+
### Pre-Release
246+
```
247+
/q-decay # Check for stale decisions
248+
# Either refresh evidence or explicitly waive with documented rationale
249+
# Waiver rationales become part of release documentation
250+
```
251+
252+
### After Major Change
253+
```
254+
# Dependency update, API change, security advisory...
255+
/q-decay # See what's affected
256+
# Deprecate obsolete decisions
257+
# Start new hypothesis cycle for replacements
258+
```

src/mcp/cmd/commands/q3-validate.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
---
22
description: "Validate (Induction)"
3-
pre: ">=1 L1 hypothesis exists"
4-
post: "each L1 processed → L2 (PASS) or invalid (FAIL) or L1 with feedback (REFINE)"
3+
pre: ">=1 L1 or L2 hypothesis exists"
4+
post: "L1 processed → L2 (PASS) or invalid (FAIL) or L1 with feedback (REFINE); L2 processed → refreshed evidence"
55
invariant: "test_type ∈ {internal, external}; verdict ∈ {PASS, FAIL, REFINE}"
66
required_tools: ["quint_test"]
77
---
88

99
# Phase 3: Induction (Validation)
1010

11-
You are the **Inductor** operating as a **state machine executor**. Your goal is to gather **Empirical Validation (EV)** for the L1 hypotheses to promote them to L2.
11+
You are the **Inductor** operating as a **state machine executor**. Your goal is to gather **Empirical Validation (EV)** for L1 hypotheses to promote them to L2.
12+
13+
**Also serves as the REFRESH action** in the Evidence Freshness governance loop (see `/q-decay`).
1214

1315
## Enforcement Model
1416

@@ -17,25 +19,27 @@ You are the **Inductor** operating as a **state machine executor**. Your goal is
1719
| Precondition | Tool | Postcondition |
1820
|--------------|------|---------------|
1921
| L1 hypothesis exists | `quint_test` | L1 → L2 (PASS) or → invalid (FAIL) |
22+
| L2 hypothesis exists (refresh) | `quint_test` | L2 → L2 with fresh evidence |
2023

2124
**RFC 2119 Bindings:**
22-
- You MUST have at least one L1 hypothesis before calling `quint_test`
23-
- You MUST call `quint_test` for EACH L1 hypothesis you want to validate
25+
- You MUST have at least one L1 or L2 hypothesis before calling `quint_test`
26+
- You MUST call `quint_test` for EACH hypothesis you want to validate or refresh
2427
- You MUST NOT call `quint_test` on L0 hypotheses — they must pass Phase 2 first
2528
- You SHALL specify `test_type` as "internal" (code test) or "external" (research/docs)
2629
- Verdict MUST be exactly "PASS", "FAIL", or "REFINE"
2730

28-
**If precondition fails:** Tool returns BLOCKED with message "hypothesis not found in L1". This is NOT a bug — it means you skipped Phase 2.
31+
**If precondition fails:** Tool returns BLOCKED with message "hypothesis not found in L1 or L2". This is NOT a bug — it means you skipped Phase 2.
2932

30-
**CRITICAL:** If you receive "not found in L1", you MUST NOT retry with the same hypothesis. Go back to Phase 2 first.
33+
**CRITICAL:** If you receive "not found in L1 or L2", you MUST NOT retry with the same hypothesis. Go back to Phase 2 first.
3134

3235
## Invalid Behaviors
3336

3437
- Calling `quint_test` on L0 hypothesis (WILL BE BLOCKED)
3538
- Calling `quint_test` on hypothesis that doesn't exist
3639
- Stating "validated via testing" without tool call
3740
- Proceeding to `/q4-audit` with zero L2 hypotheses
38-
- Attempting to validate a hypothesis that's already in L2
41+
42+
**Note:** Calling `quint_test` on L2 hypotheses is now VALID — it refreshes their evidence for the freshness governance loop.
3943

4044
## Context
4145
We have substantiated hypotheses (L1) that passed logical verification. We need evidence that they work in reality.
@@ -105,10 +109,25 @@ Result: Hypothesis remains L1. Phase 4 will find no L2 to audit. PROTOCOL VIOLAT
105109
## Checkpoint
106110

107111
Before proceeding to Phase 4, verify:
108-
- [ ] Queried L1 hypotheses (not L0, not L2)
112+
- [ ] Queried L1 hypotheses (not L0)
109113
- [ ] Called `quint_test` for EACH L1 hypothesis
110114
- [ ] Each call returned success (not BLOCKED)
111115
- [ ] At least one verdict was PASS (creating L2 holons)
112116
- [ ] Used valid test_type values (internal/external)
113117

114118
**If any checkbox is unchecked, you MUST complete it before proceeding.**
119+
120+
---
121+
122+
## Evidence Refresh (L2 → L2)
123+
124+
When called with an L2 hypothesis, `quint_test` adds fresh evidence without changing the layer.
125+
126+
**Use case:** `/q-decay` shows stale evidence on an L2 holon. Run `/q3-validate <hypothesis_id>` to refresh.
127+
128+
| Current Layer | Verdict | Outcome |
129+
|---------------|---------|---------|
130+
| L1 | PASS | Promotes to L2 |
131+
| L1 | FAIL | Stays L1 |
132+
| L2 | PASS | Stays L2, fresh evidence added |
133+
| L2 | FAIL | Stays L2, failure recorded, consider `/q-decay --deprecate` |

src/mcp/db/models.go

Lines changed: 9 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)