chore: docs update

m0n0x41d · m0n0x41d · commit ee32616c801b · 2025-12-21T21:53:23.000+04:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -96,6 +96,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - `quint_test` now accepts L2 hypotheses for evidence refresh (L2 + PASS stays L2 with fresh evidence).
   - Freshness report now shows individual evidence IDs (not just counts) for actionable output.
   - Implements WLNK principle: one expired evidence item = entire holon is STALE.
+  - Natural language support: users can say "waive the benchmark until February" and the agent handles ID resolution.
+  - New documentation: `docs/evidence-freshness.md` — practical guide to managing stale evidence.
   - Updated command documentation: `q-decay.md` and `q3-validate.md`.
 
 - **CI/CD Pipeline**:
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -80,10 +80,18 @@ External evidence (documentation, benchmarks, research) is only valuable if it i
 
 Quint's assurance calculator applies a **Congruence Penalty** based on the CL, reducing the effective reliability of evidence that isn't a perfect match for your context.
 
-### Validity (Evidence Decay)
+### Validity (Evidence Freshness)
 
 **FPF Pattern:** B.3.4 Evidence Decay & Epistemic Debt
 
-Evidence is perishable. A performance benchmark from two years ago is less trustworthy than one from last week because the context (libraries, hardware, compilers) has likely changed.
+Evidence expires. That benchmark you ran six months ago? The library has been updated twice since then. Your numbers might not be accurate anymore.
 
-Every piece of evidence in Quint has a `valid_until` date. The `/q-decay` command scans for expired evidence, and the assurance calculator automatically penalizes the reliability of claims that depend on it. This system makes the "staleness" of knowledge visible and manageable, preventing you from making critical decisions based on outdated information.
+Every piece of evidence has a `valid_until` date. When evidence expires, the decision it supports becomes **questionable** — not necessarily wrong, just unverified. The `/q-decay` command shows you what's stale and lets you:
+
+- **Refresh** — Re-run tests to get fresh proof
+- **Deprecate** — Downgrade the hypothesis if the decision needs rethinking
+- **Waive** — Accept the risk temporarily with documented rationale
+
+This makes hidden risk visible. You know exactly which decisions are operating on outdated assumptions.
+
+See [Evidence Freshness](evidence-freshness.md) for the full guide.
diff --git a/docs/evidence-freshness.md b/docs/evidence-freshness.md
@@ -0,0 +1,181 @@
+# Evidence Freshness
+
+Evidence has an expiration date. This guide explains why that matters and what to do about it.
+
+## Why Evidence Expires
+
+Imagine you benchmarked Redis vs Memcached six months ago. Redis won. You made the decision, recorded the DRR, moved on.
+
+Now it's six months later. The Memcached team shipped a major performance update. Your Node.js version changed. The benchmark numbers you relied on? They might not be accurate anymore.
+
+**The decision isn't necessarily wrong — it's just unverified.**
+
+This is what FPF calls **Evidence Decay**. Every piece of evidence has a `valid_until` date. When that date passes, the evidence is "stale" and the decisions built on it become questionable.
+
+## The Problem with Stale Evidence
+
+Stale evidence creates hidden risk. You're operating on assumptions that haven't been re-checked. Maybe they're still true. Maybe they're not. You don't know.
+
+Quint Code makes this visible instead of hiding it.
+
+## Checking Your Evidence
+
+Run `/q-decay` to see what's stale:
+
+```
+/q-decay
+```
+
+You'll get a freshness report showing which holons have expired evidence:
+
+```
+## Evidence Freshness Report
+
+### STALE (1 holon requires action)
+
+#### Use Redis for Caching (L2)
+| Evidence ID | Type | Status | Details |
+|-------------|------|--------|---------|
+| ev-benchmark-2024-06-15 | validation | EXPIRED | 180 days overdue |
+| ev-load-test-2024-06-20 | validation | EXPIRED | 175 days overdue |
+
+### FRESH (3 holons)
+- Use TypeScript for API (L2)
+- PostgreSQL for primary storage (L2)
+- JWT for auth tokens (L2)
+```
+
+## Three Ways to Handle Stale Evidence
+
+When you see stale evidence, you have three options:
+
+### 1. Refresh — Re-run the test
+
+**When:** The decision is still correct, you just need fresh proof.
+
+**How:** Run `/q3-validate` on the hypothesis to gather new evidence.
+
+```
+User: The Redis benchmark is old but Redis is still the right choice.
+      Refresh the evidence.
+
+Agent: Running validation for hypothesis-redis-caching...
+       [executes benchmark]
+       Evidence refreshed. Valid until 2025-06-21.
+```
+
+### 2. Deprecate — Reconsider the decision
+
+**When:** The world has changed. The decision itself is questionable.
+
+**How:** Tell the agent to deprecate the hypothesis. It gets downgraded one level (L2→L1 or L1→L0).
+
+```
+User: AWS announced they're deprecating this service.
+      Deprecate that decision.
+
+Agent: Deprecated: hypothesis-aws-service L2 → L1
+       This decision now needs re-evaluation.
+       Run /q1-hypothesize to explore alternatives.
+```
+
+### 3. Waive — Accept the risk temporarily
+
+**When:** You know it's stale, but you're shipping Tuesday and can't re-test everything right now.
+
+**How:** Tell the agent to waive the evidence with a deadline and reason.
+
+```
+User: We're launching next week. Waive the benchmark until January 15th,
+      we have a full test cycle scheduled for the 20th.
+
+Agent: Waiver recorded:
+       - Evidence: ev-benchmark-2024-06-15
+       - Waived until: 2025-01-15
+       - Rationale: Launch deadline. Full test cycle Jan 20.
+
+       Warning: Returns to EXPIRED after 2025-01-15.
+```
+
+**A waiver is not ignoring the problem.** It's explicitly documenting that you know about the risk and accept it until a specific date. The waiver goes in the audit log — who waived what, why, and until when.
+
+## Natural Language Usage
+
+You don't need to memorize evidence IDs or parameters. Just describe what you want.
+
+The agent sees the freshness report and understands context. When you say "waive the benchmark until February," it finds the right evidence ID and calls the tool for you.
+
+**These all work:**
+
+```
+"Waive everything until January 15th, we're launching"
+
+"The load test is only 2 weeks overdue, refresh it"
+
+"That API is being deprecated, deprecate our decision to use it"
+
+"Waive the security audit until the 15th with rationale: re-audit scheduled"
+```
+
+If you want to be explicit, you can:
+
+```
+/q-decay --waive ev-benchmark-2024-06-15 --until 2025-02-01 --rationale "Migration pending"
+```
+
+But natural language works fine.
+
+## The WLNK Principle
+
+A holon is **STALE** if *any* of its evidence is expired (and not waived).
+
+This is the Weakest Link (WLNK) principle. If you have three pieces of evidence and one is stale, the whole decision is questionable. You don't get to average it out.
+
+Think of it like a chain. Three strong links and one rusted link? The chain breaks at the rust.
+
+## Practical Workflows
+
+### Weekly Maintenance
+
+```
+/q-decay                    # What's stale?
+# For each item: refresh, deprecate, or waive
+```
+
+### Before a Release
+
+```
+/q-decay                    # Check for stale decisions
+# Either refresh evidence or explicitly waive with rationale
+# Waivers become part of release documentation
+```
+
+### After Major Changes
+
+Dependency update? API change? Security advisory?
+
+```
+/q-decay                    # What's affected?
+# Deprecate obsolete decisions
+# Start new hypothesis cycle for replacements
+```
+
+## Audit Trail
+
+All actions are logged:
+
+| Action | What's Recorded |
+|--------|----------------|
+| Deprecate | from_layer, to_layer, who, when |
+| Waive | evidence_id, until_date, rationale, who, when |
+
+You can always answer: "Who waived what and why?"
+
+## Summary
+
+- Evidence expires. This is normal.
+- `/q-decay` shows you what's stale.
+- **Refresh** if the decision is still right, you just need new proof.
+- **Deprecate** if the decision needs rethinking.
+- **Waive** if you accept the risk temporarily (with documented rationale).
+- Talk naturally — the agent handles the details.
diff --git a/docs/fpf-engine.md b/docs/fpf-engine.md
@@ -82,10 +82,19 @@ Compute trust scores using:
 | `/q-actualize` | Maintenance | Reconcile the knowledge base with recent code changes. |
 | `/q-reset` | Utility | Discard the current reasoning cycle. |
 
-### New Maintenance Commands
+### Maintenance Commands
 
-#### /q-decay (Evidence Decay)
-Over time, the evidence supporting your decisions can become stale. A benchmark from two years ago may not reflect the performance of a library today. This command implements the FPF principle of **Evidence Decay (B.3.4)**. It scans your evidence for expired `valid_until` dates and reports on the project's "Epistemic Debt"—the amount of risk you are carrying from outdated knowledge.
+#### /q-decay (Evidence Freshness)
+
+Evidence expires. A benchmark from six months ago might not reflect current performance. `/q-decay` shows you what's stale and gives you three options:
+
+- **Refresh** — Re-run tests to get fresh evidence
+- **Deprecate** — Downgrade the hypothesis if the decision needs rethinking
+- **Waive** — Accept the risk temporarily with documented rationale
+
+You can speak naturally: "waive the benchmark until February, we'll re-test after launch."
+
+See [Evidence Freshness](evidence-freshness.md) for the full guide.
 
 #### /q-actualize (Knowledge Reconciliation)
 This command serves as the **Observe** phase of the FPF's **Canonical Evolution Loop (B.4)**. It reconciles your documented knowledge with the current state of the codebase by: