docs: close Feature 143 - update all documentation

davidmatousek · claude · davidmatousek · commit 5181461ff688 · 2026-04-15T18:48:10.000-04:00
Product (PM): docs/product/02_PRD/INDEX.md (status to Delivered), docs/product/06_OKRs/README.md (Feature Delivery Log row F-143), docs/product/_backlog/BACKLOG.md (delivery flag) Architecture (Architect): docs/architecture/README.md (ADR-024 added to inventory), CLAUDE.md (Recent Changes prepended with Feature 143 entry) DevOps: no updates required (documentation-only ADR, zero CI/CD changes) KB: docs/INSTITUTIONAL_KNOWLEDGE.md (KB-032 — three-surface comparison pattern) Delivery doc: specs/143-maestro-aivss-evaluation-adr/delivery.md Cleanup: branch deleted (local + remote), 32 tasks complete + 1 N/A Closes umbrella MAESTRO compliance discovery (Phases 1-3 = 084, 141, 082; Phase 4 = 143). Follow-on: Issue #168 tracks AIVSS v1.0 + adopter watch. Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -98,6 +98,18 @@ When invoked as a subagent (via Agent tool), return ONLY:
 - Review `agent-assignments.md` for workload distribution
 
 ## Recent Changes
+- **Feature 143**: MAESTRO Phase 4 — OWASP AIVSS Evaluation ADR
+  - **Documentation-only spike** closing the MAESTRO compliance umbrella (Phases 1-3 delivered in Features 136, 141, 082). Zero production code changes, zero schema changes, zero example regenerations.
+  - **New ADR-024** (`docs/architecture/02_ADRs/ADR-024-owasp-aivss-evaluation.md`, Status: Accepted 2026-04-15) — records tachi's AIVSS posture as **diverge at present time** (Option C). The existing four-dimensional weighted-sum composite (`(0.35 × CVSS 3.1) + (0.30 × Exploitability) + (0.20 × Reachability) + (0.15 × Scalability)`) remains the canonical scoring model. AIVSS v0.8 is documented as a peer agentic-AI scoring framework that tachi is aware of and intentionally non-aligned with. Cross-references ADR-020 (MAESTRO classification), ADR-019 (shared cross-agent definitions), ADR-018 (baseline-aware scoring lineage).
+  - **Three-surface evaluation**: Surface A (dimension set) — `Conflict` on CVSS (3.1 vs v4.0), `Gap` on tachi Exploitability and Scalability, `No equivalent` on tachi Reachability and the AIVSS 10-AARF agentic amplification set. Surface B (composite formula) — tachi weighted-sum across 4 operational dimensions vs AIVSS amplification model (`AIVSS = (CVSS_Base + AARS) × Mitigation_Factor` where AARS consumes CVSS-to-10.0 headroom). The two formulas cannot produce equivalent scores even with identical CVSS inputs. Surface C (severity bands) — Critical/High/Medium/Low thresholds overlap (AIVSS v0.8 §3.5.2 adopts the CVSS convention tachi also uses); **Surface C is the single point of structural alignment between the two frameworks**.
+  - **Five-criteria justification** (maturity, adoption, compatibility, effort, compliance value): AIVSS v0.8 is pre-1.0 with public review opening 2026-04-16, no external adopter case studies, and would require CVSS 3.1→4.0 migration + composite formula restructuring. Decision weight rests heaviest on maturity (adopting a pre-1.0 framework into a stable pipeline introduces churn risk) and compatibility (structural divergence on Surfaces A and B).
+  - **Re-evaluation triggers**: tachi will re-evaluate when AIVSS publishes a stable v1.0 *and* at least one external adopter ships a case study. Neither condition currently holds.
+  - **Skill update**: `.claude/skills/tachi-risk-scoring/SKILL.md` gained a new `## AIVSS Relationship` section (80-200 words) with a relative link to ADR-024 — serves as the runtime-adjacent pointer for skill consumers to the canonical ADR decision. Decision-noun consistency (SC-007) verified between ADR-024 and SKILL.md.
+  - **Architecture component reference**: added to `docs/architecture/01_system_design/README.md` under "Feature 143" — documents the three surfaces touched (ADR-024, SKILL.md section, conditional Issue) with the additive-only architectural posture. No existing agent, schema, script, or example file is modified.
+  - **Option C specifics**: because the decision is **diverge** (not Option A adoption or Option B supplementary field), no follow-on implementation Issue was filed per FR-007 conditionality. T023 (conditional Issue creation) was marked N/A.
+  - **Standards addition**: CVSS 3.1 remains the canonical base (no upgrade to CVSS v4.0); AIVSS v0.8 is documented as a peer framework in the ADR, not listed in the Tech Stack as a dependency.
+  - **MAESTRO compliance umbrella closed**: Feature 084 (Phase 1: classification) + Feature 141 (Phase 2: cross-layer chains) + Feature 136 (Phase 3 correctness fix) + Feature 082 (detection tier lean refactor with MAESTRO ownership governance) + Feature 143 (Phase 4: AIVSS posture) together complete tachi's MAESTRO alignment stance.
+  - 32 tasks completed + 1 N/A (T023 skipped — Option C chosen). Governance: PM + Architect + Team-Lead sign-off. PR #167 squash-merged to main 2026-04-15.
 - **Feature 129**: Attack Tree Delta Sub-Agent
   - Extracted attack tree generation and delta reconciliation from the threat-report parent agent into a focused leaf sub-agent `tachi-attack-tree-delta` (`.claude/agents/tachi/attack-tree-delta.md`). Establishes the first parent-leaf agent decomposition in the tachi pipeline — threat-report remains the single entry point (Phase 5 dispatch unchanged); the delta sub-agent is invoked only by threat-report with four atomic inputs (Critical/High findings, delta_counts, baseline dir, output dir) and returns a structured JSON manifest. No orchestrator-level wiring changes.
   - Tools scope (least-privilege): `Read`, `Write`, `Glob`, `Grep`. Writes are confined to `attack-trees/` directory — sub-agent cannot modify `threats.md`, `threat-report.md`, or any file outside the attack tree subtree.
diff --git a/docs/INSTITUTIONAL_KNOWLEDGE.md b/docs/INSTITUTIONAL_KNOWLEDGE.md
@@ -663,6 +663,29 @@ Captured during structured delivery retrospective. Smooth sailing — everything
 
 ---
 
+### KB-032: Three-Surface Comparison Is a Reusable Pattern for External Scoring Framework Evaluation
+
+**Date**: 2026-04-15
+**Category**: Architecture / Decision Pattern
+**Source**: Feature 143 delivery retrospective
+**Severity**: Low (positive pattern)
+
+**Problem**: When evaluating whether tachi should adopt an external scoring framework (e.g., OWASP AIVSS), a single-axis comparison ("is the formula compatible?") collapses three structurally different concerns into one prose argument and obscures which axis actually drives the decision. Reviewers cannot replay the comparison, and the ADR ages poorly because no one can re-derive the decision from the summary.
+
+**Root Cause**: N/A — positive pattern observation from Feature 143's ADR-024 evaluation of OWASP AIVSS.
+
+**Solution**: Decompose the evaluation across **three orthogonal surfaces**: (1) **Dimension space** — what dimensions does each model score and how do they map across (Conflict / Gap / No equivalent)? (2) **Formula shape** — is the composite function a weighted sum, a product, or a different aggregation? (3) **Severity bands** — do the threshold ranges align even when the underlying numerics differ? Each surface produces an independent "compatible / incompatible / partial" verdict. The aggregate decision (Adopt-Primary / Adopt-Supplementary / Diverge) is justified by citing the per-surface verdicts. ADR-024 used this pattern to document tachi's divergence from AIVSS v0.8 with reproducible reasoning.
+
+**Result**: ADR-024 produced a CISO-readable Decision section that answers "does tachi align with AIVSS?" in one paragraph, backed by a worked-example table showing per-surface findings. Architect approval cited the three-surface decomposition as making the rationale auditable. The pattern is reusable for any future framework-evaluation ADR (CVSS variants, future AI risk frameworks, alternative composite scoring schemes).
+
+**When to Apply**: Any ADR that evaluates an external scoring framework, taxonomy, or numeric model against a tachi-native equivalent. Use the three-surface decomposition as the structural backbone of the Alternatives Considered and Decision sections. Combine with a "When to Re-Evaluate" trigger clause (concrete external event — version release, adopter case study, regulatory citation) so the ADR has a built-in expiration condition rather than aging silently.
+
+**Tags**: #architecture #decision-pattern #adr #scoring #framework-evaluation #three-surface #feature-143
+
+**Quality Score**: 7/10
+
+---
+
 ## Bug Fixes
 
 *No entries yet. Use `/kb-create` to add the first bug fix.*
diff --git a/docs/architecture/README.md b/docs/architecture/README.md
@@ -46,6 +46,7 @@ Significant technical decisions with context and trade-offs
 - `ADR-021-source-date-epoch-for-deterministic-pdf-comparison.md` - SOURCE_DATE_EPOCH reproducible-builds convention for byte-deterministic PDF baseline comparison (Feature 128)
 - `ADR-022-mmdc-hard-prerequisite.md` - `mmdc` (Mermaid CLI) as hard prerequisite gated on attack-tree detection; establishes fail-loud-on-missing-CLI posture with defense-in-depth preflight gates — first ADR governing CLI-prerequisite posture (Feature 130)
 - `ADR-023-threat-agent-skill-references-pattern.md` - Detection variant of the lean + skill references pattern as a sibling to the methodology variant; single-point `**MANDATORY**: Read` at detection start for all 11 threat agents (6 STRIDE + 5 AI); MAESTRO classification remains orchestrator-owned; shared reference edits are additive-only; `finding-format-shared.md` gains "For Threat Agents" producer section. Completes the lean-agent architecture migration for all 17 tachi agents (Feature 082)
+- `ADR-024-owasp-aivss-evaluation.md` - **Accepted (2026-04-15)**: tachi diverges from OWASP AIVSS v0.8 at the present time; existing four-dimensional weighted-sum composite (`0.35 × CVSS 3.1 + 0.30 × Exploitability + 0.20 × Reachability + 0.15 × Scalability`) remains the canonical scoring model. Records the three-surface evaluation (dimension set, composite formula, severity bands) against AIVSS v0.8, establishes structural divergence on Surfaces A (dimensions) and B (formula) with alignment only on Surface C (severity bands), and defines re-evaluation triggers (AIVSS v1.0 + at least one external adopter case study). Cross-references ADR-020 (MAESTRO classification), ADR-019 (shared definitions), ADR-018 (baseline-aware scoring lineage). Closes the MAESTRO compliance umbrella (Phases 1-3 delivered in Features 136, 141, 082). Documentation-only spike — zero schema, script, or pipeline changes (Feature 143)
 - `ADR-NNN-decision-title.md` - Individual ADRs
 
 ### 03_patterns/
diff --git a/docs/product/02_PRD/INDEX.md b/docs/product/02_PRD/INDEX.md
@@ -1,12 +1,12 @@
 # PRD Index
 
-**Last Updated**: 2026-04-14
+**Last Updated**: 2026-04-15
 **Legend**: ✓=APPROVED, ⚠=APPROVED_WITH_CONCERNS, 🔄=CHANGES_REQUESTED, ⛔=BLOCKED, ⚠⚡=OVERRIDDEN
 
 
 | # | Feature | PM | Architect | Team-Lead | Status | Date |
 |---|---------|----|-----------|-----------| -------|------|
-| 143 | [MAESTRO Phase 4: OWASP AIVSS Evaluation ADR](143-maestro-aivss-evaluation-adr-2026-04-14.md) | ⚠ | ⚠ | ⚠ | Approved | 2026-04-14 |
+| 143 | [MAESTRO Phase 4: OWASP AIVSS Evaluation ADR](143-maestro-aivss-evaluation-adr-2026-04-14.md) | ⚠ | ⚠ | ⚠ | Delivered | 2026-04-15 |
 | 129 | [Attack Tree Delta Sub-Agent](129-attack-tree-delta-sub-agent-2026-04-13.md) | ✓ | ⚠ | ⚠ | Delivered | 2026-04-14 |
 | 141 | [MAESTRO Phase 2: Cross-Layer Attack Chains](141-maestro-cross-layer-attack-chains-2026-04-12.md) | ✓ | ⚠ | ⚠ | Delivered | 2026-04-12 |
 | 082 | [Threat Agent Skill References](082-threat-agent-skill-references-2026-04-11.md) | ✓ | ⚠ | ⚠ | Delivered | 2026-04-11 |
diff --git a/docs/product/06_OKRs/README.md b/docs/product/06_OKRs/README.md
@@ -173,3 +173,4 @@ OKRs align the team around measurable goals. They answer:
 | 2026-04-10 | F-136: MAESTRO Canonical Layer Correctness Fix | [136](../02_PRD/136-maestro-canonical-layer-correctness-fix-2026-04-10.md) | Correctness fix aligning tachi's MAESTRO seven-layer taxonomy with the canonical CSA Ken Huang reference. Three layers renamed: L5 Security → L5 Evaluation and Observability, L6 Agent Ecosystem → L6 Security and Compliance, L7 User Interface → L7 Agent Ecosystem. Acronym expansion corrected ("Multi-Agent Environment, Security, Threat, Risk, and Outcome"). Schema version bumped 1.2 → 1.3 with documented old → new enum migration path in CHANGELOG. Keyword sets rebalanced: new L5 observability keywords (audit log, monitoring, anomaly detection, SIEM, forensics, telemetry), L6 retains security keywords (auth, WAF, guardrail, RBAC), L7 merges agent-ecosystem and user-facing keywords. Typst template three-way divergence ("Integration Services" L6 bug) corrected. Wave 0 pre-edit grep discovery report committed for audit trail. All six example outputs regenerated; five non-agentic-app baselines byte-deterministic under SOURCE_DATE_EPOCH=1700000000 with test_backward_compatibility.py passing. Observability components (audit loggers, SIEM, anomaly detection) now have a canonical L5 Evaluation and Observability home instead of being misrouted to Security or lost in Unclassified. ADR-020 amended with revision note. 45/45 tasks complete; 8 user stories delivered. Release-please will cut v4.10.0 minor release (feat(136) prefix). |
 | 2026-04-11 | F-082: Threat Agent Skill References | [082](../02_PRD/082-threat-agent-skill-references-2026-04-11.md) | Completes the lean-agent architecture for all 17 tachi agents by migrating the remaining 11 threat detection agents (6 STRIDE + 5 AI) from self-contained inline shape to lean + skill references pattern. STRIDE agents reduced from 113-141 lines to 50-54 lines; AI agents reduced from 167-201 lines to 78-114 lines — every agent within FR-10 tier caps (STRIDE ≤120, AI ≤150, hard cap ≤180). 11 new companion skill directories created at `.claude/skills/tachi-<name>/references/` (spoofing, tampering, repudiation, info-disclosure, denial-of-service, privilege-escalation, prompt-injection, data-poisoning, model-theft, tool-abuse, agent-autonomy), each hosting a `detection-patterns.md` reference file loaded via a single `**MANDATORY**: Read` directive at detection start (new "detection variant" of the lean pattern, sibling to the methodology variant used by control-analyzer). Enrichment floor cleared: +30 new pattern categories added across the 11 agents against a ≥22 aggregate floor — +8 margin. Source attribution: OWASP Top 10 2021, OWASP LLM Top 10 2025, OWASP AI Exchange, MITRE ATT&CK v15+, MITRE ATLAS v5.1+ (including Oct 2025 agent techniques AML.T0058-T0062 — context poisoning, memory corruption, agent-in-the-middle, excessive agency runtime, cascading agent failures), CWE Top 25 2024, NIST AI 600-1. New ADR-023 records the sibling detection variant as a second documented lean-agent shape. Shared reference `finding-format-shared.md` gains a "For Threat Agents" producer section (additive-only); OWASP 3×3 risk matrix now lives in exactly one canonical file (`severity-bands-shared.md`, normalized to Unicode ×). T057 live regeneration on agentic-app confirmed +8 new findings (22 baseline → 30). Zero new runtime dependencies (SC-014 — empty diff on `pyproject.toml`, `requirements*.txt`, `package.json`). 68 tasks across 18 waves; 5 user stories delivered. PR #151 merged via squash (commit 6f9a40d). |
 | 2026-04-12 | F-141: MAESTRO Phase 2 — Cross-Layer Attack Chain Analysis | [141](../02_PRD/141-maestro-cross-layer-attack-chains-2026-04-12.md) | Cross-layer attack chain correlation engine identifying multi-layer MAESTRO attack paths from threat findings. New attack-chains.md artifact, threat report Section 6 narrative, PDF chain diagram pages, schema additions (attack-chain.yaml), parser additions (tachi_parsers.py), 800+ lines test coverage. Transforms tachi from "STRIDE tool with MAESTRO labels" to "full MAESTRO implementation" with canonical CSA cross-layer deliverables. 6 user stories delivered across 7 implementation waves. |
+| 2026-04-15 | F-143: MAESTRO Phase 4 — OWASP AIVSS Evaluation ADR | [143](../02_PRD/143-maestro-aivss-evaluation-adr-2026-04-14.md) | Documentation-only ADR spike evaluating OWASP AIVSS v0.8 against tachi's four-dimensional composite scoring model across three surfaces (dimensions, composite formula weights, severity band thresholds). Decision: **diverge** from AIVSS at present time — tachi's existing `(0.35 × CVSS 3.1) + (0.30 × Exploitability) + (0.20 × Reachability) + (0.15 × Scalability)` composite remains canonical; AIVSS v0.8 documented as a peer agentic-AI scoring framework tachi is aware of and intentionally non-aligned with. New **ADR-024** (`docs/architecture/02_ADRs/ADR-024-owasp-aivss-evaluation.md`, Status: Accepted) delivers the three-surface side-by-side comparison with Overlap/Gap/Conflict/No equivalent row labels, two worked examples quantifying score divergence, explicit *When to Re-Evaluate* clause (AIVSS v1.0 + at least one external adopter case study), and cross-references to ADR-020 (MAESTRO classification), ADR-019 (shared cross-agent definitions), and ADR-018 (baseline-aware scoring lineage). Companion 80–200 word **AIVSS Relationship** section added to `.claude/skills/tachi-risk-scoring/SKILL.md` reflecting the ADR decision (cross-surface consistency verified — both surfaces use "diverge"). **Zero production code changes** — no schemas, scripts, agents, example outputs, or pipeline dependencies modified. Closes umbrella MAESTRO compliance discovery [#136](https://github.com/davidmatousek/tachi/issues/136); Phases 1–3 delivered in F-136, F-141, F-082. Conditional follow-on (T023) N/A — Option C (diverge) chosen, no adopt-path implementation feature filed. 32 tasks complete + 1 N/A (T023 conditionally skipped per decision). 4 user stories delivered. PR #167 squash-merged to main 2026-04-15. |
diff --git a/docs/product/_backlog/BACKLOG.md b/docs/product/_backlog/BACKLOG.md
@@ -1,13 +1,14 @@
 # Backlog
 
-> Auto-generated from GitHub Issues on 2026-04-15T22:07:10Z.
+> Auto-generated from GitHub Issues on 2026-04-15T22:46:48Z.
 > Source of truth: GitHub Issues with `stage:*` labels.
 > Regenerate: `/aod.status` or `.aod/scripts/bash/backlog-regenerate.sh`
 
 ## Discover
 
 | # | Title | ICE | Evidence | Updated |
 |---|-------|-----|----------|---------|
+| #168 | Track OWASP AIVSS v1.0 release and first external adopter case study | Impact: —, Confidence: —, Effort: — = **Not yet scored** | Retrospective: Emerged during delivery of Feature 143 (MA... | 2026-04-15 |
 | #145 | MAESTRO canonical worked example: multi-agent reference architecture in examples/ | Impact: 5, Confidence: 8, Effort: 7 = **20** | Identified during MAESTRO compliance audit alongside #136... | 2026-04-10 |
 | #144 | MAESTRO companion: NIST AI RMF integration evaluation ADR | Impact: 6, Confidence: 7, Effort: 7 = **20** | Identified during MAESTRO compliance audit alongside #136... | 2026-04-10 |
 | #142 | MAESTRO Phase 3: Agentic threat pattern expansion (Collusion, Emergent Behavior, Temporal, Trust, Communication, Resource) | Impact: 7, Confidence: 6, Effort: 5 = **18** | Follow-on from #136 per PM recommendation. Canonical MAES... | 2026-04-10 |
@@ -40,7 +41,7 @@
 
 | # | Title | Delivered | Retro | Updated |
 |---|-------|-----------|-------|---------|
-| — | *No items in this stage* | | |
+| #143 | MAESTRO Phase 4: OWASP AIVSS evaluation ADR | 2026-04-15 | — | 2026-04-15 |
 
 ## Untracked
 
@@ -49,7 +50,6 @@
 | # | Title | State | Updated |
 |---|-------|-------|---------|
 | #154 | fix: PDF security report — attack tree images missing, MAESTRO headings empty, landscape infographic whitespace | CLOSED | 2026-04-12 |
-| #143 | MAESTRO Phase 4: OWASP AIVSS evaluation ADR | OPEN | 2026-04-15 |
 | #27 | Developer Guide: Automated Threat Modeling for Your Architecture | CLOSED | 2026-03-24 |
 | #18 | Feature: Threat Infographic Agent | CLOSED | 2026-03-23 |
 | #15 | Feature 007: Threat Report Agent & Attack Trees | CLOSED | 2026-03-23 |
diff --git a/specs/143-maestro-aivss-evaluation-adr/delivery.md b/specs/143-maestro-aivss-evaluation-adr/delivery.md