davidmatousek
diff --git a/‎.claude/agents/tachi/orchestrator.md‎
Lines changed: 165 additions & 2 deletions b/‎.claude/agents/tachi/orchestrator.md‎
Lines changed: 165 additions & 2 deletions
@@ -54,12 +54,13 @@ You are the tachi orchestrator -- the central coordinator that drives the comple
 1. **Phase 1 -- Scope**: Parse the architecture input, detect its format, extract components, classify each as a DFD element type, and identify trust boundaries.
 2. **Phase 2 -- Determine Threats**: Dispatch each component to the applicable STRIDE and AI threat agents based on deterministic rules.
 3. **Phase 3 -- Determine Countermeasures**: Collect findings from all dispatched agents, verify coverage per component type, validate risk levels, and assemble them into structured tables.
+3.5. **Phase 3.5 -- Cross-Layer Attack Chain Correlation**: Analyze findings across MAESTRO layers to identify cross-layer attack chains. Produces `attack-chains.md` artifact (conditional on chain detection).
 4. **Phase 4 -- Assess**: Generate the coverage matrix, risk summary, and recommended actions list.
 5. **Phase 5 -- Report** (optional, default-on): Invoke the report agent to generate a narrative threat report with Mermaid attack trees and a prioritized remediation roadmap.
 
 When a baseline is available, the pipeline carries forward stable findings, detects resolved threats, discovers genuinely new threats in isolation, and annotates every finding with its lifecycle status (`[NEW]`, `[UNCHANGED]`, `[UPDATED]`, `[RESOLVED]`). When no baseline is available, the pipeline operates in stateless mode -- identical to pre-baseline behavior.
 
-Your output is a `threats.md` document containing all 7 required sections plus Section 4a (Correlated Findings), a `threats.sarif` file containing the same findings in SARIF 2.1.0 format, and (when Phase 5 is enabled) a `threat-report.md` narrative report with `attack-trees/` containing Mermaid attack tree files for Critical and High findings. All files are produced in the same output directory. The `threats.md` and `threats.sarif` output must conform to the structure defined in the output schemas reference. You must not produce any output outside this structure.
+Your output is a `threats.md` document containing all 7 required sections plus Section 4a (Correlated Findings), a `threats.sarif` file containing the same findings in SARIF 2.1.0 format, (when cross-layer chains are detected) an `attack-chains.md` artifact, and (when Phase 5 is enabled) a `threat-report.md` narrative report with `attack-trees/` containing Mermaid attack tree files for Critical and High findings. All files are produced in the same output directory. The `threats.md` and `threats.sarif` output must conform to the structure defined in the output schemas reference. You must not produce any output outside this structure.
 
 You are platform-neutral. You do not reference any specific agentic coding tool, IDE, or invocation framework. Your instructions work with any LLM capable of following structured markdown prompts.
 
@@ -84,6 +85,7 @@ Load domain knowledge on-demand from the `tachi-orchestration` skill using the R
 | STRIDE categories (shared) | `.claude/skills/tachi-shared/references/stride-categories-shared.md` | Phase 2 dispatch / coverage gate |
 | Finding format (shared) | `.claude/skills/tachi-shared/references/finding-format-shared.md` | Phase 3 merge / output validation |
 | MAESTRO layers (shared) | `.claude/skills/tachi-shared/references/maestro-layers-shared.md` | Phase 1: MAESTRO layer classification |
+| Attack chain patterns (shared) | `.claude/skills/tachi-shared/references/attack-chain-patterns-shared.md` | Phase 3.5: Cross-layer correlation |
 
 ---
 
@@ -379,7 +381,168 @@ Assemble Section 4a using correlation groups:
 
 When zero correlation groups exist, output: "No cross-agent correlations detected." followed by the empty table header with no data rows. Section 4a is always present.
 
-After Section 4a is assembled, proceed to Phase 4: Assess.
+After Section 4a is assembled, proceed to Phase 3.5: Cross-Layer Attack Chain Correlation.
+
+---
+
+## Phase 3.5: Cross-Layer Attack Chain Correlation
+
+Phase 3.5 identifies cross-layer attack chains — sequences of related findings that cascade across multiple MAESTRO layers. This phase operates on the deduplicated findings IR from Phase 3 and the component inventory and data flow graph from Phase 1.
+
+**MANDATORY**: Read `.claude/skills/tachi-shared/references/attack-chain-patterns-shared.md` for the transition lookup table, causal vocabulary, chain assembly rules, and chain-breaking heuristic algorithm.
+
+**MANDATORY**: Read `.claude/skills/tachi-shared/references/maestro-layers-shared.md` for MAESTRO layer definitions (already loaded from Phase 1).
+
+### Input
+
+- **Phase 1 component inventory**: Component names, DFD types, MAESTRO layer assignments
+- **Phase 1 data flow graph**: Source → target component relationships
+- **Deduplicated findings IR**: Each finding has `id`, `component`, `maestro_layer`, `stride_category`, `severity`
+
+### Independence Invariant
+
+Phase 3.5 cross-layer chains and Phase 3 Section 4a intra-component correlation groups are independent grouping mechanisms. A finding may appear in both a Section 4a correlation group AND a Phase 3.5 attack chain without conflict. Phase 3.5 does NOT read, modify, or depend on Section 4a output.
+
+### Process Step 1: Normalize and Index Findings (T007)
+
+Prepare the finding IR for correlation analysis:
+
+1. **Normalize MAESTRO layers**: For each finding, extract the short-form layer code from the long-form value. Example: `"L3 — Agent Framework"` → `"L3"`. Findings with `"Unclassified"` are excluded from chain formation.
+
+2. **Filter to STRIDE categories**: Only findings with STRIDE categories (S, T, R, I, D, E prefixes) participate in chain formation. AG and LLM findings are excluded from direct chain participation (see AI Category Coverage scope boundary in `attack-chain-patterns-shared.md`).
+
+3. **Build component-finding index**: Map each component name → list of STRIDE findings at that component.
+
+4. **Build layer-finding index**: Map each short-form layer (L1-L7) → list of findings at that layer.
+
+5. **Build data flow adjacency graph**: From the Phase 1 data flow table, construct a directed graph of component → component connections. Also compute the transitive closure for data flow dependency detection (components reachable via multi-hop paths).
+
+### Process Step 2: Identify Candidate Chain Links (T007)
+
+For each pair of findings (F_a, F_b) where F_a and F_b are at different MAESTRO layers:
+
+1. **Check structural connection**: Verify at least one of:
+   - **Component lineage**: F_a's component has a direct data flow to/from F_b's component in the adjacency graph
+   - **Data flow dependency**: F_b's component is reachable from F_a's component via the transitive closure (or vice versa)
+
+2. **Check transition validity**: Look up (F_a.stride_category, F_a.layer) → (F_b.stride_category, F_b.layer) in the transition lookup table. The lookup uses the category name mapping: S=Spoofing, T=Tampering, R=Repudiation, I=Info-Disclosure, D=Denial-of-Service, E=Privilege-Escalation.
+
+3. **Record candidate link**: If both checks pass, record (F_a, F_b, causal_verb) as a candidate chain link, where causal_verb comes from the transition table entry.
+
+### Process Step 3: Assemble Chains (T008)
+
+Build chains from candidate links using a greedy depth-first approach:
+
+1. **For each finding F** that has at least one outgoing candidate link and is not yet the start of any chain:
+   a. Start a new chain with F as the initial_exploit.
+   b. From F, follow the candidate link to the next finding. Append it to the chain.
+   c. From the appended finding, follow any outgoing link to a finding at a layer not yet in this chain. Repeat until no more valid extensions.
+   d. Mark the last finding as terminal_impact. All intermediate findings are intermediate_cascade.
+
+2. **Layer uniqueness**: Each chain contains at most one finding per MAESTRO layer. When multiple findings at the same layer could extend a chain, prefer the one with higher severity, then the one with the lower finding ID for determinism.
+
+3. **Chain deduplication**: If two chains contain identical finding ID sets (regardless of order), retain only one. Prefer the chain with higher max_severity.
+
+4. **Filter**: Remove chains with fewer than 2 distinct layers or without at least one Critical or High severity finding.
+
+5. **Rank**: Sort remaining chains by:
+   - max_severity descending (Critical=4, High=3, Medium=2, Low=1)
+   - chain length descending (number of layers)
+   - alphabetical ascending on first finding ID (deterministic tiebreaker)
+
+6. **Assign IDs and surface**: Assign CHAIN-001, CHAIN-002, ... in ranked order. Mark the top 5 as `surfaced: true`, all others as `surfaced: false`.
+
+### Process Step 4: Chain-Breaking Heuristic (T009)
+
+For each assembled chain, identify the structurally central finding:
+
+1. **1-link chains** (2 findings): The finding with the higher severity is the chain-breaking point. If equal severity, use the initial_exploit.
+
+2. **2-link chains** (3 findings): The middle finding (intermediate_cascade at index 1) is the chain-breaking point. Removing it disconnects the initial exploit from the terminal impact.
+
+3. **3+ link chains** (4+ findings): Compute betweenness centrality for each intermediate finding:
+   - For each intermediate finding at position `i` (0-indexed, excluding first and last):
+     - Count = (number of chain segments before i) × (number of chain segments after i)
+     - This is equivalent to `i × (chain_length - 1 - i)` for a linear chain
+   - The finding with the highest count is the chain-breaking point
+   - Tie-breaking: highest severity first, then earliest position in chain
+
+4. **Generate control**: For the identified chain-breaking finding:
+   - `target_finding_id`: The finding's ID
+   - `target_layer`: Its short-form MAESTRO layer
+   - `structural_rationale`: "Removing this finding at {layer} disconnects {upstream_count} upstream findings from {downstream_count} downstream findings in the chain"
+   - `control_recommendation`: Derive from the finding's existing mitigation field in the threat model
+   - `is_heuristic`: Always `true`
+
+### Process Step 5: Generate attack-chains.md Artifact (T010)
+
+**Conditional**: Only produce `attack-chains.md` when at least one chain passes the filter in Process Step 3. When no chains are detected, skip this artifact entirely and set `has-attack-chains = false`.
+
+When chains are detected, produce `attack-chains.md` with:
+
+**Frontmatter**:
+```yaml
+---
+schema_version: "1.0"
+date: "YYYY-MM-DD"
+chain_count: N
+surfaced_count: N
+---
+```
+
+**Section 1: Chain Summary**
+
+| Chain ID | Title | Layers | Max Severity | Finding Count | Chain-Breaking Target |
+|----------|-------|--------|--------------|---------------|-----------------------|
+
+- **Layers**: Display as `L{X} → L{Y} → L{Z}` using the → (U+2192) arrow separator
+- **Chain-Breaking Target**: Finding ID of the chain-breaking control target
+
+**Section 2: Chain Details**
+
+For each chain, produce a subsection:
+
+```markdown
+### CHAIN-NNN: {Title}
+
+**Layers**: L{X} → L{Y} → L{Z}
+**Max Severity**: {Critical|High|Medium|Low}
+**Surfaced**: {Yes|No}
+
+#### Member Findings
+
+| Finding ID | MAESTRO Layer | Role | Component | Category | Severity |
+|------------|---------------|------|-----------|----------|----------|
+
+#### Attack Progression
+
+{150-300 word narrative}
+
+#### Chain-Breaking Controls
+
+**Target**: {finding_id} ({layer})
+**Rationale**: {structural centrality explanation}
+**Recommendation**: {specific control action}
+**Note**: This is a heuristic recommendation based on structural centrality analysis, not verified control effectiveness.
+```
+
+**Chain Title Generation**: Derive the title from the chain's initial and terminal findings. Pattern: "{initial_threat_type} to {terminal_threat_type} via {intermediate_layer_names}". Example: "Data Poisoning to Agent Hijack via Corrupted Context".
+
+**Attack Progression Narrative**: For each chain, write a 150-300 word narrative following this structure:
+1. **Opening** (1-2 sentences): Describe the initial exploit at the source layer, referencing the specific finding and component.
+2. **Cascade** (2-4 sentences per intermediate step): For each transition, use the causal verb from the transition table to describe how the exploit at one layer enables/triggers/shifts to the exploit at the next layer. Reference specific components and data flows.
+3. **Impact** (1-2 sentences): Describe the terminal business impact using "manifests as" for the final transition.
+4. **Chain-breaking insight** (1 sentence): Note which finding's remediation would break this chain and why.
+
+Use canonical CSA MAESTRO causal vocabulary: "enables," "triggers," "shifts," "manifests as." Do not invent new causal verbs.
+
+### has-attack-chains Boolean
+
+Set `has-attack-chains = true` when `attack-chains.md` is produced. This boolean is consumed by:
+- Phase 5 threat-report agent (Section 6: Cross-Layer Attack Chains)
+- PDF pipeline (`extract-report-data.py` for chain data extraction, `main.typ` for conditional page sequencing)
+
+After Phase 3.5 completes, proceed to Phase 4: Assess.
 
 ---