fix(074): baseline auto-detection and downstream command context limits (#102)

davidmatousek · claude · web-flow · commit 5aeb10bf24ef · 2026-04-08T16:45:55.000-04:00
Baseline auto-detection looked for threats.md in the output directory,
but /threat-model creates a fresh timestamped subfolder per run so the
directory is always empty. Auto-detection now scans the parent directory
for the most recent sibling containing a threats.md.

Also fixed /risk-score and /compensating-controls embedding full file
contents in agent prompts, which exceeded subagent context limits on
large threat models (61+ findings). Both commands now pass file paths
and let agents read on-demand via their existing Read tool access.

Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/agents/tachi/orchestrator.md b/.claude/agents/tachi/orchestrator.md
@@ -117,7 +117,7 @@ This optional phase detects whether a previous pipeline output exists and loads
 
 ### Baseline Detection
 
-Locate a previous pipeline output using two methods in priority order: (1) **explicit flag** `--baseline <path>` pointing to a valid `threats.md`, (2) **auto-detection** of an existing `threats.md` in the output directory.
+Locate a previous pipeline output using two methods in priority order: (1) **explicit flag** `--baseline <path>` pointing to a valid `threats.md`, (2) **auto-detection** by scanning the output directory's **parent** for sibling directories containing a `threats.md`. Since each run creates a unique timestamped subfolder (e.g., `docs/security/2026-04-08T15-16-21/`), auto-detection lists all sibling directories in the parent (e.g., `docs/security/`), sorts them lexicographically descending (ISO timestamps sort naturally), skips the current run's directory, and uses the `threats.md` from the most recent previous directory.
 
 **If neither method finds a baseline**: Set `baseline_present = false` and proceed to Phase 1 in stateless mode. **If a baseline file is found**: Validate it is parseable with YAML frontmatter. If corrupted, log a warning and proceed in stateless mode. The pipeline must never block on a bad baseline.
 
diff --git a/.claude/commands/compensating-controls.md b/.claude/commands/compensating-controls.md
@@ -88,40 +88,34 @@ Single-command entry point for tachi compensating controls analysis — the thir
 
 ## Step 2: Run Control Analysis
 
-1. Read the risk score input file at `{input_file}`.
+**IMPORTANT**: Do NOT read or embed the input files in the agent prompt. The control-analyzer agent has Read tool access and will load files on-demand to manage its own context window. Pass file **paths**, not file **contents**.
 
-2. If `architecture_path` is not null, read the architecture file.
-
-3. Invoke the `tachi-control-analyzer` agent with the following prompt:
+1. Invoke the `tachi-control-analyzer` agent with the following prompt:
 
    ```
-   Analyze the following scored threat findings against the target codebase to detect
-   existing security controls, classify each threat, recommend remediation for gaps,
-   and calculate residual risk. Execute your complete 6-phase analysis pipeline
-   (internal to the control-analyzer agent, not the threat-model command pipeline):
+   Analyze scored threat findings against the target codebase to detect existing
+   security controls, classify each threat, recommend remediation for gaps, and
+   calculate residual risk. Execute your complete 6-phase analysis pipeline:
    Phase 1 (Parse Input) → Phase 2 (Discover Codebase) → Phase 3 (Detect Controls) →
    Phase 4 (Map & Classify) → Phase 5 (Recommend & Calculate Residual Risk) →
    Phase 6 (Generate Output).
 
-   Write all output files to: {output_dir}
-   - compensating-controls.md
-   - compensating-controls.sarif
-
+   Input file: {absolute path to input_file}
    Input format: {input_format}
-   Analysis date: {current date YYYY-MM-DD}
+   Architecture file: {absolute path to architecture_path, or "none"}
    Target codebase: {target_path}
+   Output directory: {output_dir}
+   Analysis date: {current date YYYY-MM-DD}
 
-   <risk-score-input>
-   {contents of input file}
-   </risk-score-input>
+   Read the input file yourself using the Read tool. For large inputs,
+   read in sections to manage context.
 
-   {if architecture_path is not null:}
-   <architecture-input>
-   {contents of architecture file}
-   </architecture-input>
+   Write output files:
+   - compensating-controls.md
+   - compensating-controls.sarif
    ```
 
-4. Wait for the control-analyzer agent to complete all 6 pipeline phases.
+2. Wait for the control-analyzer agent to complete all 6 pipeline phases.
 
 ## Step 3: Report Results
 
diff --git a/.claude/commands/risk-score.md b/.claude/commands/risk-score.md
@@ -72,35 +72,31 @@ Single-command entry point for tachi quantitative risk scoring. Validates prereq
 
 ## Step 2: Run Risk Scoring
 
-1. Read the input file at `{input_file}`.
+**IMPORTANT**: Do NOT read or embed the input files in the agent prompt. The risk-scorer agent has Read tool access and will load files on-demand to manage its own context window. Pass file **paths**, not file **contents**.
 
-2. If `architecture_path` is not null, read the architecture file.
-
-3. Invoke the `tachi-risk-scorer` agent with the following prompt:
+1. Invoke the `tachi-risk-scorer` agent with the following prompt:
 
    ```
-   Score the following threat model output using your complete scoring pipeline
+   Score the threat model output using your complete scoring pipeline
    (Threat Parsing → Trust Zone Extraction → Dimensional Scoring → Composite
    Calculation → Governance Fields → Output Generation).
 
-   Write all output files to: {output_dir}
-   - risk-scores.md
-   - risk-scores.sarif
-
+   Input file: {absolute path to input_file}
    Input format: {input_format}
+   Architecture file: {absolute path to architecture_path, or "none"}
+   Output directory: {output_dir}
    Scoring date: {current date YYYY-MM-DD}
 
-   <threat-model-input>
-   {contents of input file}
-   </threat-model-input>
+   Read the input file yourself using the Read tool. For large threat models,
+   read in sections: parse finding tables (Sections 3, 4, 4a) and trust zones
+   (Section 2) first. You do not need to load the full file at once.
 
-   {if architecture_path is not null:}
-   <architecture-input>
-   {contents of architecture file}
-   </architecture-input>
+   Write output files:
+   - risk-scores.md
+   - risk-scores.sarif
    ```
 
-4. Wait for the risk-scorer agent to complete all 6 of its internal analysis phases.
+2. Wait for the risk-scorer agent to complete all 6 of its internal analysis phases.
 
 ## Step 3: Report Results
 
diff --git a/.claude/commands/threat-model.md b/.claude/commands/threat-model.md
@@ -28,10 +28,23 @@ Consider user input before proceeding (if not empty).
 
 7. Generate a unique run folder:
    - Compute timestamp: `YYYY-MM-DDTHH-MM-SS` (e.g., `2026-03-25T14-30-22`)
+   - Set `parent_dir` to the current `output_dir` value (before appending timestamp)
    - Append to output_dir: `{output_dir}/{timestamp}/`
    - This ensures each run produces output in a unique subfolder
    - Example: `examples/agentic-app/test-output/2026-03-25T14-30-22/`
 
+8. Auto-detect baseline from previous runs (unless `--baseline` was explicitly provided):
+   - List all subdirectories in `parent_dir`
+   - Exclude the current run's timestamp directory
+   - Sort remaining directories lexicographically descending (ISO timestamps sort naturally)
+   - Check each directory (most recent first) for a `threats.md` file
+   - If found: set `baseline_path` to that file
+   - If none found: `baseline_path = null` (first run — stateless mode)
+   - Display when detected:
+     ```
+     Baseline detected: {baseline_path}
+     ```
+
 ## Overview
 
 Single-command entry point for tachi threat modeling. Validates prerequisites, invokes the tachi orchestrator agent against an architecture description, and writes the full output suite to the target directory.
@@ -92,6 +105,8 @@ Single-command entry point for tachi threat modeling. Validates prerequisites, i
    - threat-report.md
    - attack-trees/ (one file per Critical/High finding)
 
+   Baseline: {baseline_path or "none (first run — stateless mode)"}
+
    Architecture input:
 
    <architecture-input>
@@ -110,6 +125,7 @@ THREAT MODEL COMPLETE
 Architecture: {architecture_path}
 Output: {output_dir}  ← includes timestamped subfolder
 Version: {version_tag or "unversioned"}
+Baseline: {baseline_path or "none (first run)"}
 
 Files generated:
   threats.md                          — Primary threat model
@@ -120,6 +136,9 @@ Files generated:
 Risk Summary:
   Critical: {count}    High: {count}    Medium: {count}    Low: {count}
 
+Delta Summary (when baseline present):
+  New: {count}    Unchanged: {count}    Updated: {count}    Resolved: {count}
+
 Next steps:
   1. Review Critical/High findings in {output_dir}/threats.md Section 7
   2. Run /risk-score to add quantitative risk scoring (CVSS, exploitability, scalability, reachability)
diff --git a/.claude/skills/tachi-orchestration/references/baseline-correlation.md b/.claude/skills/tachi-orchestration/references/baseline-correlation.md
@@ -13,7 +13,7 @@ Reference material for baseline handling and carry-forward logic in the baseline
 ### Priority Order
 
 1. **Explicit `--baseline <path>`**: Use the specified file directly.
-2. **Auto-detection**: Check the output directory for an existing `threats.md`.
+2. **Auto-detection**: Scan the output directory's **parent** for the most recent sibling directory containing a `threats.md`. Since each run creates a timestamped subfolder (e.g., `docs/security/2026-04-08T15-16-21/`), list all sibling directories, sort lexicographically (ISO timestamps sort naturally), exclude the current run's directory, and use the `threats.md` from the most recent match.
 3. **No baseline found**: Operate in stateless mode (identical to pre-baseline behavior).
 
 ### Validation