davidmatousek
diff --git a/‎.claude/skills/tachi-orchestration/references/dispatch-rules.md‎
Lines changed: 2 additions & 2 deletions b/‎.claude/skills/tachi-orchestration/references/dispatch-rules.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.claude/skills/tachi-orchestration/references/output-schemas.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/skills/tachi-orchestration/references/output-schemas.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/skills/tachi-shared/references/finding-format-shared.md‎
Lines changed: 1 addition & 1 deletion b/‎.claude/skills/tachi-shared/references/finding-format-shared.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.claude/skills/tachi-shared/references/maestro-layers-shared.md‎
Lines changed: 31 additions & 17 deletions b/‎.claude/skills/tachi-shared/references/maestro-layers-shared.md‎
Lines changed: 31 additions & 17 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 72 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
@@ -146,14 +146,14 @@ Label this section clearly:
 |-----------|----------|---------------|-------------------|---------------|--------------|
 | LLM Agent Orchestrator | Process | L3 — Agent Framework | S, T, R, I, D, E | LLM, AG | 11 |
 | MCP Tool Server | Process | L3 — Agent Framework | S, T, R, I, D, E | AG | 8 |
-| User | External Entity | L7 — User Interface | S, R | — | 2 |
+| User | External Entity | L7 — Agent Ecosystem | S, R | — | 2 |
 | Knowledge Base | Data Store | L2 — Data Operations | T, I, D | — | 3 |
 | External API | External Entity | Unclassified | S, R | — | 2 |
 
 In this example:
 - "LLM Agent Orchestrator" is a Process (6 STRIDE agents) with dual-dispatch (3 LLM + 2 AG agents) = 11 total. Classified as L3 (Agent Framework) due to "orchestrator" keyword.
 - "MCP Tool Server" is a Process (6 STRIDE agents) with AG dispatch (2 AG agents) = 8 total. Classified as L3 due to "MCP server" keyword.
-- "User" is an External Entity (2 STRIDE agents) with no AI match = 2 total. Classified as L7 due to user-facing entity.
+- "User" is an External Entity (2 STRIDE agents) with no AI match = 2 total. Classified as L7 (Agent Ecosystem) due to the `user` keyword matching human-agent interaction scope.
 - "Knowledge Base" is a Data Store (3 STRIDE agents) with no AI match = 3 total. Classified as L2 due to "knowledge base" keyword.
 - "External API" is an External Entity with no matching MAESTRO keywords = Unclassified.
 
 
@@ -23,7 +23,7 @@ The output begins with YAML frontmatter containing exactly these fields:
 
 ```yaml
 ---
-schema_version: "1.2"
+schema_version: "1.3"
 date: "YYYY-MM-DD"
 input_format: "detected-or-declared-format"
 classification: "confidential"
 
@@ -61,7 +61,7 @@ These fields are present in specific pipeline contexts but not required for ever
 | `dfd_element_type` | string (enum) | Threat agent output | DFD classification of the target component. One of: `External Entity`, `Process`, `Data Store`, `Data Flow`. |
 | `delta_status` | string (enum) | Baseline-aware runs | Lifecycle status: `NEW`, `UNCHANGED`, `UPDATED`, `RESOLVED`. Defaults to `NEW` when no baseline is present. |
 | `baseline_run_id` | string (nullable) | Baseline-aware runs | Run ID of the baseline that first discovered this finding. Null for first-run findings. |
-| `maestro_layer` | string (enum) | Phase 1 classification | CSA MAESTRO architectural layer classification for the finding's target component. One of: `L1 — Foundation Model`, `L2 — Data Operations`, `L3 — Agent Framework`, `L4 — Deployment Infrastructure`, `L5 — Security`, `L6 — Agent Ecosystem`, `L7 — User Interface`, `Unclassified`. Defaults to `"Unclassified"` when not present or when the component matched no layer keywords. Assigned during Phase 1 and inherited by findings in Phase 3. |
+| `maestro_layer` | string (enum) | Phase 1 classification | CSA MAESTRO architectural layer classification for the finding's target component. One of: `L1 — Foundation Model`, `L2 — Data Operations`, `L3 — Agent Framework`, `L4 — Deployment Infrastructure`, `L5 — Evaluation and Observability`, `L6 — Security and Compliance`, `L7 — Agent Ecosystem`, `Unclassified`. Defaults to `"Unclassified"` when not present or when the component matched no layer keywords. Assigned during Phase 1 and inherited by findings in Phase 3. |
 | `correlation_group` | string | Correlated findings | Correlation group ID (e.g., `CG-1`). Present when the finding belongs to a cross-category correlation group. |
 | `fingerprints` | object | SARIF output | Partial fingerprints for cross-run correlation: `findingId/v1` (primary key) and `primaryLocationLineHash` (validation signal). |
 
 
@@ -14,7 +14,7 @@ consumers:
 
 Canonical CSA MAESTRO seven-layer taxonomy definitions for agentic AI architectures. This is the single source of truth for layer identifiers, descriptions, keyword mappings, and the classification algorithm used during Phase 1 (Scope). All consuming agents should Read this file rather than maintaining inline definitions.
 
-**Source**: Cloud Security Alliance — MAESTRO (Multi-Agent Environment Security Toolkit for Reasoning and Orchestration), February 2025.
+**Source**: Cloud Security Alliance — MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome), February 2025.
 
 ---
 
@@ -33,13 +33,17 @@ MAESTRO layer classification runs during Phase 1 after DFD classification. It as
 
 ### Ordering Rationale
 
-The L1-L7 evaluation order is load-bearing — changing the order changes classification results. The ordering follows a specificity gradient:
+The L1-L7 evaluation order follows a specificity gradient from most specific to most general, with observability placed before security to ensure detective controls are classified correctly.
 
-- **L1 (Foundation Model)** is evaluated first because foundation model keywords are the most specific and least ambiguous.
-- **L7 (User Interface)** is evaluated last because UI keywords (e.g., "API endpoint") are the most general and could match components at other layers.
-- **L4 (Deployment Infrastructure)** and **L5 (Security)** are evaluated in the middle because their keywords have moderate specificity.
+- **L1 (Foundation Models)** is evaluated first because foundation model keywords (LLM, GPT, Claude, Gemini, inference engine) are the most specific and least ambiguous — they rarely match components at other layers.
+- **L2 (Data Operations)** follows because data pipeline keywords (vector, RAG, embedding, dataset) are domain-specific to data handling infrastructure.
+- **L3 (Agent Frameworks)** is next because agent orchestration keywords (orchestrator, planner, tool dispatch, MCP server) are specific to agentic orchestration layers.
+- **L4 (Deployment Infrastructure)** is evaluated in the middle because infrastructure keywords (container, load balancer, API gateway) are common but clearly scoped to runtime deployment.
+- **L5 (Evaluation and Observability)** is evaluated before L6 Security so that detective control keywords (audit log, monitoring, SIEM, anomaly detection, telemetry) classify to the observability layer rather than being misrouted to security. This ordering resolves the semantic ambiguity where "audit log" could match either layer — first-match-wins with L5-before-L6 gives the correct observability classification.
+- **L6 (Security and Compliance)** is evaluated after L5 because security keywords (auth, WAF, firewall, guardrail, RBAC, IAM) are specific to preventive controls and access enforcement. Components matching both L5 and L6 keywords classify to L5 first (e.g., "security audit log" → L5 via `audit log` match).
+- **L7 (Agent Ecosystem)** is evaluated last because it is the broadest catch-all — covering multi-agent coordination (multi-agent, swarm, delegation), agent-to-agent protocols, and human-agent interaction (chat UI, dashboard, API endpoint, web portal). Keywords here are the most general and could potentially match components at other layers, so L7 evaluates last to avoid capturing specific components that belong elsewhere.
 
-**WARNING**: Changing keyword order changes classification. Test against all example architectures after any modification to the keyword table.
+**WARNING**: Changing keyword order changes classification. The L5-before-L6 ordering is load-bearing for canonical MAESTRO observability/security separation. Test against all six example architectures after any modification to the keyword table.
 
 ---
 
@@ -51,9 +55,9 @@ The L1-L7 evaluation order is load-bearing — changing the order changes classi
 | L2 | Data Operations | Data pipelines, vector stores, embedding indexes, and training/retrieval data management | Vector DB, RAG pipeline, training dataset, embedding index, knowledge base |
 | L3 | Agent Framework | Orchestration layers, planning engines, tool dispatch, and agent execution frameworks | Agent orchestrator, tool server, MCP server, workflow engine, planner |
 | L4 | Deployment Infrastructure | Runtime environments, networking, and infrastructure services | API gateway, load balancer, Kubernetes cluster, container runtime, CDN |
-| L5 | Security | Authentication, authorization, content filtering, and security monitoring | WAF, auth service, secrets manager, guardrail, rate limiter, IAM |
-| L6 | Agent Ecosystem | Multi-agent coordination, delegation, and inter-agent communication | Multi-agent supervisor, agent mesh, delegation broker, swarm coordinator |
-| L7 | User Interface | User-facing surfaces and API endpoints for human interaction | Chat UI, admin dashboard, REST API, GraphQL endpoint, web portal |
+| L5 | Evaluation and Observability | Detective controls, logging, monitoring, anomaly detection, forensics, and telemetry collection | Audit logger, SIEM, observability stack, metrics collector, alerting system, forensic log store |
+| L6 | Security and Compliance | Authentication, authorization, content filtering, preventive controls, and compliance enforcement | WAF, auth service, secrets manager, guardrail, rate limiter, IAM, RBAC, encryption service |
+| L7 | Agent Ecosystem | Multi-agent coordination, delegation, agent-to-agent communication, and human-agent interaction surfaces | Multi-agent supervisor, agent mesh, delegation broker, swarm coordinator, chat UI, admin dashboard, REST API |
 
 ---
 
@@ -129,15 +133,30 @@ Keywords are matched case-insensitively against component name, description, and
 | queue |
 | registry |
 
-### L5 — Security
+### L5 — Evaluation and Observability
+
+| Keyword |
+|---------|
+| audit log |
+| monitoring |
+| SIEM |
+| anomaly detection |
+| telemetry |
+| log |
+| metrics |
+| tracing |
+| forensics |
+| alerting |
+| observability |
+
+### L6 — Security and Compliance
 
 | Keyword |
 |---------|
 | auth |
 | WAF |
 | firewall |
 | secrets manager |
-| audit log |
 | guardrail |
 | content filter |
 | rate limit |
@@ -147,7 +166,7 @@ Keywords are matched case-insensitively against component name, description, and
 | access control |
 | security |
 
-### L6 — Agent Ecosystem
+### L7 — Agent Ecosystem
 
 | Keyword |
 |---------|
@@ -160,11 +179,6 @@ Keywords are matched case-insensitively against component name, description, and
 | sub-agent |
 | agent registry |
 | agent mesh |
-
-### L7 — User Interface
-
-| Keyword |
-|---------|
 | chat UI |
 | dashboard |
 | admin console |
 
@@ -7,6 +7,78 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ---
 
+## [Unreleased]
+
+### Breaking Changes — Correctness Fix (#136)
+
+**MAESTRO Canonical Layer Alignment**: tachi's MAESTRO seven-layer taxonomy has been aligned with the canonical CSA Ken Huang reference. Three L5/L6/L7 layer names, the acronym expansion, and a third-divergent name ("Integration Services") in the Typst PDF template have been corrected. This is a **correctness fix**, not a feature addition.
+
+#### Enum Value Migration (`schemas/finding.yaml` `maestro_layer`)
+
+The `maestro_layer` enum in `schemas/finding.yaml` has changed values. Downstream consumers (dashboards, scripts, tooling built on the enum) MUST update their code.
+
+| Old Value | New Value |
+|-----------|-----------|
+| `L5 — Security` | `L5 — Evaluation and Observability` |
+| `L6 — Agent Ecosystem` | `L6 — Security and Compliance` |
+| `L7 — User Interface` | `L7 — Agent Ecosystem` |
+| `L6 — Integration Services` (Typst template bug) | `L6 — Security and Compliance` |
+
+L1–L4 enum values are unchanged.
+
+#### Schema Version Bump
+
+`schemas/finding.yaml` schema version bumped from `1.2` to `1.3`. This signals the enum-value-only breaking change. The schema shape and required fields are unchanged — only the allowed values for `maestro_layer` changed. Per ADR-020, enum-value-only breaking changes warrant a minor schema bump (not major), provided schema shape and required fields are unchanged.
+
+#### Acronym Correction
+
+The MAESTRO acronym expansion in `.claude/skills/tachi-shared/references/maestro-layers-shared.md` (line 17) and `docs/architecture/02_ADRs/ADR-020-maestro-layer-classification.md` (line 123) has been corrected from:
+
+- **Old**: "Multi-Agent Environment Security Toolkit for Reasoning and Orchestration"
+- **New**: "Multi-Agent Environment, Security, Threat, Risk, and Outcome"
+
+The new form matches the canonical CSA source.
+
+#### Typst PDF Template Fix
+
+`templates/tachi/security-report/maestro-findings.typ` fallback dictionary (lines 132-134) previously contained `"L6": "Integration Services"` — a third divergent name matching neither the canonical CSA spec nor the prior shared reference. This pre-existing bug was corrected as part of this fix.
+
+#### Regenerated Example Outputs
+
+All six example architectures in `examples/*` have had their threat model outputs regenerated with canonical layer names:
+
+- `examples/web-app/` — threats.md + security-report.pdf.baseline
+- `examples/microservices/` — threats.md + security-report.pdf.baseline
+- `examples/ascii-web-api/` — threats.md + security-report.pdf.baseline
+- `examples/free-text-microservice/` — threats.md + security-report.pdf.baseline
+- `examples/mermaid-agentic-app/` — threats.md + threat-report.md + threat-infographic-spec.md + attack-trees/ + security-report.pdf.baseline
+- `examples/agentic-app/sample-report/` — full pipeline (threats.md, risk-scores.md, compensating-controls.md, threat-report.md, infographic specs, security-report.pdf)
+
+The five non-agentic-app PDF baselines are byte-deterministic under `SOURCE_DATE_EPOCH=1700000000` per ADR-021. The agentic-app sample remains intentionally excluded from byte-determinism testing due to non-deterministic Gemini infographic generation.
+
+#### New L5 Keyword Set
+
+A new L5 Evaluation and Observability keyword section has been added covering: audit log, monitoring, observability, telemetry, anomaly detection, SIEM, forensics, behavioral monitoring, metrics, human oversight, log aggregation. Previously, findings targeting audit loggers and observability components had no dedicated layer and were misrouted or lost.
+
+#### Downstream Migration Guidance
+
+If you consume tachi output programmatically:
+
+1. Update any hardcoded references to the old layer names (see enum migration table above)
+2. Update any scripts parsing `maestro_layer` values from `threats.md`, `risk-scores.md`, or `compensating-controls.md`
+3. Regenerate any custom report templates that reference layer names
+4. Check `schema_version` field — expect `"1.3"` going forward
+
+#### References
+
+- PRD: [docs/product/02_PRD/136-maestro-canonical-layer-correctness-fix-2026-04-10.md](docs/product/02_PRD/136-maestro-canonical-layer-correctness-fix-2026-04-10.md)
+- Spec: [specs/136-maestro-canonical-layer/spec.md](specs/136-maestro-canonical-layer/spec.md)
+- Plan: [specs/136-maestro-canonical-layer/plan.md](specs/136-maestro-canonical-layer/plan.md)
+- ADR-020 (canonical taxonomy rule): [docs/architecture/02_ADRs/ADR-020-maestro-layer-classification.md](docs/architecture/02_ADRs/ADR-020-maestro-layer-classification.md)
+- GitHub Issue: [#136](https://github.com/davidmatousek/tachi/issues/136)
+
+---
+
 ## [4.9.2](https://github.com/davidmatousek/tachi/compare/v4.9.1...v4.9.2) (2026-04-10)
 
 
 
@@ -257,9 +257,9 @@ For agentic AI systems, tachi maps each finding to the [CSA MAESTRO](https://clo
 | L2 | Data Operations | Vector stores, RAG pipelines, embeddings |
 | L3 | Agent Framework | Orchestrators, tool servers, MCP |
 | L4 | Deployment Infrastructure | API gateways, containers, networking |
-| L5 | Security | Auth, guardrails, rate limiting |
-| L6 | Agent Ecosystem | Multi-agent coordination, delegation |
-| L7 | User Interface | Chat UIs, dashboards, API endpoints |
+| L5 | Evaluation and Observability | Audit logging, monitoring, anomaly detection, forensics |
+| L6 | Security and Compliance | Auth, guardrails, rate limiting, encryption, IAM |
+| L7 | Agent Ecosystem | Multi-agent coordination, delegation, chat UIs, API endpoints |
 
 MAESTRO layers appear in `threats.md`, propagate through all downstream commands, and power the `maestro-stack` and `maestro-heatmap` infographic templates.