trailofbits
diff --git a/‎.claude-plugin/marketplace.json‎
Lines changed: 10 additions & 0 deletions b/‎.claude-plugin/marketplace.json‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎CODEOWNERS‎
Lines changed: 1 addition & 0 deletions b/‎CODEOWNERS‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎plugins/trailmark/.claude-plugin/plugin.json‎
Lines changed: 9 additions & 0 deletions b/‎plugins/trailmark/.claude-plugin/plugin.json‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎plugins/trailmark/README.md‎
Lines changed: 55 additions & 0 deletions b/‎plugins/trailmark/README.md‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎plugins/trailmark/skills/audit-augmentation/SKILL.md‎
Lines changed: 158 additions & 0 deletions b/‎plugins/trailmark/skills/audit-augmentation/SKILL.md‎
Lines changed: 158 additions & 0 deletions
diff --git a/‎plugins/trailmark/skills/audit-augmentation/references/formats.md‎
Lines changed: 121 additions & 0 deletions b/‎plugins/trailmark/skills/audit-augmentation/references/formats.md‎
Lines changed: 121 additions & 0 deletions
@@ -212,6 +212,16 @@
       },
       "source": "./plugins/testing-handbook-skills"
     },
+    {
+      "name": "trailmark",
+      "version": "0.8.0",
+      "description": "Builds multi-language source code graphs for security analysis: call graphs, attack surface mapping, blast radius, taint propagation, complexity hotspots, and entry point enumeration. Generates Mermaid diagrams (call graphs, class hierarchies, dependency maps, heatmaps). Compares code graph snapshots for structural diff and evolution analysis. Runs graph-informed mutation testing triage (genotoxic). Generates mutation-driven test vectors (vector-forge). Extracts crypto protocol message flows and converts Mermaid diagrams to ProVerif models. Projects SARIF and weAudit findings onto code graphs. Use when analyzing call paths, mapping attack surface, visualizing code architecture, triaging survived mutants, generating cryptographic test vectors, diagramming crypto protocols, formally verifying protocols, or augmenting audits with static analysis findings.",
+      "author": {
+        "name": "Scott Arciszewski",
+        "url": "https://github.com/tob-scott-a"
+      },
+      "source": "./plugins/trailmark"
+    },
     {
       "name": "variant-analysis",
       "version": "1.0.0",
 
@@ -33,6 +33,7 @@
 /plugins/static-analysis/ @axelm-tob @dguido
 /plugins/supply-chain-risk-auditor/ @smichaels-tob @dguido
 /plugins/testing-handbook-skills/ @GrosQuildu @dguido
+/plugins/trailmark/ @tob-scott-a @pbottine @tob-joe @dguido
 /plugins/variant-analysis/ @axelm-tob @dguido
 /plugins/workflow-skill-design/ @bsamuels453 @dguido
 /plugins/yara-authoring/ @dguido
 
@@ -65,6 +65,7 @@ cd /path/to/parent  # e.g., if repo is at ~/projects/skills, be in ~/projects
 | [static-analysis](plugins/static-analysis/) | Static analysis toolkit with CodeQL, Semgrep, and SARIF parsing |
 | [supply-chain-risk-auditor](plugins/supply-chain-risk-auditor/) | Audit supply-chain threat landscape of project dependencies |
 | [testing-handbook-skills](plugins/testing-handbook-skills/) | Skills from the [Testing Handbook](https://appsec.guide): fuzzers, static analysis, sanitizers, coverage |
+| [trailmark](plugins/trailmark/) | Code graph analysis, Mermaid diagrams, mutation testing triage, and protocol verification |
 | [variant-analysis](plugins/variant-analysis/) | Find similar vulnerabilities across codebases using pattern-based analysis |
 
 ### Malware Analysis
 
@@ -0,0 +1,9 @@
+{
+  "name": "trailmark",
+  "version": "0.8.0",
+  "description": "Builds multi-language source code graphs for security analysis: call graphs, attack surface mapping, blast radius, taint propagation, complexity hotspots, and entry point enumeration. Generates Mermaid diagrams (call graphs, class hierarchies, dependency maps, heatmaps). Compares code graph snapshots for structural diff and evolution analysis. Runs graph-informed mutation testing triage (genotoxic). Generates mutation-driven test vectors (vector-forge). Extracts crypto protocol message flows and converts Mermaid diagrams to ProVerif models. Projects SARIF and weAudit findings onto code graphs. Use when analyzing call paths, mapping attack surface, visualizing code architecture, triaging survived mutants, generating cryptographic test vectors, diagramming crypto protocols, formally verifying protocols, or augmenting audits with static analysis findings.",
+  "author": {
+    "name": "Scott Arciszewski",
+    "url": "https://github.com/tob-scott-a"
+  }
+}
@@ -0,0 +1,55 @@
+# trailmark
+
+**Source code graph analysis for security auditing.** Parses code into queryable graphs of functions, classes, and calls, then uses that structure for diagram generation, mutation testing triage, protocol verification, and differential review.
+
+## Prerequisites
+
+[Trailmark](https://pypi.org/project/trailmark/) ([source](https://github.com/trailofbits/trailmark)) must be installed:
+
+```bash
+uv pip install trailmark
+```
+
+## Skills
+
+| Skill | Description |
+|-------|-------------|
+| `trailmark` | Build and query multi-language code graphs with pre-analysis passes (blast radius, taint, privilege boundaries, entrypoints) |
+| `diagramming-code` | Generate Mermaid diagrams from code graphs (call graphs, class hierarchies, complexity heatmaps, data flow) |
+| `crypto-protocol-diagram` | Extract protocol message flow from source code or specs (RFC, ProVerif, Tamarin) into sequence diagrams |
+| `genotoxic` | Triage mutation testing results using graph analysis — classify survived mutants as false positives, missing tests, or fuzzing targets |
+| `vector-forge` | Mutation-driven test vector generation — find coverage gaps via mutation testing, then generate Wycheproof-style vectors that close them |
+| `graph-evolution` | Compare code graphs at two snapshots to surface security-relevant structural changes text diffs miss |
+| `mermaid-to-proverif` | Convert Mermaid sequence diagrams into ProVerif formal verification models |
+| `audit-augmentation` | Project SARIF and weAudit findings onto code graphs as annotations and subgraphs |
+
+## Directory Structure
+
+```text
+trailmark/
+├── .claude-plugin/
+│   └── plugin.json
+├── README.md
+└── skills/
+    ├── trailmark/                    # Core graph querying
+    ├── diagram/                      # Mermaid diagram generation
+    │   └── scripts/diagram.py
+    ├── crypto-protocol-diagram/      # Protocol flow extraction
+    │   └── examples/
+    ├── genotoxic/                    # Mutation testing triage
+    ├── vector-forge/                 # Mutation-driven test vector generation
+    │   └── references/
+    ├── graph-evolution/              # Structural diff
+    │   └── scripts/graph_diff.py
+    ├── mermaid-to-proverif/          # Sequence diagram → ProVerif
+    │   └── examples/
+    └── audit-augmentation/           # SARIF/weAudit integration
+```
+
+## Related Skills
+
+| Skill | Use For |
+|-------|---------|
+| `mutation-testing` | Guidance for running mutation frameworks (mewt, muton) — use before genotoxic for triage |
+| `differential-review` | Text-level security diff review — complements graph-evolution's structural analysis |
+| `audit-context-building` | Deep architectural context before vulnerability hunting |
@@ -0,0 +1,158 @@
+---
+name: audit-augmentation
+description: >
+  Augments Trailmark code graphs with external audit findings from SARIF static
+  analysis results and weAudit annotation files. Maps findings to graph nodes by
+  file and line overlap, creates severity-based subgraphs, and enables
+  cross-referencing findings with pre-analysis data (blast radius, taint, etc.).
+  Use when projecting SARIF results onto a code graph, overlaying weAudit
+  annotations, cross-referencing Semgrep or CodeQL findings with call graph
+  data, or visualizing audit findings in the context of code structure.
+---
+
+# Audit Augmentation
+
+Projects findings from external tools (SARIF) and human auditors (weAudit)
+onto Trailmark code graphs as annotations and subgraphs.
+
+## When to Use
+
+- Importing Semgrep, CodeQL, or other SARIF-producing tool results into a graph
+- Importing weAudit audit annotations into a graph
+- Cross-referencing static analysis findings with blast radius or taint data
+- Querying which functions have high-severity findings
+- Visualizing audit coverage alongside code structure
+
+## When NOT to Use
+
+- Running static analysis tools (use semgrep/codeql directly, then import)
+- Building the code graph itself (use the `trailmark` skill)
+- Generating diagrams (use the `diagramming-code` skill after augmenting)
+
+## Installation
+
+**MANDATORY:** If `uv run trailmark` fails, install trailmark first:
+
+```bash
+uv pip install trailmark
+```
+
+## Quick Start
+
+### CLI
+
+```bash
+# Augment with SARIF
+uv run trailmark augment {targetDir} --sarif results.sarif
+
+# Augment with weAudit
+uv run trailmark augment {targetDir} --weaudit .vscode/alice.weaudit
+
+# Both at once, output JSON
+uv run trailmark augment {targetDir} \
+    --sarif results.sarif \
+    --weaudit .vscode/alice.weaudit \
+    --json
+```
+
+### Programmatic API
+
+```python
+from trailmark.query.api import QueryEngine
+
+engine = QueryEngine.from_directory("{targetDir}", language="python")
+
+# Run pre-analysis first for cross-referencing
+engine.preanalysis()
+
+# Augment with SARIF
+result = engine.augment_sarif("results.sarif")
+# result: {matched_findings: 12, unmatched_findings: 3, subgraphs_created: [...]}
+
+# Augment with weAudit
+result = engine.augment_weaudit(".vscode/alice.weaudit")
+
+# Query findings
+engine.findings()                                       # All findings
+engine.subgraph("sarif:error")                          # High-severity SARIF
+engine.subgraph("weaudit:high")                         # High-severity weAudit
+engine.subgraph("sarif:semgrep")                        # By tool name
+engine.annotations_of("function_name")                  # Per-node lookup
+```
+
+## Workflow
+
+```
+Augmentation Progress:
+- [ ] Step 1: Build graph and run pre-analysis
+- [ ] Step 2: Locate SARIF/weAudit files
+- [ ] Step 3: Run augmentation
+- [ ] Step 4: Inspect results and subgraphs
+- [ ] Step 5: Cross-reference with pre-analysis
+```
+
+**Step 1:** Build the graph and run pre-analysis for blast radius and taint
+context:
+
+```python
+engine = QueryEngine.from_directory("{targetDir}", language="{lang}")
+engine.preanalysis()
+```
+
+**Step 2:** Locate input files:
+- **SARIF**: Usually output by tools like `semgrep --sarif -o results.sarif`
+  or `codeql database analyze --format=sarif-latest`
+- **weAudit**: Stored in `.vscode/<username>.weaudit` within the workspace
+
+**Step 3:** Run augmentation via `engine.augment_sarif()` or
+`engine.augment_weaudit()`. Check `unmatched_findings` in the result — these
+are findings whose file/line locations didn't overlap any parsed code unit.
+
+**Step 4:** Query findings and subgraphs. Use `engine.findings()` to list all
+annotated nodes. Use `engine.subgraph_names()` to see available subgraphs.
+
+**Step 5:** Cross-reference with pre-analysis data to prioritize:
+- Findings on tainted nodes: overlap `sarif:error` with `tainted` subgraph
+- Findings on high blast radius nodes: overlap with `high_blast_radius`
+- Findings on privilege boundaries: overlap with `privilege_boundary`
+
+## Annotation Format
+
+Findings are stored as standard Trailmark annotations:
+
+- **Kind**: `finding` (tool-generated) or `audit_note` (human notes)
+- **Source**: `sarif:<tool_name>` or `weaudit:<author>`
+- **Description**: Compact single-line:
+  `[SEVERITY] rule-id: message (tool)`
+
+## Subgraphs Created
+
+| Subgraph | Contents |
+|----------|----------|
+| `sarif:error` | Nodes with SARIF error-level findings |
+| `sarif:warning` | Nodes with SARIF warning-level findings |
+| `sarif:note` | Nodes with SARIF note-level findings |
+| `sarif:<tool>` | Nodes flagged by a specific tool |
+| `weaudit:high` | Nodes with high-severity weAudit findings |
+| `weaudit:medium` | Nodes with medium-severity weAudit findings |
+| `weaudit:low` | Nodes with low-severity weAudit findings |
+| `weaudit:findings` | All weAudit findings (entryType=0) |
+| `weaudit:notes` | All weAudit notes (entryType=1) |
+
+## How Matching Works
+
+Findings are matched to graph nodes by file path and line range overlap:
+
+1. Finding file path is normalized relative to the graph's `root_path`
+2. Nodes whose `location.file_path` matches AND whose line range overlaps are
+   selected
+3. The tightest match (smallest span) is preferred
+4. If a finding's location doesn't overlap any node, it counts as unmatched
+
+SARIF paths may be relative, absolute, or `file://` URIs — all are handled.
+weAudit uses 0-indexed lines which are converted to 1-indexed automatically.
+
+## Supporting Documentation
+
+- **[references/formats.md](references/formats.md)** — SARIF 2.1.0 and
+  weAudit file format field reference
@@ -0,0 +1,121 @@
+# SARIF and weAudit Format Reference
+
+## SARIF 2.1.0
+
+SARIF (Static Analysis Results Interchange Format) is an OASIS standard for
+encoding static analysis results as JSON.
+
+### Structure Used by Trailmark
+
+```
+sarifLog
+├── version: "2.1.0"
+└── runs[]
+    ├── tool.driver.name          → source field ("sarif:<name>")
+    └── results[]
+        ├── ruleId                → included in description
+        ├── message.text          → included in description
+        ├── level                 → "error" | "warning" | "note"
+        └── locations[]
+            └── physicalLocation
+                ├── artifactLocation.uri   → matched to node file
+                └── region
+                    ├── startLine          → matched to node lines
+                    └── endLine            → matched to node lines
+```
+
+### Level Values
+
+| Level | Subgraph |
+|-------|----------|
+| `error` | `sarif:error` |
+| `warning` (default) | `sarif:warning` |
+| `note` | `sarif:note` |
+
+### Example SARIF Result
+
+```json
+{
+  "ruleId": "python.lang.security.audit.exec-detected",
+  "level": "warning",
+  "message": {"text": "Detected use of exec()"},
+  "locations": [{
+    "physicalLocation": {
+      "artifactLocation": {"uri": "src/handler.py"},
+      "region": {"startLine": 42, "endLine": 42}
+    }
+  }]
+}
+```
+
+## weAudit
+
+weAudit is a VSCode extension by Trail of Bits for collaborative security
+auditing. Files are stored as `.vscode/<username>.weaudit`.
+
+### Structure Used by Trailmark
+
+```
+root
+├── clientRemote              → fallback author extraction
+├── treeEntries[]             → active findings/notes
+│   ├── label                 → included in description
+│   ├── entryType             → 0=Finding, 1=Note
+│   ├── author                → source field ("weaudit:<author>")
+│   ├── details
+│   │   ├── severity          → "High" | "Medium" | "Low" | "Informational"
+│   │   ├── type              → finding category
+│   │   └── description       → included in annotation
+│   └── locations[]
+│       ├── path              → relative to git root
+│       ├── startLine         → 0-indexed (converted to 1-indexed)
+│       └── endLine           → 0-indexed (converted to 1-indexed)
+└── resolvedEntries[]         → same structure as treeEntries
+```
+
+### Entry Types
+
+| entryType | AnnotationKind | Subgraph |
+|-----------|---------------|----------|
+| 0 (Finding) | `finding` | `weaudit:findings` |
+| 1 (Note) | `audit_note` | `weaudit:notes` |
+
+### Severity Values
+
+| Severity | Subgraph |
+|----------|----------|
+| `High` | `weaudit:high` |
+| `Medium` | `weaudit:medium` |
+| `Low` | `weaudit:low` |
+| `Informational` | `weaudit:informational` |
+
+### Example weAudit Entry
+
+```json
+{
+  "label": "SQL Injection in user input",
+  "entryType": 0,
+  "author": "alice",
+  "details": {
+    "severity": "High",
+    "difficulty": "Low",
+    "type": "Data Validation",
+    "description": "User input not sanitized before SQL query.",
+    "exploit": "Attacker injects malicious SQL.",
+    "recommendation": "Use parameterized queries."
+  },
+  "locations": [{
+    "path": "src/database/queries.py",
+    "startLine": 41,
+    "endLine": 44,
+    "label": "executeQuery function",
+    "description": ""
+  }]
+}
+```
+
+### Line Indexing
+
+weAudit uses **0-indexed** line numbers. Trailmark uses **1-indexed** (from
+tree-sitter). The augmentation module adds 1 to both `startLine` and `endLine`
+during conversion.