Skip to content

Commit 18beff8

Browse files
committed
feat(trailmark): skills that reason about code as graphs
1 parent 5c15f4f commit 18beff8

File tree

45 files changed

+10451
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+10451
-0
lines changed

.claude-plugin/marketplace.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,16 @@
212212
},
213213
"source": "./plugins/testing-handbook-skills"
214214
},
215+
{
216+
"name": "trailmark",
217+
"version": "0.8.0",
218+
"description": "Builds multi-language source code graphs for security analysis: call graphs, attack surface mapping, blast radius, taint propagation, complexity hotspots, and entry point enumeration. Generates Mermaid diagrams (call graphs, class hierarchies, dependency maps, heatmaps). Compares code graph snapshots for structural diff and evolution analysis. Runs graph-informed mutation testing triage (genotoxic). Generates mutation-driven test vectors (vector-forge). Extracts crypto protocol message flows and converts Mermaid diagrams to ProVerif models. Projects SARIF and weAudit findings onto code graphs. Use when analyzing call paths, mapping attack surface, visualizing code architecture, triaging survived mutants, generating cryptographic test vectors, diagramming crypto protocols, formally verifying protocols, or augmenting audits with static analysis findings.",
219+
"author": {
220+
"name": "Scott Arciszewski",
221+
"url": "https://github.com/tob-scott-a"
222+
},
223+
"source": "./plugins/trailmark"
224+
},
215225
{
216226
"name": "variant-analysis",
217227
"version": "1.0.0",

CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
/plugins/static-analysis/ @axelm-tob @dguido
3434
/plugins/supply-chain-risk-auditor/ @smichaels-tob @dguido
3535
/plugins/testing-handbook-skills/ @GrosQuildu @dguido
36+
/plugins/trailmark/ @tob-scott-a @pbottine @tob-joe @dguido
3637
/plugins/variant-analysis/ @axelm-tob @dguido
3738
/plugins/workflow-skill-design/ @bsamuels453 @dguido
3839
/plugins/yara-authoring/ @dguido

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ cd /path/to/parent # e.g., if repo is at ~/projects/skills, be in ~/projects
6565
| [static-analysis](plugins/static-analysis/) | Static analysis toolkit with CodeQL, Semgrep, and SARIF parsing |
6666
| [supply-chain-risk-auditor](plugins/supply-chain-risk-auditor/) | Audit supply-chain threat landscape of project dependencies |
6767
| [testing-handbook-skills](plugins/testing-handbook-skills/) | Skills from the [Testing Handbook](https://appsec.guide): fuzzers, static analysis, sanitizers, coverage |
68+
| [trailmark](plugins/trailmark/) | Code graph analysis, Mermaid diagrams, mutation testing triage, and protocol verification |
6869
| [variant-analysis](plugins/variant-analysis/) | Find similar vulnerabilities across codebases using pattern-based analysis |
6970

7071
### Malware Analysis
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"name": "trailmark",
3+
"version": "0.8.0",
4+
"description": "Builds multi-language source code graphs for security analysis: call graphs, attack surface mapping, blast radius, taint propagation, complexity hotspots, and entry point enumeration. Generates Mermaid diagrams (call graphs, class hierarchies, dependency maps, heatmaps). Compares code graph snapshots for structural diff and evolution analysis. Runs graph-informed mutation testing triage (genotoxic). Generates mutation-driven test vectors (vector-forge). Extracts crypto protocol message flows and converts Mermaid diagrams to ProVerif models. Projects SARIF and weAudit findings onto code graphs. Use when analyzing call paths, mapping attack surface, visualizing code architecture, triaging survived mutants, generating cryptographic test vectors, diagramming crypto protocols, formally verifying protocols, or augmenting audits with static analysis findings.",
5+
"author": {
6+
"name": "Scott Arciszewski",
7+
"url": "https://github.com/tob-scott-a"
8+
}
9+
}

plugins/trailmark/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# trailmark
2+
3+
**Source code graph analysis for security auditing.** Parses code into queryable graphs of functions, classes, and calls, then uses that structure for diagram generation, mutation testing triage, protocol verification, and differential review.
4+
5+
## Prerequisites
6+
7+
[Trailmark](https://pypi.org/project/trailmark/) ([source](https://github.com/trailofbits/trailmark)) must be installed:
8+
9+
```bash
10+
uv pip install trailmark
11+
```
12+
13+
## Skills
14+
15+
| Skill | Description |
16+
|-------|-------------|
17+
| `trailmark` | Build and query multi-language code graphs with pre-analysis passes (blast radius, taint, privilege boundaries, entrypoints) |
18+
| `diagramming-code` | Generate Mermaid diagrams from code graphs (call graphs, class hierarchies, complexity heatmaps, data flow) |
19+
| `crypto-protocol-diagram` | Extract protocol message flow from source code or specs (RFC, ProVerif, Tamarin) into sequence diagrams |
20+
| `genotoxic` | Triage mutation testing results using graph analysis — classify survived mutants as false positives, missing tests, or fuzzing targets |
21+
| `vector-forge` | Mutation-driven test vector generation — find coverage gaps via mutation testing, then generate Wycheproof-style vectors that close them |
22+
| `graph-evolution` | Compare code graphs at two snapshots to surface security-relevant structural changes text diffs miss |
23+
| `mermaid-to-proverif` | Convert Mermaid sequence diagrams into ProVerif formal verification models |
24+
| `audit-augmentation` | Project SARIF and weAudit findings onto code graphs as annotations and subgraphs |
25+
26+
## Directory Structure
27+
28+
```text
29+
trailmark/
30+
├── .claude-plugin/
31+
│ └── plugin.json
32+
├── README.md
33+
└── skills/
34+
├── trailmark/ # Core graph querying
35+
├── diagram/ # Mermaid diagram generation
36+
│ └── scripts/diagram.py
37+
├── crypto-protocol-diagram/ # Protocol flow extraction
38+
│ └── examples/
39+
├── genotoxic/ # Mutation testing triage
40+
├── vector-forge/ # Mutation-driven test vector generation
41+
│ └── references/
42+
├── graph-evolution/ # Structural diff
43+
│ └── scripts/graph_diff.py
44+
├── mermaid-to-proverif/ # Sequence diagram → ProVerif
45+
│ └── examples/
46+
└── audit-augmentation/ # SARIF/weAudit integration
47+
```
48+
49+
## Related Skills
50+
51+
| Skill | Use For |
52+
|-------|---------|
53+
| `mutation-testing` | Guidance for running mutation frameworks (mewt, muton) — use before genotoxic for triage |
54+
| `differential-review` | Text-level security diff review — complements graph-evolution's structural analysis |
55+
| `audit-context-building` | Deep architectural context before vulnerability hunting |
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
---
2+
name: audit-augmentation
3+
description: >
4+
Augments Trailmark code graphs with external audit findings from SARIF static
5+
analysis results and weAudit annotation files. Maps findings to graph nodes by
6+
file and line overlap, creates severity-based subgraphs, and enables
7+
cross-referencing findings with pre-analysis data (blast radius, taint, etc.).
8+
Use when projecting SARIF results onto a code graph, overlaying weAudit
9+
annotations, cross-referencing Semgrep or CodeQL findings with call graph
10+
data, or visualizing audit findings in the context of code structure.
11+
---
12+
13+
# Audit Augmentation
14+
15+
Projects findings from external tools (SARIF) and human auditors (weAudit)
16+
onto Trailmark code graphs as annotations and subgraphs.
17+
18+
## When to Use
19+
20+
- Importing Semgrep, CodeQL, or other SARIF-producing tool results into a graph
21+
- Importing weAudit audit annotations into a graph
22+
- Cross-referencing static analysis findings with blast radius or taint data
23+
- Querying which functions have high-severity findings
24+
- Visualizing audit coverage alongside code structure
25+
26+
## When NOT to Use
27+
28+
- Running static analysis tools (use semgrep/codeql directly, then import)
29+
- Building the code graph itself (use the `trailmark` skill)
30+
- Generating diagrams (use the `diagramming-code` skill after augmenting)
31+
32+
## Installation
33+
34+
**MANDATORY:** If `uv run trailmark` fails, install trailmark first:
35+
36+
```bash
37+
uv pip install trailmark
38+
```
39+
40+
## Quick Start
41+
42+
### CLI
43+
44+
```bash
45+
# Augment with SARIF
46+
uv run trailmark augment {targetDir} --sarif results.sarif
47+
48+
# Augment with weAudit
49+
uv run trailmark augment {targetDir} --weaudit .vscode/alice.weaudit
50+
51+
# Both at once, output JSON
52+
uv run trailmark augment {targetDir} \
53+
--sarif results.sarif \
54+
--weaudit .vscode/alice.weaudit \
55+
--json
56+
```
57+
58+
### Programmatic API
59+
60+
```python
61+
from trailmark.query.api import QueryEngine
62+
63+
engine = QueryEngine.from_directory("{targetDir}", language="python")
64+
65+
# Run pre-analysis first for cross-referencing
66+
engine.preanalysis()
67+
68+
# Augment with SARIF
69+
result = engine.augment_sarif("results.sarif")
70+
# result: {matched_findings: 12, unmatched_findings: 3, subgraphs_created: [...]}
71+
72+
# Augment with weAudit
73+
result = engine.augment_weaudit(".vscode/alice.weaudit")
74+
75+
# Query findings
76+
engine.findings() # All findings
77+
engine.subgraph("sarif:error") # High-severity SARIF
78+
engine.subgraph("weaudit:high") # High-severity weAudit
79+
engine.subgraph("sarif:semgrep") # By tool name
80+
engine.annotations_of("function_name") # Per-node lookup
81+
```
82+
83+
## Workflow
84+
85+
```
86+
Augmentation Progress:
87+
- [ ] Step 1: Build graph and run pre-analysis
88+
- [ ] Step 2: Locate SARIF/weAudit files
89+
- [ ] Step 3: Run augmentation
90+
- [ ] Step 4: Inspect results and subgraphs
91+
- [ ] Step 5: Cross-reference with pre-analysis
92+
```
93+
94+
**Step 1:** Build the graph and run pre-analysis for blast radius and taint
95+
context:
96+
97+
```python
98+
engine = QueryEngine.from_directory("{targetDir}", language="{lang}")
99+
engine.preanalysis()
100+
```
101+
102+
**Step 2:** Locate input files:
103+
- **SARIF**: Usually output by tools like `semgrep --sarif -o results.sarif`
104+
or `codeql database analyze --format=sarif-latest`
105+
- **weAudit**: Stored in `.vscode/<username>.weaudit` within the workspace
106+
107+
**Step 3:** Run augmentation via `engine.augment_sarif()` or
108+
`engine.augment_weaudit()`. Check `unmatched_findings` in the result — these
109+
are findings whose file/line locations didn't overlap any parsed code unit.
110+
111+
**Step 4:** Query findings and subgraphs. Use `engine.findings()` to list all
112+
annotated nodes. Use `engine.subgraph_names()` to see available subgraphs.
113+
114+
**Step 5:** Cross-reference with pre-analysis data to prioritize:
115+
- Findings on tainted nodes: overlap `sarif:error` with `tainted` subgraph
116+
- Findings on high blast radius nodes: overlap with `high_blast_radius`
117+
- Findings on privilege boundaries: overlap with `privilege_boundary`
118+
119+
## Annotation Format
120+
121+
Findings are stored as standard Trailmark annotations:
122+
123+
- **Kind**: `finding` (tool-generated) or `audit_note` (human notes)
124+
- **Source**: `sarif:<tool_name>` or `weaudit:<author>`
125+
- **Description**: Compact single-line:
126+
`[SEVERITY] rule-id: message (tool)`
127+
128+
## Subgraphs Created
129+
130+
| Subgraph | Contents |
131+
|----------|----------|
132+
| `sarif:error` | Nodes with SARIF error-level findings |
133+
| `sarif:warning` | Nodes with SARIF warning-level findings |
134+
| `sarif:note` | Nodes with SARIF note-level findings |
135+
| `sarif:<tool>` | Nodes flagged by a specific tool |
136+
| `weaudit:high` | Nodes with high-severity weAudit findings |
137+
| `weaudit:medium` | Nodes with medium-severity weAudit findings |
138+
| `weaudit:low` | Nodes with low-severity weAudit findings |
139+
| `weaudit:findings` | All weAudit findings (entryType=0) |
140+
| `weaudit:notes` | All weAudit notes (entryType=1) |
141+
142+
## How Matching Works
143+
144+
Findings are matched to graph nodes by file path and line range overlap:
145+
146+
1. Finding file path is normalized relative to the graph's `root_path`
147+
2. Nodes whose `location.file_path` matches AND whose line range overlaps are
148+
selected
149+
3. The tightest match (smallest span) is preferred
150+
4. If a finding's location doesn't overlap any node, it counts as unmatched
151+
152+
SARIF paths may be relative, absolute, or `file://` URIs — all are handled.
153+
weAudit uses 0-indexed lines which are converted to 1-indexed automatically.
154+
155+
## Supporting Documentation
156+
157+
- **[references/formats.md](references/formats.md)** — SARIF 2.1.0 and
158+
weAudit file format field reference
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# SARIF and weAudit Format Reference
2+
3+
## SARIF 2.1.0
4+
5+
SARIF (Static Analysis Results Interchange Format) is an OASIS standard for
6+
encoding static analysis results as JSON.
7+
8+
### Structure Used by Trailmark
9+
10+
```
11+
sarifLog
12+
├── version: "2.1.0"
13+
└── runs[]
14+
├── tool.driver.name → source field ("sarif:<name>")
15+
└── results[]
16+
├── ruleId → included in description
17+
├── message.text → included in description
18+
├── level → "error" | "warning" | "note"
19+
└── locations[]
20+
└── physicalLocation
21+
├── artifactLocation.uri → matched to node file
22+
└── region
23+
├── startLine → matched to node lines
24+
└── endLine → matched to node lines
25+
```
26+
27+
### Level Values
28+
29+
| Level | Subgraph |
30+
|-------|----------|
31+
| `error` | `sarif:error` |
32+
| `warning` (default) | `sarif:warning` |
33+
| `note` | `sarif:note` |
34+
35+
### Example SARIF Result
36+
37+
```json
38+
{
39+
"ruleId": "python.lang.security.audit.exec-detected",
40+
"level": "warning",
41+
"message": {"text": "Detected use of exec()"},
42+
"locations": [{
43+
"physicalLocation": {
44+
"artifactLocation": {"uri": "src/handler.py"},
45+
"region": {"startLine": 42, "endLine": 42}
46+
}
47+
}]
48+
}
49+
```
50+
51+
## weAudit
52+
53+
weAudit is a VSCode extension by Trail of Bits for collaborative security
54+
auditing. Files are stored as `.vscode/<username>.weaudit`.
55+
56+
### Structure Used by Trailmark
57+
58+
```
59+
root
60+
├── clientRemote → fallback author extraction
61+
├── treeEntries[] → active findings/notes
62+
│ ├── label → included in description
63+
│ ├── entryType → 0=Finding, 1=Note
64+
│ ├── author → source field ("weaudit:<author>")
65+
│ ├── details
66+
│ │ ├── severity → "High" | "Medium" | "Low" | "Informational"
67+
│ │ ├── type → finding category
68+
│ │ └── description → included in annotation
69+
│ └── locations[]
70+
│ ├── path → relative to git root
71+
│ ├── startLine → 0-indexed (converted to 1-indexed)
72+
│ └── endLine → 0-indexed (converted to 1-indexed)
73+
└── resolvedEntries[] → same structure as treeEntries
74+
```
75+
76+
### Entry Types
77+
78+
| entryType | AnnotationKind | Subgraph |
79+
|-----------|---------------|----------|
80+
| 0 (Finding) | `finding` | `weaudit:findings` |
81+
| 1 (Note) | `audit_note` | `weaudit:notes` |
82+
83+
### Severity Values
84+
85+
| Severity | Subgraph |
86+
|----------|----------|
87+
| `High` | `weaudit:high` |
88+
| `Medium` | `weaudit:medium` |
89+
| `Low` | `weaudit:low` |
90+
| `Informational` | `weaudit:informational` |
91+
92+
### Example weAudit Entry
93+
94+
```json
95+
{
96+
"label": "SQL Injection in user input",
97+
"entryType": 0,
98+
"author": "alice",
99+
"details": {
100+
"severity": "High",
101+
"difficulty": "Low",
102+
"type": "Data Validation",
103+
"description": "User input not sanitized before SQL query.",
104+
"exploit": "Attacker injects malicious SQL.",
105+
"recommendation": "Use parameterized queries."
106+
},
107+
"locations": [{
108+
"path": "src/database/queries.py",
109+
"startLine": 41,
110+
"endLine": 44,
111+
"label": "executeQuery function",
112+
"description": ""
113+
}]
114+
}
115+
```
116+
117+
### Line Indexing
118+
119+
weAudit uses **0-indexed** line numbers. Trailmark uses **1-indexed** (from
120+
tree-sitter). The augmentation module adds 1 to both `startLine` and `endLine`
121+
during conversion.

0 commit comments

Comments
 (0)