-
Notifications
You must be signed in to change notification settings - Fork 350
feat(trailmark): skills that reason about code as graphs #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tob-scott-a
wants to merge
14
commits into
main
Choose a base branch
from
trailmark
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 11 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
18beff8
feat(trailmark): skills that reason about code as graphs
tob-scott-a ebbd30f
Add Codex skill symlinks for trailmark plugin
tob-scott-a 7286147
Address PR #133 review feedback
tob-scott-a 6c20c18
Fix diagram skill to use uv run instead of plain python
tob-scott-a 827f742
Address second round of PR #133 review feedback
tob-scott-a 4ce07e6
Local skill-improver review pass across all 10 trailmark skills
tob-scott-a 704c42f
Address third round of PR #133 review feedback
tob-scott-a cc0838b
Address fourth round of PR #133 review feedback
tob-scott-a 6d9c211
Address fifth round of PR #133 review feedback
tob-scott-a 8a821fc
Fix ProVerif type consistency and graph-evolution template vars
tob-scott-a 3df67e8
Comprehensive vivisect-style review of all trailmark skills
tob-scott-a dec734b
Fix ProVerif type consistency and graph-evolution template vars
tob-scott-a b3addf8
Fix mermaid-to-proverif template: missing beginI event and secrecy wi…
tob-scott-a e6bd48a
Address sixth round of PR #133 review feedback
tob-scott-a File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/audit-augmentation |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/crypto-protocol-diagram |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/diagramming-code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/genotoxic |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/graph-evolution |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/mermaid-to-proverif |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/trailmark |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/trailmark-structural |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/trailmark-summary |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../plugins/trailmark/skills/vector-forge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| { | ||
| "name": "trailmark", | ||
| "version": "0.8.0", | ||
| "description": "Builds multi-language source code graphs for security analysis: call graphs, attack surface mapping, blast radius, taint propagation, complexity hotspots, and entry point enumeration. Generates Mermaid diagrams (call graphs, class hierarchies, dependency maps, heatmaps). Compares code graph snapshots for structural diff and evolution analysis. Runs graph-informed mutation testing triage (genotoxic). Generates mutation-driven test vectors (vector-forge). Extracts crypto protocol message flows and converts Mermaid diagrams to ProVerif models. Projects SARIF and weAudit findings onto code graphs. Use when analyzing call paths, mapping attack surface, visualizing code architecture, triaging survived mutants, generating cryptographic test vectors, diagramming crypto protocols, formally verifying protocols, or augmenting audits with static analysis findings.", | ||
| "author": { | ||
| "name": "Scott Arciszewski", | ||
| "url": "https://github.com/tob-scott-a" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| # trailmark | ||
|
|
||
| **Source code graph analysis for security auditing.** Parses code into queryable graphs of functions, classes, and calls, then uses that structure for diagram generation, mutation testing triage, protocol verification, and differential review. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| [Trailmark](https://pypi.org/project/trailmark/) ([source](https://github.com/trailofbits/trailmark)) must be installed: | ||
|
|
||
| ```bash | ||
| uv pip install trailmark | ||
| ``` | ||
|
|
||
| ## Skills | ||
|
|
||
| | Skill | Description | | ||
| |-------|-------------| | ||
| | `trailmark` | Build and query multi-language code graphs with pre-analysis passes (blast radius, taint, privilege boundaries, entrypoints) | | ||
| | `diagramming-code` | Generate Mermaid diagrams from code graphs (call graphs, class hierarchies, complexity heatmaps, data flow) | | ||
| | `crypto-protocol-diagram` | Extract protocol message flow from source code or specs (RFC, ProVerif, Tamarin) into sequence diagrams | | ||
| | `genotoxic` | Triage mutation testing results using graph analysis — classify survived mutants as false positives, missing tests, or fuzzing targets | | ||
| | `vector-forge` | Mutation-driven test vector generation — find coverage gaps via mutation testing, then generate Wycheproof-style vectors that close them | | ||
| | `graph-evolution` | Compare code graphs at two snapshots to surface security-relevant structural changes text diffs miss | | ||
| | `mermaid-to-proverif` | Convert Mermaid sequence diagrams into ProVerif formal verification models | | ||
| | `audit-augmentation` | Project SARIF and weAudit findings onto code graphs as annotations and subgraphs | | ||
| | `trailmark-summary` | Quick structural overview (language detection, entry points, dependencies) for vivisect/galvanize | | ||
| | `trailmark-structural` | Full structural analysis with all pre-analysis passes (blast radius, taint, privilege boundaries, complexity) | | ||
|
|
||
| ## Directory Structure | ||
|
|
||
| ```text | ||
| trailmark/ | ||
| ├── .claude-plugin/ | ||
| │ └── plugin.json | ||
| ├── README.md | ||
| └── skills/ | ||
| ├── trailmark/ # Core graph querying | ||
| ├── diagramming-code/ # Mermaid diagram generation | ||
| │ └── scripts/diagram.py | ||
| ├── crypto-protocol-diagram/ # Protocol flow extraction | ||
| │ └── examples/ | ||
| ├── genotoxic/ # Mutation testing triage | ||
| ├── vector-forge/ # Mutation-driven test vector generation | ||
| │ └── references/ | ||
| ├── graph-evolution/ # Structural diff | ||
| │ └── scripts/graph_diff.py | ||
| ├── mermaid-to-proverif/ # Sequence diagram → ProVerif | ||
| │ └── examples/ | ||
| ├── audit-augmentation/ # SARIF/weAudit integration | ||
| ├── trailmark-summary/ # Quick overview for vivisect/galvanize | ||
| └── trailmark-structural/ # Full structural analysis | ||
| ``` | ||
|
|
||
| ## Related Skills | ||
|
|
||
| | Skill | Use For | | ||
| |-------|---------| | ||
| | `mutation-testing` | Guidance for running mutation frameworks (mewt, muton) — use before genotoxic for triage | | ||
| | `differential-review` | Text-level security diff review — complements graph-evolution's structural analysis | | ||
| | `audit-context-building` | Deep architectural context before vulnerability hunting | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,171 @@ | ||
| --- | ||
| name: audit-augmentation | ||
| description: > | ||
| Augments Trailmark code graphs with external audit findings from SARIF static | ||
| analysis results and weAudit annotation files. Maps findings to graph nodes by | ||
| file and line overlap, creates severity-based subgraphs, and enables | ||
| cross-referencing findings with pre-analysis data (blast radius, taint, etc.). | ||
| Use when projecting SARIF results onto a code graph, overlaying weAudit | ||
| annotations, cross-referencing Semgrep or CodeQL findings with call graph | ||
| data, or visualizing audit findings in the context of code structure. | ||
| --- | ||
|
|
||
| # Audit Augmentation | ||
|
|
||
| Projects findings from external tools (SARIF) and human auditors (weAudit) | ||
| onto Trailmark code graphs as annotations and subgraphs. | ||
|
|
||
| ## When to Use | ||
|
|
||
| - Importing Semgrep, CodeQL, or other SARIF-producing tool results into a graph | ||
| - Importing weAudit audit annotations into a graph | ||
| - Cross-referencing static analysis findings with blast radius or taint data | ||
| - Querying which functions have high-severity findings | ||
| - Visualizing audit coverage alongside code structure | ||
|
|
||
| ## When NOT to Use | ||
|
|
||
| - Running static analysis tools (use semgrep/codeql directly, then import) | ||
| - Building the code graph itself (use the `trailmark` skill) | ||
| - Generating diagrams (use the `diagramming-code` skill after augmenting) | ||
|
|
||
| ## Rationalizations to Reject | ||
|
|
||
| | Rationalization | Why It's Wrong | Required Action | | ||
| |-----------------|----------------|-----------------| | ||
| | "The user only asked about SARIF, skip pre-analysis" | Without pre-analysis, you can't cross-reference findings with blast radius or taint | Always run `engine.preanalysis()` before augmenting | | ||
| | "Unmatched findings don't matter" | Unmatched findings may indicate parsing gaps or out-of-scope files | Report unmatched count and investigate if high | | ||
| | "One severity subgraph is enough" | Different severities need different triage workflows | Query all severity subgraphs, not just `error` | | ||
| | "SARIF results speak for themselves" | Findings without graph context lack blast radius and taint reachability | Cross-reference with pre-analysis subgraphs | | ||
| | "weAudit and SARIF overlap, pick one" | Human auditors and tools find different things | Import both when available | | ||
| | "Tool isn't installed, I'll do it manually" | Manual analysis misses what tooling catches | Install trailmark first | | ||
|
|
||
| --- | ||
|
|
||
| ## Installation | ||
|
|
||
| **MANDATORY:** If `uv run trailmark` fails, install trailmark first: | ||
|
|
||
| ```bash | ||
| uv pip install trailmark | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### CLI | ||
|
|
||
| ```bash | ||
| # Augment with SARIF | ||
| uv run trailmark augment {targetDir} --sarif results.sarif | ||
|
|
||
| # Augment with weAudit | ||
| uv run trailmark augment {targetDir} --weaudit .vscode/alice.weaudit | ||
|
|
||
| # Both at once, output JSON | ||
| uv run trailmark augment {targetDir} \ | ||
| --sarif results.sarif \ | ||
| --weaudit .vscode/alice.weaudit \ | ||
| --json | ||
| ``` | ||
|
|
||
| ### Programmatic API | ||
|
|
||
| ```python | ||
| from trailmark.query.api import QueryEngine | ||
|
|
||
| engine = QueryEngine.from_directory("{targetDir}", language="python") | ||
|
|
||
| # Run pre-analysis first for cross-referencing | ||
| engine.preanalysis() | ||
|
|
||
| # Augment with SARIF | ||
| result = engine.augment_sarif("results.sarif") | ||
| # result: {matched_findings: 12, unmatched_findings: 3, subgraphs_created: [...]} | ||
|
|
||
| # Augment with weAudit | ||
| result = engine.augment_weaudit(".vscode/alice.weaudit") | ||
|
|
||
| # Query findings | ||
| engine.findings() # All findings | ||
| engine.subgraph("sarif:error") # High-severity SARIF | ||
| engine.subgraph("weaudit:high") # High-severity weAudit | ||
| engine.subgraph("sarif:semgrep") # By tool name | ||
| engine.annotations_of("function_name") # Per-node lookup | ||
| ``` | ||
|
|
||
| ## Workflow | ||
|
|
||
| ``` | ||
| Augmentation Progress: | ||
| - [ ] Step 1: Build graph and run pre-analysis | ||
| - [ ] Step 2: Locate SARIF/weAudit files | ||
| - [ ] Step 3: Run augmentation | ||
| - [ ] Step 4: Inspect results and subgraphs | ||
| - [ ] Step 5: Cross-reference with pre-analysis | ||
| ``` | ||
|
|
||
| **Step 1:** Build the graph and run pre-analysis for blast radius and taint | ||
| context: | ||
|
|
||
| ```python | ||
| engine = QueryEngine.from_directory("{targetDir}", language="{lang}") | ||
| engine.preanalysis() | ||
| ``` | ||
|
|
||
| **Step 2:** Locate input files: | ||
| - **SARIF**: Usually output by tools like `semgrep --sarif -o results.sarif` | ||
| or `codeql database analyze --format=sarif-latest` | ||
| - **weAudit**: Stored in `.vscode/<username>.weaudit` within the workspace | ||
|
|
||
| **Step 3:** Run augmentation via `engine.augment_sarif()` or | ||
| `engine.augment_weaudit()`. Check `unmatched_findings` in the result — these | ||
| are findings whose file/line locations didn't overlap any parsed code unit. | ||
|
|
||
| **Step 4:** Query findings and subgraphs. Use `engine.findings()` to list all | ||
| annotated nodes. Use `engine.subgraph_names()` to see available subgraphs. | ||
|
|
||
| **Step 5:** Cross-reference with pre-analysis data to prioritize: | ||
| - Findings on tainted nodes: overlap `sarif:error` with `tainted` subgraph | ||
| - Findings on high blast radius nodes: overlap with `high_blast_radius` | ||
| - Findings on privilege boundaries: overlap with `privilege_boundary` | ||
|
|
||
| ## Annotation Format | ||
|
|
||
| Findings are stored as standard Trailmark annotations: | ||
|
|
||
| - **Kind**: `finding` (tool-generated) or `audit_note` (human notes) | ||
| - **Source**: `sarif:<tool_name>` or `weaudit:<author>` | ||
| - **Description**: Compact single-line: | ||
| `[SEVERITY] rule-id: message (tool)` | ||
|
|
||
| ## Subgraphs Created | ||
|
|
||
| | Subgraph | Contents | | ||
| |----------|----------| | ||
| | `sarif:error` | Nodes with SARIF error-level findings | | ||
| | `sarif:warning` | Nodes with SARIF warning-level findings | | ||
| | `sarif:note` | Nodes with SARIF note-level findings | | ||
| | `sarif:<tool>` | Nodes flagged by a specific tool | | ||
| | `weaudit:high` | Nodes with high-severity weAudit findings | | ||
| | `weaudit:medium` | Nodes with medium-severity weAudit findings | | ||
| | `weaudit:low` | Nodes with low-severity weAudit findings | | ||
| | `weaudit:findings` | All weAudit findings (entryType=0) | | ||
| | `weaudit:notes` | All weAudit notes (entryType=1) | | ||
|
|
||
| ## How Matching Works | ||
|
|
||
| Findings are matched to graph nodes by file path and line range overlap: | ||
|
|
||
| 1. Finding file path is normalized relative to the graph's `root_path` | ||
| 2. Nodes whose `location.file_path` matches AND whose line range overlaps are | ||
| selected | ||
| 3. The tightest match (smallest span) is preferred | ||
| 4. If a finding's location doesn't overlap any node, it counts as unmatched | ||
|
|
||
| SARIF paths may be relative, absolute, or `file://` URIs — all are handled. | ||
| weAudit uses 0-indexed lines which are converted to 1-indexed automatically. | ||
|
|
||
| ## Supporting Documentation | ||
|
|
||
| - **[references/formats.md](references/formats.md)** — SARIF 2.1.0 and | ||
| weAudit file format field reference |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.