A small utility (library + CLI) that walks a Rust workspace / crate, parses each .rs file with Tree‑sitter (v0.25), and lets you:
- Dump a lightweight JSON representation of the concrete syntax tree (CST-ish) per file
- Run arbitrary Tree‑sitter queries across all source files and aggregate captures by crate
It focuses on being: fast, embeddable, and predictable (deterministic output ordering).
- Parallel parsing + querying via
rayon - Skips common noisy dirs automatically (
target/,generated/) - User‑configurable directory skipping with repeatable
--skip-dir <path>(relative or absolute) - Optional inclusion of tests / benches (
--include-tests) - Deterministic ordering of files & captures for reproducible diffs
- Optional line context for each capture (
--context) - Depth‑limited JSON CST dumping (
--max-depth) - Optional inlining of short node source spans (
--with-source) - Safe stdout writing (gracefully handles broken pipe)
From crates.io:
cargo install --locked arbolFrom the repo:
cargo install --locked --git https://github.com/joaommartins/arbolAfter install you can run as a normal binary:
arbol --helpDump a shallow CST (installed binary assumed):
arbol dump-json --max-depth 2 > ast.jsonRun an inline query and emit JSON:
arbol query --expr '(function_item name: (identifier) @fn.name)' --json > fns.jsonUse a query file with line context:
arbol query --query-file examples/functions.scm --contextInclude tests / benches:
arbol query --include-tests --expr '(macro_invocation macro: (identifier) @macro.name)'Skip specific directories (repeat --skip-dir or pass multiple):
arbol query \
--skip-dir target \
--skip-dir openapi/generated \
--expr '(trait_item name: (type_identifier) @trait.name)' --jsonVerbose tracing:
arbol dump-json --verbose --max-depth 1Subcommands:
Dump a per‑file JSON listing of nodes (optionally including node source text):
Flags:
--with-sourceinclude short node snippets (<= 240 bytes)--max-depth <n>limit traversal depth (0 = only root)--output <path>write to file instead of stdout
Run a raw Tree‑sitter query across all discovered Rust files.
Provide exactly one of:
--query-file <file.scm>--expr '<inline s-expression>'
Optional flags:
--contextinclude the full source line for each capture--jsonemit structured JSON instead of plain grouped text
Global flags:
--include-testsalso scantests/&benches/--skip-dir <path>repeatable; omit any paths under these directories--verboseenable tracing subscriber--root <path>(default.) – directory to scan (should contain a Cargo.toml or nested crates)--markdown-helpemit Markdown help to stdout (or to file with--help-output)--help-output <path>path to write Markdown help (implies--markdown-help)
[{
"crate_path": "utilities/arbol",
"captures": [
{
"crate_path": "utilities/arbol",
"file": "src/lib.rs",
"line": 42,
"column": 5,
"name": "fn.name",
"text": "rust_language",
"line_text": "pub fn rust_language() -> Language {" // only with --context
}
]
}]Queries are standard Tree‑sitter S‑expressions. Example: capture all public function names:
((function_item
(visibility_modifier) @vis
name: (identifier) @fn.name))Capture trait names:
((trait_item name: (type_identifier) @trait.name))You can combine them in one file; all captures are flattened then grouped by crate.
- Parsing & querying parallelised over files (one parser per worker thread)
- Sorting captures ensures deterministic output (stable CI diffs)
- Source text for nodes is truncated by size threshold to avoid massive JSON
- No incremental parsing (fresh parse each run)
- No built‑in filtering by crate patterns yet
- Query diagnostics: only basic position caret reporting
- Large monolithic queries may allocate more; consider splitting
- Use smaller
--max-depthfor structural overviews - Pipe into
jqfor quick ad‑hoc exploration:... DumpJson | jq '.[] | .path, .nodes[0]' - For speed in huge workspaces, start without
--contextthen re‑run when refining
MIT
Small focused improvements welcome:
- Open an issue / PR
- Add tests / examples if changing behaviour
- Keep output ordering deterministic
- 0.1.0 – Initial release: dump / query, parallel execution, deterministic output, configurable directory skips.
[{ "path": "src/lib.rs", "root_kind": "source_file", "nodes": [ { "kind": "function_item", "start_byte": 120, "end_byte": 260, "start_line": 10, "end_line": 18, "child_count": 5, "text": "fn foo() {}" // present only with --with-source and short spans } ] }]