This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Install OCaml dependencies
opam install . --deps-only
# Build grammar libraries (requires npm)
cd grammars && ./build-grammars.sh && cd ..
# Build the project
dune build
# Run all tests
dune test
# Run specific test group (e.g., just Match tests)
dune exec tests/test_runner.exe -- test Match
# Run a single named test
dune exec tests/test_runner.exe -- test Match "simple expression match"diffract is an OCaml library and CLI for parsing source files with tree-sitter and pattern matching. Key capabilities:
- Parse source to S-expressions using tree-sitter grammars
- Pattern matching with concrete syntax and metavariables
Core Library (lib/)
diffract.ml- Main module, re-exports submodulestree.ml- Pure OCaml tree representation (eliminates FFI overhead during traversal)tree_sitter_bindings.ml- Low-level ctypes FFI bindingstree_sitter_helper.c- C helper layer wrapping TSNode in OCaml custom blocks (libffi can't handle 32-byte structs by value)node.ml- FFI-based tree traversal (internal, used during parsing)languages.ml- Static grammar registry (language name →externalC binding)match.ml- Index-based pattern matching with concrete syntaxpattern.ml- Pattern matching DSL
Tree-sitter Integration Flow:
- C helper layer wraps TSNode/TSTree in OCaml custom blocks with finalizers
- ctypes-foreign binds to libtree-sitter and C helpers
languages.mldispatches to grammar language functions statically linked into the binarytree.mlconverts FFI nodes to pure OCaml representation once during parsing
- Add C wrapper and
externalbinding tolib/tree_sitter_helper.candlib/languages.ml:
/* lib/tree_sitter_helper.c */
extern const TSLanguage *tree_sitter_ruby(void);
CAMLprim value tsh_ruby_language(value v_unit) {
CAMLparam1(v_unit); CAMLreturn(caml_copy_nativeint((intnat)tree_sitter_ruby())); }(* lib/languages.ml *)
external ruby_language : unit -> nativeint = "tsh_ruby_language"
(* add to canonical_info: ("ruby", [], ruby_language) *)- Update
grammars/build-grammars.shwith the compilation command:
npm install tree-sitter-ruby
cc -O2 -c -o "$TMPDIR_LOCAL/ruby_parser.o" \
-I node_modules/tree-sitter-ruby/src \
node_modules/tree-sitter-ruby/src/parser.c
cc -O2 -c -o "$TMPDIR_LOCAL/ruby_scanner.o" \
-I node_modules/tree-sitter-ruby/src \
node_modules/tree-sitter-ruby/src/scanner.c
ar rcs lib/libtree-sitter-ruby.a "$TMPDIR_LOCAL/ruby_parser.o" "$TMPDIR_LOCAL/ruby_scanner.o"- Add copy rule and
(foreign_archives ...)entry tolib/dune:
(rule (target libtree-sitter-ruby.a)
(deps ../grammars/lib/libtree-sitter-ruby.a)
(action (copy %{deps} %{target})))
And add tree-sitter-ruby to the (foreign_archives ...) list.
- Rebuild:
cd grammars && ./build-grammars.sh && cd .. && dune build
Patterns use @@ delimiters with a required match mode and optional metavariable declarations:
@@
match: strict
metavar $obj: single
metavar $method: single
@@
$obj.$method()
Types: single (one AST node), sequence (zero or more nodes)
Ellipsis (...) can be used as anonymous sequence matching:
@@
match: strict
@@
<?php
function test() {
...
echo "middle";
...
}
...matches zero or more nodes (like sequence metavars)- Auto-detects context: adds
;in statement position, not in argument position - Does NOT replace
...$var(PHP spread operator is preserved) - Each
...gets a unique binding name (..._0,..._1, etc.) - Sequence metavars (including
...) are not supported withmatch: partial.
Matching modes (required - must specify one):
match: strict- Exact positional matching (no extra children allowed, ordered). Use for function calls, arrays.match: partial- Subset matching (ignores extra children, unordered). Use for object literals, JSX attributes.match: field- Field-based matching (matches children by tree-sitter field name instead of position, ignores extra source fields not in pattern, preserves order within each field). Use for definitions with decorators/attributes.
A replacement line may use a separator character as its prefix instead of +,
followed by a space. The prefix character IS the join string between expanded
elements, except ~ which stands for newline. The line must reference at least
one sequence metavar.
Any punctuation character that is not a reserved spatch role marker (-, +,
space, tab) and not an identifier character ($, letters, digits, _) is valid.
Common conventions:
~ $VAR — expand $VAR, joining with newline (~ stands for \n)
, $VAR — expand $VAR, joining with ","
; $VAR — expand $VAR, joining with ";"
| $VAR — expand $VAR, joining with "|"
Other characters like !, &, . also work and are used literally as the separator.
The sequence variable(s) must be declared as metavar $VAR: sequence in the preamble.
Verbatim expansion (no following section): elements are joined with the specified separator and substituted directly. Use Match.transform as normal.
Transform expansion (with following @@ section): if a subsequent section declares on $VAR matching an expansion line's variable, that section's transform is applied to each element and the results are joined. Use Match.transform_nested to enable this.
@@
match: strict
metavar $BEFORE: sequence
metavar $AFTER: sequence
@@
- import { $BEFORE Stack $AFTER } from "@mui/system";
+ import {
, $BEFORE $AFTER
+ } from "@mui/system";
+ import { Stack } from "@mui/not.system";
$BEFORE and $AFTER elements are gathered and comma-joined.
@@
match: strict
metavar $TAG: single
metavar $PROPS: sequence
@@
- matchStringExhaustive($TAG, {
- $PROPS
- });
+ match($TAG)
~ $PROPS
+ .exhaustive();
@@
match: field
on $PROPS
metavar $KEY: single
metavar $VAL: single
@@
- $KEY: $VAL
+ .with("$KEY", $VAL)
The inner section's transform (- $KEY: $VAL / + .with("$KEY", $VAL)) is applied to each element node in $PROPS; results are joined with newline.
- Expansion prefix lines are valid only in the replacement side (like
+lines). - Expansion vars must be declared as
sequencemetavars and must appear in the match side. - Inner sections used for transform expansion cannot themselves contain expansion lines.
on $VARin an inner section targets an expansion slot when the var matches; this is distinct from context-nesting use ofon $VAR.