Skip to content

Commit d472529

Browse files
rabbiveeshclaude
andauthored
feat: workspace indexing + subprocess removal + workspace/symbol (#14)
* refactor: remove subprocess isolation for module resolution tree-sitter-perl is stable — subprocess overhead and JSON serialization boundary no longer justified. Production now uses the same direct in-process parsing path that tests always used. Deleted: - parse_in_subprocess() — subprocess spawn + 5s timeout + SIGKILL - subprocess_main() — JSON serialization of ExportedSub metadata - --parse-exports CLI mode in main.rs - subprocess_main forwarding in module_index.rs - PARSE_TIMEOUT constant - #[cfg(not(test))] / #[cfg(test)] split on parse_module() - inferred_type_from_tag import (was only used for JSON deserialization) Net: -253 lines. One code path for production and tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: workspace indexing with Rayon + workspace/symbol search Workspace indexing: - Scans workspace root for *.pm, *.pl, *.t files using `ignore` crate (respects .gitignore, skips blib/, node_modules/, etc.) - Parallel parsing with Rayon (defaults to num_cpus threads) - catch_unwind per file for defense-in-depth, 1MB file size cap - Background-spawned from initialized(), non-blocking - File watchers registered for create/change/delete via workspace/didChangeWatchedFiles workspace/symbol: - Searches open documents (freshest) then workspace index - Fuzzy substring match on symbol name - Returns subs, methods, packages, classes with location New dependencies: rayon = "1", ignore = "0.4" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add progress reporting for workspace indexing + spawn_blocking for file watchers - window/workDoneProgress begin/end for workspace indexing (matches cpanfile indexing pattern) - did_change_watched_files now uses spawn_blocking to avoid blocking the async runtime on bulk file changes (branch switch, git checkout) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: cross-file rename, prepareRename, range formatting, linked editing Cross-file rename (Part C): - RenameKind enum determines single-file vs cross-file scope - Variable rename stays single-file (lexical scope) - Function/method/package rename searches open docs + workspace index - rename_function, rename_method, rename_package on FileAnalysis prepareRename: - Returns range + placeholder for symbol/ref at cursor - Capability registered with prepare_provider: true Range formatting (Part E): - Extracts selected lines, pipes to perltidy, returns edits - Reuses the existing perltidy integration pattern Linked editing range (Part F): - When cursor is on a symbol, all references become editable ranges - Reuses find_references for the range collection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: e2e tests for single-file and cross-file rename Single-file: $pi → $tau in sample.pl — applies edit, verifies all occurrences renamed, then undoes to restore original content. Cross-file: process → execute on $worker->process() in inheritance.pl — verifies WorkspaceEdit contains edits in both inheritance.pl AND BaseWorker.pm without applying (no file modification). Added rename() and apply_workspace_edit() helpers to test/lsp.lua. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: method rename replaces only the method name, not the whole expression MethodCall refs now store method_name_span separately from the expression span. rename_method uses it for precise name-only replacement. Before: $self->process({ input => 1 }) → execute After: $self->execute({ input => 1 }) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rename_sub searches both FunctionCall and MethodCall refs In Perl, sub foo can be called as foo() or $obj->foo(). The old rename_function only searched FunctionCall, rename_method only searched MethodCall. Cross-file rename from a sub definition missed all method call sites. Unified into rename_sub which searches both ref kinds. Backend dispatch uses rename_sub for both RenameKind::Function and RenameKind::Method. Unit test confirms: sub emit + emit('event') + $self->emit('done') all found by rename_sub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: CLI modes for rename and workspace-symbol + fixes --rename <root> <file> <line> <col> <new_name>: index workspace, perform cross-file rename, output JSON edits. Canonicalizes paths so file_path matches workspace_index keys. --workspace-symbol <root> <query>: index workspace, search symbols. --help / --version for discoverability. Fixes from review: - Path canonicalization for workspace_index key matching - Target file pulled from workspace_index (no separate parse) - DRY span_to_json helper for edit serialization - Variable/HashKey single-file cases consolidated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hash key rename with tree context + Moo constructor arg keys Bug A partial fix: rename_at_with_tree passes tree/source through to find_references so hash key owner resolution works when the tree is available (LSP path). Hash key rename for function-return keys works end-to-end. Direct hash assignment keys remain a known limitation (Bug B from the catalogue — needs symbol_at to find HashKeyDefs). Bug D fix: visit_has_call now synthesizes HashKeyDef symbols owned by "new" for each Moo/Moose/Mojo has attribute. Foo->new(username => ...) connects to `has username`. Also: - symbols::rename gains tree/source params, backend passes them - rename_at_with_tree public API for callers that have tree context - Unit test for Moo constructor HashKeyDef synthesis - Unit test for rename_sub covering both call kinds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add ref coverage + provenance principles to architecture rules Two new architecture rules in CLAUDE.md: Rule 7: Every meaningful token gets a ref. If a user can put their cursor on a token and it means something, ref_at() must return a ref for it. Documents known gaps (fat-comma keys in call args, hash literal keys, framework constructor args) and the specificity rule (narrowest span wins when refs overlap). Rule 8: Provenance — refs should trace back to their source. Derived refs (constant folding, import re-export, framework synthesis) need traceable derivation chains. Documents six provenance chains: constant folding backwards, has→accessor→constructor→hash key, import list, return hash key→caller deref, package→file path, inherited overrides. Also updates: file map (CLI modes, no subprocess), cross-file section (subprocess removal, workspace indexing), LSP capabilities (rename cross-file, workspace/symbol, rangeFormatting, linkedEditingRange, use-line completion). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: spec ref coverage gaps + provenance for rename improvements Part 1 — Ref coverage (Rule 7): - Gap A: ref_at() specificity — narrowest span wins when refs overlap - Gap B: Fat-comma keys in call args need HashKeyAccess refs - Gap C: HashKeyDef symbols need correct selection_span for symbol_at - Gap D: find_references for hash keys — collect defs + accesses by owner - Gap E: Hash deref keys resolve via Gap D once owner chain works Part 2 — Provenance (Rule 8): - Chain 1: Constant folding backwards — folded_from span on Ref - Chain 2: Import list rename — verify emit_refs_for_strings works - Chain 3: has → accessor → constructor → hash key unified rename - Chain 4: Return hash key → caller deref (works once Gap D fixed) - Chain 5: Package rename → file rename (stretch) - Chain 6: Inherited override tracking (stretch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: e2e coverage for hash key rename + Moo constructor arg rename E2e tests added: - Hash key rename from $db_config->{host} — verifies def + access updated - Moo constructor arg rename from MooApp->new(name => ...) — verifies has def + constructor arg updated CLI fix: --rename HashKey branch now parses tree for owner resolution. Cleanup: merged rename_at and rename_at_with_tree into single rename_at with tree/source params. Zero warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1b92ea1 commit d472529

15 files changed

+1357
-276
lines changed

CLAUDE.md

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -29,30 +29,45 @@ The codebase has four layers. Data flows **down** only. Each layer may only depe
2929

3030
**Rules:**
3131

32-
1. **All tree-sitter CST traversal happens inside `build()`.** No other file should walk tree-sitter nodes, call `child_by_field_name`, iterate children, or use `TreeCursor`. The `build()` function in `builder.rs` is the single entry point that takes a `Tree` and returns a `FileAnalysis` — everything that needs the CST lives inside that call. **To add new CST-derived data:** add extraction to the relevant `visit_*` method in `builder.rs` and store the result in `FileAnalysis` (as a field on `Symbol`, a new map, etc.). The builder already visits every sub, package, class, and variable node — use that pass, don't create a second one. If the builder grows too monolithic, we can introduce builder plugins (separate functions called from `build()` that take `&mut FileAnalysis` + `&Tree` + `&[u8]`) to decouple concerns while preserving the single-entry-point invariant.
32+
1. **All tree-sitter CST traversal happens inside `build()`.** No other file should walk tree-sitter nodes, call `child_by_field_name`, iterate children, or use `TreeCursor`. The `build()` function in `builder.rs` is the single entry point that takes a `Tree` and returns a `FileAnalysis` — everything that needs the CST lives inside that call. **To add new CST-derived data:** add extraction to the relevant `visit_*` method in `builder.rs` and store the result in `FileAnalysis` (as a field on `Symbol`, a new map, etc.). The builder already visits every sub, package, class, and variable node — use that pass, don't create a second one. Builder plugins (separate modules called from `build()` that take `&mut FileAnalysis` + `&Tree` + `&[u8]`) can decouple framework-specific concerns while preserving the single-entry-point invariant.
3333

3434
2. **`file_analysis.rs` is the single source of truth.** All analysis results — symbols, refs, scopes, types, documentation, parameters — live in `FileAnalysis`. Query methods belong here. No `tree_sitter` imports allowed in this file.
3535

3636
3. **`symbols.rs` is a thin adapter.** It converts `FileAnalysis` types to LSP protocol types. It does NOT perform analysis, walk trees, or make decisions about Perl semantics. If you find yourself writing an `if` about Perl language behavior in `symbols.rs`, it belongs in `builder.rs` or `file_analysis.rs`.
3737

38-
4. **`module_resolver.rs` calls the builder, then queries `FileAnalysis`.** It should never walk the tree directly. The resolver's job is: find `.pm` file → call `builder::build()` → extract what it needs from the resulting `FileAnalysis` via query methods → serialize to `ExportedSub`/JSON.
38+
4. **`module_resolver.rs` calls the builder, then queries `FileAnalysis`.** It should never walk the tree directly. The resolver's job is: find `.pm` file → call `builder::build()` → extract what it needs from the resulting `FileAnalysis` via query methods.
3939

40-
5. **DRY: shared extraction logic goes on `FileAnalysis`.** If two code paths (e.g., subprocess JSON serialization and direct parsing) need the same data from a `FileAnalysis`, add a method to `FileAnalysis` that both call. Never duplicate the extraction loop.
40+
5. **DRY: shared extraction logic goes on `FileAnalysis`.** If two code paths need the same data from a `FileAnalysis`, add a method to `FileAnalysis` that both call. Never duplicate the extraction loop.
4141

4242
6. **`cursor_context.rs` is the exception:** it receives a tree + source for cursor-position analysis (completion context, signature help context). This is acceptable because cursor context is inherently position-dependent and runs on the already-parsed tree. It should NOT modify `FileAnalysis`.
4343

44+
7. **Every meaningful token gets a ref.** Every token the user might put their cursor on should have a `Ref` that explains what it means in context. If a token is meaningful but `ref_at()` returns nothing (or returns a wrong/too-broad ref), the builder is missing a ref emission. This is how completion, goto-def, hover, rename, and references all work — they start with the ref at the cursor position.
45+
46+
Common gaps to watch for: fat-comma keys in call arguments (`connect(timeout => 30)``timeout` needs its own `HashKeyAccess` ref, not just the enclosing `MethodCall`), hash literal keys (`{ status => 'ok' }` — the `HashKeyDef` must be findable via `symbol_at`/`ref_at`), and framework-synthesized entities (Moo `has name` should produce `HashKeyDef` entries for the constructor, not just accessor methods).
47+
48+
When multiple refs overlap at a position, `ref_at` must return the **most specific** (narrowest span). A `HashKeyAccess` for `timeout` inside a `MethodCall` for `connect` should win over the `MethodCall`.
49+
50+
8. **Provenance: refs should trace back to their source.** When a ref is derived from another value (constant folding, import re-export, framework synthesis), the derivation chain should be traceable for rename and cross-referencing. Key provenance chains:
51+
52+
- **Constant folding:** `my $m = 'process'; $self->$m()` → the `MethodCall` ref targeting `"process"` was derived from the string literal `'process'`. Renaming `process` should update the source string.
53+
- **`has` declarations:** `has name => (is => 'ro')` is a single source of truth that produces: an accessor Method symbol, a `HashKeyDef` for the constructor (`->new(name => ...)`), and a `HashKeyDef` for the internal hash (`$self->{name}`). Renaming any one should update all.
54+
- **Import lists:** `use Foo qw(bar)` — the string `bar` in the import list should rename when `sub bar` in Foo is renamed.
55+
- **Return hash keys → caller derefs:** `sub get_config { return { host => ... } }` then `$cfg->{host}` — the key `host` in the caller is derived from the return hash. `HashKeyOwner::Sub("get_config")` links them.
56+
- **Package name → file path:** Renaming `MyApp::Controller::Users` could offer to rename/move the `.pm` file.
57+
- **Inherited overrides:** Renaming `Animal::speak` should surface `Dog::speak` for coordinated rename.
58+
4459
### File map
4560

46-
- `src/main.rs` — Entry point, stdio transport, `--parse-exports` subprocess mode
47-
- `src/backend.rs``LanguageServer` trait implementation (tower-lsp), request routing
61+
- `src/main.rs` — Entry point, stdio transport, CLI modes (`--rename`, `--workspace-symbol`, `--version`)
62+
- `src/backend.rs``LanguageServer` trait implementation (tower-lsp), request routing, workspace indexing
4863
- `src/document.rs` — Document store with tree-sitter parsing
4964
- `src/file_analysis.rs` — Data model: scopes, symbols, refs, imports, type inference, priority constants
5065
- `src/builder.rs` — Single-pass CST → FileAnalysis builder (the ONLY tree-sitter consumer)
5166
- `src/pod.rs` — POD→markdown converter (tree-sitter-pod AST walk, handles nested formatting/lists/data regions)
5267
- `src/cursor_context.rs` — Cursor position analysis: completion/signature/selection context
5368
- `src/symbols.rs` — LSP adapter layer (converts FileAnalysis types to LSP types)
5469
- `src/module_index.rs` — Cross-file: public API, reverse index (`func → modules`), concurrent cache
55-
- `src/module_resolver.rs` — Background resolver thread, subprocess isolation, export extraction
70+
- `src/module_resolver.rs` — Background resolver thread, in-process parsing, workspace indexing (Rayon)
5671
- `src/module_cache.rs` — SQLite persistence, schema migrations, mtime validation
5772
- `src/cpanfile.rs` — cpanfile parsing via tree-sitter queries
5873

@@ -93,7 +108,7 @@ E2e tests use Neovim headless mode. They exercise the full LSP protocol over std
93108
- `ModuleIndex` uses a dedicated `std::thread` for filesystem I/O (never blocks tokio)
94109
- `Arc<DashMap>` shared between resolver thread and async LSP handlers
95110
- Reverse index: `DashMap<func_name, Vec<module_name>>` for O(1) exporter lookup
96-
- Export extraction uses tree-sitter in isolated subprocesses (5s timeout + SIGKILL)
111+
- Export extraction runs in-process (no subprocess isolation — tree-sitter-perl grammar is stable)
97112
- Subprocess runs the full builder on each module, then queries `FileAnalysis` for per-export metadata
98113
- `ModuleExports` stores `subs: HashMap<String, ExportedSub>` — unified per-export metadata (def_line, params, is_method, return_type, hash_keys, doc) — and `parents: Vec<String>` for inheritance chain
99114
- cpanfile parsed with tree-sitter queries at startup, deps pre-resolved with progress reporting
@@ -121,7 +136,7 @@ Post-build enrichment propagates imported return types and hash keys into the lo
121136
- `resolve_method_in_ancestors()` does DFS parent walk (matching Perl's default MRO), depth limit 20
122137
- `MethodResolution` enum: `Local { class, sym_id }` for same-file, `CrossFile { class }` for module index lookup
123138
- `complete_methods_for_class` walks ancestors, deduplicates by name (child methods shadow parent)
124-
- Cross-file: `ModuleExports.parents` stored in SQLite `parents` TEXT column (JSON array) and subprocess JSON output
139+
- Cross-file: `ModuleExports.parents` stored in SQLite `parents` TEXT column (JSON array)
125140
- `ModuleIndex.parents_cached(module_name)` returns parent list for cross-file inheritance walking
126141

127142
### Framework accessor synthesis
@@ -132,28 +147,39 @@ Post-build enrichment propagates imported return types and hash keys into the lo
132147
- Mojo::Base: `has 'name'` produces rw accessor with fluent return type `ClassName(current_package)`; `use Mojo::Base 'Parent'` also feeds `package_parents`
133148
- DBIC: `__PACKAGE__->add_columns(...)` synthesizes column accessors; `has_many`/`belongs_to`/`has_one`/`might_have` synthesize relationship accessors with typed returns
134149
- Synthesized methods are standard symbols — completion, hover, goto-def, inheritance all work automatically
135-
- Cross-file: subprocess runs full builder, so framework accessors appear in `ModuleExports.subs` for cross-file resolution
150+
- Cross-file: resolver runs full builder on each module, so framework accessors appear in `ModuleExports.subs` for cross-file resolution
136151

137152
### Cross-file param types
138153

139154
- `ExportedParam.inferred_type: Option<String>` carries body-inferred param types across file boundaries
140-
- Subprocess serializes param type as `"type"` field in JSON; deserialized in both subprocess and direct-parse paths
141-
- `SignatureInfo.param_types` delivers pre-resolved types for cross-file signature help (avoids meaningless `body_end` query)
155+
- Param type serialized as `"type"` field in JSON for SQLite storage; `SignatureInfo.param_types` delivers pre-resolved types for cross-file signature help
156+
157+
### Workspace indexing
158+
159+
- `workspace_index: Arc<DashMap<PathBuf, FileAnalysis>>` — full `FileAnalysis` for every `.pm`/`.pl`/`.t` in the workspace
160+
- Indexed at startup with Rayon `par_iter` + `ignore` crate for `.gitignore` respect. `catch_unwind` per file, 1MB size cap
161+
- File watcher via `workspace/didChangeWatchedFiles` for incremental re-indexing (runs in `spawn_blocking`)
162+
- Query priority: `documents` (open files, freshest) → `workspace_index` (all project files) → `module_index` (external `@INC` modules)
163+
- Enables `workspace/symbol` search and cross-file rename across all project files, not just open ones
164+
- Benchmarks: 274-file Mojolicious in 204ms, 657-file DBIx-Class in 167ms (release build)
142165

143166
## LSP Capabilities
144167

145168
- `textDocument/documentSymbol` — outline of subs, packages, variables, classes (with fields/methods as children)
146169
- `textDocument/definition` — go-to-def for variables (scope-aware), subs, methods (type-inferred), packages/classes, hash keys; resolves through expression chains
147170
- `textDocument/references` — scope-aware for variables, file-wide for functions/packages/hash keys; resolves through expression chains
148171
- `textDocument/hover` — shows declaration line, inferred types, return types, class-aware for methods
149-
- `textDocument/rename` — scope-aware for variables, file-wide for functions/packages/hash keys
150-
- `textDocument/completion` — scope-aware variables (cross-sigil forms), subs, methods (type-inferred with return type detail), packages, hash keys, auto-import from cached modules, deref snippets for typed references
172+
- `textDocument/rename` — scope-aware for variables; cross-file for functions/methods/packages (searches documents + workspace index); `prepareRename` support
173+
- `textDocument/completion` — scope-aware variables (cross-sigil forms), subs, methods (type-inferred with return type detail), packages, hash keys, auto-import from cached modules, deref snippets for typed references, module names on `use` lines, import lists inside `qw()`
151174
- `textDocument/signatureHelp` — parameter info with inferred types for subs/methods (signature syntax + legacy @_ pattern), triggers on `(` and `,`
152175
- `textDocument/inlayHint` — type annotations for variables (Object/HashRef/ArrayRef/CodeRef) and sub return types
153176
- `textDocument/documentHighlight` — highlight all occurrences with read/write distinction
154177
- `textDocument/selectionRange` — expand/shrink selection via tree-sitter node hierarchy
155178
- `textDocument/foldingRange` — blocks, subs, classes, pod sections
156179
- `textDocument/formatting` — shells out to perltidy (respects .perltidyrc)
180+
- `textDocument/rangeFormatting` — perltidy on selected line range
157181
- `textDocument/semanticTokens/full` — variable tokens with modifiers: scalar/array/hash, declaration, modification
158182
- `textDocument/codeAction` — auto-import for unresolved functions
183+
- `textDocument/linkedEditingRange` — simultaneous editing of all references in scope
184+
- `workspace/symbol` — search symbols across all workspace-indexed files
159185
- Diagnostics — unresolved function/method warnings (skips builtins, local subs, imported functions)

Cargo.lock

Lines changed: 114 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,7 @@ ts-parser-pod = "1"
1212
dashmap = "6"
1313
serde_json = "1"
1414
rusqlite = { version = "0.32", features = ["bundled"] }
15+
rayon = "1"
16+
ignore = "0.4"
1517
log = "0.4"
1618
env_logger = "0.11"

0 commit comments

Comments
 (0)