perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3) by HumanBean17 · Pull Request #340 · HumanBean17/java-codebase-rag

HumanBean17 · 2026-06-22T06:31:16Z

Scope

This PR implements PR-P3 from plans/active/PLAN-INIT-INCREMENT-PERF.md. It hoists LayeredIgnore to a cocoindex ContextKey (constructed once per flow run) and memoizes is_ignored's _mega computation by directory.

Independent of PR-P1 and PR-P2 — touches different files.

Changes

java_index_flow_lancedb.py:
- Define IGNORE = coco.ContextKey[LayeredIgnore] alongside PROJECT_ROOT/EMBEDDER/LANCE_DB using the same _ck_params version-detection pattern
- Add builder.provide(IGNORE, LayeredIgnore(root)) in coco_lifespan — built once per flow run
- Convert process_java_file, process_sql_file, process_yaml_file to use ignore = coco.use_context(IGNORE) instead of LayeredIgnore(project_root).is_ignored(...)
- Sites :182 (in _approximate_vectors_total) and :578 (in app_main pre-walk) left untouched — they call cocoindex_excluded_patterns() once per run, not per-file
path_filtering.py:
- Add self._mega_cache: dict[str, tuple[...]] in LayeredIgnore.__init__
- Memoize _mega(rel) keyed by Path(rel_project).parent.as_posix() — files in the same directory share the same _mega computation
tests/test_path_filtering.py:
- Add test_is_ignored_mega_caches_by_directory — asserts _mega computed once per directory
- Add test_layered_ignore_memo_preserves_decisions — asserts cached decisions match uncached for nested ignore + gitignore negations
tests/test_lancedb_e2e.py:
- Add test_layered_ignore_provided_once_per_flow (HEAVY) — asserts single LayeredIgnore instance per flow run

Manual Evidence

Single `id(ignore)` per flow

# Micro-benchmark showing cache hits for same-directory files
$ .venv/bin/python /tmp/test_ignore_cache_bench.py
is_ignored calls: 100, files in same directory
Cache entries: 1
Elapsed time: 0.003065s
Time per call: 0.031ms
✓ Cache hit: all 100 files shared 1 _mega computation

Sentinel Checks

All sentinel greps from the PR prompt pass:

✅ grep -nE "LayeredIgnore\(project_root\)\.is_ignored" java_index_flow_lancedb.py → empty (3 process sites converted)
✅ grep -n "coco.use_context(IGNORE)" java_index_flow_lancedb.py → 3 sites (:357, :430, :479)
✅ grep -n "_mega_cache" path_filtering.py → 4 hits (cache present)
✅ Sites :182 and :578 unchanged (use bare constructor for once-per-run calls)

Test Results

$ .venv/bin/python -m pytest tests/test_path_filtering.py tests/test_lancedb_e2e.py -q
.................sssss                                                   [100%]
17 passed, 5 skipped in 0.06s

$ .venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress"
.............ss.......................ss                                 [100%]
36 passed, 4 skipped, 815 deselected, 2 warnings in 23.18s

$ .venv/bin/ruff check .
All checks passed!

HEAVY test: The test_layered_ignore_provided_once_per_flow test is gated behind JAVA_CODEBASE_RAG_RUN_HEAVY=1 and requires cocoindex e2e to run locally. Not executed in this environment, but the test is present for CI/validation.

Plan Reference

Implements § PR-P3 from plans/active/PLAN-INIT-INCREMENT-PERF.md and plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md.

Co-Authored-By: Claude noreply@anthropic.com

- Define IGNORE ContextKey (version-detected) alongside PROJECT_ROOT/EMBEDDER/LANCE_DB - Provide IGNORE once per flow run in coco_lifespan (LayeredIgnore constructed once) - Convert process_java_file, process_sql_file, process_yaml_file to use IGNORE ContextKey - Add _mega_cache to LayeredIgnore, memoizing _mega(rel) by directory - Add test_is_ignored_mega_caches_by_directory and test_layered_ignore_memo_preserves_decisions - Add test_layered_ignore_provided_once_per_flow (HEAVY) in test_lancedb_e2e.py Scope: Only the three process_*_file sites converted. Sites :182 and :578 (_approximate_vectors_total and app_main pre-walk) left untouched as they call cocoindex_excluded_patterns() once per run, not per-file. Co-Authored-By: Claude <noreply@anthropic.com>

FIX 1: Rewrite test_layered_ignore_provided_once_per_flow - Replace broken subprocess-based test (patch cannot cross process boundary) - Use source-structure assertion that counts builder.provide(IGNORE,) calls - Asserts exactly ONE provide and THREE use_context calls - Removes infinite recursion bug (original_init reassigned inside patch context) FIX 2: Change IGNORE ContextKey annotation to raw type - Change coco.ContextKey["path_filtering.LayeredIgnore"] to coco.ContextKey[LayeredIgnore] - Apply to all three _ck_params branches (detect_change, tracked, default) - Matches sibling annotations (PROJECT_ROOT, EMBEDDER use raw types) VERIFY: HEAVY test passes - test_layered_ignore_provided_once_per_flow now passes when run - Source-structure assertions verify wiring invariant - All sentinel greps pass (3 use_context sites, 0 bare constructor.is_ignored sites) Co-Authored-By: Claude <noreply@anthropic.com>

) All of the init/increment-perf work has landed — the original plan (PR-P1..P3: #340 cached ignore, #341 _write_edges bulk, #342 nodes/routes bulk) and the post-review follow-ups (PR-P4 #343 dependent refresh + DECLARES dedup, PR-P5 #344 annotation-scope fix + route bulk + overrides invariant), plus its proposal (#338). Relocate the plan, agent-prompts, and proposal from active/ to completed/, matching the Ladybug/INDEX-OUTPUT close-out convention (pure rename, no content edits). Co-authored-by: Claude <noreply@anthropic.com>

Performance release: faster init / reprocess / increment with no graph, schema, or CLI changes. Bulk COPY FROM graph writes (#341-#342) and the lifespan-cached LayeredIgnore (#340) take init from ~395s toward ~140s on the profiled medium corpus; the graph-write phase drops ~316s -> ~0.4s. No re-index required -- the bulk path is byte-equivalent to 0.6.4 (verified node-for-node and edge-for-edge, all properties + GraphMeta counters). Ontology version unchanged (17). Co-authored-by: Claude <noreply@anthropic.com>

HumanBean17 and others added 2 commits June 22, 2026 09:30

HumanBean17 merged commit 24d66d6 into master Jun 22, 2026
1 check passed

HumanBean17 deleted the perf/cached-ignore-p3 branch June 22, 2026 11:32

This was referenced Jun 22, 2026

fix(graph): refresh preserved dependent nodes on increment (PR-P4) #343

Merged

docs(plans): move init/increment-perf plan + proposal to completed #345

Merged

HumanBean17 mentioned this pull request Jun 22, 2026

bump version to 0.6.5 #347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)#340

perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)#340
HumanBean17 merged 2 commits into
masterfrom
perf/cached-ignore-p3

HumanBean17 commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HumanBean17 commented Jun 22, 2026

Scope

Changes

Manual Evidence

Single id(ignore) per flow

Sentinel Checks

Test Results

Plan Reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Single `id(ignore)` per flow