Skip to content

perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)#340

Merged
HumanBean17 merged 2 commits into
masterfrom
perf/cached-ignore-p3
Jun 22, 2026
Merged

perf(vectors): lifespan-cached LayeredIgnore + is_ignored memo (PR-P3)#340
HumanBean17 merged 2 commits into
masterfrom
perf/cached-ignore-p3

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Scope

This PR implements PR-P3 from plans/active/PLAN-INIT-INCREMENT-PERF.md. It hoists LayeredIgnore to a cocoindex ContextKey (constructed once per flow run) and memoizes is_ignored's _mega computation by directory.

Independent of PR-P1 and PR-P2 — touches different files.

Changes

  • java_index_flow_lancedb.py:

    • Define IGNORE = coco.ContextKey[LayeredIgnore] alongside PROJECT_ROOT/EMBEDDER/LANCE_DB using the same _ck_params version-detection pattern
    • Add builder.provide(IGNORE, LayeredIgnore(root)) in coco_lifespan — built once per flow run
    • Convert process_java_file, process_sql_file, process_yaml_file to use ignore = coco.use_context(IGNORE) instead of LayeredIgnore(project_root).is_ignored(...)
    • Sites :182 (in _approximate_vectors_total) and :578 (in app_main pre-walk) left untouched — they call cocoindex_excluded_patterns() once per run, not per-file
  • path_filtering.py:

    • Add self._mega_cache: dict[str, tuple[...]] in LayeredIgnore.__init__
    • Memoize _mega(rel) keyed by Path(rel_project).parent.as_posix() — files in the same directory share the same _mega computation
  • tests/test_path_filtering.py:

    • Add test_is_ignored_mega_caches_by_directory — asserts _mega computed once per directory
    • Add test_layered_ignore_memo_preserves_decisions — asserts cached decisions match uncached for nested ignore + gitignore negations
  • tests/test_lancedb_e2e.py:

    • Add test_layered_ignore_provided_once_per_flow (HEAVY) — asserts single LayeredIgnore instance per flow run

Manual Evidence

Single id(ignore) per flow

# Micro-benchmark showing cache hits for same-directory files
$ .venv/bin/python /tmp/test_ignore_cache_bench.py
is_ignored calls: 100, files in same directory
Cache entries: 1
Elapsed time: 0.003065s
Time per call: 0.031ms
✓ Cache hit: all 100 files shared 1 _mega computation

Sentinel Checks

All sentinel greps from the PR prompt pass:

  • grep -nE "LayeredIgnore\(project_root\)\.is_ignored" java_index_flow_lancedb.py → empty (3 process sites converted)
  • grep -n "coco.use_context(IGNORE)" java_index_flow_lancedb.py → 3 sites (:357, :430, :479)
  • grep -n "_mega_cache" path_filtering.py → 4 hits (cache present)
  • ✅ Sites :182 and :578 unchanged (use bare constructor for once-per-run calls)

Test Results

$ .venv/bin/python -m pytest tests/test_path_filtering.py tests/test_lancedb_e2e.py -q
.................sssss                                                   [100%]
17 passed, 5 skipped in 0.06s

$ .venv/bin/python -m pytest tests -q -k "ignore or path_filter or vectors_progress"
.............ss.......................ss                                 [100%]
36 passed, 4 skipped, 815 deselected, 2 warnings in 23.18s

$ .venv/bin/ruff check .
All checks passed!

HEAVY test: The test_layered_ignore_provided_once_per_flow test is gated behind JAVA_CODEBASE_RAG_RUN_HEAVY=1 and requires cocoindex e2e to run locally. Not executed in this environment, but the test is present for CI/validation.

Plan Reference

Implements § PR-P3 from plans/active/PLAN-INIT-INCREMENT-PERF.md and plans/AGENT-PROMPTS-INIT-INCREMENT-PERF.md.

Co-Authored-By: Claude noreply@anthropic.com

HumanBean17 and others added 2 commits June 22, 2026 09:30
- Define IGNORE ContextKey (version-detected) alongside PROJECT_ROOT/EMBEDDER/LANCE_DB
- Provide IGNORE once per flow run in coco_lifespan (LayeredIgnore constructed once)
- Convert process_java_file, process_sql_file, process_yaml_file to use IGNORE ContextKey
- Add _mega_cache to LayeredIgnore, memoizing _mega(rel) by directory
- Add test_is_ignored_mega_caches_by_directory and test_layered_ignore_memo_preserves_decisions
- Add test_layered_ignore_provided_once_per_flow (HEAVY) in test_lancedb_e2e.py

Scope: Only the three process_*_file sites converted. Sites :182 and :578
(_approximate_vectors_total and app_main pre-walk) left untouched as they call
cocoindex_excluded_patterns() once per run, not per-file.

Co-Authored-By: Claude <noreply@anthropic.com>
FIX 1: Rewrite test_layered_ignore_provided_once_per_flow
- Replace broken subprocess-based test (patch cannot cross process boundary)
- Use source-structure assertion that counts builder.provide(IGNORE,) calls
- Asserts exactly ONE provide and THREE use_context calls
- Removes infinite recursion bug (original_init reassigned inside patch context)

FIX 2: Change IGNORE ContextKey annotation to raw type
- Change coco.ContextKey["path_filtering.LayeredIgnore"] to coco.ContextKey[LayeredIgnore]
- Apply to all three _ck_params branches (detect_change, tracked, default)
- Matches sibling annotations (PROJECT_ROOT, EMBEDDER use raw types)

VERIFY: HEAVY test passes
- test_layered_ignore_provided_once_per_flow now passes when run
- Source-structure assertions verify wiring invariant
- All sentinel greps pass (3 use_context sites, 0 bare constructor.is_ignored sites)

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit 24d66d6 into master Jun 22, 2026
1 check passed
@HumanBean17 HumanBean17 deleted the perf/cached-ignore-p3 branch June 22, 2026 11:32
HumanBean17 added a commit that referenced this pull request Jun 22, 2026
)

All of the init/increment-perf work has landed — the original plan
(PR-P1..P3: #340 cached ignore, #341 _write_edges bulk, #342 nodes/routes
bulk) and the post-review follow-ups (PR-P4 #343 dependent refresh +
DECLARES dedup, PR-P5 #344 annotation-scope fix + route bulk + overrides
invariant), plus its proposal (#338). Relocate the plan, agent-prompts,
and proposal from active/ to completed/, matching the Ladybug/INDEX-OUTPUT
close-out convention (pure rename, no content edits).

Co-authored-by: Claude <noreply@anthropic.com>
HumanBean17 added a commit that referenced this pull request Jun 22, 2026
Performance release: faster init / reprocess / increment with no graph,
schema, or CLI changes. Bulk COPY FROM graph writes (#341-#342) and the
lifespan-cached LayeredIgnore (#340) take init from ~395s toward ~140s on
the profiled medium corpus; the graph-write phase drops ~316s -> ~0.4s.

No re-index required -- the bulk path is byte-equivalent to 0.6.4 (verified
node-for-node and edge-for-edge, all properties + GraphMeta counters).
Ontology version unchanged (17).

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant