perf(graph): bulk COPY FROM for nodes, routes, clients/producers (PR-P2)#342
Merged
Conversation
- Convert _write_nodes_impl (shared workhorse) to bulk COPY FROM for Symbol nodes - Convert _write_routes_and_exposes (shared) to bulk COPY FROM for Route/EXPOSES/Client/Producer/DECLARES_*/HTTP_CALLS/ASYNC_CALLS - Convert _write_clients_producers_and_calls (incremental-only) to bulk COPY FROM for Client/Producer/edges - Delete _CREATE_SYMBOL, _MERGE_SYMBOL, _CREATE_ROUTE, _CREATE_EXPOSES, and 6 shared _CREATE_* constants - Retain MERGE (r:Route) dedup in _write_clients_producers_and_calls with comment - Add _existing_node_ids helper to generalize referential integrity filtering for all node types - Add column constants: _ROUTE_COLUMNS, _CLIENT_COLUMNS, _PRODUCER_COLUMNS, _REL_*_COLUMNS for routes/clients/producers - Filter existing nodes in incremental path to avoid duplicate primary key errors - Re-fetch valid_ids after bulk-loading Route/Client/Producer nodes before edge staging - Add test_incremental_bulk_write_equivalent_to_full_rebuild and test_incremental_route_merge_dedup_preserved to TestIncrementalOrchestrator Co-Authored-By: Claude <noreply@anthropic.com>
PR-P2's _write_clients_producers_and_calls (incremental global pass 5-6) applied the edge-endpoint filter to NODE writes: it filtered client/producer node rows against the existing-id set. But the caller DETACH-DELETEs every Client/Producer node immediately before invoking this, so the pre-load id set is empty by construction — the filter dropped ALL Client and Producer nodes (and their DECLARES_*/HTTP_CALLS/ ASYNC_CALLS edges) on every increment. Repro on http_caller_smoke: Client=5, Producer=3 -> 0, 0 after a single increment. Load Client/Producer nodes unconditionally (matching _write_routes_and_exposes and the old per-row path); keep the endpoint filter on EDGES only, against the post-load id set. The bug was latent: test_incremental_with_http_clients_ does_not_fall_back used a client-bearing corpus but only asserted mode=="incremental" — strengthen it to assert Client/Producer nodes survive the increment. Also drop a redundant unused module-level write_ladybug import that tripped ruff F401/F811. Co-Authored-By: Claude <noreply@anthropic.com>
This was referenced Jun 22, 2026
HumanBean17
added a commit
that referenced
this pull request
Jun 22, 2026
) All of the init/increment-perf work has landed — the original plan (PR-P1..P3: #340 cached ignore, #341 _write_edges bulk, #342 nodes/routes bulk) and the post-review follow-ups (PR-P4 #343 dependent refresh + DECLARES dedup, PR-P5 #344 annotation-scope fix + route bulk + overrides invariant), plus its proposal (#338). Relocate the plan, agent-prompts, and proposal from active/ to completed/, matching the Ladybug/INDEX-OUTPUT close-out convention (pure rename, no content edits). Co-authored-by: Claude <noreply@anthropic.com>
Merged
HumanBean17
added a commit
that referenced
this pull request
Jun 22, 2026
Performance release: faster init / reprocess / increment with no graph, schema, or CLI changes. Bulk COPY FROM graph writes (#341-#342) and the lifespan-cached LayeredIgnore (#340) take init from ~395s toward ~140s on the profiled medium corpus; the graph-write phase drops ~316s -> ~0.4s. No re-index required -- the bulk path is byte-equivalent to 0.6.4 (verified node-for-node and edge-for-edge, all properties + GraphMeta counters). Ontology version unchanged (17). Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
This PR implements PR-P2 from
plans/active/PLAN-INIT-INCREMENT-PERF.md§ PR-P2, converting the remaining graph write helpers to bulkCOPY FROM:_write_nodes_impl(shared workhorse) — stage Symbol rows then_bulk_copy(conn, "Symbol", _NODE_COLUMNS, rows)_write_routes_and_exposes(shared) — per-table staging +_bulk_copyfor Route/EXPOSES/Client/Producer/DECLARES_*/HTTP_CALLS/ASYNC_CALLS_write_clients_producers_and_calls(incremental-only) — Client/Producer/edges bulk;MERGE (r:Route)retained + commented_write_meta— left UNTOUCHED (on MERGE as planned)Deletes
_CREATE_SYMBOL,_MERGE_SYMBOL,_CREATE_ROUTE,_CREATE_EXPOSES, and 6 shared_CREATE_*constants (CLIENT/PRODUCER/DECLARES_*/HTTP_CALL/ASYNC_CALL).Referential integrity handling
Applies the lesson from PR-P1's CI failure: bulk
COPY FROMenforces referential integrity. This PR:_existing_symbol_idsto_existing_node_ids(queries all node types: Symbol, Route, Client, Producer)_existing_node_ids(conn)for EXPOSES, DECLARES_CLIENT, DECLARES_PRODUCER, HTTP_CALLS, ASYNC_CALLSvalid_idsafter node bulk-load)_write_nodes_implto avoid duplicate primary key errors in incremental pathColumn ordering fix
Enhanced
_bulk_copyto select columns in exact schema order:tbl.select(columns). Without this, pyarrow'sfrom_pylistcreates columns in dict key order (alphabetical), which doesn't match LadybugDB's schema and causes type coercion errors.Manual evidence
Full rebuild of tests/bank-chat-system with bulk writes:
rm -rf /tmp/p2-evidence && .venv/bin/python build_ast_graph.py \ --source-root tests/bank-chat-system \ --ladybug-path /tmp/p2-evidence/code_graph.lbug --verbose .venv/bin/java-codebase-rag meta --source-root tests/bank-chat-system --index-dir /tmp/p2-evidenceGraph-write phase timing (from JCIRAG_PROGRESS lines):
Meta counts (from
counts_json):Tests
All tests pass:
tests/test_incremental_graph.py— 30 tests (including 2 new:test_incremental_bulk_write_equivalent_to_full_rebuild,test_incremental_route_merge_dedup_preserved)tests/test_ast_graph_build.py— 28 teststests/test_schema_consistency.py— 8 testsSentinel greps (all pass)
Must return zero:
Must be non-zero:
Files changed
build_ast_graph.py— bulk conversions + column constants +_existing_node_idshelpertests/test_incremental_graph.py— 2 new tests inTestIncrementalOrchestrator