Summary
Follow-up to #346 / PR #348. PR #348 fixed erase to actually delete code_graph.lbug, cocoindex.db, and .graph_hashes.json. But erase still leaves the incremental-rebuild crash marker (.graph_increment_in_progress) on disk. After erase → init, the marker survives init (the full-rebuild path never touches it), so the next increment sees the stale marker and silently falls back to a full rebuild — a one-time perf hit that is only explained under --verbose.
This matters for this codebase specifically because incremental builds are known to crash (kuzu/ladybug SIGSEGV — see project history), so a leftover crash marker is not purely hypothetical.
Reproduce
IDX=/tmp/erase-marker
java-codebase-rag init --source-root tests/bank-chat-system --index-dir "$IDX" --quiet
# Simulate a crashed increment leaving its marker behind:
: > "$IDX/.graph_increment_in_progress"
java-codebase-rag erase --source-root tests/bank-chat-system --index-dir "$IDX" --yes
ls -a "$IDX" | grep graph_increment # STILL PRESENT
java-codebase-rag init --source-root tests/bank-chat-system --index-dir "$IDX" --quiet # full rebuild (does NOT clear the marker)
java-codebase-rag increment --source-root tests/bank-chat-system --index-dir "$IDX" --verbose # logs: "[increment] crash marker exists; falling back to full rebuild"
erase prints success: true, yet afterward:
$ ls -a "$IDX" | grep graph_increment
.graph_increment_in_progress ← survives erase AND init
Expected vs actual
- Expected:
erase removes all graph-builder bookkeeping so a subsequent lifecycle starts clean; increment after erase → init runs incrementally.
- Actual:
.graph_increment_in_progress survives erase (and init), so the next increment does an unnecessary full rebuild, explained only by a --verbose stderr line.
Root cause
java_codebase_rag/cli.py::_cmd_erase enumerates a fixed list of artifacts (code_graph.lbug, cocoindex.db, .graph_hashes.json) and removes only those. The crash marker lives at index_dir / ".graph_increment_in_progress" (build_ast_graph.py:3814), created by incremental_rebuild and removed only on a clean exit / except handler — a SIGKILL/OOM/native crash leaves it behind. erase never targets it, and the full-rebuild path (write_ladybug) never clears it, so it persists across erase and init until the next increment clears it via the full-rebuild fallback (build_ast_graph.py:3815-3821).
Suggested fix
Have erase also remove .graph_increment_in_progress. Rather than hardcoding another literal in cli.py, prefer a single source of truth: export the builder-owned file paths from build_ast_graph.py (or declare them on ResolvedOperatorConfig) so erase and the builder can't drift. See the related .graph_hashes.json.tmp follow-up.
Environment
Summary
Follow-up to #346 / PR #348. PR #348 fixed
eraseto actually deletecode_graph.lbug,cocoindex.db, and.graph_hashes.json. Buterasestill leaves the incremental-rebuild crash marker (.graph_increment_in_progress) on disk. Aftererase → init, the marker survivesinit(the full-rebuild path never touches it), so the nextincrementsees the stale marker and silently falls back to a full rebuild — a one-time perf hit that is only explained under--verbose.This matters for this codebase specifically because incremental builds are known to crash (kuzu/ladybug
SIGSEGV— see project history), so a leftover crash marker is not purely hypothetical.Reproduce
eraseprintssuccess: true, yet afterward:Expected vs actual
eraseremoves all graph-builder bookkeeping so a subsequent lifecycle starts clean;incrementaftererase → initruns incrementally..graph_increment_in_progresssurviveserase(andinit), so the nextincrementdoes an unnecessary full rebuild, explained only by a--verbosestderr line.Root cause
java_codebase_rag/cli.py::_cmd_eraseenumerates a fixed list of artifacts (code_graph.lbug,cocoindex.db,.graph_hashes.json) and removes only those. The crash marker lives atindex_dir / ".graph_increment_in_progress"(build_ast_graph.py:3814), created byincremental_rebuildand removed only on a clean exit /excepthandler — aSIGKILL/OOM/native crash leaves it behind.erasenever targets it, and the full-rebuild path (write_ladybug) never clears it, so it persists acrosseraseandinituntil the nextincrementclears it via the full-rebuild fallback (build_ast_graph.py:3815-3821).Suggested fix
Have
erasealso remove.graph_increment_in_progress. Rather than hardcoding another literal incli.py, prefer a single source of truth: export the builder-owned file paths frombuild_ast_graph.py(or declare them onResolvedOperatorConfig) soeraseand the builder can't drift. See the related.graph_hashes.json.tmpfollow-up.Environment
.venv,master@c92d9fb+ PR fix(cli): erase removes graph/cocoindex.db/.graph_hashes.json by type (#346) #348 (679d22c).