Summary
java-codebase-rag erase reports success: true but does not delete the LadybugDB graph (code_graph.lbug). The next init then refuses with exit code 2 and points the user back to erase --yes — a deadloop. The documented "clean slate" workflow (erase --yes then init) is broken.
Reproduce
IDX=/tmp/erase-bug
java-codebase-rag init --source-root tests/bank-chat-system --index-dir "$IDX" --quiet
java-codebase-rag erase --source-root tests/bank-chat-system --index-dir "$IDX" --yes
ls "$IDX" # code_graph.lbug STILL PRESENT
java-codebase-rag init --source-root tests/bank-chat-system --index-dir "$IDX" --quiet # → exit 2
erase prints {"message": "erase completed", "success": true}, yet afterward:
$ ls -la /tmp/erase-bug
code_graph.lbug 8429568 bytes ← survives
.graph_hashes.json 20475 bytes ← survives (never targeted by erase)
cocoindex.db/ (dir) ← survives
Only the *.lance tables were dropped. The follow-up init refuses:
rc=2
{"message": "init refused: index paths already exist. Use `java-codebase-rag reprocess` ... or `java-codebase-rag erase --yes` then `init` for a clean slate.",
"non_empty_paths": ["/private/tmp/erase-bug/code_graph.lbug"], "success": false}
Expected vs actual
- Expected:
erase removes the LadybugDB graph (its help string: "Runs cocoindex drop, removes LadybugDB, and drops Lance tables"), so init afterward starts clean.
- Actual: Only the
.lance tables are dropped; the .lbug graph (plus .graph_hashes.json and cocoindex.db/) remain, and init exits 2.
Root cause
java_codebase_rag/cli.py::_cmd_erase mixes directory-only and file-only filesystem APIs that silently no-op on the wrong type:
- L625
shutil.rmtree(cfg.ladybug_path, ignore_errors=True) — but code_graph.lbug is a single regular file, not a directory. shutil.rmtree on a file raises; ignore_errors=True swallows it, so the file is never deleted. (Verified: os.path.isfile(...) == True, and shutil.rmtree(<file>, ignore_errors=True) is a confirmed no-op.)
- L628
cfg.cocoindex_db.unlink() — but cocoindex.db is a directory; unlink() on a dir raises IsADirectoryError, swallowed by the surrounding except OSError. So cocoindex.db/ survives too.
.graph_hashes.json is never targeted by erase at all.
Existing test is a false green
tests/test_java_codebase_rag_cli.py::test_init_after_erase_succeeds passes, but it doesn't catch this: it creates an empty index dir, runs erase (nothing to delete), then init. It never builds a real index first, so it never exercises "erase a real graph → re-init". A build-then-erase-then-re-init regression case is missing.
Suggested fix
Remove each path by type (handles both file and dir LadybugDB layouts — the .lbug is a file in this repo, but kuzu can also use a directory):
def _rm_any(p: Path) -> None:
try:
if p.is_dir() and not p.is_symlink():
shutil.rmtree(p, ignore_errors=True)
elif p.exists() or p.is_symlink():
p.unlink()
except OSError:
pass
if cfg.ladybug_path.exists():
_rm_any(cfg.ladybug_path)
if cfg.cocoindex_db.exists():
_rm_any(cfg.cocoindex_db)
# also clear cfg.index_dir / ".graph_hashes.json"
…plus a regression test that builds a real index, erases, asserts code_graph.lbug is gone, and that a subsequent init succeeds.
Environment
Summary
java-codebase-rag erasereportssuccess: truebut does not delete the LadybugDB graph (code_graph.lbug). The nextinitthen refuses with exit code 2 and points the user back toerase --yes— a deadloop. The documented "clean slate" workflow (erase --yestheninit) is broken.Reproduce
eraseprints{"message": "erase completed", "success": true}, yet afterward:Only the
*.lancetables were dropped. The follow-upinitrefuses:Expected vs actual
eraseremoves the LadybugDB graph (its help string: "Runs cocoindex drop, removes LadybugDB, and drops Lance tables"), soinitafterward starts clean..lancetables are dropped; the.lbuggraph (plus.graph_hashes.jsonandcocoindex.db/) remain, andinitexits 2.Root cause
java_codebase_rag/cli.py::_cmd_erasemixes directory-only and file-only filesystem APIs that silently no-op on the wrong type:shutil.rmtree(cfg.ladybug_path, ignore_errors=True)— butcode_graph.lbugis a single regular file, not a directory.shutil.rmtreeon a file raises;ignore_errors=Trueswallows it, so the file is never deleted. (Verified:os.path.isfile(...) == True, andshutil.rmtree(<file>, ignore_errors=True)is a confirmed no-op.)cfg.cocoindex_db.unlink()— butcocoindex.dbis a directory;unlink()on a dir raisesIsADirectoryError, swallowed by the surroundingexcept OSError. Sococoindex.db/survives too..graph_hashes.jsonis never targeted byeraseat all.Existing test is a false green
tests/test_java_codebase_rag_cli.py::test_init_after_erase_succeedspasses, but it doesn't catch this: it creates an empty index dir, runserase(nothing to delete), theninit. It never builds a real index first, so it never exercises "erase a real graph → re-init". A build-then-erase-then-re-init regression case is missing.Suggested fix
Remove each path by type (handles both file and dir LadybugDB layouts — the
.lbugis a file in this repo, but kuzu can also use a directory):…plus a regression test that builds a real index, erases, asserts
code_graph.lbugis gone, and that a subsequentinitsucceeds.Environment
.venv,master@6307c19(post PRs perf(graph): bulk COPY FROM for _write_edges (PR-P1) #341–345).tests/bank-chat-systemfixture (8.4 MB graph).