Skip to content

Bug: erase leaves the LadybugDB graph on disk; subsequent init refuses (exit 2) #346

Description

@HumanBean17

Summary

java-codebase-rag erase reports success: true but does not delete the LadybugDB graph (code_graph.lbug). The next init then refuses with exit code 2 and points the user back to erase --yes — a deadloop. The documented "clean slate" workflow (erase --yes then init) is broken.

Reproduce

IDX=/tmp/erase-bug
java-codebase-rag init  --source-root tests/bank-chat-system --index-dir "$IDX" --quiet
java-codebase-rag erase --source-root tests/bank-chat-system --index-dir "$IDX" --yes
ls "$IDX"                  # code_graph.lbug STILL PRESENT
java-codebase-rag init  --source-root tests/bank-chat-system --index-dir "$IDX" --quiet   # → exit 2

erase prints {"message": "erase completed", "success": true}, yet afterward:

$ ls -la /tmp/erase-bug
code_graph.lbug     8429568 bytes   ← survives
.graph_hashes.json    20475 bytes   ← survives (never targeted by erase)
cocoindex.db/         (dir)         ← survives

Only the *.lance tables were dropped. The follow-up init refuses:

rc=2
{"message": "init refused: index paths already exist. Use `java-codebase-rag reprocess` ... or `java-codebase-rag erase --yes` then `init` for a clean slate.",
 "non_empty_paths": ["/private/tmp/erase-bug/code_graph.lbug"], "success": false}

Expected vs actual

  • Expected: erase removes the LadybugDB graph (its help string: "Runs cocoindex drop, removes LadybugDB, and drops Lance tables"), so init afterward starts clean.
  • Actual: Only the .lance tables are dropped; the .lbug graph (plus .graph_hashes.json and cocoindex.db/) remain, and init exits 2.

Root cause

java_codebase_rag/cli.py::_cmd_erase mixes directory-only and file-only filesystem APIs that silently no-op on the wrong type:

  • L625 shutil.rmtree(cfg.ladybug_path, ignore_errors=True) — but code_graph.lbug is a single regular file, not a directory. shutil.rmtree on a file raises; ignore_errors=True swallows it, so the file is never deleted. (Verified: os.path.isfile(...) == True, and shutil.rmtree(<file>, ignore_errors=True) is a confirmed no-op.)
  • L628 cfg.cocoindex_db.unlink() — but cocoindex.db is a directory; unlink() on a dir raises IsADirectoryError, swallowed by the surrounding except OSError. So cocoindex.db/ survives too.
  • .graph_hashes.json is never targeted by erase at all.

Existing test is a false green

tests/test_java_codebase_rag_cli.py::test_init_after_erase_succeeds passes, but it doesn't catch this: it creates an empty index dir, runs erase (nothing to delete), then init. It never builds a real index first, so it never exercises "erase a real graph → re-init". A build-then-erase-then-re-init regression case is missing.

Suggested fix

Remove each path by type (handles both file and dir LadybugDB layouts — the .lbug is a file in this repo, but kuzu can also use a directory):

def _rm_any(p: Path) -> None:
    try:
        if p.is_dir() and not p.is_symlink():
            shutil.rmtree(p, ignore_errors=True)
        elif p.exists() or p.is_symlink():
            p.unlink()
    except OSError:
        pass

if cfg.ladybug_path.exists():
    _rm_any(cfg.ladybug_path)
if cfg.cocoindex_db.exists():
    _rm_any(cfg.cocoindex_db)
# also clear cfg.index_dir / ".graph_hashes.json"

…plus a regression test that builds a real index, erases, asserts code_graph.lbug is gone, and that a subsequent init succeeds.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions