Summary
Follow-up to #346 / PR #348. erase now removes .graph_hashes.json, but not its atomic-write temp .graph_hashes.json.tmp. FileHashTracker.save (build_ast_graph.py:494-498) writes .graph_hashes.json.tmp then os.replaces it onto .graph_hashes.json; a crash between the write and the replace orphans the .tmp. erase does not remove it, so the orphan accumulates in the index dir across erase cycles. Pure cruft (no functional impact — the next save overwrites it) but it defeats the "clean slate" erase advertises.
Reproduce
IDX=/tmp/erase-tmp
java-codebase-rag init --source-root tests/bank-chat-system --index-dir "$IDX" --quiet
# Simulate a crashed hash-store save leaving the temp behind:
printf '{}' > "$IDX/.graph_hashes.json.tmp"
java-codebase-rag erase --source-root tests/bank-chat-system --index-dir "$IDX" --yes
ls -a "$IDX" | grep graph_hashes # .graph_hashes.json.tmp STILL PRESENT
erase prints success: true, yet afterward:
$ ls -a "$IDX" | grep graph_hashes
.graph_hashes.json.tmp ← survives (orphan)
Expected vs actual
- Expected:
erase removes the hash store and its write temp, leaving an empty index dir.
- Actual:
.graph_hashes.json.tmp survives erase; describe_path_sizes / the Will delete: preview never list it, so the orphan is invisible.
Root cause
java_codebase_rag/cli.py::_cmd_erase targets index_dir / ".graph_hashes.json" only. FileHashTracker (build_ast_graph.py:475) owns both .graph_hashes.json and the transient .graph_hashes.json.tmp, but erase hardcodes the final name only.
Suggested fix
Simplest: also remove .graph_hashes.json.tmp, or glob .graph_hashes.json*. Cleaner / shared with the crash-marker follow-up: export builder-owned file paths from build_ast_graph.py (or declare them on ResolvedOperatorConfig) so erase clears all builder state from one list and the two sides cannot drift.
Environment
Summary
Follow-up to #346 / PR #348.
erasenow removes.graph_hashes.json, but not its atomic-write temp.graph_hashes.json.tmp.FileHashTracker.save(build_ast_graph.py:494-498) writes.graph_hashes.json.tmpthenos.replaces it onto.graph_hashes.json; a crash between the write and the replace orphans the.tmp.erasedoes not remove it, so the orphan accumulates in the index dir across erase cycles. Pure cruft (no functional impact — the nextsaveoverwrites it) but it defeats the "clean slate"eraseadvertises.Reproduce
eraseprintssuccess: true, yet afterward:Expected vs actual
eraseremoves the hash store and its write temp, leaving an empty index dir..graph_hashes.json.tmpsurviveserase;describe_path_sizes/ theWill delete:preview never list it, so the orphan is invisible.Root cause
java_codebase_rag/cli.py::_cmd_erasetargetsindex_dir / ".graph_hashes.json"only.FileHashTracker(build_ast_graph.py:475) owns both.graph_hashes.jsonand the transient.graph_hashes.json.tmp, buterasehardcodes the final name only.Suggested fix
Simplest: also remove
.graph_hashes.json.tmp, or glob.graph_hashes.json*. Cleaner / shared with the crash-marker follow-up: export builder-owned file paths frombuild_ast_graph.py(or declare them onResolvedOperatorConfig) soeraseclears all builder state from one list and the two sides cannot drift.Environment
.venv,master@c92d9fb+ PR fix(cli): erase removes graph/cocoindex.db/.graph_hashes.json by type (#346) #348 (679d22c).