Skip to content

fix(config): consistent index_dir/source_root resolution for CLI and MCP#316

Merged
HumanBean17 merged 1 commit into
masterfrom
bugfix/yml-config-dirs
Jun 14, 2026
Merged

fix(config): consistent index_dir/source_root resolution for CLI and MCP#316
HumanBean17 merged 1 commit into
masterfrom
bugfix/yml-config-dirs

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Problem

A .java-codebase-rag.yml living in a subdirectory of the Java tree (e.g. my-project-context/) resolved its relative paths inconsistently between the CLI (init / increment / reprocess) and the MCP server. No single index_dir value worked for both.

Given the layout:

MyProject/
    my-project-context/.java-codebase-rag.yml   <- cwd
    .java-codebase-rag/                          <- the real index
    microservice-a/ …
Config init MCP server
source_root: ../ + index_dir: ../.java-codebase-rag index lands in ~/ (one level too high) ✅ finds the index
source_root: ../ + index_dir: .java-codebase-rag ✅ finds the index ❌ "index cannot be found"

After this PR, both rows resolve identically and correctly: source_root=MyProject, index_dir=MyProject/.java-codebase-rag.

Root cause

Two compounding bugs in config resolution.

1. index_dir and source_root used different bases (java_codebase_rag/config.py)

source_root resolved relative to the config file's directory (config_dir), but index_dir resolved relative to the already-resolved source_root. So a ../ written in index_dir (intended relative to the config file) was re-applied on top of source_root and overshot by one level~/.java-codebase-rag. The docs even contradicted themselves (CONFIGURATION.md said source_root is "relative to the config file's parent directory" but index_dir is "relative to source_root").

Fix: a YAML index_dir now resolves against config_dir — the same base as source_root. CLI/env index_dir and the default ./.java-codebase-rag stay source_root-relative (unchanged), so the common case (config at project root) is unaffected.

2. The MCP server ignored the YAML source_root field (server.py)

main() called resolve_operator_config(source_root=_project_root()), and _project_root() returns the walk-up-discovered config dir (non-None). A non-None source_root routes into the "explicit override" branch that skips the YAML source_root field. The CLI passes source_root=None, which honors the field — so the same config file produced a different effective root for init vs the MCP server.

Fix: main() now passes _source_root_for_operator_config() (JAVA_CODEBASE_RAG_SOURCE_ROOT-or-None). When the env override is unset, the MCP server runs the same walk-up + YAML-source_root-honoring path as the CLI. JAVA_CODEBASE_RAG_SOURCE_ROOT still wins when set. _project_root() is kept for the _resolve_lancedb_uri() fallback only.

Verification

  • TDD: wrote failing tests first, watched them fail for the right reason, then implemented.
  • Reproduced the original divergence against the real resolve_operator_config before and after the fix; init and MCP now agree.
  • tests/test_config.py — 2 new tests (YAML index_dir resolves against config dir, for both ../ and bare forms).
  • tests/test_mcp_server_project_root.py — 3 new tests (_source_root_for_operator_config() env-or-None semantics + init/MCP parity regression).
  • .venv/bin/ruff check . — clean.
  • .venv/bin/python -m pytest tests (no JAVA_CODEBASE_RAG_RUN_HEAVY) — 774 passed, 11 skipped.

User-visible behaviour changes

  • A YAML index_dir written relative to the config file now resolves against the config file's directory (previously: against source_root). This is a breaking change for configs where the config file lives in a subdirectory and a relative index_dir was specified — but the old behaviour was already inconsistent between init and MCP, so any such config was already broken for one of the two. The common case (config at project root, or no explicit index_dir) is unchanged.
  • The MCP server now honors a YAML source_root field it previously ignored (fixes the init/MCP divergence; also fixes the source_root used for microservice/scope detection).
  • Recommended config for a config-in-subdir layout:
    source_root: ../
    index_dir: ../.java-codebase-rag

Scope / impact

  • No ontology bump, no embedding change, no re-index required — this only changes path resolution; existing indexes remain valid at their existing locations.
  • No env-var contract change. mcp.json.example and the README zero-env-var note remain accurate.
  • Skipped a formal propose/active/ doc: this is a bounded 2-file bugfix (plus docs + tests) with the approach pre-approved in the investigation, not a feature/schema change.

🤖 Generated with Claude Code

…urce_root

A config file living in a subdirectory of the Java tree resolved
inconsistently between the CLI and the MCP server, so no single
index_dir value worked for both. Two compounding bugs:

1. _resolve_index_dir_path resolved a YAML `index_dir` relative to the
   already-resolved `source_root`, while `source_root` itself resolved
   relative to the config file's directory. A `../` in index_dir was
   re-applied on top of source_root and overshot by one level (the
   "init indexes ~/" symptom). YAML index_dir now resolves against the
   config file's directory, the same base as source_root. CLI/env
   index_dir and the default stay source_root-relative (unchanged).

2. server.main() passed source_root=_project_root() (the walk-up-
   discovered config dir) to resolve_operator_config, routing into the
   branch that treats it as an explicit override and skips the YAML
   source_root field. The CLI passes source_root=None, which honors the
   field -- so the same config produced a different effective root for
   init vs MCP (the "mcp can't find the index" symptom). main() now
   passes _source_root_for_operator_config() (env-or-None), so the MCP
   server honors YAML source_root exactly like the CLI;
   JAVA_CODEBASE_RAG_SOURCE_ROOT still wins when set.

With both fixes a config in my-context/ next to
  source_root: ../
  index_dir: ../.java-codebase-rag
resolves identically for init and the MCP server.

Docs: CONFIGURATION.md index_dir base comment + tips updated. No
ontology/embedding change; existing indexes remain valid.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit 229c544 into master Jun 14, 2026
1 check failed
HumanBean17 added a commit that referenced this pull request Jun 14, 2026
… index

run_update passed the discovered config dir as an explicit source_root to
resolve_operator_config, routing it into the branch that SKIPS the YAML
source_root field. With a config living in a subdir next to
`source_root: ../`, update then indexed that subdir (no Java) against the
real index one level up, so cocoindex treated every indexed file as removed
and deleted them — the "Updating index (Lance + graph)..." hang, and the
ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left
cocoindex.db mid-reconcile.

This is the same bug class #316 fixed for the MCP server (its docstring
warns that a non-None source_root skips the YAML field); run_update was the
last production caller still passing a discovered dir. Pass source_root=None
so the YAML source_root is honored exactly like increment/init/reprocess.
run_install is unaffected (it passes the user-confirmed Java root).

Adds a regression test mirroring the reported layout (config in
my-project-context/, source_root: ../, real index one level up) that
captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to
the YAML root, not the config dir.

No schema, ontology, embedding, or env-var change. Existing indexes remain
valid; no reindex required.

Co-Authored-By: Claude <noreply@anthropic.com>
HumanBean17 added a commit that referenced this pull request Jun 14, 2026
… index (#320)

run_update passed the discovered config dir as an explicit source_root to
resolve_operator_config, routing it into the branch that SKIPS the YAML
source_root field. With a config living in a subdir next to
`source_root: ../`, update then indexed that subdir (no Java) against the
real index one level up, so cocoindex treated every indexed file as removed
and deleted them — the "Updating index (Lance + graph)..." hang, and the
ever-growing Lance `_deletions` + 1000s+ increment after a ctrl+C left
cocoindex.db mid-reconcile.

This is the same bug class #316 fixed for the MCP server (its docstring
warns that a non-None source_root skips the YAML field); run_update was the
last production caller still passing a discovered dir. Pass source_root=None
so the YAML source_root is honored exactly like increment/init/reprocess.
run_install is unaffected (it passes the user-confirmed Java root).

Adds a regression test mirroring the reported layout (config in
my-project-context/, source_root: ../, real index one level up) that
captures the env handed to cocoindex and asserts SOURCE_ROOT resolves to
the YAML root, not the config dir.

No schema, ontology, embedding, or env-var change. Existing indexes remain
valid; no reindex required.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant