LLM‑Assisted Code Index Search (Query Rewriting + Reranking) with Shared Sub‑LLM Layer #7820

Michaelzag · 2025-09-07T15:13:22Z

Michaelzag
Sep 7, 2025

What specific problem does this solve?

Users struggle to find relevant code with plain vector search when queries are vague, natural‑language, or use synonyms that don’t match raw code text. This leads to trial‑and‑error search phrasing, extra time scanning irrelevant results, and “I know it’s here but can’t find it” frustration.

Current behavior: Embedding search returns top‑K nearest neighbors by cosine similarity across code chunks. Results are fast but not always intent‑aligned, and natural language questions (e.g., “where do we issue JWTs?”) may rank poorly vs. precise symbol/file searches.
Expected behavior: Search should understand developer intent and produce high‑precision results for natural language requests by combining fast ANN recall with a second‑stage LLM reranker and optional multi‑query expansion.
Who is affected: All users of Codebase Indexing, especially on large repos, and those using NL queries or error messages.
Impact: Less time spelunking through irrelevant results; faster navigation; better trust in search.

Additional context (optional)

This complements, not replaces, Issue #6223 (vector DB alternatives). This proposal is provider‑agnostic and focuses on retrieval quality rather than storage engines.

The LLM utility introduced here is deliberately generic so it can later be reused by Conversation/Chat Memory (#7537) for extraction, episode titles, and memory search reranking.

Roo Code Task Links (Optional)

No response

Request checklist

I've searched existing Issues and Discussions for duplicates
This describes a specific problem with clear impact and context

Interested in implementing this?

Yes, I'd like to help implement this feature

Implementation requirements

I understand this needs approval before implementation begins

How should this be solved? (REQUIRED if contributing, optional otherwise)

Introduce a shared “sub‑LLM” layer and apply two high‑ROI improvements to Code Index search in a single PR. All enhancements are opt‑in and disabled by default (no LLM calls, no added cost unless explicitly enabled). Keep behavior gated via settings and fall back gracefully on error/timeouts.

Shared sub‑LLM component (new)

Location: src/services/llm-utils/
LlmClient: provider selection (mirror chat model by default; custom override allowed); generateJson/generateText with backoff and budgets
JsonRunner: fence stripping, first‑object slice, zod validation, strict JSON contracts, sanitized logging
QueryRewriter: expand 1 NL query → 2–3 code‑aware variants (synonyms, symbol forms)
Reranker: cross‑encoder/LLM scorer for top‑K candidates returning { score: 0..1, reason? }

Code Index integration

src/services/code-index/search-service.ts
- Rewrite: create 2–3 query variants; run parallel ANN searches; union/dedupe candidates
- Rerank: score top‑K (e.g., 50) and blend with embedding score (final = 0.7*rerank + 0.3*embedding); keep timeouts fallbacks
- Optional heuristic boosts: identifier exact matches, directory proximity, language match
src/services/code-index/processors/scanner.ts
- For public/top‑level blocks only: generate a 1‑line title and 1–2 sentence summary (+ 2–5 tags)
- Augment embedding text minimally: “lang | relPath | symbol | span | summary | tags + code”
- Cache: extend CacheManager with an “aux‑hash” to recompute summaries only when code changes

Settings (opt‑in; safe defaults)

roo-cline.subLlm.enabled (default false)
roo-cline.subLlm.model.mode: mirror | custom
roo-cline.subLlm.model.provider/modelId: when custom
roo-cline.codeIndex.llm.rewriter (default false)
roo-cline.codeIndex.llm.reranker (default false)
roo-cline.codeIndex.llm.summaries (default false; reindex required)
roo-cline.codeIndex.llm.tags (default false; reindex required)
Optional budgets (only applied when enabled): maxKForRerank, maxTokensPerOp, dailyCostCapUSD

Reuse for Chat Memory (later; no extra surface)

Extraction: reuse LlmClient + JsonRunner (strict JSON)
Episode titles: reuse Summarizer
Memory search: reuse Reranker for top‑K facts/episodes

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Given a large repository with mixed languages, when I search “where do we issue JWTs”, then the first page includes the token issuance function and related auth flows, and fewer irrelevant “token” hits appear compared to embedding‑only search.
Given an error signature pasted into search, when I run search, then results prioritize code that throws/logs that error (exact or paraphrased) in the first page.
Given vague NL queries like “how do we start the server”, when I run search, then top results include the entrypoint, HTTP server setup, or framework bootstrap code.
Given settings are disabled, when I search, then behavior matches current embedding‑only search and no LLM calls are made (no additional costs).
Given reranker timeouts or errors, when I search, then system falls back to embedding‑only ordering without breaking the flow.
Given summaries/tags are enabled and reindex completes, when I search NL questions, then top results show better intent matching than pre‑summary baseline on our internal eval set.

Technical considerations (REQUIRED if contributing, optional otherwise)

Architecture: new src/services/llm-utils/ with unit tests; add small hooks in search-service.ts; augment in scanner.ts (gated by setting). When disabled, these paths short‑circuit and incur zero LLM usage.
Performance:
- Rerank: batch top‑K (≈50), 150–300 ms budget; trim candidates to ~350 tokens; timeout to embedding‑only
- Rewriter: 2–3 variants; union results; modest extra search time
Compatibility:
- Mirrors chat model by default via ApiHandler; custom model supported
- No schema changes unless enabling summaries/tags; payload additions are backward compatible
Impacted systems: Code Index search and indexing; Settings/i18n. Future reuse by Chat Memory (extraction, titles, rerank).
Blockers: None expected; ensure strict JSON validations and log redaction.

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Trade‑offs and risks (REQUIRED if contributing)

Cost/latency: Additional search latency and token cost; mitigated by batching, timeouts, and budgets
Over‑summarization: Index‑time summaries can drift; mitigate via tight caps, public/top‑level gating, and recompute on code change
Model variance: Different providers score differently; keep blending and fallbacks
Complexity: Single PR includes both search‑time improvements and gated index‑time summaries/tags; reindex required only if summaries/tags are enabled

Related issues

Support for Alternative Vector Databases in Codebase Indexing #6223 Support for Alternative Vector Databases in Codebase Indexing (complementary infra)
Conversation Memory #7537 Conversation Memory (future reuse of sub‑LLM)
Feature: Per-Workspace Codebase Indexing Control #7511 Feature: Per‑Workspace Codebase Indexing Control (align features per workspace)
LM Studio code-indexer provider #3998 LM Studio code‑indexer provider (provider expansion; orthogonal)

2025-09-07T15:15:03Z

roomote[bot]
bot Sep 7, 2025

I'm exploring a potential implementation for the LLM-assisted code index search feature. I'll work on creating the shared sub-LLM layer and integrating query rewriting and reranking capabilities as outlined in the requirements. Will open a PR to share progress.

0 replies

2025-09-07T15:29:22Z

roomote[bot]
bot Sep 7, 2025

I've made initial progress on implementing the LLM-assisted code index search feature. The core sub-LLM layer is in place with the following components:

Completed:

✅ Shared sub-LLM layer (src/services/llm-utils/)
- LlmClient with provider selection and budget management
- JsonRunner for JSON validation
- QueryRewriter for query expansion
- Reranker for search result scoring
- Summarizer for code summaries/tags
✅ Configuration settings in package.json
✅ Enhanced search service with query rewriting and reranking
✅ All features are opt-in and disabled by default

Questions before proceeding:

Integration approach: Should the EnhancedCodeIndexSearchService replace the existing CodeIndexSearchService entirely, or should we provide a toggle to switch between them?
Service initialization: Where should the enhanced services be initialized? Should this be in service-factory.ts or manager.ts?
Testing strategy: Would you prefer unit tests for each component separately, or integration tests that test the full flow?
Token counting: Should we use the existing tiktoken implementation or integrate with provider-specific token counting APIs?
Scanner integration: The enhanced scanner needs to modify the existing scanner's behavior. Should we extend DirectoryScanner or create a wrapper pattern?

The branch is pushed to feature/llm-assisted-search with the initial implementation. I'd appreciate your guidance on these integration points before completing the implementation to ensure it aligns with your architectural preferences.

0 replies

daniel-lxs · 2025-09-09T17:22:42Z

daniel-lxs
Sep 9, 2025
Collaborator

Hey @Michaelzag, thanks for the detailed proposal. Improving result relevance is definitely valuable, but I don’t think introducing an extra LLM layer for reranking is the right fit here. The purpose of code search is to provide semantically similar candidates to the query. The main model, which already has the full conversation and task context, is in a better position to judge whether those candidates are relevant or not. Adding a second model that only sees partial context risks introducing noise and extra complexity without clear benefit. In our view, the search index should stay focused on fast and broad discovery, while the main model handles interpretation and filtering.

I am open for discussion, feel free to ping me on discord if you have any thoughts on this!

0 replies

ddudek · 2025-10-07T12:40:01Z

ddudek
Oct 7, 2025

I've experimented with similar approach quite a lot with awesome results. My repo with experiments is at ddudek/bebzun-llm (works only for Java/Kotlin for now)

The main idea is:

Each file is processed with Treesitter+LLM to generate summary. Class summary + smaller entities: properties, functions (maybe could be used as code comments with your idea above)
Using Treesitter to attach dependencies was a milestone to generate good summaries. Unfortunately needs advanced processing for each language.
Summaries are used to generate embeddings and then reranking.

I also made the retrieval quite wild, but basically what worked nice:

starts with LLM used to refine the user task and generate 3 search queries. Refining helps with e.g. using bare ticket content, which often is not sematically nice.
similarity search + bm25 -> results of either a class or class::detail
(this improved quite a lot) to limit reranking size I group similar entities based on similarity results, e.g. ClassA.startAnim() + ClassB.stopAnim() is reranked a lot better than ClassA.startAnim() + ClassB.stopAnim() + ClassA.somethingCompletelyDifferent(). This is similar to what you wrote above, although grouping is not only by entity (path/classname), but also similarity search result.
Reranking is very useful. Reranking uses only summaries, not code.

This works quite nice for poorly written project with 1500+ files to find relevant code for initial context, even with using local llms (qwen3-coder 30b + 128k context, bge-m3, qwen3-reranker-4b).

If you'd like to check or experiment maybe let me know earlier and lets get in touch. I'm not sure how intuitive is it for someone else to set it up.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM‑Assisted Code Index Search (Query Rewriting + Reranking) with Shared Sub‑LLM Layer #7820

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LLM‑Assisted Code Index Search (Query Rewriting + Reranking) with Shared Sub‑LLM Layer #7820

Uh oh!

Michaelzag Sep 7, 2025

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

Shared sub‑LLM component (new)

Code Index integration

Settings (opt‑in; safe defaults)

Reuse for Chat Memory (later; no extra surface)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Trade‑offs and risks (REQUIRED if contributing)

Related issues

Replies: 4 comments

Uh oh!

roomote[bot] bot Sep 7, 2025

Uh oh!

roomote[bot] bot Sep 7, 2025

Uh oh!

Uh oh!

daniel-lxs Sep 9, 2025 Collaborator

Uh oh!

ddudek Oct 7, 2025

Michaelzag
Sep 7, 2025

roomote[bot]
bot Sep 7, 2025

roomote[bot]
bot Sep 7, 2025

daniel-lxs
Sep 9, 2025
Collaborator

ddudek
Oct 7, 2025