Skip to content

[BUG] Codebase indexing and embedding is extremely slow and unstable on low-end hardware (local and cloud models, including OpenAI-compatible local endpoints) #8875

@DScoNOIZ

Description

@DScoNOIZ

Problem (one or two sentences)

On low-end hardware (e.g., 4-core i5,16GB RAM, GTX 1050), Roo-Code's codebase indexing and embedding process is extremely slow, unstable, and often fails to complete. The application (including the VS Code extension UI) becomes unresponsive during indexing, especially with large codebases or when using local embedding models (Ollama, KoboldCPP, OpenAI-compatible local APIs, etc.). Indexing gets stuck, restarts, or never finishes, making it almost impossible to use the tool in such environments.

Context (who is affected and when)

All users with modest or older hardware are affected, especially when trying to index large repositories or use local embedding providers. The issue is more severe for those who rely on open-source, privacy-friendly or offline models (e.g., Ollama, KoboldCPP, or any LLM via OpenAI-compatible API), but even OpenAI-compatible and cloud models show poor performance and stability. This impacts both individual developers and small teams who cannot afford top hardware or cloud GPU subscriptions.

Reproduction steps

  1. Environment: Windows 10/11, Intel i5 (4-core), 8GB RAM, GTX 1050, local disk, Roo-Code extension latest version.
  2. Add a large or even medium-size repository to workspace.
  3. Configure Roo-Code to use local embedding model (Ollama, KoboldCPP, OpenAI-compatible local endpoint), or cloud embedding (OpenAI-compatible, Gemini, etc.).
  4. Start codebase indexing and observe:
    • UI and VS Code extension freeze or become unresponsive.
    • Indexing process gets stuck, restarts, or fails to complete.
    • System resources (CPU, RAM, disk) are maxed out.
    • Even after hours, indexing may not finish or indexes only partial codebase.
    • The same is observed for local OpenAI-compatible endpoints running local models.

Expected result

Indexing process should adapt to system resources, throttle itself, and always remain stable and responsive, even on weaker hardware. UI should remain usable, and indexing should complete (even if slowly) without frequent failures or freezes. There should be clear feedback about progress and actionable error messages if there are problems.

Actual result

Indexing is extremely slow, often freezes, causes the VS Code UI to hang, and may never finish. Local models (Ollama, KoboldCPP, OpenAI-compatible endpoints running local models) are especially problematic, but even cloud models can stall. Large repos are nearly impossible to index. No clear progress or helpful error messages are shown. Indexing process frequently resets or aborts. The same symptoms occur with local embedding models running via OpenAI-compatible API endpoints—constant errors, resets, freezes, and incomplete indexing.

Variations tried (optional)

Tried different local and cloud models (Ollama, KoboldCPP, OpenAI, Gemini, etc.); adjusted model dimension and size; tested on Linux and Windows; tried with smaller repos (slightly better but still problematic); disabled other extensions; no improvement. The problem is equally bad for OpenAI-compatible local endpoints.

App Version

v3.3.1 (latest at time of reporting)

API Provider (optional)

LOCAL API KoboldCPP, local LLMs via OpenAI-compatible endpoints, etc

Model Used (optional)

Various (nomic-embed-text, text-embedding-3-small, KoboldCPP, local LLMs via OpenAI-compatible endpoints, etc.)

Roo Code Task Links (optional)

No response

Relevant logs or errors (optional)

Key technical findings and suggestions (deep dive):

- Codebase indexing uses hardcoded batching and concurrency parameters (`BATCH_SEGMENT_THRESHOLD`, `BATCH_PROCESSING_CONCURRENCY`, etc.) that do not adapt to system capabilities. On weak systems, too many parallel tasks or large batches lead to RAM/CPU exhaustion and freezing.
- There is no auto-throttling or "safe mode"—if system resources are exhausted, the tool does not reduce batch size or parallelism. The process can stall or crash, especially with local models (including OpenAI-compatible endpoints running local models).
- Indexing queue, batch and embedding logic are all in main extension process, which can block VS Code UI when overloaded.
- No visible config or UI settings to reduce batch size, concurrency, or enable "low resource" mode; these should be user-adjustable.
- Error handling and progress feedback are insufficient: when things break, user sees only freezes or generic errors, but not actionable advice.
- Even with cloud models, large codebases are almost impossible to index on modest hardware.
- Recommendation: expose concurrency/batch parameters in UI; add auto-throttle based on resource usage; provide progress, resource stats, and smart error messages; consider a mode for gradual, background indexing that never freezes UI.

Code references and details available on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions