Skip to content

Conversation

@NaccOll
Copy link
Contributor

@NaccOll NaccOll commented Aug 14, 2025

Related GitHub Issue

Close #5682

Relate:
#6262
#6223

Description

Support LanceDB store for code index.

Test Procedure

Env

OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)

After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.

Task

use codebase_search to search Deepseek provider in the project?

Qdrant(Docker)

Disk Size

Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB

Memory
Memory: 1.7GB

My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory (Because of a permissions issue with Windows and Docker, on_disk does not work. See #6262). The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.

Search Performance

No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.

LanceDB

Disk Size

Total: 929MB
RooCode: 355MB
Copilot: 574MB

Memory

Memory: 200MB

Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 200MB of memory usage. However, you don't have to worry about memory consumption growing linearly with the size of the code base, as LanceDB does not load all files into memory for calculation and filtering.

Search Performance

LanceDB is a vector database that, while based on the file system, still offers decent query performance. While I haven't done rigorous benchmarking, the same task typically takes around ~800 milliseconds to complete.

Build Time

10 minutes

I just tested on a machine with an N100 processor (4C 4T 1.8GHz) and only 8GB of RAM. Using Gemini's text-embedding-004 for testing and storing data on a mechanical hard drive (while VSCode itself runs on an SSD), it took just 10 minutes to complete the indexing of Roo Code (72k blocks). During the process, CPU usage was around 50%, and memory consumption stayed at about 1GB.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Qdrant
捕获-QDRANT

LanceDB
捕获-Lancedb

Both implementations produce identical search results.

Documentation Updates

Yes, documentation updates are required.

Additional Notes

Advantages

  • Ready-to-use out of the box (base on npm)
  • Smaller disk usage
  • Lower memory usage
  • Inactive project indexes won't occupy your memory at all

Disadvantages

  • Users are required to install npm to dynamically download LanceDB related dependencies

Important

This PR adds support for LanceDB as a local vector store for code indexing, updating configurations, UI, and localization to accommodate the new option.

  • Behavior:
    • Adds support for LanceDB as a local vector store in local-vector-store.ts.
    • Updates config-manager.ts to include vectorStoreProvider and localVectorStoreDirectory options.
    • Modifies ClineProvider.ts and webviewMessageHandler.ts to handle new vector store settings.
  • UI and Configuration:
    • Updates CodeIndexPopover.tsx to include UI elements for selecting vector store provider and directory.
    • Adds localization strings for new settings in multiple language files, including embeddings.json and settings.json.
  • Testing and Dependencies:
    • Adds tests for LocalVectorStore in local-vector-store.spec.ts.
    • Introduces @lancedb/lancedb as a new dependency in package.json.

This description was created by Ellipsis for 2d2cafc1f89c3aa169c9079c2ca65f13308e8dc4. You can customize this summary. It will automatically update as commits are pushed.

@NaccOll NaccOll requested review from cte, jr and mrubens as code owners August 14, 2025 03:09
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Aug 14, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typographical inconsistency: This message uses a half-width colon (:) after "失敗しました" whereas other messages in the file use a full-width colon (:). For consistency, consider replacing ':' with ':'.

Suggested change
"localStoreInitFailed": "ローカルベクトルストアの初期化に失敗しました: {{errorMessage}}",
"localStoreInitFailed": "ローカルベクトルストアの初期化に失敗しました{{errorMessage}}",

This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/lexicographical suggestion: In German, compound nouns are typically formed without spaces. Consider changing "Lokaler Vektorspeicher Pfad" to "Lokaler Vektorspeicherpfad".

Suggested change
"localVectorStoreDirectoryLabel": "Lokaler Vektorspeicher Pfad",
"localVectorStoreDirectoryLabel": "Lokaler Vektorspeicherpfad",

Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the LanceDB implementation for local vector storage. The implementation shows good performance characteristics (10 minutes for 72k blocks) and comprehensive platform support.

Review Findings

Critical Issues (Must Fix):

  1. Security concern with dynamic require() (src/services/code-index/vector-store/local-vector-store.ts:59)

    • Using require("@lancedb/lancedb") dynamically could pose security risks
    • Consider using static imports or implementing additional validation
  2. Missing error handling for npm dependency (src/services/lancedb-manager.ts:159)

    • The code assumes npm is available but doesn't gracefully handle cases where npm is not installed
    • This could break the extension for users without npm
    • Should check for npm availability before attempting to use it

Important Suggestions (Should Consider):

  1. Memory leak potential (src/services/code-index/vector-store/local-vector-store.ts)

    • The db and table properties are cached but not always properly cleaned up
    • closeConnect() should be called more consistently in error paths
  2. Inconsistent file system operations

    • Mix of sync (fs) and async (fs/promises) operations throughout
    • Should standardize on async operations for better performance
  3. Missing documentation updates

    • PR mentions "documentation updates are required" but no docs are included

Minor Improvements (Nice to Have):

  1. Test coverage gaps

    • Tests heavily mock LanceDB - consider adding integration tests
  2. Platform detection robustness (src/services/lancedb-manager.ts:82)

    • Doesn't handle edge cases like FreeBSD, other Unix variants
  3. Configuration validation

    • localVectorStoreDirectory should validate path is writable before use

Positive Aspects

  • Excellent performance metrics provided
  • Comprehensive platform support (Windows, macOS, Linux with different architectures)
  • Proper cleanup and optimization methods included
  • Well-structured code following existing patterns
  • Good use of safeWriteJson for atomic writes

Overall, this is a valuable addition that addresses the community's request for local vector storage. With the critical issues addressed, this will be a great enhancement to the codebase.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 14, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 16, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to Triage in Roo Code Roadmap Aug 16, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 16, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to Triage in Roo Code Roadmap Aug 16, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 16, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 16, 2025
- Added LanceDBManager class to handle installation and verification of LanceDB binaries.
- Introduced methods for checking current platform, installing dependencies, and cleaning up.
- Updated WebviewMessage interface to include new vector store provider options.
- Enhanced storage utility functions to support synchronous operations for custom storage paths.
- Updated CodeIndexPopover component to handle new settings for vector store provider and local directory.
- Added internationalization support for new settings in multiple languages.
@NaccOll NaccOll force-pushed the local-lance-vector-store branch from 2d2cafc to 60f5146 Compare August 19, 2025 11:23
@daniel-lxs daniel-lxs marked this pull request as draft August 21, 2025 18:54
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Aug 21, 2025
@daniel-lxs
Copy link
Member

I'll move this to draft while I test it locally for now.

@hannesrudolph
Copy link
Collaborator

I tested this locally and saw significant CPU spikes. We don’t want to include this in Roo since it adds unnecessary complexity that’s better handled by existing services or products.

@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Aug 21, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request PR - Draft / In Progress size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Local Embedding and Local Vector Store for Indexing

3 participants