-
Notifications
You must be signed in to change notification settings - Fork 2.6k
LanceDB For Code Index Vector Store #7076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/i18n/locales/ja/embeddings.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical inconsistency: This message uses a half-width colon (:) after "失敗しました" whereas other messages in the file use a full-width colon (:). For consistency, consider replacing ':' with ':'.
| "localStoreInitFailed": "ローカルベクトルストアの初期化に失敗しました: {{errorMessage}}", | |
| "localStoreInitFailed": "ローカルベクトルストアの初期化に失敗しました:{{errorMessage}}", |
This comment was generated because it violated a code review rule: irule_C0ez7Rji6ANcGkkX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo/lexicographical suggestion: In German, compound nouns are typically formed without spaces. Consider changing "Lokaler Vektorspeicher Pfad" to "Lokaler Vektorspeicherpfad".
| "localVectorStoreDirectoryLabel": "Lokaler Vektorspeicher Pfad", | |
| "localVectorStoreDirectoryLabel": "Lokaler Vektorspeicherpfad", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the LanceDB implementation for local vector storage. The implementation shows good performance characteristics (10 minutes for 72k blocks) and comprehensive platform support.
Review Findings
Critical Issues (Must Fix):
-
Security concern with dynamic require() (
src/services/code-index/vector-store/local-vector-store.ts:59)- Using
require("@lancedb/lancedb")dynamically could pose security risks - Consider using static imports or implementing additional validation
- Using
-
Missing error handling for npm dependency (
src/services/lancedb-manager.ts:159)- The code assumes npm is available but doesn't gracefully handle cases where npm is not installed
- This could break the extension for users without npm
- Should check for npm availability before attempting to use it
Important Suggestions (Should Consider):
-
Memory leak potential (
src/services/code-index/vector-store/local-vector-store.ts)- The
dbandtableproperties are cached but not always properly cleaned up closeConnect()should be called more consistently in error paths
- The
-
Inconsistent file system operations
- Mix of sync (
fs) and async (fs/promises) operations throughout - Should standardize on async operations for better performance
- Mix of sync (
-
Missing documentation updates
- PR mentions "documentation updates are required" but no docs are included
Minor Improvements (Nice to Have):
-
Test coverage gaps
- Tests heavily mock LanceDB - consider adding integration tests
-
Platform detection robustness (
src/services/lancedb-manager.ts:82)- Doesn't handle edge cases like FreeBSD, other Unix variants
-
Configuration validation
localVectorStoreDirectoryshould validate path is writable before use
Positive Aspects
- Excellent performance metrics provided
- Comprehensive platform support (Windows, macOS, Linux with different architectures)
- Proper cleanup and optimization methods included
- Well-structured code following existing patterns
- Good use of
safeWriteJsonfor atomic writes
Overall, this is a valuable addition that addresses the community's request for local vector storage. With the critical issues addressed, this will be a great enhancement to the codebase.
- Added LanceDBManager class to handle installation and verification of LanceDB binaries. - Introduced methods for checking current platform, installing dependencies, and cleaning up. - Updated WebviewMessage interface to include new vector store provider options. - Enhanced storage utility functions to support synchronous operations for custom storage paths. - Updated CodeIndexPopover component to handle new settings for vector store provider and local directory. - Added internationalization support for new settings in multiple languages.
2d2cafc to
60f5146
Compare
|
I'll move this to draft while I test it locally for now. |
|
I tested this locally and saw significant CPU spikes. We don’t want to include this in Roo since it adds unnecessary complexity that’s better handled by existing services or products. |
Related GitHub Issue
Close #5682
Relate:
#6262
#6223
Description
Support LanceDB store for code index.
Test Procedure
Env
OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)
After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.
Task
use codebase_search to search Deepseek provider in the project?
Qdrant(Docker)
Disk Size
Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB
Memory
Memory: 1.7GB
My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory (Because of a permissions issue with Windows and Docker, on_disk does not work. See #6262). The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.
Search Performance
No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.
LanceDB
Disk Size
Total: 929MB
RooCode: 355MB
Copilot: 574MB
Memory
Memory: 200MB
Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 200MB of memory usage. However, you don't have to worry about memory consumption growing linearly with the size of the code base, as LanceDB does not load all files into memory for calculation and filtering.
Search Performance
LanceDB is a vector database that, while based on the file system, still offers decent query performance. While I haven't done rigorous benchmarking, the same task typically takes around ~800 milliseconds to complete.
Build Time
10 minutes
I just tested on a machine with an N100 processor (4C 4T 1.8GHz) and only 8GB of RAM. Using Gemini's text-embedding-004 for testing and storing data on a mechanical hard drive (while VSCode itself runs on an SSD), it took just 10 minutes to complete the indexing of Roo Code (72k blocks). During the process, CPU usage was around 50%, and memory consumption stayed at about 1GB.
Pre-Submission Checklist
Screenshots / Videos
Qdrant

LanceDB

Both implementations produce identical search results.
Documentation Updates
Yes, documentation updates are required.
Additional Notes
Advantages
Disadvantages
Important
This PR adds support for LanceDB as a local vector store for code indexing, updating configurations, UI, and localization to accommodate the new option.
local-vector-store.ts.config-manager.tsto includevectorStoreProviderandlocalVectorStoreDirectoryoptions.ClineProvider.tsandwebviewMessageHandler.tsto handle new vector store settings.CodeIndexPopover.tsxto include UI elements for selecting vector store provider and directory.embeddings.jsonandsettings.json.LocalVectorStoreinlocal-vector-store.spec.ts.@lancedb/lancedbas a new dependency inpackage.json.This description was created by
for 2d2cafc1f89c3aa169c9079c2ca65f13308e8dc4. You can customize this summary. It will automatically update as commits are pushed.