-
Notifications
You must be signed in to change notification settings - Fork 2.6k
LanceDB For Code Index Vector Store #6535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added LanceDBManager class to handle installation and verification of LanceDB binaries. - Introduced methods for checking current platform, installing dependencies, and cleaning up. - Updated WebviewMessage interface to include new vector store provider options. - Enhanced storage utility functions to support synchronous operations for custom storage paths. - Updated CodeIndexPopover component to handle new settings for vector store provider and local directory. - Added internationalization support for new settings in multiple languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the LanceDB implementation and found several areas that need attention before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: Using require() with dynamic module loading could pose security risks. Consider using static imports or implementing additional validation to ensure only the intended module is loaded.
| this.lancedbModule = require("@lancedb/lancedb") | |
| // Dynamically import LanceDB | |
| this.lancedbModule = await import("@lancedb/lancedb") | |
| return this.lancedbModule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for disk space. The implementation should check available disk space before creating or expanding the database to prevent failures in low-disk scenarios. Consider adding:
// Check available disk space
const stats = await fs.statfs(this.dbPath)
const availableSpace = stats.bavail * stats.bsize
if (availableSpace < MIN_REQUIRED_SPACE) {
throw new Error(t("embeddings:vectorStore.insufficientDiskSpace"))
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path normalization might not handle all edge cases correctly, especially on Windows. Consider using a more robust path handling approach:
const normalizedPaths = filePaths.map((fp) => {
const absolutePath = path.isAbsolute(fp) ? fp : path.join(workspaceRoot, fp)
return path.relative(workspaceRoot, absolutePath).split(path.sep).join('/')
})
src/services/code-index/vector-store/__tests__/local-vector-store.spec.ts
Outdated
Show resolved
Hide resolved
76c1465 to
4b6db6b
Compare
|
Closing, see #5682 (comment) |
|
@daniel-lxs Could you please reopen this PR? |

Related GitHub Issue
Close #5682
Relate:
#6262
#6223
Description
Support LanceDB store for code index.
Test Procedure
Env
OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)
After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.
Task
use codebase_search to search Deepseek provider in the project?
Qdrant(Docker)
Disk Size
Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB
Memory
Memory: 1.7GB
My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory. The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.
Search Performance
No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.
LanceDB
Disk Size
Total: 929MB
RooCode: 355MB
Copilot: 574MB
Memory
Memory: 200MB
Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 200MB of memory usage. However, you don't have to worry about memory consumption growing linearly with the size of the code base, as LanceDB does not load all files into memory for calculation and filtering.
Search Performance
LanceDB is a vector database that, while based on the file system, still offers decent query performance. While I haven't done rigorous benchmarking, the same task typically takes around ~800 milliseconds to complete.
Build Time
10 minutes
I just tested on a machine with an N100 processor (4C 4T 1.8GHz) and only 8GB of RAM. Using Gemini's text-embedding-004 for testing and storing data on a mechanical hard drive (while VSCode itself runs on an SSD), it took just 10 minutes to complete the indexing of Roo Code (72k blocks). During the process, CPU usage was around 50%, and memory consumption stayed at about 1GB.
Pre-Submission Checklist
Screenshots / Videos
Qdrant

LanceDB

Both implementations produce identical search results.
Documentation Updates
Yes, documentation updates are required.
Additional Notes
Advantages
Disadvantages
Other
I will also submit a Sqlite-based implementation.
Important
Adds LanceDB as a new vector store option for code indexing, with configuration, UI, and backend updates to support it alongside Qdrant.
codebase-index.ts,ClineProvider.ts, andwebviewMessageHandler.ts.codebaseIndexVectorStoreProviderandcodebaseIndexLocalVectorStoreDirectory.LocalVectorStoreinlocal-vector-store.tsfor LanceDB integration.LanceDBManagerinlancedb-manager.tsto manage LanceDB dependencies.service-factory.tsto create vector store instances based on configuration.LocalVectorStoreinlocal-vector-store.spec.ts.config-manager.spec.tsto test new configuration options.CodeIndexPopover.tsxto allow selection between Qdrant and LanceDB.embeddings.jsonandsettings.jsonfiles.@lancedb/lancedbdependency inpackage.json.esbuild.mjsto exclude@lancedb/lancedbfrom the bundle.This description was created by
for 76c1465ac2810f1076795085d651f63e6f7d2af0. You can customize this summary. It will automatically update as commits are pushed.