Skip to content

Conversation

@NaccOll
Copy link
Contributor

@NaccOll NaccOll commented Aug 1, 2025

Related GitHub Issue

Close #5682

Relate:
#6262
#6223

Description

Support LanceDB store for code index.

Test Procedure

Env

OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)

After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.

Task

use codebase_search to search Deepseek provider in the project?

Qdrant(Docker)

Disk Size

Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB

Memory
Memory: 1.7GB

My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory. The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.

Search Performance

No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.

LanceDB

Disk Size

Total: 929MB
RooCode: 355MB
Copilot: 574MB

Memory

Memory: 200MB

Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 200MB of memory usage. However, you don't have to worry about memory consumption growing linearly with the size of the code base, as LanceDB does not load all files into memory for calculation and filtering.

Search Performance

LanceDB is a vector database that, while based on the file system, still offers decent query performance. While I haven't done rigorous benchmarking, the same task typically takes around ~800 milliseconds to complete.

Build Time

10 minutes

I just tested on a machine with an N100 processor (4C 4T 1.8GHz) and only 8GB of RAM. Using Gemini's text-embedding-004 for testing and storing data on a mechanical hard drive (while VSCode itself runs on an SSD), it took just 10 minutes to complete the indexing of Roo Code (72k blocks). During the process, CPU usage was around 50%, and memory consumption stayed at about 1GB.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Qdrant
捕获-QDRANT

LanceDB
捕获-Lancedb

Both implementations produce identical search results.

Documentation Updates

Yes, documentation updates are required.

Additional Notes

Advantages

  • Ready-to-use out of the box (base on npm)
  • Smaller disk usage
  • Lower memory usage
  • Inactive project indexes won't occupy your memory at all

Disadvantages

  • Users are required to install npm to dynamically download LanceDB related dependencies

Other

I will also submit a Sqlite-based implementation.


Important

Adds LanceDB as a new vector store option for code indexing, with configuration, UI, and backend updates to support it alongside Qdrant.

  • Behavior:
    • Adds LanceDB as a new vector store option for code indexing in codebase-index.ts, ClineProvider.ts, and webviewMessageHandler.ts.
    • Updates configuration to include codebaseIndexVectorStoreProvider and codebaseIndexLocalVectorStoreDirectory.
    • Supports both Qdrant and LanceDB, with LanceDB using local storage.
  • Backend:
    • Implements LocalVectorStore in local-vector-store.ts for LanceDB integration.
    • Adds LanceDBManager in lancedb-manager.ts to manage LanceDB dependencies.
    • Updates service-factory.ts to create vector store instances based on configuration.
  • Testing:
    • Adds tests for LocalVectorStore in local-vector-store.spec.ts.
    • Updates config-manager.spec.ts to test new configuration options.
  • UI:
    • Updates CodeIndexPopover.tsx to allow selection between Qdrant and LanceDB.
    • Adds localization for new settings in multiple embeddings.json and settings.json files.
  • Misc:
    • Adds @lancedb/lancedb dependency in package.json.
    • Updates esbuild.mjs to exclude @lancedb/lancedb from the bundle.

This description was created by Ellipsis for 76c1465ac2810f1076795085d651f63e6f7d2af0. You can customize this summary. It will automatically update as commits are pushed.

- Added LanceDBManager class to handle installation and verification of LanceDB binaries.
- Introduced methods for checking current platform, installing dependencies, and cleaning up.
- Updated WebviewMessage interface to include new vector store provider options.
- Enhanced storage utility functions to support synchronous operations for custom storage paths.
- Updated CodeIndexPopover component to handle new settings for vector store provider and local directory.
- Added internationalization support for new settings in multiple languages.
@NaccOll NaccOll requested review from cte, jr and mrubens as code owners August 1, 2025 03:27
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Aug 1, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 1, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the LanceDB implementation and found several areas that need attention before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: Using require() with dynamic module loading could pose security risks. Consider using static imports or implementing additional validation to ensure only the intended module is loaded.

Suggested change
this.lancedbModule = require("@lancedb/lancedb")
// Dynamically import LanceDB
this.lancedbModule = await import("@lancedb/lancedb")
return this.lancedbModule

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for disk space. The implementation should check available disk space before creating or expanding the database to prevent failures in low-disk scenarios. Consider adding:

// Check available disk space
const stats = await fs.statfs(this.dbPath)
const availableSpace = stats.bavail * stats.bsize
if (availableSpace < MIN_REQUIRED_SPACE) {
  throw new Error(t("embeddings:vectorStore.insufficientDiskSpace"))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path normalization might not handle all edge cases correctly, especially on Windows. Consider using a more robust path handling approach:

const normalizedPaths = filePaths.map((fp) => {
  const absolutePath = path.isAbsolute(fp) ? fp : path.join(workspaceRoot, fp)
  return path.relative(workspaceRoot, absolutePath).split(path.sep).join('/')
})

@NaccOll NaccOll force-pushed the local-lance-vector-store branch from 76c1465 to 4b6db6b Compare August 1, 2025 07:13
@NaccOll NaccOll mentioned this pull request Aug 1, 2025
6 tasks
@daniel-lxs
Copy link
Member

Closing, see #5682 (comment)

@daniel-lxs daniel-lxs closed this Aug 2, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 2, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 2, 2025
@NaccOll
Copy link
Contributor Author

NaccOll commented Aug 13, 2025

@daniel-lxs Could you please reopen this PR?

@daniel-lxs
Copy link
Member

daniel-lxs commented Aug 13, 2025

Hey, I can't reopen it anymore, I think you need to create a new PR:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Local Embedding and Local Vector Store for Indexing

3 participants