Sqlite For Code Index Vector Store #6471

NaccOll · 2025-07-31T03:25:35Z

Related GitHub Issue

Description

Support sqlite store for code index.

Test Procedure

Env

OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)

After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.

Task

use codebase_search to search Deepseek provider in the project?

Qdrant(Docker)

Disk Size

Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB

Memory
Memory: 1.7GB

My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory. The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.

Search Performance

No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.

Sqlite

Disk Size

Total: 718MB
RooCode: 293MB
Copilot: 424MB

Memory

Memory: 300MB

Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 300MB of memory usage. However, you don't need to worry about memory consumption growing linearly with codebase size, as the queries are processed in batches and lower-relevance vectors are discarded.

Search Performance

Since we're not using VSS (Vector Similarity Search), the search performance is inferior compared to Qdrant. While no rigorous benchmarking has been conducted, the same tasks typically take around ~1500ms to complete.

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes (if applicable).
Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Qdrant

Sqlite

Both implementations produce identical search results.

Documentation Updates

Yes, documentation updates are required.

Additional Notes

Advantages

Ready-to-use out of the box
Smaller disk usage
Lower memory usage
Inactive project indexes won't occupy your memory at all

Disadvantages

Requires NodeJS upgrade, which may introduce extension instability (though won't increase extension size)
Weaker search performance due to non-VSS implementation

Other

I will also submit a LanceDB-based implementation. It delivers faster search performance with comparable memory and disk usage, while requiring no NodeJS upgrade. The only drawback is that it needs to dynamically download dependencies during the user's first Local mode operation.

Important

Adds support for SQLite as a vector store for code indexing, updates configurations, and modifies UI for vector store selection.

Behavior:
- Adds support for SQLite as a vector store for code indexing, providing an alternative to Qdrant.
- Updates codebaseIndexConfigSchema in codebase-index.ts to include codebaseIndexVectorStoreProvider and codebaseIndexLocalVectorStoreDirectory.
- Modifies ClineProvider and webviewMessageHandler to handle new vector store configurations.
Implementation:
- Introduces LocalVectorStore in local-vector-store.ts for handling vector storage using SQLite.
- Adds tests for LocalVectorStore in local-vector-store.spec.ts.
- Updates service-factory.ts to instantiate LocalVectorStore based on configuration.
Configuration:
- Updates Node.js version to 22.17.1 in .nvmrc, .tool-versions, and package.json.
- Adjusts setup-node-pnpm/action.yml to use the new Node.js version.
UI:
- Modifies CodeIndexPopover.tsx to include UI elements for selecting vector store provider and configuring local vector store directory.
- Updates i18n files to include new labels and descriptions for vector store settings.

^{This description was created by}^{for 72ced93d7577a04aa499b2b5ee64f8a7eb641f57. You can customize this summary. It will automatically update as commits are pushed.}

roomote

Thank you for implementing SQLite support for code indexing! This is a valuable addition that addresses the concerns about external dependencies. I've reviewed the implementation and found several issues that need attention:

Critical Issues:

Node.js version mismatch between .nvmrc (22.15.1) and package.json (22.17.1)
Potential memory issues in parallel batch processing
Missing validation for local vector store configuration

Important Suggestions:
4. Error handling improvements in database operations
5. Test coverage gaps for parallel search algorithm
6. Magic numbers should be extracted to constants

Minor Improvements:
7. Validation logic could be extracted to shared utilities
8. Consider adding more comprehensive error handling

Overall, this is a solid implementation that provides a good alternative to Qdrant. The main concerns are around memory usage optimization and configuration validation.

roomote · 2025-07-31T03:29:39Z

src/services/code-index/vector-store/local-vector-store.ts

The parallel batch processing here could consume excessive memory with large datasets. All batches are loaded into memory simultaneously. Could we consider implementing a streaming approach or limiting concurrent batches to prevent memory issues?

The topResults data comes from activeTasks execution, but the number of concurrent activeTasks is strictly limited - up to half the core count. On a 16-core machine, this means only 8 parallel tasks can process 80K records. With an embedding dimension of just 768, the memory footprint remains at only 300MB.

src/services/code-index/vector-store/local-vector-store.ts

hujianxin · 2025-07-31T13:31:35Z

Looking forward to this.

- Updated WebviewMessage interface to include options for local vector store provider and directory. - Implemented synchronous function to retrieve storage path for conversations. - Enhanced CodeIndexPopover component to manage settings for local and Qdrant vector stores. - Added translations for new settings in multiple languages.

…ount

…toreDirectory for consistency

daniel-lxs · 2025-08-02T17:52:30Z

Closing, see #5682 (comment)

NaccOll requested review from cte, jr and mrubens as code owners July 31, 2025 03:25

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Jul 31, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Jul 31, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Jul 31, 2025

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Jul 31, 2025

roomote bot reviewed Jul 31, 2025

View reviewed changes

NaccOll force-pushed the local-sqlite-vector-store branch from 72ced93 to b6882a2 Compare July 31, 2025 03:29

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 31, 2025

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 1, 2025

hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 1, 2025

NaccOll mentioned this pull request Aug 1, 2025

LanceDB For Code Index Vector Store #6535

Closed

6 tasks

NaccOll added 5 commits August 1, 2025 11:30

refactor: replace hardcoded batch sizes with class constants

5f28448

chore: update Node.js version in .nvmrc to 22.17.1

a055f47

fix: limit maximum parallelism to 8 in search processing

679d6ce

feat: implement automatic collection resizing based on deleted file c…

82f6f3d

…ount

NaccOll force-pushed the local-sqlite-vector-store branch from 1f4e105 to 82f6f3d Compare August 1, 2025 03:36

NaccOll added 2 commits August 1, 2025 15:37

refactor: rename localVectorStoreDirectoryPlaceholder to localVectorS…

4a4356c

…toreDirectory for consistency

feat: sqlite optimize and auto_vacuum

16219b2

PaperBoardOfficial mentioned this pull request Aug 2, 2025

Local Embedding and Local Vector Store for Indexing #5682

Closed

4 tasks

daniel-lxs closed this Aug 2, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 2, 2025

github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Aug 2, 2025

NaccOll deleted the local-sqlite-vector-store branch August 25, 2025 00:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sqlite For Code Index Vector Store #6471

Sqlite For Code Index Vector Store #6471

Uh oh!

NaccOll commented Jul 31, 2025 •

edited

Loading

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Jul 31, 2025

Uh oh!

NaccOll Jul 31, 2025

Uh oh!

Uh oh!

hujianxin commented Jul 31, 2025

Uh oh!

daniel-lxs commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sqlite For Code Index Vector Store #6471

Sqlite For Code Index Vector Store #6471

Uh oh!

Conversation

NaccOll commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Description

Test Procedure

Env

Task

Qdrant(Docker)

Sqlite

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Advantages

Disadvantages

Other

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

NaccOll Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hujianxin commented Jul 31, 2025

Uh oh!

daniel-lxs commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NaccOll commented Jul 31, 2025 •

edited

Loading