Skip to content

Conversation

@NaccOll
Copy link
Contributor

@NaccOll NaccOll commented Jul 31, 2025

Related GitHub Issue

Close #5682

Relate:
#6262
#6223

Description

Support sqlite store for code index.

Test Procedure

Env

OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)

After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.

Task

use codebase_search to search Deepseek provider in the project?

Qdrant(Docker)

Disk Size

Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB

Memory
Memory: 1.7GB

My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory. The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.

Search Performance

No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.

Sqlite

Disk Size

Total: 718MB
RooCode: 293MB
Copilot: 424MB

Memory

Memory: 300MB

Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 300MB of memory usage. However, you don't need to worry about memory consumption growing linearly with codebase size, as the queries are processed in batches and lower-relevance vectors are discarded.

Search Performance

Since we're not using VSS (Vector Similarity Search), the search performance is inferior compared to Qdrant. While no rigorous benchmarking has been conducted, the same tasks typically take around ~1500ms to complete.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Qdrant
捕获-QDRANT

Sqlite
捕获-SQLITE

Both implementations produce identical search results.

Documentation Updates

Yes, documentation updates are required.

Additional Notes

Advantages

  • Ready-to-use out of the box
  • Smaller disk usage
  • Lower memory usage
  • Inactive project indexes won't occupy your memory at all

Disadvantages

  • Requires NodeJS upgrade, which may introduce extension instability (though won't increase extension size)
  • Weaker search performance due to non-VSS implementation

Other

I will also submit a LanceDB-based implementation. It delivers faster search performance with comparable memory and disk usage, while requiring no NodeJS upgrade. The only drawback is that it needs to dynamically download dependencies during the user's first Local mode operation.


Important

Adds support for SQLite as a vector store for code indexing, updates configurations, and modifies UI for vector store selection.

  • Behavior:
    • Adds support for SQLite as a vector store for code indexing, providing an alternative to Qdrant.
    • Updates codebaseIndexConfigSchema in codebase-index.ts to include codebaseIndexVectorStoreProvider and codebaseIndexLocalVectorStoreDirectory.
    • Modifies ClineProvider and webviewMessageHandler to handle new vector store configurations.
  • Implementation:
    • Introduces LocalVectorStore in local-vector-store.ts for handling vector storage using SQLite.
    • Adds tests for LocalVectorStore in local-vector-store.spec.ts.
    • Updates service-factory.ts to instantiate LocalVectorStore based on configuration.
  • Configuration:
    • Updates Node.js version to 22.17.1 in .nvmrc, .tool-versions, and package.json.
    • Adjusts setup-node-pnpm/action.yml to use the new Node.js version.
  • UI:
    • Modifies CodeIndexPopover.tsx to include UI elements for selecting vector store provider and configuring local vector store directory.
    • Updates i18n files to include new labels and descriptions for vector store settings.

This description was created by Ellipsis for 72ced93d7577a04aa499b2b5ee64f8a7eb641f57. You can customize this summary. It will automatically update as commits are pushed.

@NaccOll NaccOll requested review from cte, jr and mrubens as code owners July 31, 2025 03:25
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation enhancement New feature or request labels Jul 31, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for implementing SQLite support for code indexing! This is a valuable addition that addresses the concerns about external dependencies. I've reviewed the implementation and found several issues that need attention:

Critical Issues:

  1. Node.js version mismatch between .nvmrc (22.15.1) and package.json (22.17.1)
  2. Potential memory issues in parallel batch processing
  3. Missing validation for local vector store configuration

Important Suggestions:
4. Error handling improvements in database operations
5. Test coverage gaps for parallel search algorithm
6. Magic numbers should be extracted to constants

Minor Improvements:
7. Validation logic could be extracted to shared utilities
8. Consider adding more comprehensive error handling

Overall, this is a solid implementation that provides a good alternative to Qdrant. The main concerns are around memory usage optimization and configuration validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallel batch processing here could consume excessive memory with large datasets. All batches are loaded into memory simultaneously. Could we consider implementing a streaming approach or limiting concurrent batches to prevent memory issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topResults data comes from activeTasks execution, but the number of concurrent activeTasks is strictly limited - up to half the core count. On a 16-core machine, this means only 8 parallel tasks can process 80K records. With an embedding dimension of just 768, the memory footprint remains at only 300MB.

@NaccOll NaccOll force-pushed the local-sqlite-vector-store branch from 72ced93 to b6882a2 Compare July 31, 2025 03:29
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 31, 2025
@hujianxin
Copy link

Looking forward to this.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 1, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 1, 2025
@NaccOll NaccOll mentioned this pull request Aug 1, 2025
6 tasks
NaccOll added 5 commits August 1, 2025 11:30
- Updated WebviewMessage interface to include options for local vector store provider and directory.
- Implemented synchronous function to retrieve storage path for conversations.
- Enhanced CodeIndexPopover component to manage settings for local and Qdrant vector stores.
- Added translations for new settings in multiple languages.
@NaccOll NaccOll force-pushed the local-sqlite-vector-store branch from 1f4e105 to 82f6f3d Compare August 1, 2025 03:36
@daniel-lxs
Copy link
Member

Closing, see #5682 (comment)

@daniel-lxs daniel-lxs closed this Aug 2, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 2, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Aug 2, 2025
@NaccOll NaccOll deleted the local-sqlite-vector-store branch August 25, 2025 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request PR - Needs Preliminary Review size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Local Embedding and Local Vector Store for Indexing

4 participants