-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Sqlite For Code Index Vector Store #6471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing SQLite support for code indexing! This is a valuable addition that addresses the concerns about external dependencies. I've reviewed the implementation and found several issues that need attention:
Critical Issues:
- Node.js version mismatch between .nvmrc (22.15.1) and package.json (22.17.1)
- Potential memory issues in parallel batch processing
- Missing validation for local vector store configuration
Important Suggestions:
4. Error handling improvements in database operations
5. Test coverage gaps for parallel search algorithm
6. Magic numbers should be extracted to constants
Minor Improvements:
7. Validation logic could be extracted to shared utilities
8. Consider adding more comprehensive error handling
Overall, this is a solid implementation that provides a good alternative to Qdrant. The main concerns are around memory usage optimization and configuration validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parallel batch processing here could consume excessive memory with large datasets. All batches are loaded into memory simultaneously. Could we consider implementing a streaming approach or limiting concurrent batches to prevent memory issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topResults data comes from activeTasks execution, but the number of concurrent activeTasks is strictly limited - up to half the core count. On a 16-core machine, this means only 8 parallel tasks can process 80K records. With an embedding dimension of just 768, the memory footprint remains at only 300MB.
72ced93 to
b6882a2
Compare
|
Looking forward to this. |
- Updated WebviewMessage interface to include options for local vector store provider and directory. - Implemented synchronous function to retrieve storage path for conversations. - Enhanced CodeIndexPopover component to manage settings for local and Qdrant vector stores. - Added translations for new settings in multiple languages.
1f4e105 to
82f6f3d
Compare
…toreDirectory for consistency
|
Closing, see #5682 (comment) |
Related GitHub Issue
Close #5682
Relate:
#6262
#6223
Description
Support sqlite store for code index.
Test Procedure
Env
OS: Windows 11
embedder provider: Gemini
Model: text-embedding-004(768 dimensions)
Codebase: RooCode(~72k blocks) and vscode-copilot-chat(~106k blocks)
After enabling code indexing for both RooCode and vscode-copilot-chat, close VSCode completely. Then reopen it, ensuring only the RooCode instance remains active.
Task
use codebase_search to search Deepseek provider in the project?
Qdrant(Docker)
Disk Size
Total: 1.7GB
RooCode: 864.9MB
Copilot: 914MB
Memory
Memory: 1.7GB
My Docker startup consumes 1GB of memory initially. It then launches Qdrant, which loads all indexes into memory. The Docker memory usage has now reached 2.68GB. Memory usage does not grow while the search is executing.
Search Performance
No standardized performance testing has been conducted yet. When initiating tasks in VSCode, they typically complete within 500ms.
Sqlite
Disk Size
Total: 718MB
RooCode: 293MB
Copilot: 424MB
Memory
Memory: 300MB
Since everything resides on disk, there's no additional memory usage when searches aren't being performed. During search operations, data is retrieved from the database, resulting in approximately 300MB of memory usage. However, you don't need to worry about memory consumption growing linearly with codebase size, as the queries are processed in batches and lower-relevance vectors are discarded.
Search Performance
Since we're not using VSS (Vector Similarity Search), the search performance is inferior compared to Qdrant. While no rigorous benchmarking has been conducted, the same tasks typically take around ~1500ms to complete.
Pre-Submission Checklist
Screenshots / Videos
Qdrant

Sqlite

Both implementations produce identical search results.
Documentation Updates
Yes, documentation updates are required.
Additional Notes
Advantages
Disadvantages
Other
I will also submit a LanceDB-based implementation. It delivers faster search performance with comparable memory and disk usage, while requiring no NodeJS upgrade. The only drawback is that it needs to dynamically download dependencies during the user's first Local mode operation.
Important
Adds support for SQLite as a vector store for code indexing, updates configurations, and modifies UI for vector store selection.
codebaseIndexConfigSchemaincodebase-index.tsto includecodebaseIndexVectorStoreProviderandcodebaseIndexLocalVectorStoreDirectory.ClineProviderandwebviewMessageHandlerto handle new vector store configurations.LocalVectorStoreinlocal-vector-store.tsfor handling vector storage using SQLite.LocalVectorStoreinlocal-vector-store.spec.ts.service-factory.tsto instantiateLocalVectorStorebased on configuration.22.17.1in.nvmrc,.tool-versions, andpackage.json.setup-node-pnpm/action.ymlto use the new Node.js version.CodeIndexPopover.tsxto include UI elements for selecting vector store provider and configuring local vector store directory.This description was created by
for 72ced93d7577a04aa499b2b5ee64f8a7eb641f57. You can customize this summary. It will automatically update as commits are pushed.