-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: Add support for local vector store configuration in settings #6024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @NaccOll, I saw in the description that Node 22 is required for this implementation. That makes sense, but updating such a core dependency can have side effects in other parts of the project.
Is there any alternative approach we could explore, maybe using an external library, that wouldn't require changing the Node version?
|
I have tried it better\sqlite3 has better performance, but needs to be packaged by platform, which will greatly increase the size of the extension sql.js only needs one wasm file, about 1MB. But the performance is worse (this PR uses the RooCode code base for testing, codebase_search takes 2-3 seconds) And it will cause the database to be loaded into memory. If you want to save it, you need to save the entire database file at once. If you are concerned about upgrading NodeJS, I suggest you check out Libsql first It has better performance than Sqlite, but its problem is that the generated db file is larger (the RooCode DB file generated by this solution is 1.1g), and it will also greatly increase the size of the extension. It is currently planning to temporarily download the platform node file when the extension starts. Upgrading NodeJs brings risks. The other side of the scale is expansion size and cross-platform compatibility. |
7e03915 to
e161d57
Compare
- Upgrade Node.JS 22.17.1 - Updated WebviewMessage interface to include options for local vector store provider and directory. - Implemented synchronous function to retrieve storage path for conversations. - Enhanced CodeIndexPopover component to handle local vector store settings. - Added validation tests for new settings in CodeIndexPopover. - Updated ExtensionStateContext to initialize new settings. - Translated new settings labels and descriptions into multiple languages.
e161d57 to
5a00745
Compare
|
@daniel-lxs Please review |
|
Please see #5682 (comment) |
|
@daniel-lxs
In fact, I don't know how qdrant avoids memory exhaustion. When I start qdrant using docker, it loads all codebase indexes into memory, including projects that I have indexed but not currently opened in vscode. Because I have indexed so many projects by Qwen3-Embedding-8B(the vector size is 4096), opening qdbrantconsumes more than 10g of memory, which is unacceptable to me. |
|
@NaccOll |
|
@daniel-lxs Maybe this is a problem only with Docker on Windows, but I don't know where to analyze it. Because my goal is to use code index under the premise of limited memory size. Even if I set memmap_threshold, on_disk and other configuration items, the next time I start the computer, qdrant in docker will recover all indexes to RAM. That's why I want a local vector solution. Its biggest advantage for me is not that it works out of the box, but that it works on demand. Even if my single project index has 3G, it only consumes 3G of memory for me (actually not, because the data is on disk, although the performance is lower), instead of loading the index of other projects that I am not currently developing into memory, which will consume more than 10G of memory and bother me. Not to mention that the local solution is smaller than qdrant. I found that even if qdrant did nothing, a collection would consume about 400MB of storage space. A project that only takes 100MB in the local solution would take 650MB of storage space if using qdrant. This may be the inherent format of qdrant. I sincerely hope you will test my PR. I have implemented dynamic download (via npm) which does not increase the size of the extension, and it is not the default solution. The default solution is still qdrant. This is not to say that you must adopt my PR, but to adopt related ideas to achieve on-demand index memory loading without increasing the size of the extension. It is importantful for me. Below is my docker-compose.yaml version: "3.8"
services:
qdrant:
image: qdrant/qdrant:v1.14.1
restart: always
ports:
- 127.0.0.1:6333:6333
# environment:
# QDRANT__STORAGE__OPTIMIZERS__MEMMAP_THRESHOLD: 20000
volumes:
- qdrant_storage:/qdrant/storage
- ./qdrant/config.yaml:/qdrant/config/config.yaml
volumes:
qdrant_storage:log_level: INFO
# Logging configuration
# Qdrant logs to stdout. You may configure to also write logs to a file on disk.
# Be aware that this file may grow indefinitely.
# logger:
# # Logging format, supports `text` and `json`
# format: text
# on_disk:
# enabled: true
# log_file: path/to/log/file.log
# log_level: INFO
# # Logging format, supports `text` and `json`
# format: text
storage:
# Where to store all the data
storage_path: ./storage
# Where to store snapshots
snapshots_path: ./snapshots
snapshots_config:
# "local" or "s3" - where to store snapshots
snapshots_storage: local
# s3_config:
# bucket: ""
# region: ""
# access_key: ""
# secret_key: ""
# Where to store temporary files
# If null, temporary snapshots are stored in: storage/snapshots_temp/
temp_path: null
# If true - point payloads will not be stored in memory.
# It will be read from the disk every time it is requested.
# This setting saves RAM by (slightly) increasing the response time.
# Note: those payload values that are involved in filtering and are indexed - remain in RAM.
#
# Default: true
on_disk_payload: true
# Maximum number of concurrent updates to shard replicas
# If `null` - maximum concurrency is used.
update_concurrency: null
# Write-ahead-log related configuration
wal:
# Size of a single WAL segment
wal_capacity_mb: 32
# Number of WAL segments to create ahead of actual data requirement
wal_segments_ahead: 0
# Normal node - receives all updates and answers all queries
node_type: "Normal"
# Listener node - receives all updates, but does not answer search/read queries
# Useful for setting up a dedicated backup node
# node_type: "Listener"
performance:
# Number of parallel threads used for search operations. If 0 - auto selection.
max_search_threads: 0
# Max number of threads (jobs) for running optimizations across all collections, each thread runs one job.
# If 0 - have no limit and choose dynamically to saturate CPU.
# Note: each optimization job will also use `max_indexing_threads` threads by itself for index building.
max_optimization_threads: 0
# CPU budget, how many CPUs (threads) to allocate for an optimization job.
# If 0 - auto selection, keep 1 or more CPUs unallocated depending on CPU size
# If negative - subtract this number of CPUs from the available CPUs.
# If positive - use this exact number of CPUs.
optimizer_cpu_budget: 0
# Prevent DDoS of too many concurrent updates in distributed mode.
# One external update usually triggers multiple internal updates, which breaks internal
# timings. For example, the health check timing and consensus timing.
# If null - auto selection.
update_rate_limit: null
# Limit for number of incoming automatic shard transfers per collection on this node, does not affect user-requested transfers.
# The same value should be used on all nodes in a cluster.
# Default is to allow 1 transfer.
# If null - allow unlimited transfers.
#incoming_shard_transfers_limit: 1
# Limit for number of outgoing automatic shard transfers per collection on this node, does not affect user-requested transfers.
# The same value should be used on all nodes in a cluster.
# Default is to allow 1 transfer.
# If null - allow unlimited transfers.
#outgoing_shard_transfers_limit: 1
# Enable async scorer which uses io_uring when rescoring.
# Only supported on Linux, must be enabled in your kernel.
# See: <https://qdrant.tech/articles/io_uring/#and-what-about-qdrant>
#async_scorer: false
optimizers:
# The minimal fraction of deleted vectors in a segment, required to perform segment optimization
deleted_threshold: 0.2
# The minimal number of vectors in a segment, required to perform segment optimization
vacuum_min_vector_number: 1000
# Target amount of segments optimizer will try to keep.
# Real amount of segments may vary depending on multiple parameters:
# - Amount of stored points
# - Current write RPS
#
# It is recommended to select default number of segments as a factor of the number of search threads,
# so that each segment would be handled evenly by one of the threads.
# If `default_segment_number = 0`, will be automatically selected by the number of available CPUs
default_segment_number: 0
# Do not create segments larger this size (in KiloBytes).
# Large segments might require disproportionately long indexation times,
# therefore it makes sense to limit the size of segments.
#
# If indexation speed have more priority for your - make this parameter lower.
# If search speed is more important - make this parameter higher.
# Note: 1Kb = 1 vector of size 256
# If not set, will be automatically selected considering the number of available CPUs.
max_segment_size_kb: null
# Maximum size (in KiloBytes) of vectors to store in-memory per segment.
# Segments larger than this threshold will be stored as read-only memmapped file.
# To enable memmap storage, lower the threshold
# Note: 1Kb = 1 vector of size 256
# To explicitly disable mmap optimization, set to `0`.
# If not set, will be disabled by default.
memmap_threshold_kb: 10000
# Maximum size (in KiloBytes) of vectors allowed for plain index.
# Default value based on https://github.com/google-research/google-research/blob/master/scann/docs/algorithms.md
# Note: 1Kb = 1 vector of size 256
# To explicitly disable vector indexing, set to `0`.
# If not set, the default value will be used.
indexing_threshold_kb: 20000
# Interval between forced flushes.
flush_interval_sec: 5
# Max number of threads (jobs) for running optimizations per shard.
# Note: each optimization job will also use `max_indexing_threads` threads by itself for index building.
# If null - have no limit and choose dynamically to saturate CPU.
# If 0 - no optimization threads, optimizations will be disabled.
max_optimization_threads: null
# This section has the same options as 'optimizers' above. All values specified here will overwrite the collections
# optimizers configs regardless of the config above and the options specified at collection creation.
#optimizers_overwrite:
# deleted_threshold: 0.2
# vacuum_min_vector_number: 1000
# default_segment_number: 0
# max_segment_size_kb: null
# memmap_threshold_kb: null
# indexing_threshold_kb: 20000
# flush_interval_sec: 5
# max_optimization_threads: null
# Default parameters of HNSW Index. Could be overridden for each collection or named vector individually
hnsw_index:
# Number of edges per node in the index graph. Larger the value - more accurate the search, more space required.
m: 16
# Number of neighbours to consider during the index building. Larger the value - more accurate the search, more time required to build index.
ef_construct: 100
# Minimal size (in KiloBytes) of vectors for additional payload-based indexing.
# If payload chunk is smaller than `full_scan_threshold_kb` additional indexing won't be used -
# in this case full-scan search should be preferred by query planner and additional indexing is not required.
# Note: 1Kb = 1 vector of size 256
full_scan_threshold_kb: 10000
# Number of parallel threads used for background index building.
# If 0 - automatically select.
# Best to keep between 8 and 16 to prevent likelihood of building broken/inefficient HNSW graphs.
# On small CPUs, less threads are used.
max_indexing_threads: 0
# Store HNSW index on disk. If set to false, index will be stored in RAM. Default: false
on_disk: true
# Custom M param for hnsw graph built for payload index. If not set, default M will be used.
payload_m: null
# Default shard transfer method to use if none is defined.
# If null - don't have a shard transfer preference, choose automatically.
# If stream_records, snapshot or wal_delta - prefer this specific method.
# More info: https://qdrant.tech/documentation/guides/distributed_deployment/#shard-transfer-method
shard_transfer_method: null
# Default parameters for collections
collection:
# Number of replicas of each shard that network tries to maintain
replication_factor: 1
# How many replicas should apply the operation for us to consider it successful
write_consistency_factor: 1
# Default parameters for vectors.
vectors:
# Whether vectors should be stored in memory or on disk.
on_disk: true
# shard_number_per_node: 1
# Default quantization configuration.
# More info: https://qdrant.tech/documentation/guides/quantization
quantization: null
# Default strict mode parameters for newly created collections.
strict_mode:
# Whether strict mode is enabled for a collection or not.
enabled: false
# Max allowed `limit` parameter for all APIs that don't have their own max limit.
max_query_limit: null
# Max allowed `timeout` parameter.
max_timeout: null
# Allow usage of unindexed fields in retrieval based (eg. search) filters.
unindexed_filtering_retrieve: null
# Allow usage of unindexed fields in filtered updates (eg. delete by payload).
unindexed_filtering_update: null
# Max HNSW value allowed in search parameters.
search_max_hnsw_ef: null
# Whether exact search is allowed or not.
search_allow_exact: null
# Max oversampling value allowed in search.
search_max_oversampling: null
# Maximum number of collections allowed to be created
# If null - no limit.
max_collections: null
service:
# Maximum size of POST data in a single request in megabytes
max_request_size_mb: 32
# Number of parallel workers used for serving the api. If 0 - equal to the number of available cores.
# If missing - Same as storage.max_search_threads
max_workers: 0
# Host to bind the service on
host: 0.0.0.0
# HTTP(S) port to bind the service on
http_port: 6333
# gRPC port to bind the service on.
# If `null` - gRPC is disabled. Default: null
# Comment to disable gRPC:
grpc_port: 6334
# Enable CORS headers in REST API.
# If enabled, browsers would be allowed to query REST endpoints regardless of query origin.
# More info: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
# Default: true
enable_cors: true
# Enable HTTPS for the REST and gRPC API
enable_tls: false
# Check user HTTPS client certificate against CA file specified in tls config
verify_https_client_certificate: false
# Set an api-key.
# If set, all requests must include a header with the api-key.
# example header: `api-key: <API-KEY>`
#
# If you enable this you should also enable TLS.
# (Either above or via an external service like nginx.)
# Sending an api-key over an unencrypted channel is insecure.
#
# Uncomment to enable.
# api_key: your_secret_api_key_here
# Set an api-key for read-only operations.
# If set, all requests must include a header with the api-key.
# example header: `api-key: <API-KEY>`
#
# If you enable this you should also enable TLS.
# (Either above or via an external service like nginx.)
# Sending an api-key over an unencrypted channel is insecure.
#
# Uncomment to enable.
# read_only_api_key: your_secret_read_only_api_key_here
# Uncomment to enable JWT Role Based Access Control (RBAC).
# If enabled, you can generate JWT tokens with fine-grained rules for access control.
# Use generated token instead of API key.
#
# jwt_rbac: true
# Hardware reporting adds information to the API responses with a
# hint on how many resources were used to execute the request.
#
# Warning: experimental, this feature is still under development and is not supported yet.
#
# Uncomment to enable.
# hardware_reporting: true
cluster:
# Use `enabled: true` to run Qdrant in distributed deployment mode
enabled: false
# Configuration of the inter-cluster communication
p2p:
# Port for internal communication between peers
port: 6335
# Use TLS for communication between peers
enable_tls: false
# Configuration related to distributed consensus algorithm
consensus:
# How frequently peers should ping each other.
# Setting this parameter to lower value will allow consensus
# to detect disconnected nodes earlier, but too frequent
# tick period may create significant network and CPU overhead.
# We encourage you NOT to change this parameter unless you know what you are doing.
tick_period_ms: 100
# Compact consensus operations once we have this amount of applied
# operations. Allows peers to join quickly with a consensus snapshot without
# replaying a huge amount of operations.
# If 0 - disable compaction
compact_wal_entries: 128
# Set to true to prevent service from sending usage statistics to the developers.
# Read more: https://qdrant.tech/documentation/guides/telemetry
telemetry_disabled: false
# TLS configuration.
# Required if either service.enable_tls or cluster.p2p.enable_tls is true.
tls:
# Server certificate chain file
cert: ./tls/cert.pem
# Server private key file
key: ./tls/key.pem
# Certificate authority certificate file.
# This certificate will be used to validate the certificates
# presented by other nodes during inter-cluster communication.
#
# If verify_https_client_certificate is true, it will verify
# HTTPS client certificate
#
# Required if cluster.p2p.enable_tls is true.
ca_cert: ./tls/cacert.pem
# TTL in seconds to reload certificate from disk, useful for certificate rotations.
# Only works for HTTPS endpoints. Does not support gRPC (and intra-cluster communication).
# If `null` - TTL is disabled.
cert_ttl: 3600 |
Related GitHub Issue
Closes: #5682
Description
Upgrade Node.JS 22 to use the built-in
node:sqliteto implement local vector storageQdrant is still used by default, but users can switch according to their own preferences
Test Procedure
Unit Test:
src\services\code-index\vector-store_tests_\local-vector-store.spec.ts
Integration test
Pre-Submission Checklist
Screenshots / Videos
Important
Adds support for configuring a local vector store using Node.js 22's
node:sqlite, allowing users to choose between local and Qdrant storage options.node:sqlitein Node.js 22.codebaseIndexConfigSchemaincodebase-index.tsto includecodebaseIndexVectorStoreProviderandcodebaseIndexLocalVectorStoreDirectory.ClineProviderandwebviewMessageHandlerto handle new vector store settings.LocalVectorStoreclass inlocal-vector-store.tsfor handling local storage operations.getLocalVectorStoreDirectoryPathinstorage.tsfor directory management.local-vector-store.spec.tsto validate local vector store functionality.CodeIndexPopover.tsxto allow users to select vector store provider and configure local storage path..nvmrc,.tool-versions, andpackage.json.This description was created by
for 5d70de1349ed8cd93f86e8f9a847fc1711f0da62. You can customize this summary. It will automatically update as commits are pushed.