-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Enable exclude_source_vectors by default for new indices
#131907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit sets `index.mapping.exclude_source_vectors` to `true` by default for newly created indices. When enabled, vector fields (`dense_vector`, `sparse_vector`, `rank_vector`) are excluded from `_source` on disk and are not returned in API responses unless explicitly requested. The change improves indexing performance, reduces storage size, and avoids unnecessary payload bloat in responses. Vector values continue to be rehydrated transparently for partial updates, reindex, and recovery. Existing indices are not affected and continue to store vectors in `_source` by default.
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @jimczi, I've created a changelog YAML for you. Note that since this PR is labelled |
…ude_vector_source
…er than byte per byte if fields are synthetic
|
Hi @jimczi, I've updated the changelog YAML for you. Note that since this PR is labelled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some very minor comments
server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Have we considered calling this |
That was an option considered but we picked the exclude route since excludes in source filtering take precedence over includes. |
…ude_vector_source
…31907) This commit sets `index.mapping.exclude_source_vectors` to `true` by default for newly created indices. When enabled, vector fields (`dense_vector`, `sparse_vector`, `rank_vector`) are excluded from `_source` on disk and are not returned in API responses unless explicitly requested. The change improves indexing performance, reduces storage size, and avoids unnecessary payload bloat in responses. Vector values continue to be rehydrated transparently for partial updates, reindex, and recovery. Existing indices are not affected and continue to store vectors in `_source` by default.
…improv * upstream/main: (92 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
* upstream/main: (58 commits) ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064) Fix 404s in REST API landing page (elastic#133086) Fix release tests for OptimizerVerificationTests (elastic#133100) Make Glob non-recursive (elastic#132798) Update ES|QL function list for release versions (elastic#133096) Split transport version func test into abstract base (elastic#133035) Omit project ID from snapshot metrics (elastic#133098) Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013 Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015 Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097 Rename initial to unreferenced in transport versions (elastic#133082) Rename exception type header (elastic#133045) ESQL: Pluggable tests for Operator status (elastic#132876) ESQL: Mark new signatures in MIN and MAX (elastic#132980) Don't try to serialize half-baked cluster info (elastic#132756) migrate ml_rollover_legacy_indices transport version (elastic#133008) Enable `exclude_source_vectors` by default for new indices (elastic#131907) Expose APIs needed by flush during translog replay (elastic#132960) Change reporting_user role to leverage reserved kibana privileges (elastic#132766) Update TasksIT for batched execution (elastic#132762) ...
This commit sets
index.mapping.exclude_source_vectorstotrueby default for newly created indices. When enabled, vector fields (dense_vector,sparse_vector,rank_vector) are excluded from_sourceon disk and are not returned in API responses unless explicitly requested.The change improves indexing performance, reduces storage size, and avoids unnecessary payload bloat in responses. Vector values continue to be rehydrated transparently for partial updates, reindex, and recovery.
Existing indices are not affected and continue to store vectors in
_sourceby default.