Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Jul 25, 2025

This commit sets index.mapping.exclude_source_vectors to true by default for newly created indices. When enabled, vector fields (dense_vector, sparse_vector, rank_vector) are excluded from _source on disk and are not returned in API responses unless explicitly requested.

The change improves indexing performance, reduces storage size, and avoids unnecessary payload bloat in responses. Vector values continue to be rehydrated transparently for partial updates, reindex, and recovery.

Existing indices are not affected and continue to store vectors in _source by default.

This commit sets `index.mapping.exclude_source_vectors` to `true` by default
for newly created indices. When enabled, vector fields (`dense_vector`,
`sparse_vector`, `rank_vector`) are excluded from `_source` on disk and are
not returned in API responses unless explicitly requested.

The change improves indexing performance, reduces storage size, and avoids
unnecessary payload bloat in responses. Vector values continue to be rehydrated
transparently for partial updates, reindex, and recovery.

Existing indices are not affected and continue to store vectors in `_source`
by default.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 25, 2025

@jimczi jimczi removed the request for review from a team July 25, 2025 08:19
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've updated the changelog YAML for you. Note that since this PR is labelled >breaking, you need to update the changelog YAML to fill out the extended information sections.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some very minor comments

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pquentin
Copy link
Member

Have we considered calling this include_source_vectors and defaulting to False? This would avoid the double negative when asking to keep them.

@jimczi
Copy link
Contributor Author

jimczi commented Aug 12, 2025

Have we considered calling this include_source_vectors and defaulting to False?

That was an option considered but we picked the exclude route since excludes in source filtering take precedence over includes.

@jimczi jimczi merged commit 8036a08 into elastic:main Aug 18, 2025
34 checks passed
@jimczi jimczi deleted the exclude_vector_source branch August 18, 2025 16:18
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Aug 18, 2025
…31907)

This commit sets `index.mapping.exclude_source_vectors` to `true` by default
for newly created indices. When enabled, vector fields (`dense_vector`,
`sparse_vector`, `rank_vector`) are excluded from `_source` on disk and are
not returned in API responses unless explicitly requested.

The change improves indexing performance, reduces storage size, and avoids
unnecessary payload bloat in responses. Vector values continue to be rehydrated
transparently for partial updates, reindex, and recovery.

Existing indices are not affected and continue to store vectors in `_source`
by default.
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 19, 2025
…improv

* upstream/main: (92 commits)
  ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064)
  Fix 404s in REST API landing page (elastic#133086)
  Fix release tests for OptimizerVerificationTests (elastic#133100)
  Make Glob non-recursive (elastic#132798)
  Update ES|QL function list for release versions (elastic#133096)
  Split transport version func test into abstract base (elastic#133035)
  Omit project ID from snapshot metrics (elastic#133098)
  Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013
  Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097
  Rename initial to unreferenced in transport versions (elastic#133082)
  Rename exception type header (elastic#133045)
  ESQL: Pluggable tests for Operator status (elastic#132876)
  ESQL: Mark new signatures in MIN and MAX (elastic#132980)
  Don't try to serialize half-baked cluster info (elastic#132756)
  migrate ml_rollover_legacy_indices transport version (elastic#133008)
  Enable `exclude_source_vectors` by default for new indices (elastic#131907)
  Expose APIs needed by flush during translog replay (elastic#132960)
  Change reporting_user role to leverage reserved kibana privileges (elastic#132766)
  Update TasksIT for batched execution (elastic#132762)
  ...
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 19, 2025
* upstream/main: (58 commits)
  ESQL: mark LOOKUP JOIN as ExecutesOn.Any by default (elastic#133064)
  Fix 404s in REST API landing page (elastic#133086)
  Fix release tests for OptimizerVerificationTests (elastic#133100)
  Make Glob non-recursive (elastic#132798)
  Update ES|QL function list for release versions (elastic#133096)
  Split transport version func test into abstract base (elastic#133035)
  Omit project ID from snapshot metrics (elastic#133098)
  Mute org.elasticsearch.xpack.esql.analysis.AnalyzerTests testNoDenseVectorFailsForMagnitude elastic#133013
  Mute org.elasticsearch.xpack.esql.optimizer.OptimizerVerificationTests testRemoteEnrichAfterCoordinatorOnlyPlans elastic#133015
  Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/160_exists_query/Test exists query on _id field} elastic#133097
  Rename initial to unreferenced in transport versions (elastic#133082)
  Rename exception type header (elastic#133045)
  ESQL: Pluggable tests for Operator status (elastic#132876)
  ESQL: Mark new signatures in MIN and MAX (elastic#132980)
  Don't try to serialize half-baked cluster info (elastic#132756)
  migrate ml_rollover_legacy_indices transport version (elastic#133008)
  Enable `exclude_source_vectors` by default for new indices (elastic#131907)
  Expose APIs needed by flush during translog replay (elastic#132960)
  Change reporting_user role to leverage reserved kibana privileges (elastic#132766)
  Update TasksIT for batched execution (elastic#132762)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>breaking :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants