Skip to content

Conversation

@markjhoy
Copy link
Contributor

@markjhoy markjhoy commented Jun 6, 2025

Updates the SparseVectorFieldMapper type to include index options for pruning tokens and associated configuration values.

Before this update, token pruning for sparse vector types is only available via the query (see parameters for the sparse vector query ).

With this PR, by default, any new indices with a sparse_vector field type will by default have token pruning turned on (previous indices that may have had sparse_vector fields that exist before this update will still keep pruning turned off by default). Any sparse_vector queries that have explicit pruning options will still override the index defaults if they are set up.

Example:

{
  "properties": {
    "example_field": {
       "type": "sparse_vector",
        "index_options": {
          "prune": (boolean, default is `true`),
          "pruning_config": {
            "tokens_freq_ratio_threshold": (integer, range 1-100, default is 5),
            "tokens_weight_threshold": (double, range 0.0-1.0, default if 0.4)
          }
        }
     }
  }
}

@markjhoy markjhoy added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:SearchOrg Meta label for the Search Org (Enterprise Search) :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:Search - Relevance The Search organization Search Relevance team labels Jun 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @markjhoy, I've created a changelog YAML for you.

@markjhoy
Copy link
Contributor Author

Note - due to the scope of the changes (and especially for the transport and index versions) - this will require a manual backport to 8.19

@markjhoy markjhoy marked this pull request as ready for review June 10, 2025 14:37
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@markjhoy
Copy link
Contributor Author

buildkite test this

@markjhoy markjhoy enabled auto-merge (squash) June 23, 2025 20:47
@markjhoy markjhoy merged commit a671505 into elastic:main Jun 23, 2025
28 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129089

markjhoy added a commit to markjhoy/elasticsearch that referenced this pull request Jun 23, 2025
…en pruning (elastic#129089)

* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Kathleen DeRusso <[email protected]>
(cherry picked from commit a671505)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/sparse-vector.md
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
#	server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderTests.java
@markjhoy
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

markjhoy added a commit that referenced this pull request Jun 24, 2025
…for token pruning (#129089) (#129890)

* Update sparse_vector field mapping to include default setting for token pruning (#129089)

* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Kathleen DeRusso <[email protected]>
(cherry picked from commit a671505)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/sparse-vector.md
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
#	server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/queries/SemanticQueryBuilderTests.java

* Update changelog for version

* [CI] Auto commit changes from spotless

* Update docs to replace 9.1 with 8.19

* Rename 129089.yaml to 129890.yaml

* proper asciidocs; cleanups

* remove doc preview labels; cleanup test index ver.

* clean up docs

* add sparse vector token pruning tag

---------

Co-authored-by: elasticsearchmachine <[email protected]>
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
…en pruning (elastic#129089)

* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Kathleen DeRusso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:SearchOrg Meta label for the Search Org (Enterprise Search) test-full-bwc Trigger full BWC version matrix tests v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants