Skip to content

Conversation

@mromaios
Copy link
Contributor

@mromaios mromaios commented Jul 31, 2025

This PR

  • enhances SparseVectorFieldMapperTests by adding documents with various frequency tokens in the mock index and accompanying query vector set so that we can test 3 pruning scenarios (NO_PRUNING, DEFAULT_PRUNING, STRICT_PRUNING) more effectively.
  • adds a case in the relevant yaml tests to cover the following pruning scenario: token is over the frequency ratio and over the weight threshold.

NO_PRUNING, // No pruning applied - all tokens preserved
DEFAULT_PRUNING, // Default pruning configuration
STRICT_PRUNING // Stricter pruning with higher thresholds
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I started of with these 3 scenarios for index and query configs as well, but quickly realised that those have 2 core dimensions. prune and pruningConfig, which for index you need to have prune=true to set a pruningConfig but that's not the case for queries, where the user can send any combination. This lead to the creation of separate enums for query and index scenarios.

@mromaios mromaios marked this pull request as ready for review August 5, 2025 16:03
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 5, 2025
@mromaios mromaios self-assigned this Aug 5, 2025
@mromaios mromaios marked this pull request as draft August 5, 2025 16:04
@mromaios mromaios added :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >test Issues or PRs that are addressing/adding tests and removed needs:triage Requires assignment of a team area label labels Aug 5, 2025
@mromaios mromaios marked this pull request as ready for review August 5, 2025 16:07
@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Relevance The Search organization Search Relevance team labels Aug 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@Mikep86 Mikep86 requested review from a team and Mikep86 August 5, 2025 16:26
@mromaios mromaios changed the title chore: add sparse vector pruning tests refactor(test): add sparse vector pruning tests Aug 7, 2025
Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! I left some comments about where we have opportunities to streamline even further :)

@mromaios
Copy link
Contributor Author

Nice cleanup! I left some comments about where we have opportunities to streamline even further :)

Thanks @Mikep86 🙏 , great suggestions!
Had a go at implementing them b22eb47. Ready for another (final?🤞) look when you get the chance. Thanks!

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating!

@Mikep86 Mikep86 added auto-backport Automatically create backport pull requests when merged v9.1.3 v8.19.3 labels Aug 12, 2025
@Mikep86
Copy link
Contributor

Mikep86 commented Aug 12, 2025

I configured the PR to backport these changes to 9.1 & 8.19. We'll need to do some adjustments to the index version checks in the 8.19 backport, but it shouldn't be too bad.

@mromaios
Copy link
Contributor Author

LGTM, thanks for iterating!

Thanks for all the support throughout!

I configured the PR to backport these changes to 9.1 & 8.19. We'll need to do some adjustments to the index version checks in the 8.19 backport, but it shouldn't be too bad.

Sounds good!

@mromaios mromaios merged commit 6cd330c into elastic:main Aug 13, 2025
33 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 132264

mromaios added a commit to mromaios/elasticsearch that referenced this pull request Aug 13, 2025
(cherry picked from commit 6cd330c)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java
mromaios added a commit to mromaios/elasticsearch that referenced this pull request Aug 13, 2025
(cherry picked from commit 6cd330c)

# Conflicts:
#	server/src/test/java/org/elasticsearch/index/mapper/vectors/SparseVectorFieldMapperTests.java
@mromaios
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.1
8.19

Questions ?

Please refer to the Backport tool documentation

@mromaios
Copy link
Contributor Author

Oops we have some failures:

#132810 (comment)

Investigating

performTypeQueryFinalizationTest(mapperService, null, null, false);
}
if (shouldPrune == null) {
shouldPrune = indexVersion.onOrAfter(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I overlooked here is that we also need to handle 8.x index versions that support default token pruning. This check needs to change to something like:

shouldPrune = indexVersion.between(IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT_BACKPORT_8_X, IndexVersions.UPGRADE_TO_LUCENE_10_0_0) || indexVersion.onOrAfter(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Will add it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:SearchOrg Meta label for the Search Org (Enterprise Search) >test Issues or PRs that are addressing/adding tests v8.19.3 v9.1.3 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants