Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jul 16, 2025

We are currently writing to meta the list of offsets for the posting lists. This means that we need to hold that array on heap when reading a segment. This arrays can get pretty big for a high number of centroids, therefore this commit proposes to remove that data from the meta file, instead we are adding the offsets just after the centroids, together with the raw centroid.

There is not noticeable effect in performance

@iverase iverase added >non-issue :Search Relevance/Search Catch all for Search Relevance v9.2.0 labels Jul 16, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting by the raw centroid is good. We only read the raw centroid when we quantize for scoring the posting list, and that is also when we need the offset.

+1

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@iverase iverase merged commit b57ee3b into elastic:main Jul 16, 2025
33 checks passed
@iverase iverase deleted the postingLsitoffsets branch July 16, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants