Skip to content

Conversation

@martijnvg
Copy link
Member

@martijnvg martijnvg commented Aug 14, 2025

Replace int hashmap by a counter in TrackingPostingsInMemoryBytesCodec and use int hashset to keep track of seen fields.

and use int hashset to keep track of seen fields.
@martijnvg martijnvg changed the title Replace int hashmap by a counter in TrackingPostingsInMemoryBytesCodec Slightly improve TrackingPostingsInMemoryBytesCodec Aug 14, 2025
@martijnvg martijnvg marked this pull request as ready for review August 14, 2025 06:56
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Contributor

@jordan-powers jordan-powers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Martijn!

Comment on lines 112 to 114
// As far as I know only when bloom filter for _id filter gets written this method gets invoked twice for the same field.
// So maybe we can get rid of the seenFields here? And just keep track of whether _id field has been seen?
return terms;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's difficult to guarantee that we'll only ever invoke this method twice for _id fields. I think only checking the _id field would potentially open us up to some subtle bugs in the future.

Comment on lines 73 to 75
// Alternatively, we can consider using a FixedBitSet here and size to max(fieldNumber). T
// his should be faster without worrying too much about memory usage.
this.seenFields = new IntHashSet(state.fieldInfos.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's probably better to prioritize speed over memory efficiency here.

@martijnvg martijnvg enabled auto-merge (squash) August 16, 2025 03:11
@martijnvg martijnvg merged commit 6bf0a1b into elastic:main Aug 18, 2025
34 checks passed
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Aug 18, 2025
Replace int hashmap by a counter in TrackingPostingsInMemoryBytesCodec
and use int hashset to keep track of seen fields.
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Aug 18, 2025
Replace int hashmap by a counter in TrackingPostingsInMemoryBytesCodec
and use int hashset to keep track of seen fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants