Skip to content

Conversation

@jordan-powers
Copy link
Contributor

Applies the merge optimizations from #126499 and #126732 to binary field types for the ES819 codec.

Relates to #126111

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

meta.writeLong(offset); // docsWithFieldOffset
final short jumpTableEntryCount;
if (disiAccumulator != null) {
jumpTableEntryCount = disiAccumulator.build(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the place disiAccumulator should always be not null? (if numDocsWithField is not -1 or equal to max doc and valueProducer support optimized merge)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll remove the check

}

/** Tracks state of one binary sub-reader that we are merging */
private static class BinaryDocValuesSub extends DocIDMerger.Sub {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from Lucene's DocValuesConsumer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except that it returns an anonymous subclass of TsdbDocValuesProducer instead of EmptyDocValuesProducer so that it can support merge stats.

d.add(new SortedSetDocValuesField("tags", new BytesRef(tags[(i + j) % tags.length])));
}

d.add(new BinaryDocValuesField("bytes_1", new BytesRef(tags[i % tags.length])));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename bytes_1 to tags_as_bytes?

@jordan-powers jordan-powers enabled auto-merge (squash) April 24, 2025 16:27
@jordan-powers jordan-powers merged commit 69c2eda into elastic:main Apr 24, 2025
15 of 17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Apr 24, 2025
#127346)

Applies the merge optimizations from #126499 and #126732 to binary field
types for the ES819 codec.
@jordan-powers jordan-powers deleted the optimize-merge-binary-fields branch June 12, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants