Skip to content

Conversation

@jordan-powers
Copy link
Contributor

This PR adds a mapping parameter to keyword fields doc_values.cardinality. When this parameter is set to low (the default), keyword fields will use sorted set doc values as normal. However, when this parameter is set to high, keyword fields will instead use binary doc values.

This is an optimization to remove the overhead of looking up keyword values by ordinal when the keyword field has high-cardinality.

This is still a work in progress, but I am opening the draft PR to start getting CI running.

@jordan-powers jordan-powers self-assigned this Nov 25, 2025
@jordan-powers jordan-powers added >feature :StorageEngine/Mapping The storage related side of mappings labels Nov 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jordan-powers, I've created a changelog YAML for you.

if (hasValue) {
for (int i = 0; i < sortedBinaryDocValues.docValueCount(); i++) {
BytesRef bytesRef = sortedBinaryDocValues.nextValue();
emit(bytesRef.utf8ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the conversion from ut8 to utf16 and then back here is a significant overhead on wildcard/regex queries.

It worth removing this roundtrip, though probably belongs in a follow-up. Here's a hacky approach I made to fix this: parkertimmins@fa13b3b#diff-cf9d201e04fb4fd754a3981f450cda5e68c551392e781a78aaa4ef8ccc48bccd

Another idea is that maybe you could use BinaryDvConfirmedQuery. This operates on BytesRefs directly so probably does not have this round trip. (Though I'm not really sure if it makes sense to use)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that, thanks! Although if it requires more than a couple of lines to fix, I think it'll be best left as a follow-up. This PR is getting long enough as-is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :StorageEngine/Mapping The storage related side of mappings v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants