-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715
Conversation
| minOrd = Math.min(minOrd, ord); | ||
| maxOrd = Math.max(maxOrd, ord); | ||
| } | ||
| builder.appendOrds(convertedOrds, 0, length, minOrd, maxOrd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builder is backed by a int[] so we need to do a conversion here, because TSDBDocValuesEncoder is long[] based. But is there a better way? For example can we have builder that accepts long[]? And then in the build() do the conversion to int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do this conversation somewhere. Maybe here is good enough.
| valuesData.seek(indexReader.get(blockIndex)); | ||
| } | ||
| currentBlockIndex = blockIndex; | ||
| decoder.decodeOrdinals(valuesData, currentBlock, bitsPerOrd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is largely the same as the read(...) method except here (decoder.decodeOrdinals(valuesData, currentBlock, bitsPerOrd);) and at the end because the builder is different.
Ideally would like to see less code duplication here.
| } | ||
|
|
||
| @Override | ||
| public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordinals are implemented in the codec as numeric doc values. Builder gets created here, since we need to reference a SortedDocValues instance.
c995f5e to
1a0e927
Compare
|
|
||
| @Override | ||
| public boolean supportsBlockRead() { | ||
| return ords instanceof BulkNumericDocValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on whether the field is empty, has a single unique value, sparse or dense we got another implementation here. Only the dense implementation support bulk loading, but that is why this supportsBlockRead() method exists.
…o tsdb codec. This change targets reading field values in bulk mode at codec level when doc values type is sorted doc values or sorted set doc values, there is only one value per document, and the field is dense (all documents have a value). Relates to elastic#128445
1a0e927 to
f7ff379
Compare
|
I've locally benchmarking with the following query: When running the following profile query: Without this change host.name value loading takes: Wit this change host.name value loading takes: |
With this change both sorted set and number doc values use the same bulk loading for values/ordinals. This PR supersedes (#132715) that also sped up loading dense singleton keyword fields, but duplicated the bulk encoding logic.
With this change both sorted set and number doc values use the same bulk loading for values/ordinals. This PR supersedes (elastic#132715) that also sped up loading dense singleton keyword fields, but duplicated the bulk encoding logic.
WIP Note: no tests yet, exploring how to implement bulk loading of dense singleton sorted (set) doc values.
This change targets reading field values in bulk mode at codec level when doc values type is sorted doc values or sorted set doc values, there is only one value per document, and the field is dense (all documents have a value).
Relates to #128445