Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715

martijnvg · 2025-08-12T09:45:08Z

WIP Note: no tests yet, exploring how to implement bulk loading of dense singleton sorted (set) doc values.

This change targets reading field values in bulk mode at codec level when doc values type is sorted doc values or sorted set doc values, there is only one value per document, and the field is dense (all documents have a value).

Relates to #128445

martijnvg · 2025-08-12T09:51:32Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+                            minOrd = Math.min(minOrd, ord);
+                            maxOrd = Math.max(maxOrd, ord);
+                        }
+                        builder.appendOrds(convertedOrds, 0, length, minOrd, maxOrd);


The builder is backed by a int[] so we need to do a conversion here, because TSDBDocValuesEncoder is long[] based. But is there a better way? For example can we have builder that accepts long[]? And then in the build() do the conversion to int?

I think we need to do this conversation somewhere. Maybe here is good enough.

martijnvg · 2025-08-12T09:52:47Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+                                valuesData.seek(indexReader.get(blockIndex));
+                            }
+                            currentBlockIndex = blockIndex;
+                            decoder.decodeOrdinals(valuesData, currentBlock, bitsPerOrd);


This method is largely the same as the read(...) method except here (decoder.decodeOrdinals(valuesData, currentBlock, bitsPerOrd);) and at the end because the builder is different.

Ideally would like to see less code duplication here.

martijnvg · 2025-08-12T09:54:34Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

            }
+
+            @Override
+            public BlockLoader.Block read(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {


Ordinals are implemented in the codec as numeric doc values. Builder gets created here, since we need to reference a SortedDocValues instance.

martijnvg · 2025-08-12T09:57:21Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

+
+            @Override
+            public boolean supportsBlockRead() {
+                return ords instanceof BulkNumericDocValues;


Depending on whether the field is empty, has a single unique value, sparse or dense we got another implementation here. Only the dense implementation support bulk loading, but that is why this supportsBlockRead() method exists.

…o tsdb codec. This change targets reading field values in bulk mode at codec level when doc values type is sorted doc values or sorted set doc values, there is only one value per document, and the field is dense (all documents have a value). Relates to elastic#128445

martijnvg · 2025-08-14T15:14:46Z

I've locally benchmarking with the following query: FROM metrics-hostmetricsreceiver.otel-default | STATS count(*) BY host.name | LIMIT 10000. Without this change the total query time is ~260ms and with this change ~150ms.

When running the following profile query:

POST /_query
{
    "profile": true,
    "query": "FROM metrics-hostmetricsreceiver.otel-default | STATS count(*) BY host.name | LIMIT 10000",
    "pragma": {
        "data_partitioning": "shard"
    }
}

Without this change host.name value loading takes:

{
    "operator": "ValuesSourceReaderOperator[fields = [host.name]]",
    "status": {
        "readers_built": {
            "host.name:column_at_a_time:BlockDocValuesReader.SingletonOrdinals": 58
        },
        "values_loaded": 221184000,
        "process_nanos": 1216242904, <-- ~1216 ms
        "pages_received": 45597,
        "pages_emitted": 45597,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

Wit this change host.name value loading takes:

{
    "operator": "ValuesSourceReaderOperator[fields = [host.name]]",
    "status": {
        "readers_built": {
            "host.name:column_at_a_time:BlockDocValuesReader.SingletonOrdinals": 58
        },
        "values_loaded": 221184000,
        "process_nanos": 387031891, <-- 387 ms
        "pages_received": 45597,
        "pages_emitted": 45597,
        "rows_received": 221184000,
        "rows_emitted": 221184000
    }
}

With this change both sorted set and number doc values use the same bulk loading for values/ordinals. This PR supersedes (#132715) that also sped up loading dense singleton keyword fields, but duplicated the bulk encoding logic.

With this change both sorted set and number doc values use the same bulk loading for values/ordinals. This PR supersedes (elastic#132715) that also sped up loading dense singleton keyword fields, but duplicated the bulk encoding logic.

martijnvg added :Analytics/Compute Engine Analytics in ES|QL :StorageEngine/Codec labels Aug 12, 2025

elasticsearchmachine added the v9.2.0 label Aug 12, 2025

martijnvg commented Aug 12, 2025

View reviewed changes

martijnvg force-pushed the compute_engine_improve_loading_dense_sorted_doc_values branch from c995f5e to 1a0e927 Compare August 12, 2025 09:55

martijnvg commented Aug 12, 2025

View reviewed changes

martijnvg mentioned this pull request Aug 12, 2025

Bulk doc value loading at codec level #128445

Closed

6 tasks

martijnvg added 2 commits August 14, 2025 14:52

Specialize SingletonOrdinalsBuilder to handle dense cases.

f7ff379

martijnvg force-pushed the compute_engine_improve_loading_dense_sorted_doc_values branch from 1a0e927 to f7ff379 Compare August 14, 2025 12:03

martijnvg closed this Aug 15, 2025

martijnvg mentioned this pull request Aug 16, 2025

Speed up loading dense singleton keyword fields #132994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715

Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715

Uh oh!

martijnvg commented Aug 12, 2025

Uh oh!

martijnvg Aug 12, 2025

Uh oh!

martijnvg Aug 14, 2025

Uh oh!

martijnvg Aug 12, 2025

Uh oh!

martijnvg Aug 12, 2025

Uh oh!

martijnvg Aug 12, 2025

Uh oh!

martijnvg commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715

Push compute engine value loading for dense singleton ordinals down to tsdb codec. #132715

Uh oh!

Conversation

martijnvg commented Aug 12, 2025

Uh oh!

martijnvg Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants