Skip to content

Conversation

@kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Aug 22, 2025

Values are first loaded as longs and then converted to doubles in bulk and copied to an array that's wrapped in a block.

@kkrik-es kkrik-es self-assigned this Aug 22, 2025
@kkrik-es kkrik-es added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/Compute Engine Analytics in ES|QL Team:StorageEngine :StorageEngine/Codec labels Aug 22, 2025
@kkrik-es kkrik-es requested review from dnhatn and martijnvg August 22, 2025 13:46
@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kostas! I think this approach would significantly reduce the complexity in the codec :). I hope there is no performance degradation. The profiler may lie us a bit when a small part of the value reading cost is shifted to aggregation or other computations, but not a big deal.

@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException {
if (docValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
BlockLoader.Block result = direct.tryReadDoubles(factory, docs, offset, toDouble);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have an OverLong vector or block, I think we don't need to introduce tryReadDoubles. Instead, we can just read longs from the codec and perform the conversion here. This would significantly reduce the complexity in the codec. We can add a method to BlockLoaderFactory that converts a block to doubles with a ToDouble function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to convert a BlockLoader.Block to a DoubleVector or similar? I think I tried this approach first but hit this incompatibility issue.

private final long[] values;
private final BlockUtils.ToDouble toDouble;

DoubleOverLongArrayVector(long[] values, BlockUtils.ToDouble toDouble, int positionCount, BlockFactory blockFactory) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be DoubleOverLongVector, where we delegate to a LongVector instead?

* Specialized builder for collecting dense arrays of long values.
*/
interface SingletonLongBuilder extends Builder {
void setToDouble(BlockDocValuesReader.ToDouble toDouble);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a new method to BlockFactory: toDoubles(SingletonLongBuilder longBuilder, BlockDocValuesReader.ToDouble toDouble) would be more manageable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking about this initially, but it would require creating a subclass of DoubleArrayVector wrapping the created LongArrayVector, but that's marked as final. is there a better approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that will be DoubleOverLongVector.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kostas!

I wonder whether we can simplify. Maybe we can encapsulate this change more (without changing X-Vector.java and friends), if we take a similar approach to what was done for singleton ordinals (SingletonLongToSingletonOrdinalDelegate). By adding FromSingletonLongToDoubleDelegate and adding a SingletonDoubleBuilder (to lucene.read package) that converts the longs on the fly to doubles. Additionally this would avoid adding setToDouble(...) to SingletonLongBuilder.

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Aug 25, 2025

Thanks Kostas!

I wonder whether we can simplify. Maybe we can encapsulate this change more (without changing X-Vector.java and friends), if we take a similar approach to what was done for singleton ordinals (SingletonLongToSingletonOrdinalDelegate). By adding FromSingletonLongToDoubleDelegate and adding a SingletonDoubleBuilder (to lucene.read package) that converts the longs on the fly to doubles. Additionally this would avoid adding setToDouble(...) to SingletonLongBuilder.

My problem with this solution is how to wire this in BlockDocValuesReader. tryRead returns BlockLoader.Block, I'm missing how to convert this to anything resembling longs.

@martijnvg
Copy link
Member

martijnvg commented Aug 25, 2025

I think we can just pass down BlockDocValuesReader.ToDouble as a parameter to OptionalColumnAtATimeReader#tryRead(...). If toDouble is not null we use it via the single long delegate builder, in the other case it just a dense whole number field. It would look something like this:

Subject: [PATCH] patch
---
Index: server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java b/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java
--- a/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java	(revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java	(date 1756149980151)
@@ -67,7 +67,7 @@
          * Returns {@code null} if unable to load the values.
          */
         @Nullable
-        BlockLoader.Block tryRead(BlockFactory factory, Docs docs, int offset) throws IOException;
+        BlockLoader.Block tryRead(BlockFactory factory, Docs docs, int offset, BlockDocValuesReader.ToDouble toDouble) throws IOException;
     }
 
     interface RowStrideReader extends Reader {
@@ -456,6 +456,8 @@
          */
         SingletonOrdinalsBuilder singletonOrdinalsBuilder(SortedDocValues ordinals, int count, boolean isDense);
 
+        SingletonDoubleBuilder singletonDoubleBuilder(int expectedCount);
+
         /**
          * Build a reader for reading {@link SortedSetDocValues}
          */
@@ -559,6 +561,15 @@
         SingletonOrdinalsBuilder appendOrds(int[] values, int from, int length, int minOrd, int maxOrd);
     }
 
+    interface SingletonDoubleBuilder extends Builder {
+        /**
+         * Appends an ordinal to the builder.
+         */
+        SingletonOrdinalsBuilder appendDouble(double value);
+
+        SingletonOrdinalsBuilder appendDoubles(double[] values, int from, int length);
+    }
+
     interface SortedSetOrdinalsBuilder extends Builder {
         /**
          * Appends an ordinal to the builder.
Index: server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java b/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java
--- a/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java	(revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java	(date 1756149504440)
@@ -132,7 +132,7 @@
         @Override
         public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException {
             if (numericDocValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
-                BlockLoader.Block result = direct.tryRead(factory, docs, offset);
+                BlockLoader.Block result = direct.tryRead(factory, docs, offset, toDouble);
                 if (result != null) {
                     return result;
                 }
@@ -398,6 +398,12 @@
 
         @Override
         public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException {
+            if (docValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
+                BlockLoader.Block result = direct.tryRead(factory, docs, offset, toDouble);
+                if (result != null) {
+                    return result;
+                }
+            }
             try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count() - offset)) {
                 for (int i = offset; i < docs.count(); i++) {
                     int doc = docs.get(i);
@@ -715,7 +721,7 @@
                 return readSingleDoc(factory, docs.get(offset));
             }
             if (ordinals instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
-                BlockLoader.Block block = direct.tryRead(factory, docs, offset);
+                BlockLoader.Block block = direct.tryRead(factory, docs, offset, toDouble);
                 if (block != null) {
                     return block;
                 }
Index: server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java b/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
--- a/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java	(revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java	(date 1756149980146)
@@ -45,6 +45,7 @@
 import org.apache.lucene.util.packed.PackedInts;
 import org.elasticsearch.core.IOUtils;
 import org.elasticsearch.index.codec.tsdb.TSDBDocValuesEncoder;
+import org.elasticsearch.index.mapper.BlockDocValuesReader;
 import org.elasticsearch.index.mapper.BlockLoader;
 
 import java.io.IOException;
@@ -383,7 +384,12 @@
             }
 
             @Override
-            public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+            public BlockLoader.Block tryRead(
+                BlockLoader.BlockFactory factory,
+                BlockLoader.Docs docs,
+                int offset,
+                BlockDocValuesReader.ToDouble toDouble
+            ) throws IOException {
                 if (ords instanceof BaseDenseNumericValues denseOrds) {
                     var block = tryReadAHead(factory, docs, offset);
                     if (block != null) {
@@ -457,7 +463,12 @@
         }
 
         @Override
-        public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+        public BlockLoader.Block tryRead(
+            BlockLoader.BlockFactory factory,
+            BlockLoader.Docs docs,
+            int offset,
+            BlockDocValuesReader.ToDouble toDouble
+        ) throws IOException {
             return null;
         }
 
@@ -504,7 +515,12 @@
         }
 
         @Override
-        public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+        public BlockLoader.Block tryRead(
+            BlockLoader.BlockFactory factory,
+            BlockLoader.Docs docs,
+            int offset,
+            BlockDocValuesReader.ToDouble toDouble
+        ) throws IOException {
             return null;
         }
 
@@ -1365,9 +1381,24 @@
                 }
 
                 @Override
-                public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
-                    try (BlockLoader.SingletonLongBuilder builder = factory.singletonLongs(docs.count() - offset)) {
-                        return tryRead(builder, docs, offset);
+                public BlockLoader.Block tryRead(
+                    BlockLoader.BlockFactory factory,
+                    BlockLoader.Docs docs,
+                    int offset,
+                    BlockDocValuesReader.ToDouble toDouble
+                ) throws IOException {
+                    if (toDouble != null) {
+                        try (var builder = factory.singletonDoubleBuilder(docs.count() - offset)) {
+                            BlockLoader.SingletonLongBuilder delegate = new SingletonLongToSingletonDoubleDelegate(toDouble, builder);
+                            var result = tryRead(delegate, docs, offset);
+                            if (result != null) {
+                                return result;
+                            }
+                        }
+                    } else {
+                        try (BlockLoader.SingletonLongBuilder builder = factory.singletonLongs(docs.count() - offset)) {
+                            return tryRead(builder, docs, offset);
+                        }
                     }
                 }
 
@@ -1747,6 +1778,57 @@
             builder.appendOrds(convertedOrds, 0, length, minOrd, maxOrd);
             return this;
         }
+
+        @Override
+        public BlockLoader.Block build() {
+            return builder.build();
+        }
+
+        @Override
+        public BlockLoader.Builder appendNull() {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public BlockLoader.Builder beginPositionEntry() {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public BlockLoader.Builder endPositionEntry() {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public void close() {}
+    }
+
+    static final class SingletonLongToSingletonDoubleDelegate implements BlockLoader.SingletonLongBuilder {
+        private final BlockDocValuesReader.ToDouble toDouble;
+        private final BlockLoader.SingletonDoubleBuilder builder;
+
+        SingletonLongToSingletonDoubleDelegate(BlockDocValuesReader.ToDouble toDouble, BlockLoader.SingletonDoubleBuilder builder) {
+            this.toDouble = toDouble;
+            this.builder = builder;
+        }
+
+        @Override
+        public BlockLoader.SingletonLongBuilder appendLong(long value) {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public BlockLoader.SingletonLongBuilder appendLongs(long[] values, int from, int length) {
+            int counter = 0;
+            double[] convertedValues = new double[length];
+            int end = from + length;
+            for (int j = from; j < end; j++) {
+                double ord = toDouble.convert(values[j]);
+                convertedValues[counter++] = ord;
+            }
+            builder.appendDoubles(convertedValues, 0, length);
+            return this;
+        }
 
         @Override
         public BlockLoader.Block build() {

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Aug 26, 2025

I think we can just pass down BlockDocValuesReader.ToDouble as a parameter to OptionalColumnAtATimeReader#tryRead(...). If toDouble is not null we use it via the single long delegate builder, in the other case it just a dense whole number field. It would look something like this:

 +
 +        @Override
 +        public BlockLoader.SingletonLongBuilder appendLongs(long[] values, int from, int length) {
 +            int counter = 0;
 +            double[] convertedValues = new double[length];
 +            int end = from + length;
 +            for (int j = from; j < end; j++) {
 +                double ord = toDouble.convert(values[j]);
 +                convertedValues[counter++] = ord;
 +            }
 +            builder.appendDoubles(convertedValues, 0, length);
 +            return this;
 +        }
  
          @Override
          public BlockLoader.Block build() {

This looks pretty similar to my approach, but it requires two arrays (one long[], one double[]), whereas my approach performs the conversion on the fly. Still, lemme see if I can avoid polluting SingletonLongBuilder - which is the part that you don't like much, iiuc.

@martijnvg
Copy link
Member

This looks pretty similar to my approach, but it requires two arrays (one long[], one double[]), whereas my approach performs the conversion on the fly.

In the proposed approach it just create one large double[]. The there is just a small double[] buffer array in SingletonLongToSingletonDoubleDelegate that converts values from the block buffer to double, that in turn get array copied into the larger double[]. When benchmarking this approach for dense keywords, it wasn't slower compared to lazy converting on the fly to ints like is done in this PR.

@kkrik-es
Copy link
Contributor Author

Updated, ptal. If this looks reasonable, I'll add tests and it ready.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and I think we can remove the draft status from this PR!

Regarding testing maybe overwrite supportsBulkBlockReading() in ScaledFloatFieldMapperTests, DoubleFieldMapperTests and other floating point based field mapper tests? This way the testSingletonLongBulkBlockReading() get executed for these field types.

@kkrik-es kkrik-es marked this pull request as ready for review August 26, 2025 17:53
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Kostas!

int offset,
BlockDocValuesReader.ToDouble toDouble
) throws IOException {
if (ords instanceof BaseDenseNumericValues denseOrds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert toDouble is null here?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kostas! LGTM 👍

@kkrik-es kkrik-es merged commit 80d730e into elastic:main Aug 27, 2025
33 checks passed
@kkrik-es kkrik-es deleted the tsdb/bulk-load-float branch September 29, 2025 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants