-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Push down loading of singleton dense double based field types to the … #133397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es819 doc value codec
|
Hi @kkrik-es, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kostas! I think this approach would significantly reduce the complexity in the codec :). I hope there is no performance degradation. The profiler may lie us a bit when a small part of the value reading cost is shifted to aggregation or other computations, but not a big deal.
| @Override | ||
| public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException { | ||
| if (docValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) { | ||
| BlockLoader.Block result = direct.tryReadDoubles(factory, docs, offset, toDouble); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an OverLong vector or block, I think we don't need to introduce tryReadDoubles. Instead, we can just read longs from the codec and perform the conversion here. This would significantly reduce the complexity in the codec. We can add a method to BlockLoaderFactory that converts a block to doubles with a ToDouble function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good way to convert a BlockLoader.Block to a DoubleVector or similar? I think I tried this approach first but hit this incompatibility issue.
| private final long[] values; | ||
| private final BlockUtils.ToDouble toDouble; | ||
|
|
||
| DoubleOverLongArrayVector(long[] values, BlockUtils.ToDouble toDouble, int positionCount, BlockFactory blockFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be DoubleOverLongVector, where we delegate to a LongVector instead?
| * Specialized builder for collecting dense arrays of long values. | ||
| */ | ||
| interface SingletonLongBuilder extends Builder { | ||
| void setToDouble(BlockDocValuesReader.ToDouble toDouble); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a new method to BlockFactory: toDoubles(SingletonLongBuilder longBuilder, BlockDocValuesReader.ToDouble toDouble) would be more manageable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking about this initially, but it would require creating a subclass of DoubleArrayVector wrapping the created LongArrayVector, but that's marked as final. is there a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that will be DoubleOverLongVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kostas!
I wonder whether we can simplify. Maybe we can encapsulate this change more (without changing X-Vector.java and friends), if we take a similar approach to what was done for singleton ordinals (SingletonLongToSingletonOrdinalDelegate). By adding FromSingletonLongToDoubleDelegate and adding a SingletonDoubleBuilder (to lucene.read package) that converts the longs on the fly to doubles. Additionally this would avoid adding setToDouble(...) to SingletonLongBuilder.
My problem with this solution is how to wire this in |
|
I think we can just pass down Subject: [PATCH] patch
---
Index: server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java b/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java
--- a/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java (revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/mapper/BlockLoader.java (date 1756149980151)
@@ -67,7 +67,7 @@
* Returns {@code null} if unable to load the values.
*/
@Nullable
- BlockLoader.Block tryRead(BlockFactory factory, Docs docs, int offset) throws IOException;
+ BlockLoader.Block tryRead(BlockFactory factory, Docs docs, int offset, BlockDocValuesReader.ToDouble toDouble) throws IOException;
}
interface RowStrideReader extends Reader {
@@ -456,6 +456,8 @@
*/
SingletonOrdinalsBuilder singletonOrdinalsBuilder(SortedDocValues ordinals, int count, boolean isDense);
+ SingletonDoubleBuilder singletonDoubleBuilder(int expectedCount);
+
/**
* Build a reader for reading {@link SortedSetDocValues}
*/
@@ -559,6 +561,15 @@
SingletonOrdinalsBuilder appendOrds(int[] values, int from, int length, int minOrd, int maxOrd);
}
+ interface SingletonDoubleBuilder extends Builder {
+ /**
+ * Appends an ordinal to the builder.
+ */
+ SingletonOrdinalsBuilder appendDouble(double value);
+
+ SingletonOrdinalsBuilder appendDoubles(double[] values, int from, int length);
+ }
+
interface SortedSetOrdinalsBuilder extends Builder {
/**
* Appends an ordinal to the builder.
Index: server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java b/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java
--- a/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java (revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java (date 1756149504440)
@@ -132,7 +132,7 @@
@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException {
if (numericDocValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
- BlockLoader.Block result = direct.tryRead(factory, docs, offset);
+ BlockLoader.Block result = direct.tryRead(factory, docs, offset, toDouble);
if (result != null) {
return result;
}
@@ -398,6 +398,12 @@
@Override
public BlockLoader.Block read(BlockFactory factory, Docs docs, int offset, boolean nullsFiltered) throws IOException {
+ if (docValues instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
+ BlockLoader.Block result = direct.tryRead(factory, docs, offset, toDouble);
+ if (result != null) {
+ return result;
+ }
+ }
try (BlockLoader.DoubleBuilder builder = factory.doublesFromDocValues(docs.count() - offset)) {
for (int i = offset; i < docs.count(); i++) {
int doc = docs.get(i);
@@ -715,7 +721,7 @@
return readSingleDoc(factory, docs.get(offset));
}
if (ordinals instanceof BlockLoader.OptionalColumnAtATimeReader direct) {
- BlockLoader.Block block = direct.tryRead(factory, docs, offset);
+ BlockLoader.Block block = direct.tryRead(factory, docs, offset, toDouble);
if (block != null) {
return block;
}
Index: server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java b/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
--- a/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java (revision 4e9af1088a0f97cb968af09d23eae1916ee94edb)
+++ b/server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java (date 1756149980146)
@@ -45,6 +45,7 @@
import org.apache.lucene.util.packed.PackedInts;
import org.elasticsearch.core.IOUtils;
import org.elasticsearch.index.codec.tsdb.TSDBDocValuesEncoder;
+import org.elasticsearch.index.mapper.BlockDocValuesReader;
import org.elasticsearch.index.mapper.BlockLoader;
import java.io.IOException;
@@ -383,7 +384,12 @@
}
@Override
- public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+ public BlockLoader.Block tryRead(
+ BlockLoader.BlockFactory factory,
+ BlockLoader.Docs docs,
+ int offset,
+ BlockDocValuesReader.ToDouble toDouble
+ ) throws IOException {
if (ords instanceof BaseDenseNumericValues denseOrds) {
var block = tryReadAHead(factory, docs, offset);
if (block != null) {
@@ -457,7 +463,12 @@
}
@Override
- public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+ public BlockLoader.Block tryRead(
+ BlockLoader.BlockFactory factory,
+ BlockLoader.Docs docs,
+ int offset,
+ BlockDocValuesReader.ToDouble toDouble
+ ) throws IOException {
return null;
}
@@ -504,7 +515,12 @@
}
@Override
- public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
+ public BlockLoader.Block tryRead(
+ BlockLoader.BlockFactory factory,
+ BlockLoader.Docs docs,
+ int offset,
+ BlockDocValuesReader.ToDouble toDouble
+ ) throws IOException {
return null;
}
@@ -1365,9 +1381,24 @@
}
@Override
- public BlockLoader.Block tryRead(BlockLoader.BlockFactory factory, BlockLoader.Docs docs, int offset) throws IOException {
- try (BlockLoader.SingletonLongBuilder builder = factory.singletonLongs(docs.count() - offset)) {
- return tryRead(builder, docs, offset);
+ public BlockLoader.Block tryRead(
+ BlockLoader.BlockFactory factory,
+ BlockLoader.Docs docs,
+ int offset,
+ BlockDocValuesReader.ToDouble toDouble
+ ) throws IOException {
+ if (toDouble != null) {
+ try (var builder = factory.singletonDoubleBuilder(docs.count() - offset)) {
+ BlockLoader.SingletonLongBuilder delegate = new SingletonLongToSingletonDoubleDelegate(toDouble, builder);
+ var result = tryRead(delegate, docs, offset);
+ if (result != null) {
+ return result;
+ }
+ }
+ } else {
+ try (BlockLoader.SingletonLongBuilder builder = factory.singletonLongs(docs.count() - offset)) {
+ return tryRead(builder, docs, offset);
+ }
}
}
@@ -1747,6 +1778,57 @@
builder.appendOrds(convertedOrds, 0, length, minOrd, maxOrd);
return this;
}
+
+ @Override
+ public BlockLoader.Block build() {
+ return builder.build();
+ }
+
+ @Override
+ public BlockLoader.Builder appendNull() {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public BlockLoader.Builder beginPositionEntry() {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public BlockLoader.Builder endPositionEntry() {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void close() {}
+ }
+
+ static final class SingletonLongToSingletonDoubleDelegate implements BlockLoader.SingletonLongBuilder {
+ private final BlockDocValuesReader.ToDouble toDouble;
+ private final BlockLoader.SingletonDoubleBuilder builder;
+
+ SingletonLongToSingletonDoubleDelegate(BlockDocValuesReader.ToDouble toDouble, BlockLoader.SingletonDoubleBuilder builder) {
+ this.toDouble = toDouble;
+ this.builder = builder;
+ }
+
+ @Override
+ public BlockLoader.SingletonLongBuilder appendLong(long value) {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public BlockLoader.SingletonLongBuilder appendLongs(long[] values, int from, int length) {
+ int counter = 0;
+ double[] convertedValues = new double[length];
+ int end = from + length;
+ for (int j = from; j < end; j++) {
+ double ord = toDouble.convert(values[j]);
+ convertedValues[counter++] = ord;
+ }
+ builder.appendDoubles(convertedValues, 0, length);
+ return this;
+ }
@Override
public BlockLoader.Block build() {
|
This looks pretty similar to my approach, but it requires two arrays (one long[], one double[]), whereas my approach performs the conversion on the fly. Still, lemme see if I can avoid polluting |
In the proposed approach it just create one large double[]. The there is just a small double[] buffer array in |
|
Updated, ptal. If this looks reasonable, I'll add tests and it ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and I think we can remove the draft status from this PR!
Regarding testing maybe overwrite supportsBulkBlockReading() in ScaledFloatFieldMapperTests, DoubleFieldMapperTests and other floating point based field mapper tests? This way the testSingletonLongBulkBlockReading() get executed for these field types.
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/BlockUtils.java
Outdated
Show resolved
Hide resolved
...esql/compute/src/main/java/org/elasticsearch/compute/lucene/read/SingletonDoubleBuilder.java
Show resolved
Hide resolved
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Kostas!
| int offset, | ||
| BlockDocValuesReader.ToDouble toDouble | ||
| ) throws IOException { | ||
| if (ords instanceof BaseDenseNumericValues denseOrds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert toDouble is null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kostas! LGTM 👍
test/framework/src/main/java/org/elasticsearch/index/mapper/MapperTestCase.java
Outdated
Show resolved
Hide resolved
…pperTestCase.java Co-authored-by: Martijn van Groningen <[email protected]>
Values are first loaded as longs and then converted to doubles in bulk and copied to an array that's wrapped in a block.