Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/130705.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 130705
summary: Fix `BytesRef2BlockHash`
area: ES|QL
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ public Block[] getKeys() {
try {
try (BytesRefBlock.Builder b1 = blockFactory.newBytesRefBlockBuilder(positions)) {
for (int i = 0; i < positions; i++) {
int k1 = (int) (finalHash.get(i) & 0xffffL);
int k1 = (int) (finalHash.get(i) & 0xffffffffL);
if (k1 == 0) {
Copy link
Contributor

@bpintea bpintea Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this (and below) be if (k <= 0) { ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be 0 because we grab the low bits. Right? I haven't had enough coffee for bit math.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the finalHash.get(i) not return smth like 0xe0000000L (or some other value with the 32nd bit set)?
The hash1.hash.get() method seems to expect a non-negative argument. It might all be safe, it just stood out to me since we're truncating a long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are always positive, they are generated by BytesRefBlockHash, see hashOrdToGroupNullReserved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one of the implicit assumptions is that a block hash will never have more than 2B positions. If it does, we'll need a larger data structure anyway I guess.

Copy link
Contributor

@bpintea bpintea Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are always positive

To make sure: 0xe0000000L is a positive long. But (int) (0xe0000000L & 0xffffffffL) is a negative int.

I guess one of the implicit assumptions is that a block hash will never have more than 2B positions.

Yeh, this is why I was thinking that this might still be safe. 👍 (At least if these are positions.)

Copy link
Contributor

@alex-spies alex-spies Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @nik9000 should probably comment here as well - and whatever we end up with should probably become a Java comment here, that's too many assumptions for my head to infer, at least when skimming the file :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 any additional observations on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y'all are right - we need positive integers from the hash. We're using the one from core which allows long - but we do intend to use only int. It's for sure worth a comment.

b1.appendNull();
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;

import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.lucene.BytesRefs;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.compute.aggregation.GroupingAggregatorFunction;
import org.elasticsearch.compute.data.Block;
Expand All @@ -35,8 +36,10 @@
import org.elasticsearch.xpack.esql.core.util.Holder;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Locale;
import java.util.Set;
import java.util.function.Consumer;
import java.util.stream.IntStream;
import java.util.stream.LongStream;
Expand Down Expand Up @@ -1232,6 +1235,190 @@ public void testLongNull() {
}, blockFactory.newLongArrayVector(values, values.length).asBlock(), blockFactory.newConstantNullBlock(values.length));
}

public void test2BytesRefsKeys() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a more descriptive name? Like test2BytesRefsHighCardinalityKey or so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on HighCardinality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is java. Lean into the memes about long method names....

final Page page;
final int positions1 = 100_000;
final int positions2 = 10;
final int totalPositions = positions1 * positions2;
try (
BytesRefBlock.Builder builder1 = blockFactory.newBytesRefBlockBuilder(totalPositions);
BytesRefBlock.Builder builder2 = blockFactory.newBytesRefBlockBuilder(totalPositions);
) {
for (int i = 0; i < positions1; i++) {
for (int p = 0; p < positions2; p++) {
builder1.appendBytesRef(new BytesRef("abcdef" + i));
builder2.appendBytesRef(new BytesRef("abcdef" + p));
}
}
page = new Page(builder1.build(), builder2.build());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this all in one page? I know the BlockHash needs it in memory. And we build that HashMap with it in memory below. But maybe we make Pages of a reasonable size here?

Meh. I guess it's only like 9mb. Probably not worth changing it unless you need to go bigger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's small enough, and it reproduced the problem pretty consistently, so I guess it's not worth the effort and complication.

record Output(int offset, IntBlock block, IntVector vector) implements Releasable {
@Override
public void close() {
Releasables.close(block, vector);
}
}
List<Output> output = new ArrayList<>();

try (BlockHash hash1 = new BytesRef2BlockHash(blockFactory, 0, 1, totalPositions);) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we randomize which key has high cardinality? It looks like this test only stuffs the high cardinality key into the first position, which repros the bug, but in principle the other position might have the same problem (if we implemented things differently).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

hash1.add(page, new GroupingAggregatorFunction.AddInput() {
@Override
public void add(int positionOffset, IntArrayBlock groupIds) {
groupIds.incRef();
output.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntBigArrayBlock groupIds) {
groupIds.incRef();
output.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntVector groupIds) {
groupIds.incRef();
output.add(new Output(positionOffset, null, groupIds));
}

@Override
public void close() {
fail("hashes should not close AddInput");
}
});

Block[] keys = hash1.getKeys();
try {
Set<String> distinctKeys = new HashSet<>();
BytesRefBlock block0 = (BytesRefBlock) keys[0];
BytesRefBlock block1 = (BytesRefBlock) keys[1];
BytesRef scratch = new BytesRef();
StringBuilder builder = new StringBuilder();
for (int i = 0; i < totalPositions; i++) {
builder.setLength(0);
builder.append(BytesRefs.toString(block0.getBytesRef(i, scratch)));
builder.append("#");
builder.append(BytesRefs.toString(block1.getBytesRef(i, scratch)));
distinctKeys.add(builder.toString());
}
assertThat(distinctKeys.size(), equalTo(totalPositions));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the fix and you'll see a failure here

} finally {
Releasables.close(keys);
}
} finally {
Releasables.close(output);
page.releaseBlocks();
}
}

public void test2BytesRefs() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same test we had for BytesRef3BlockHash

final Page page;
final int positions = randomIntBetween(1, 1000);
final boolean generateVector = randomBoolean();
try (
BytesRefBlock.Builder builder1 = blockFactory.newBytesRefBlockBuilder(positions);
BytesRefBlock.Builder builder2 = blockFactory.newBytesRefBlockBuilder(positions);
) {
List<BytesRefBlock.Builder> builders = List.of(builder1, builder2);
for (int p = 0; p < positions; p++) {
for (BytesRefBlock.Builder builder : builders) {
int valueCount = generateVector ? 1 : between(0, 3);
switch (valueCount) {
case 0 -> builder.appendNull();
case 1 -> builder.appendBytesRef(new BytesRef(Integer.toString(between(1, 100))));
default -> {
builder.beginPositionEntry();
for (int v = 0; v < valueCount; v++) {
builder.appendBytesRef(new BytesRef(Integer.toString(between(1, 100))));
}
builder.endPositionEntry();
}
}
}
}
page = new Page(builder1.build(), builder2.build());
}
final int emitBatchSize = between(positions, 10 * 1024);
var groupSpecs = List.of(new BlockHash.GroupSpec(0, ElementType.BYTES_REF), new BlockHash.GroupSpec(1, ElementType.BYTES_REF));
record Output(int offset, IntBlock block, IntVector vector) implements Releasable {
@Override
public void close() {
Releasables.close(block, vector);
}
}
List<Output> output1 = new ArrayList<>();
List<Output> output2 = new ArrayList<>();
try (
BlockHash hash1 = new BytesRef2BlockHash(blockFactory, 0, 1, emitBatchSize);
BlockHash hash2 = new PackedValuesBlockHash(groupSpecs, blockFactory, emitBatchSize)
) {
hash1.add(page, new GroupingAggregatorFunction.AddInput() {
@Override
public void add(int positionOffset, IntArrayBlock groupIds) {
groupIds.incRef();
output1.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntBigArrayBlock groupIds) {
groupIds.incRef();
output1.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntVector groupIds) {
groupIds.incRef();
output1.add(new Output(positionOffset, null, groupIds));
}

@Override
public void close() {
fail("hashes should not close AddInput");
}
});
hash2.add(page, new GroupingAggregatorFunction.AddInput() {
@Override
public void add(int positionOffset, IntArrayBlock groupIds) {
groupIds.incRef();
output2.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntBigArrayBlock groupIds) {
groupIds.incRef();
output2.add(new Output(positionOffset, groupIds, null));
}

@Override
public void add(int positionOffset, IntVector groupIds) {
groupIds.incRef();
output2.add(new Output(positionOffset, null, groupIds));
}

@Override
public void close() {
fail("hashes should not close AddInput");
}
});
assertThat(output1.size(), equalTo(output2.size()));
for (int i = 0; i < output1.size(); i++) {
Output o1 = output1.get(i);
Output o2 = output2.get(i);
assertThat(o1.offset, equalTo(o2.offset));
if (o1.vector != null) {
assertNull(o1.block);
assertThat(o1.vector, equalTo(o2.vector != null ? o2.vector : o2.block.asVector()));
} else {
assertNull(o2.vector);
assertThat(o1.block, equalTo(o2.block));
}
}
} finally {
Releasables.close(output1);
Releasables.close(output2);
page.releaseBlocks();
}
}

public void test3BytesRefs() {
final Page page;
final int positions = randomIntBetween(1, 1000);
Expand Down Expand Up @@ -1326,7 +1513,7 @@ public void close() {
fail("hashes should not close AddInput");
}
});
assertThat(output1.size(), equalTo(output1.size()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo probably

assertThat(output1.size(), equalTo(output2.size()));
for (int i = 0; i < output1.size(); i++) {
Output o1 = output1.get(i);
Output o2 = output2.get(i);
Expand Down
Loading