Skip to content

Conversation

Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Oct 7, 2025

This addresses #135891

TODO: write up explanation of what went wrong and his this PR fixes it

@elasticsearchmachine
Copy link
Collaborator

Hi @Kubik42, I've created a changelog YAML for you.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from e7afa0b to 1f168f2 Compare October 8, 2025 03:10
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

The reason for failing tests is here. Whenever there is no FieldExtractPreference (ie. FieldExtractPreference.NONE, just like in the SDH), geo_points are mapped to BytesRef. This means that when the block loader sanity checker runs, it expects the underlying element type to be BytesRef. However, since we've now started to rely on doc_values, the type is actually LONG.

Perhaps the bigger problem here is that NONE is being assigned when DOC_VALUES should be the one being assigned. This would normally not be an issue since field types generally use the same underlying types for storing values, regardless of whether they're using doc_values vs stored fields. However, geo_points are special and the way we store them changes. More specifically, if doc_values are available, we store them as Longs and when the stored flag is toggled, we store them as BytesRef; although technically its Strings, but more on that later.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

In my rabbit hole of understanding how BlockLoaders work, I found a somewhat of a bug thats in main. When a geo_point is specified with store = true and doc_values = false, we end up returning an imprecise result.

For example, the following command fails:

./gradlew ":x-pack:plugin:logsdb:yamlRestTest" --tests "org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT.test {yaml=/51_esql_synthetic_source/Simple from}" -Dtests.seed=76E814C77C112204 -Dtests.locale=xh -Dtests.timezone=Europe/Saratov -Druntime.java=25

with:

        Expected: "POINT (-74.00600004941225 40.712799984030426)"
             but: was "POINT (-74.006 40.7128)"
            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
            at org.junit.Assert.assertThat(Assert.java:964)
            at org.junit.Assert.assertThat(Assert.java:930)
            at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:100)
            at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:68)
            at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:522)
            ... 41 more

Based on what I've read in the code, this precision loss is expected. That being said, its not documented publicly.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 8, 2025

There is also a problem with using BlockStoredFieldsReader.BytesFromStringsBlockLoader(name()) even when the field is stored. This is despite the fact that during indexing, the StoredField is created with a String representation of the geo_point; code.

When block loading, I get the error: Unknown geometry type: 825699888, which comes from here.

My guess is that it has something to do with the geo_point being stored as a String, rather than being converted to WKT first. That said, even when I used WellKnownText.toWKT() I still received a similar error - Unknown geometry type: 1414416719.

For now, I've just followed what we do in GeoShapeWithDocValuesFieldMapper and changed the block loader to BlockStoredFieldsReader.BytesFromBytesRefsBlockLoader, which has worked.

This needs further discussion.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from 1f168f2 to b586f4c Compare October 9, 2025 01:08
@Kubik42
Copy link
Contributor Author

Kubik42 commented Oct 9, 2025

I did a little more investigating and found that the reason we end up with FieldExtractPreference.NONE is because we initialize FieldExtractExec here, where both docValuesAttributes and boundsAttributes are empty sets. This leads to the following check failing, which ultimately returns defaultPreference, which is FieldExtractPreference.NONE.

This got me thinking, why are we setting the preference to NONE for attributes with doc_values? Shouldn't it be DOC_VALUES? If we think about it, NONE implies we can load the values however we want, so we're not restricted to using doc_values for that. This made me notice that inside this function, the attribute itself contains a DataType, which in turns exposes hasDocValues(). So, ultimately, wouldn't it make more sense to just do:

if (docValuesAttributes.contains(attr) || attr.dataType().hasDocValues()) {
    return MappedFieldType.FieldExtractPreference.DOC_VALUES;
}

However, doing so causes some type inequality exceptions.

Nhat pointed out this code, which suggests that doc_values shouldn't be always be used. However, that conflicts with the definition of FieldExtractPreference.NONE, which states that we can use anything.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from e9ba832 to 7176d9e Compare October 10, 2025 22:58
@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from 7176d9e to 8573b2f Compare October 10, 2025 23:04
@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch 2 times, most recently from adc51f9 to 5cedfba Compare October 13, 2025 04:52
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments.

* This implies that we need to load the value from _source. This however is very slow, especially when synthetic source is enabled.
* We're better off reading from doc_values and converting to BytesRef to satisfy the checker. This is what this block loader is for.
*/
static class BytesRefFromLongsBlockLoader extends BlockDocValuesReader.DocValuesBlockLoader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seal classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean final? sealed doesn't work on BytesRefFromLongsBlockLoader since it has no subclasses

}

@Override
public TestBlock build() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can this be removed now?

Copy link
Contributor Author

@Kubik42 Kubik42 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't actually sure about this, but it was causing issues for my tests. When the block loader was reading values, it always ended up with two values: one containing the point and one containing null. I checked and this is the same behavior as the current BlockDocValuesReader.LongsBlockLoader. However, because expectedCount was hard coded, I was getting the error something along the lines of "expected 1 but got 2".

There are 12 classes in this file and none of them hard code expectedCount. And this one has no explanation as to why. After digging around, it seems as if this class was coded explicitly to work with BlockDocValuesReader.Ordinals.readSingleDoc(). So I figured it can be changed to accommodate others.

The code on lines 69-81 is deleted because it works with the assumption that there is only one value. Which is no longer the case.

That being said... it seems as though KeywordFieldBlockLoaderTests actually depends on the original behavior with expectedCount = 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I don't know what to do with this code. Removing it makes the most sense to me and its what makes the geo point block loader tests pass. However, it causes failures in keyword and version string block loader tests:

./gradlew ":x-pack:plugin:mapper-version:test" --tests "org.elasticsearch.xpack.versionfield.VersionStringFieldBlockLoaderTests.testBlockLoaderForFieldInObject {preference=Params[syntheticSource=true, preference=NONE]}" -Dtests.seed=5705FB7B03C2B578 -Dtests.locale=lg -Dtests.timezone=America/Tegucigalpa -Druntime.java=25

  2> java.lang.AssertionError:
    Expected: a collection with size <9>
         but: collection size was <1>
        at __randomizedtesting.SeedInfo.seed([5705FB7B03C2B578:F9E93A4005FCC128]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2722)
        at org.elasticsearch.index.mapper.TestBlock$Builder.build(TestBlock.java:495)
        at org.elasticsearch.index.mapper.TestBlock$Builder.build(TestBlock.java:456)
        at org.elasticsearch.index.mapper.BlockDocValuesReader$Ordinals.readSingleDoc(BlockDocValuesReader.java:874)
        at org.elasticsearch.index.mapper.BlockDocValuesReader$Ordinals.read(BlockDocValuesReader.java:826)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.load(BlockLoaderTestRunner.java:117)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.setupAndInvokeBlockLoader(BlockLoaderTestRunner.java:84)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.runTest(BlockLoaderTestRunner.java:55)
        at org.elasticsearch.index.mapper.BlockLoaderTestCase.testBlockLoaderForFieldInObject(BlockLoaderTestCase.java:146)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.elasticsearch.entitlement.bootstrap.TestEntitlementsRule$1.evaluate(TestEntitlementsRule.java:77)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1474)
 ./gradlew ":server:test" --tests "org.elasticsearch.index.mapper.blockloader.KeywordFieldBlockLoaderTests.testBlockLoaderForFieldInObject {preference=Params[syntheticSource=false, preference=DOC_VALUES]}" -Dtests.seed=57E4325AA9984681 -Dtests.locale=fr-BJ -Dtests.timezone=Africa/Gaborone -Druntime.java=25

  2> java.lang.AssertionError:
    Expected: a collection with size <2>
         but: collection size was <1>
        at __randomizedtesting.SeedInfo.seed([57E4325AA9984681:F908F361AFA632D1]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2722)
        at org.elasticsearch.index.mapper.TestBlock$Builder.build(TestBlock.java:495)
        at org.elasticsearch.index.mapper.TestBlock$Builder.build(TestBlock.java:456)
        at org.elasticsearch.index.mapper.BlockDocValuesReader$Ordinals.readSingleDoc(BlockDocValuesReader.java:874)
        at org.elasticsearch.index.mapper.BlockDocValuesReader$Ordinals.read(BlockDocValuesReader.java:826)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.load(BlockLoaderTestRunner.java:117)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.setupAndInvokeBlockLoader(BlockLoaderTestRunner.java:84)
        at org.elasticsearch.index.mapper.BlockLoaderTestRunner.runTest(BlockLoaderTestRunner.java:55)
        at org.elasticsearch.index.mapper.BlockLoaderTestCase.testBlockLoaderForFieldInObject(BlockLoaderTestCase.java:146)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.elasticsearch.entitlement.bootstrap.TestEntitlementsRule$1.evaluate(TestEntitlementsRule.java:77)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1474)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a knowledge gap I need filled :)


// then
// verify that we use the correct block value reader
assertThat(loader, instanceOf(BlockSourceReader.GeometriesBlockLoader.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand this, but stored field and stored preference is handled yet in the block loader method?

Copy link
Contributor Author

@Kubik42 Kubik42 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change how stored fields and stored preference is handled, only how doc_values + synthetic source are handled.

That being said, in the draft version of this PR, I did have some code to handle stored fields. However, I've since removed it since it required extra effort to get working. If you think its important to handle that case, then I'll add it back in.

@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch 2 times, most recently from c1efcb3 to a024bae Compare October 13, 2025 23:20
@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from a024bae to 212fe97 Compare October 13, 2025 23:20
@Kubik42 Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from 9f1057d to c24d31e Compare October 14, 2025 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants