Skip to content

Commit b8a24b1

Browse files
Spatial search functions support multi-valued fields in compute engine (#112063)
* Using enum to control mv-predicate combinations with ANY or ALL * Update docs/changelog/112063.yaml * Fix changelog * Refactored to use generic MvCombiner for more flexibility This opens the door to combiners which work with more types than just boolean. * Spotless, and disabled failing cartesian-point tests Reported the failing cases at #112102 * Fix changelog with better summary * Fix changelog with better summary and highlight text * More spotless checks * Remove low-value comment edit * Refined MvCombiner to maintain state to deal with ST_CONTAINS We have a special case in ST_CONTAINS in that lucenes triangle-tree implementation causes a situation where we need to reject contains results when there are other geometries that do not contain, but do intersect. This does not make sense from a pure geospatial perspective, but is a necessary consequence of the triangle-tree. * Code review fixes, and fix for long doc-values MV * Cleanup and fix fold() serialization The fold was returning the intermediate ContainsResult, which cannot be serialized, instead of the correct final boolean result. * Fix to multi-contains-multi case using BitArray Since a multi-value field should be seen as an alternative to a geometry collection, it is insufficient to consider `ANY` for multi-value contains. There are two approaches to this: * Pre-build a geometry collection before converting to docValuesReader * Maintain more state so we can assert that all components are contained within at least one of the field values In an effort to minimize the changes to the generated code, the second approach was taken, and in fact was achievable without any changes at all to generated code. However, this approach uses BigArrays, and does not get the correct one passed in. We need to change generated code a small bit to pass that in. We'll do that in a followup commit, but only if the alternative approach of creating a combined multi-value docValueReader is deemed more complex. * Fix to multi-contains-multi case using GeometryCollection This is an alternative approach to the previous one which used a BitArray to maintain state. Now we rely entirely on the internals of the DocValuesReader, and instead pre-create the GEOMETRYCOLLECTION of all the values in the multi-value field, so the triangle tree already considers the necessary combinations. This approach moves the responsibility of iterating over the multi-value from the generated code into the non-generated code. In total the number of lines of code goes down, as fewer code paths are possible. * Add addition fixed issue to changelog * Added csv-spec tests for testing multi-valued geometries * More tests for multi-value literals and one fix in fold() * Fix bug with doc values extraction for non-indexed fields for centroid Initially this work was about adding more tests, but discovered the bug at #112505. This commit fixes hat issue and expands the tests in a few areas: * PhysicalPlanOptimizerTests expanded to verify that physical planning now considers if the field has doc-values * SpatialPushDownPointsTestCase simple point-in-polygon tests expanded to consider ST_CENTROID as well, so that this behaviour is tested better there * Note that this PR also fixes the doc-values field extract bug This could have been fixed in a separate PR, but fixing it here was needed because the tests we wrote were failing without it. * Multi-point test cases * Added capability to prevent test failing on older clusters Also removed a test that was sensitive to multi-node cluster results ordering * Support BlockBuilder multivalue combining for ST_WITHIN This is similar too, but simpler than the ST_CONTAINS solution. In addition we added support for two fields to handle multi-values by using ST_CONTAINS surrogate with parameters swapped. * Require capability for BWC tests * Added multivalue fields tests for points * Support multivalues for CONTAINS/WITHIN between two fields This included taking into account that CONTAINS and WITHIN are not symmetrical in the case that the indexed geometry contains multiple intersecting polygons. We need to document this behaviour. * Small optimization to not create collections over single geometries * Simplification of iterating over multi-value BytesRef * Update docs/changelog/112063.yaml * Update docs/changelog/112063.yaml * Added back removed bug-fix link * Merge conflict * Support point doc-values for ST_WITHIN * Simplify ST_CONTAINS to not consider intersecting polygons This turns out to already be handled by combined doc-values * Last CONTAINS evaluators moved to BlockBuilder approach * Revert usage of MyCombiner in spatial predicates Since ST_CONTAINS and ST_WITHIN could not use the ANY/ALL logic and needed to first collect all values into a single geometry before applying the predicate, we decided to move ST_INTERSECTS and ST_DISJOINT to this same approach so all spatial predicates have the same level of complexity and are easier to maintain. * Revert ability to perform ANY/ALL predicate evaluations This was only being used by the spatial predicates, and since they have reverted to doing this logic internally, we remove this capability from the code-base. If we wish to implement ANY/ALL logic in any other predicates, this could be brought back by reverting this commit. * Simplify code paths for evaluators Now that all evaluators use the Block.Builder approach we can move all the common code down to the SpatialRelations class. This means that all static evaluator methods now contain only a single line of code, and all of them are identical between all four spatial functions, making comparison and maintenance much easier. * Cleanup code for easier review * Fixed bug with empty multivalue params and doc-values This was failing a test in ENRICH * After renaming the evaluator parameters we need to update the unit tests
1 parent ac27e73 commit b8a24b1

File tree

60 files changed

+2799
-2163
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2799
-2163
lines changed

docs/changelog/112063.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
pr: 112063
2+
summary: Spatial search functions support multi-valued fields in compute engine
3+
area: ES|QL
4+
type: bug
5+
issues:
6+
- 112102
7+
- 112505
8+
- 110830
9+
highlight:
10+
title: "ESQL: Multi-value fields supported in Geospatial predicates"
11+
body: |-
12+
Supporting multi-value fields in `WHERE` predicates is a challenge due to not knowing whether `ALL` or `ANY`
13+
of the values in the field should pass the predicate.
14+
For example, should the field `age:[10,30]` pass the predicate `WHERE age>20` or not?
15+
This ambiguity does not exist with the spatial predicates
16+
`ST_INTERSECTS` and `ST_DISJOINT`, because the choice between `ANY` or `ALL`
17+
is implied by the predicate itself.
18+
Consider a predicate checking a field named `location` against a test geometry named `shape`:
19+
20+
* `ST_INTERSECTS(field, shape)` - true if `ANY` value can intersect the shape
21+
* `ST_DISJOINT(field, shape)` - true only if `ALL` values are disjoint from the shape
22+
23+
This works even if the shape argument is itself a complex or compound geometry.
24+
25+
Similar logic exists for `ST_CONTAINS` and `ST_WITHIN` predicates, but these are not as easily solved
26+
with `ANY` or `ALL`, because a collection of geometries contains another collection if each of the contained
27+
geometries is within at least one of the containing geometries. Evaluating this requires that the multi-value
28+
field is first combined into a single geometry before performing the predicate check.
29+
30+
* `ST_CONTAINS(field, shape)` - true if the combined geometry contains the shape
31+
* `ST_WITHIN(field, shape)` - true if the combined geometry is within the shape
32+
notable: false

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestUtils.java

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,19 @@ public static Tuple<Page, List<String>> loadPageFromCsv(URL source, Map<String,
122122

123123
record CsvColumn(String name, Type type, BuilderWrapper builderWrapper) implements Releasable {
124124
void append(String stringValue) {
125-
if (stringValue.startsWith("\"") && stringValue.endsWith("\"")) { // string value
126-
stringValue = stringValue.substring(1, stringValue.length() - 1).replace(ESCAPED_COMMA_SEQUENCE, ",");
125+
if (stringValue.startsWith("\"") && stringValue.endsWith("\"")) {
126+
// string value
127+
String[] mvStrings = stringValue.substring(1, stringValue.length() - 1).split("\",\\s*\"");
128+
if (mvStrings.length > 1) {
129+
builderWrapper().builder().beginPositionEntry();
130+
for (String mvString : mvStrings) {
131+
mvString = mvString.replace(ESCAPED_COMMA_SEQUENCE, ",");
132+
builderWrapper().append().accept(mvString.length() == 0 ? null : type.convert(mvString));
133+
}
134+
builderWrapper().builder().endPositionEntry();
135+
return;
136+
}
137+
stringValue = mvStrings[0].replace(ESCAPED_COMMA_SEQUENCE, ",");
127138
} else if (stringValue.contains(",")) {// multi-value field
128139
builderWrapper().builder().beginPositionEntry();
129140

@@ -376,7 +387,20 @@ public static ExpectedResults loadCsvSpecValues(String csv) {
376387
}
377388
List<Object> listOfMvValues = new ArrayList<>();
378389
for (String mvValue : multiValues) {
379-
listOfMvValues.add(columnTypes.get(i).convert(mvValue.trim().replace(ESCAPED_COMMA_SEQUENCE, ",")));
390+
try {
391+
listOfMvValues.add(columnTypes.get(i).convert(mvValue.trim().replace(ESCAPED_COMMA_SEQUENCE, ",")));
392+
} catch (IllegalArgumentException e) {
393+
throw new IllegalArgumentException(
394+
"Error parsing multi-value field ["
395+
+ columnNames.get(i)
396+
+ "] with value ["
397+
+ mvValue
398+
+ "] on row "
399+
+ values.size(),
400+
e
401+
);
402+
403+
}
380404
}
381405
rowValues.add(listOfMvValues);
382406
} else {

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ public class CsvTestsDataLoader {
7575
private static final TestsDataset COUNTRIES_BBOX_WEB = new TestsDataset("countries_bbox_web");
7676
private static final TestsDataset AIRPORT_CITY_BOUNDARIES = new TestsDataset("airport_city_boundaries");
7777
private static final TestsDataset CARTESIAN_MULTIPOLYGONS = new TestsDataset("cartesian_multipolygons");
78+
private static final TestsDataset MULTIVALUE_GEOMETRIES = new TestsDataset("multivalue_geometries");
79+
private static final TestsDataset MULTIVALUE_POINTS = new TestsDataset("multivalue_points");
7880
private static final TestsDataset DISTANCES = new TestsDataset("distances");
7981
private static final TestsDataset K8S = new TestsDataset("k8s", "k8s-mappings.json", "k8s.csv").withSetting("k8s-settings.json");
8082
private static final TestsDataset ADDRESSES = new TestsDataset("addresses");
@@ -104,6 +106,8 @@ public class CsvTestsDataLoader {
104106
Map.entry(COUNTRIES_BBOX_WEB.indexName, COUNTRIES_BBOX_WEB),
105107
Map.entry(AIRPORT_CITY_BOUNDARIES.indexName, AIRPORT_CITY_BOUNDARIES),
106108
Map.entry(CARTESIAN_MULTIPOLYGONS.indexName, CARTESIAN_MULTIPOLYGONS),
109+
Map.entry(MULTIVALUE_GEOMETRIES.indexName, MULTIVALUE_GEOMETRIES),
110+
Map.entry(MULTIVALUE_POINTS.indexName, MULTIVALUE_POINTS),
107111
Map.entry(DATE_NANOS.indexName, DATE_NANOS),
108112
Map.entry(K8S.indexName, K8S),
109113
Map.entry(DISTANCES.indexName, DISTANCES),
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"properties": {
3+
"abbrev": {
4+
"type": "keyword"
5+
},
6+
"name": {
7+
"type": "text"
8+
},
9+
"scalerank": {
10+
"type": "integer"
11+
},
12+
"type": {
13+
"type": "keyword"
14+
},
15+
"location": {
16+
"type": "geo_point",
17+
"index": false,
18+
"doc_values": false
19+
},
20+
"country": {
21+
"type": "keyword"
22+
},
23+
"city": {
24+
"type": "keyword"
25+
},
26+
"city_location": {
27+
"type": "geo_point"
28+
}
29+
}
30+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"properties": {
3+
"id": {
4+
"type": "long"
5+
},
6+
"intersects": {
7+
"type": "boolean"
8+
},
9+
"contains": {
10+
"type": "boolean"
11+
},
12+
"shape": {
13+
"type": "geo_shape"
14+
},
15+
"smaller": {
16+
"type": "geo_shape"
17+
}
18+
}
19+
}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"properties": {
3+
"id": {
4+
"type": "long"
5+
},
6+
"intersects": {
7+
"type": "boolean"
8+
},
9+
"within": {
10+
"type": "boolean"
11+
},
12+
"centroid": {
13+
"type": "geo_point"
14+
},
15+
"location": {
16+
"type": "geo_point"
17+
},
18+
"subset": {
19+
"type": "geo_point"
20+
}
21+
}
22+
}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
id:l, intersects:boolean, contains:boolean, shape:geo_shape, smaller:geo_shape
2+
0, true, true, ["GEOMETRYCOLLECTION(POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))\, POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0)))"], ["GEOMETRYCOLLECTION(POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))\, POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1)))"]
3+
1, true, true, ["MULTIPOLYGON( ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))\, ((0 0\, 10 0\, 10 10\, 0 10\, 0 0)))"], ["MULTIPOLYGON( ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))\, ((1 1\, 9 1\, 9 9\, 1 9\, 1 1)))"]
4+
2, true, true, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))"]
5+
3, true, true, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
6+
4, true, true, ["POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))", "POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0))"], ["POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))", "POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1))"]
7+
5, true, false, ["POLYGON ((-5 -5\, 5 -5\, 5 5\, -5 5\, -5 -5))"], ["POLYGON ((-4 -4\, 4 -4\, 4 4\, -4 4\, -4 -4))"]
8+
6, true, false, ["POLYGON ((-5 -5\, 5 -5\, 5 5\, -5 5\, -5 -5))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-4 -4\, 4 -4\, 4 4\, -4 4\, -4 -4))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
9+
7, true, false, ["POLYGON ((-9 -9\, -1 -9\, -1 -1\, -9 -1\, -9 -9))", "POLYGON ((1 1\, 9 1\, 9 9\, 1 9\, 1 1))"], ["POLYGON ((-8 -8\, -2 -8\, -2 -2\, -8 -2\, -8 -8))", "POLYGON ((2 2\, 8 2\, 8 8\, 2 8\, 2 2))"]
10+
8, false, false, ["POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
11+
9, false, false, ["POLYGON ((-25 -25\, -15 -25\, -15 -15\, -25 -15\, -25 -25))", "POLYGON ((15 15\, 25 15\, 25 25\, 15 25\, 15 15))"], ["POLYGON ((-24 -24\, -16 -24\, -16 -16\, -24 -16\, -24 -24))", "POLYGON ((16 16\, 24 16\, 24 24\, 16 24\, 16 16))"]
12+
10, true, false, ["POLYGON ((-15 -15\, 15 -15\, 15 15\, -15 15\, -15 -15))", "POLYGON ((5 5\, 15 5\, 15 15\, 5 15\, 5 5))"], ["POLYGON ((-14 -14\, 14 -14\, 14 14\, -14 14\, -14 -14))", "POLYGON ((6 6\, 14 6\, 14 14\, 6 14\, 6 6))"]
13+
11, true, false, ["POLYGON ((-11 -11\, 1 -11\, 1 1\, -11 1\, -11 -11))", "POLYGON ((-1 -1\, 11 -1\, 11 11\, -1 11\, -1 -1))"], ["POLYGON ((-10 -10\, 0 -10\, 0 0\, -10 0\, -10 -10))", "POLYGON ((0 0\, 10 0\, 10 10\, 0 10\, 0 0))"]

0 commit comments

Comments
 (0)