Skip to content

Commit d03917d

Browse files
[8.x] Added stricter range type checks and runtime warnings for ENRICH (elastic#115091) (elastic#117130)
* Added stricter range type checks and runtime warnings for ENRICH (elastic#115091) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time). * Fix compile error likely due to mismatched ordering of backports
1 parent cf880b6 commit d03917d

File tree

19 files changed

+659
-43
lines changed

19 files changed

+659
-43
lines changed

docs/changelog/115091.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pr: 115091
2+
summary: Added stricter range type checks and runtime warnings for ENRICH
3+
area: ES|QL
4+
type: bug
5+
issues:
6+
- 107357
7+
- 116799

docs/reference/esql/esql-enrich-data.asciidoc

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,8 +138,33 @@ include::{es-ref-dir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=upda
138138

139139
include::../ingest/enrich.asciidoc[tag=update-enrich-policy]
140140

141-
==== Limitations
141+
==== Enrich Policy Types and Limitations
142+
The {esql} `ENRICH` command supports all three enrich policy types:
143+
144+
`geo_match`::
145+
Matches enrich data to incoming documents based on a <<query-dsl-geo-shape-query,`geo_shape` query>>.
146+
For an example, see <<geo-match-enrich-policy-type>>.
147+
148+
`match`::
149+
Matches enrich data to incoming documents based on a <<query-dsl-term-query,`term` query>>.
150+
For an example, see <<match-enrich-policy-type>>.
151+
152+
`range`::
153+
Matches a number, date, or IP address in incoming documents to a range in the
154+
enrich index based on a <<query-dsl-term-query,`term` query>>. For an example,
155+
see <<range-enrich-policy-type>>.
156+
142157
// tag::limitations[]
143-
The {esql} `ENRICH` command only supports enrich policies of type `match`.
144-
Furthermore, `ENRICH` only supports enriching on a column of type `keyword`.
158+
While all three enrich policy types are supported, there are some limitations to be aware of:
159+
160+
* The `geo_match` enrich policy type only supports the `intersects` spatial relation.
161+
* It is required that the `match_field` in the `ENRICH` command is of the correct type.
162+
For example, if the enrich policy is of type `geo_match`, the `match_field` in the `ENRICH`
163+
command must be of type `geo_point` or `geo_shape`.
164+
Likewise, a `range` enrich policy requires a `match_field` of type `integer`, `long`, `date`, or `ip`,
165+
depending on the type of the range field in the original enrich index.
166+
* However, this constraint is relaxed for `range` policies when the `match_field` is of type `KEYWORD`.
167+
In this case the field values will be parsed during query execution, row by row.
168+
If any value fails to parse, the output values for that row will be set to `null`,
169+
an appropriate warning will be produced and the query will continue to execute.
145170
// end::limitations[]

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ static TransportVersion def(int id) {
202202
public static final TransportVersion INDEX_STATS_ADDITIONAL_FIELDS = def(8_793_00_0);
203203
public static final TransportVersion INDEX_STATS_ADDITIONAL_FIELDS_REVERT = def(8_794_00_0);
204204
public static final TransportVersion FAST_REFRESH_RCO_2 = def(8_795_00_0);
205+
public static final TransportVersion ESQL_ENRICH_RUNTIME_WARNINGS = def(8_796_00_0);
205206

206207
/*
207208
* STOP! READ THIS FIRST! No, really,

x-pack/plugin/build.gradle

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,5 +208,6 @@ tasks.named("yamlRestTestV7CompatTransform").configure({ task ->
208208
task.skipTest("esql/80_text/reverse text", "The output type changed from TEXT to KEYWORD.")
209209
task.skipTest("esql/80_text/values function", "The output type changed from TEXT to KEYWORD.")
210210
task.skipTest("privileges/11_builtin/Test get builtin privileges" ,"unnecessary to test compatibility")
211+
task.skipTest("esql/61_enrich_ip/Invalid IP strings", "We switched from exceptions to null+warnings for ENRICH runtime errors")
211212
})
212213

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,13 @@ public DataType noText() {
591591
return isString(this) ? KEYWORD : this;
592592
}
593593

594+
public boolean isDate() {
595+
return switch (this) {
596+
case DATETIME, DATE_NANOS -> true;
597+
default -> false;
598+
};
599+
}
600+
594601
/**
595602
* Named parameters with default values. It's just easier to do this with
596603
* a builder in java....

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/lookup/EnrichQuerySourceOperator.java

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import org.elasticsearch.compute.data.IntVector;
2222
import org.elasticsearch.compute.data.Page;
2323
import org.elasticsearch.compute.operator.SourceOperator;
24+
import org.elasticsearch.compute.operator.Warnings;
2425
import org.elasticsearch.core.Releasables;
2526

2627
import java.io.IOException;
@@ -37,17 +38,25 @@ public final class EnrichQuerySourceOperator extends SourceOperator {
3738
private int queryPosition = -1;
3839
private final IndexReader indexReader;
3940
private final IndexSearcher searcher;
41+
private final Warnings warnings;
4042
private final int maxPageSize;
4143

4244
// using smaller pages enables quick cancellation and reduces sorting costs
4345
public static final int DEFAULT_MAX_PAGE_SIZE = 256;
4446

45-
public EnrichQuerySourceOperator(BlockFactory blockFactory, int maxPageSize, QueryList queryList, IndexReader indexReader) {
47+
public EnrichQuerySourceOperator(
48+
BlockFactory blockFactory,
49+
int maxPageSize,
50+
QueryList queryList,
51+
IndexReader indexReader,
52+
Warnings warnings
53+
) {
4654
this.blockFactory = blockFactory;
4755
this.maxPageSize = maxPageSize;
4856
this.queryList = queryList;
4957
this.indexReader = indexReader;
5058
this.searcher = new IndexSearcher(indexReader);
59+
this.warnings = warnings;
5160
}
5261

5362
@Override
@@ -72,12 +81,18 @@ public Page getOutput() {
7281
}
7382
int totalMatches = 0;
7483
do {
75-
Query query = nextQuery();
76-
if (query == null) {
77-
assert isFinished();
78-
break;
84+
Query query;
85+
try {
86+
query = nextQuery();
87+
if (query == null) {
88+
assert isFinished();
89+
break;
90+
}
91+
query = searcher.rewrite(new ConstantScoreQuery(query));
92+
} catch (Exception e) {
93+
warnings.registerException(e);
94+
continue;
7995
}
80-
query = searcher.rewrite(new ConstantScoreQuery(query));
8196
final var weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1.0f);
8297
if (weight == null) {
8398
continue;

x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/operator/lookup/EnrichQuerySourceOperatorTests.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
import org.elasticsearch.compute.data.IntBlock;
3333
import org.elasticsearch.compute.data.IntVector;
3434
import org.elasticsearch.compute.data.Page;
35+
import org.elasticsearch.compute.operator.DriverContext;
36+
import org.elasticsearch.compute.operator.Warnings;
3537
import org.elasticsearch.core.IOUtils;
3638
import org.elasticsearch.index.mapper.KeywordFieldMapper;
3739
import org.elasticsearch.index.mapper.MappedFieldType;
@@ -120,7 +122,8 @@ public void testQueries() throws Exception {
120122
// 3 -> [] -> []
121123
// 4 -> [a1] -> [3]
122124
// 5 -> [] -> []
123-
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, 128, queryList, reader);
125+
var warnings = Warnings.createWarnings(DriverContext.WarningsMode.IGNORE, 0, 0, "test enrich");
126+
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, 128, queryList, reader, warnings);
124127
Page p0 = queryOperator.getOutput();
125128
assertNotNull(p0);
126129
assertThat(p0.getPositionCount(), equalTo(6));
@@ -187,7 +190,8 @@ public void testRandomMatchQueries() throws Exception {
187190
MappedFieldType uidField = new KeywordFieldMapper.KeywordFieldType("uid");
188191
var queryList = QueryList.rawTermQueryList(uidField, mock(SearchExecutionContext.class), inputTerms);
189192
int maxPageSize = between(1, 256);
190-
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, maxPageSize, queryList, reader);
193+
var warnings = Warnings.createWarnings(DriverContext.WarningsMode.IGNORE, 0, 0, "test enrich");
194+
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, maxPageSize, queryList, reader, warnings);
191195
Map<Integer, Set<Integer>> actualPositions = new HashMap<>();
192196
while (queryOperator.isFinished() == false) {
193197
Page page = queryOperator.getOutput();

x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,8 @@ public void testLookupIndex() throws IOException {
183183
DataType.KEYWORD,
184184
"lookup",
185185
"data",
186-
List.of(new Alias(Source.EMPTY, "l", new ReferenceAttribute(Source.EMPTY, "l", DataType.LONG)))
186+
List.of(new Alias(Source.EMPTY, "l", new ReferenceAttribute(Source.EMPTY, "l", DataType.LONG))),
187+
Source.EMPTY
187188
);
188189
DriverContext driverContext = driverContext();
189190
try (

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,11 @@ public enum Cap {
277277
*/
278278
RANGEQUERY_FOR_DATETIME,
279279

280+
/**
281+
* Enforce strict type checking on ENRICH range types, and warnings for KEYWORD parsing at runtime. Done in #115091.
282+
*/
283+
ENRICH_STRICT_RANGE_TYPES,
284+
280285
/**
281286
* Fix for non-unique attribute names in ROW and logical plans.
282287
* https://github.com/elastic/elasticsearch/issues/110541

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
import org.elasticsearch.compute.operator.DriverContext;
4242
import org.elasticsearch.compute.operator.Operator;
4343
import org.elasticsearch.compute.operator.OutputOperator;
44+
import org.elasticsearch.compute.operator.Warnings;
4445
import org.elasticsearch.compute.operator.lookup.EnrichQuerySourceOperator;
4546
import org.elasticsearch.compute.operator.lookup.MergePositionsOperator;
4647
import org.elasticsearch.compute.operator.lookup.QueryList;
@@ -78,6 +79,7 @@
7879
import org.elasticsearch.xpack.esql.EsqlIllegalArgumentException;
7980
import org.elasticsearch.xpack.esql.core.expression.Alias;
8081
import org.elasticsearch.xpack.esql.core.expression.NamedExpression;
82+
import org.elasticsearch.xpack.esql.core.tree.Source;
8183
import org.elasticsearch.xpack.esql.core.type.DataType;
8284
import org.elasticsearch.xpack.esql.planner.EsPhysicalOperationProviders;
8385
import org.elasticsearch.xpack.esql.planner.PlannerUtils;
@@ -166,6 +168,10 @@ abstract class AbstractLookupService<R extends AbstractLookupService.Request, T
166168
);
167169
}
168170

171+
public ThreadContext getThreadContext() {
172+
return transportService.getThreadPool().getThreadContext();
173+
}
174+
169175
/**
170176
* Convert a request as sent to {@link #lookupAsync} into a transport request after
171177
* preflight checks have been performed.
@@ -330,11 +336,18 @@ private void doLookup(T request, CancellableTask task, ActionListener<Page> list
330336
releasables.add(mergePositionsOperator);
331337
SearchExecutionContext searchExecutionContext = searchContext.getSearchExecutionContext();
332338
QueryList queryList = queryList(request, searchExecutionContext, inputBlock, request.inputDataType);
339+
var warnings = Warnings.createWarnings(
340+
DriverContext.WarningsMode.COLLECT,
341+
request.source.source().getLineNumber(),
342+
request.source.source().getColumnNumber(),
343+
request.source.text()
344+
);
333345
var queryOperator = new EnrichQuerySourceOperator(
334346
driverContext.blockFactory(),
335347
EnrichQuerySourceOperator.DEFAULT_MAX_PAGE_SIZE,
336348
queryList,
337-
searchExecutionContext.getIndexReader()
349+
searchExecutionContext.getIndexReader(),
350+
warnings
338351
);
339352
releasables.add(queryOperator);
340353
var extractFieldsOperator = extractFieldsOperator(searchContext, driverContext, request.extractFields);
@@ -450,13 +463,22 @@ abstract static class Request {
450463
final DataType inputDataType;
451464
final Page inputPage;
452465
final List<NamedExpression> extractFields;
466+
final Source source;
453467

454-
Request(String sessionId, String index, DataType inputDataType, Page inputPage, List<NamedExpression> extractFields) {
468+
Request(
469+
String sessionId,
470+
String index,
471+
DataType inputDataType,
472+
Page inputPage,
473+
List<NamedExpression> extractFields,
474+
Source source
475+
) {
455476
this.sessionId = sessionId;
456477
this.index = index;
457478
this.inputDataType = inputDataType;
458479
this.inputPage = inputPage;
459480
this.extractFields = extractFields;
481+
this.source = source;
460482
}
461483
}
462484

@@ -470,6 +492,7 @@ abstract static class TransportRequest extends org.elasticsearch.transport.Trans
470492
final DataType inputDataType;
471493
final Page inputPage;
472494
final List<NamedExpression> extractFields;
495+
final Source source;
473496
// TODO: Remove this workaround once we have Block RefCount
474497
final Page toRelease;
475498
final RefCounted refs = AbstractRefCounted.of(this::releasePage);
@@ -480,14 +503,16 @@ abstract static class TransportRequest extends org.elasticsearch.transport.Trans
480503
DataType inputDataType,
481504
Page inputPage,
482505
Page toRelease,
483-
List<NamedExpression> extractFields
506+
List<NamedExpression> extractFields,
507+
Source source
484508
) {
485509
this.sessionId = sessionId;
486510
this.shardId = shardId;
487511
this.inputDataType = inputDataType;
488512
this.inputPage = inputPage;
489513
this.toRelease = toRelease;
490514
this.extractFields = extractFields;
515+
this.source = source;
491516
}
492517

493518
@Override

0 commit comments

Comments
 (0)