Skip to content

Commit b8ee9ea

Browse files
craigtaverneralexey-ivanov-es
authored andcommitted
Added stricter range type checks and runtime warnings for ENRICH (elastic#115091)
It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: elastic#107357 This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
1 parent b79c8da commit b8ee9ea

File tree

19 files changed

+657
-42
lines changed

19 files changed

+657
-42
lines changed

docs/changelog/115091.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pr: 115091
2+
summary: Added stricter range type checks and runtime warnings for ENRICH
3+
area: ES|QL
4+
type: bug
5+
issues:
6+
- 107357
7+
- 116799

docs/reference/esql/esql-enrich-data.asciidoc

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,8 +138,33 @@ include::{es-ref-dir}/ingest/apis/enrich/execute-enrich-policy.asciidoc[tag=upda
138138

139139
include::../ingest/enrich.asciidoc[tag=update-enrich-policy]
140140

141-
==== Limitations
141+
==== Enrich Policy Types and Limitations
142+
The {esql} `ENRICH` command supports all three enrich policy types:
143+
144+
`geo_match`::
145+
Matches enrich data to incoming documents based on a <<query-dsl-geo-shape-query,`geo_shape` query>>.
146+
For an example, see <<geo-match-enrich-policy-type>>.
147+
148+
`match`::
149+
Matches enrich data to incoming documents based on a <<query-dsl-term-query,`term` query>>.
150+
For an example, see <<match-enrich-policy-type>>.
151+
152+
`range`::
153+
Matches a number, date, or IP address in incoming documents to a range in the
154+
enrich index based on a <<query-dsl-term-query,`term` query>>. For an example,
155+
see <<range-enrich-policy-type>>.
156+
142157
// tag::limitations[]
143-
The {esql} `ENRICH` command only supports enrich policies of type `match`.
144-
Furthermore, `ENRICH` only supports enriching on a column of type `keyword`.
158+
While all three enrich policy types are supported, there are some limitations to be aware of:
159+
160+
* The `geo_match` enrich policy type only supports the `intersects` spatial relation.
161+
* It is required that the `match_field` in the `ENRICH` command is of the correct type.
162+
For example, if the enrich policy is of type `geo_match`, the `match_field` in the `ENRICH`
163+
command must be of type `geo_point` or `geo_shape`.
164+
Likewise, a `range` enrich policy requires a `match_field` of type `integer`, `long`, `date`, or `ip`,
165+
depending on the type of the range field in the original enrich index.
166+
* However, this constraint is relaxed for `range` policies when the `match_field` is of type `KEYWORD`.
167+
In this case the field values will be parsed during query execution, row by row.
168+
If any value fails to parse, the output values for that row will be set to `null`,
169+
an appropriate warning will be produced and the query will continue to execute.
145170
// end::limitations[]

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,7 @@ static TransportVersion def(int id) {
202202
public static final TransportVersion INDEX_STATS_ADDITIONAL_FIELDS = def(8_793_00_0);
203203
public static final TransportVersion INDEX_STATS_ADDITIONAL_FIELDS_REVERT = def(8_794_00_0);
204204
public static final TransportVersion FAST_REFRESH_RCO_2 = def(8_795_00_0);
205+
public static final TransportVersion ESQL_ENRICH_RUNTIME_WARNINGS = def(8_796_00_0);
205206

206207
/*
207208
* STOP! READ THIS FIRST! No, really,

x-pack/plugin/build.gradle

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,5 +89,6 @@ tasks.named("yamlRestCompatTestTransform").configure({ task ->
8989
task.skipTest("esql/80_text/reverse text", "The output type changed from TEXT to KEYWORD.")
9090
task.skipTest("esql/80_text/values function", "The output type changed from TEXT to KEYWORD.")
9191
task.skipTest("privileges/11_builtin/Test get builtin privileges" ,"unnecessary to test compatibility")
92+
task.skipTest("esql/61_enrich_ip/Invalid IP strings", "We switched from exceptions to null+warnings for ENRICH runtime errors")
9293
})
9394

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/DataType.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,13 @@ public DataType noText() {
591591
return isString(this) ? KEYWORD : this;
592592
}
593593

594+
public boolean isDate() {
595+
return switch (this) {
596+
case DATETIME, DATE_NANOS -> true;
597+
default -> false;
598+
};
599+
}
600+
594601
/**
595602
* Named parameters with default values. It's just easier to do this with
596603
* a builder in java....

x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/lookup/EnrichQuerySourceOperator.java

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.elasticsearch.compute.data.IntVector;
2323
import org.elasticsearch.compute.data.Page;
2424
import org.elasticsearch.compute.operator.SourceOperator;
25+
import org.elasticsearch.compute.operator.Warnings;
2526
import org.elasticsearch.core.Releasables;
2627

2728
import java.io.IOException;
@@ -38,17 +39,25 @@ public final class EnrichQuerySourceOperator extends SourceOperator {
3839
private int queryPosition = -1;
3940
private final IndexReader indexReader;
4041
private final IndexSearcher searcher;
42+
private final Warnings warnings;
4143
private final int maxPageSize;
4244

4345
// using smaller pages enables quick cancellation and reduces sorting costs
4446
public static final int DEFAULT_MAX_PAGE_SIZE = 256;
4547

46-
public EnrichQuerySourceOperator(BlockFactory blockFactory, int maxPageSize, QueryList queryList, IndexReader indexReader) {
48+
public EnrichQuerySourceOperator(
49+
BlockFactory blockFactory,
50+
int maxPageSize,
51+
QueryList queryList,
52+
IndexReader indexReader,
53+
Warnings warnings
54+
) {
4755
this.blockFactory = blockFactory;
4856
this.maxPageSize = maxPageSize;
4957
this.queryList = queryList;
5058
this.indexReader = indexReader;
5159
this.searcher = new IndexSearcher(indexReader);
60+
this.warnings = warnings;
5261
}
5362

5463
@Override
@@ -73,12 +82,18 @@ public Page getOutput() {
7382
}
7483
int totalMatches = 0;
7584
do {
76-
Query query = nextQuery();
77-
if (query == null) {
78-
assert isFinished();
79-
break;
85+
Query query;
86+
try {
87+
query = nextQuery();
88+
if (query == null) {
89+
assert isFinished();
90+
break;
91+
}
92+
query = searcher.rewrite(new ConstantScoreQuery(query));
93+
} catch (Exception e) {
94+
warnings.registerException(e);
95+
continue;
8096
}
81-
query = searcher.rewrite(new ConstantScoreQuery(query));
8297
final var weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1.0f);
8398
if (weight == null) {
8499
continue;

x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/operator/lookup/EnrichQuerySourceOperatorTests.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
import org.elasticsearch.compute.data.IntBlock;
3333
import org.elasticsearch.compute.data.IntVector;
3434
import org.elasticsearch.compute.data.Page;
35+
import org.elasticsearch.compute.operator.DriverContext;
36+
import org.elasticsearch.compute.operator.Warnings;
3537
import org.elasticsearch.core.IOUtils;
3638
import org.elasticsearch.index.mapper.KeywordFieldMapper;
3739
import org.elasticsearch.index.mapper.MappedFieldType;
@@ -120,7 +122,8 @@ public void testQueries() throws Exception {
120122
// 3 -> [] -> []
121123
// 4 -> [a1] -> [3]
122124
// 5 -> [] -> []
123-
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, 128, queryList, reader);
125+
var warnings = Warnings.createWarnings(DriverContext.WarningsMode.IGNORE, 0, 0, "test enrich");
126+
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, 128, queryList, reader, warnings);
124127
Page p0 = queryOperator.getOutput();
125128
assertNotNull(p0);
126129
assertThat(p0.getPositionCount(), equalTo(6));
@@ -187,7 +190,8 @@ public void testRandomMatchQueries() throws Exception {
187190
MappedFieldType uidField = new KeywordFieldMapper.KeywordFieldType("uid");
188191
var queryList = QueryList.rawTermQueryList(uidField, mock(SearchExecutionContext.class), inputTerms);
189192
int maxPageSize = between(1, 256);
190-
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, maxPageSize, queryList, reader);
193+
var warnings = Warnings.createWarnings(DriverContext.WarningsMode.IGNORE, 0, 0, "test enrich");
194+
EnrichQuerySourceOperator queryOperator = new EnrichQuerySourceOperator(blockFactory, maxPageSize, queryList, reader, warnings);
191195
Map<Integer, Set<Integer>> actualPositions = new HashMap<>();
192196
while (queryOperator.isFinished() == false) {
193197
Page page = queryOperator.getOutput();

x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/LookupFromIndexIT.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,8 @@ public void testLookupIndex() throws IOException {
183183
DataType.KEYWORD,
184184
"lookup",
185185
"data",
186-
List.of(new Alias(Source.EMPTY, "l", new ReferenceAttribute(Source.EMPTY, "l", DataType.LONG)))
186+
List.of(new Alias(Source.EMPTY, "l", new ReferenceAttribute(Source.EMPTY, "l", DataType.LONG))),
187+
Source.EMPTY
187188
);
188189
DriverContext driverContext = driverContext();
189190
try (

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,11 @@ public enum Cap {
278278
*/
279279
RANGEQUERY_FOR_DATETIME,
280280

281+
/**
282+
* Enforce strict type checking on ENRICH range types, and warnings for KEYWORD parsing at runtime. Done in #115091.
283+
*/
284+
ENRICH_STRICT_RANGE_TYPES,
285+
281286
/**
282287
* Fix for non-unique attribute names in ROW and logical plans.
283288
* https://github.com/elastic/elasticsearch/issues/110541

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
import org.elasticsearch.compute.operator.DriverContext;
4242
import org.elasticsearch.compute.operator.Operator;
4343
import org.elasticsearch.compute.operator.OutputOperator;
44+
import org.elasticsearch.compute.operator.Warnings;
4445
import org.elasticsearch.compute.operator.lookup.EnrichQuerySourceOperator;
4546
import org.elasticsearch.compute.operator.lookup.MergePositionsOperator;
4647
import org.elasticsearch.compute.operator.lookup.QueryList;
@@ -78,6 +79,7 @@
7879
import org.elasticsearch.xpack.esql.EsqlIllegalArgumentException;
7980
import org.elasticsearch.xpack.esql.core.expression.Alias;
8081
import org.elasticsearch.xpack.esql.core.expression.NamedExpression;
82+
import org.elasticsearch.xpack.esql.core.tree.Source;
8183
import org.elasticsearch.xpack.esql.core.type.DataType;
8284
import org.elasticsearch.xpack.esql.planner.EsPhysicalOperationProviders;
8385
import org.elasticsearch.xpack.esql.planner.PlannerUtils;
@@ -166,6 +168,10 @@ abstract class AbstractLookupService<R extends AbstractLookupService.Request, T
166168
);
167169
}
168170

171+
public ThreadContext getThreadContext() {
172+
return transportService.getThreadPool().getThreadContext();
173+
}
174+
169175
/**
170176
* Convert a request as sent to {@link #lookupAsync} into a transport request after
171177
* preflight checks have been performed.
@@ -327,11 +333,18 @@ private void doLookup(T request, CancellableTask task, ActionListener<Page> list
327333
releasables.add(mergePositionsOperator);
328334
SearchExecutionContext searchExecutionContext = searchContext.getSearchExecutionContext();
329335
QueryList queryList = queryList(request, searchExecutionContext, inputBlock, request.inputDataType);
336+
var warnings = Warnings.createWarnings(
337+
DriverContext.WarningsMode.COLLECT,
338+
request.source.source().getLineNumber(),
339+
request.source.source().getColumnNumber(),
340+
request.source.text()
341+
);
330342
var queryOperator = new EnrichQuerySourceOperator(
331343
driverContext.blockFactory(),
332344
EnrichQuerySourceOperator.DEFAULT_MAX_PAGE_SIZE,
333345
queryList,
334-
searchExecutionContext.getIndexReader()
346+
searchExecutionContext.getIndexReader(),
347+
warnings
335348
);
336349
releasables.add(queryOperator);
337350
var extractFieldsOperator = extractFieldsOperator(searchContext, driverContext, request.extractFields);
@@ -447,13 +460,22 @@ abstract static class Request {
447460
final DataType inputDataType;
448461
final Page inputPage;
449462
final List<NamedExpression> extractFields;
463+
final Source source;
450464

451-
Request(String sessionId, String index, DataType inputDataType, Page inputPage, List<NamedExpression> extractFields) {
465+
Request(
466+
String sessionId,
467+
String index,
468+
DataType inputDataType,
469+
Page inputPage,
470+
List<NamedExpression> extractFields,
471+
Source source
472+
) {
452473
this.sessionId = sessionId;
453474
this.index = index;
454475
this.inputDataType = inputDataType;
455476
this.inputPage = inputPage;
456477
this.extractFields = extractFields;
478+
this.source = source;
457479
}
458480
}
459481

@@ -467,6 +489,7 @@ abstract static class TransportRequest extends org.elasticsearch.transport.Trans
467489
final DataType inputDataType;
468490
final Page inputPage;
469491
final List<NamedExpression> extractFields;
492+
final Source source;
470493
// TODO: Remove this workaround once we have Block RefCount
471494
final Page toRelease;
472495
final RefCounted refs = AbstractRefCounted.of(this::releasePage);
@@ -477,14 +500,16 @@ abstract static class TransportRequest extends org.elasticsearch.transport.Trans
477500
DataType inputDataType,
478501
Page inputPage,
479502
Page toRelease,
480-
List<NamedExpression> extractFields
503+
List<NamedExpression> extractFields,
504+
Source source
481505
) {
482506
this.sessionId = sessionId;
483507
this.shardId = shardId;
484508
this.inputDataType = inputDataType;
485509
this.inputPage = inputPage;
486510
this.toRelease = toRelease;
487511
this.extractFields = extractFields;
512+
this.source = source;
488513
}
489514

490515
@Override

0 commit comments

Comments
 (0)