Disallow unmapped_fields=load with partial non-KEYWORD#144109
Disallow unmapped_fields=load with partial non-KEYWORD#144109shmuelhanoch wants to merge 27 commits intoelastic:mainfrom
Conversation
8b4069d to
fee6a30
Compare
Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests. Closes elastic#143218
5f76a74 to
90769e3
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
alex-spies
left a comment
There was a problem hiding this comment.
Thanks @shmuelhanoch ! I think the solution is generally correct, but I have some remarks around where to put it and how to test it.
In particular, I think this requires some yaml tests to confirm the behavior is correct in end-to-end tests as well, because our unit tests are a little "academic" and require us to mimic how we create mappings very well.
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I think we should add a bunch more test cases. In particular, tests where a partially mapped field is referenced not (just) in KEEP, but also, say, WHERE.
There was a problem hiding this comment.
We have WHERE covered now. Great! Let's do more different commands and expressions.
Like, if someone accidentally slipped in a regression that bypasses this verification for, say, CHANGE_POINT, we would want a test to fail. That's true for all commands.
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
| * Resolve a dotted field path (e.g. "a.b.c") in the given root mapping. | ||
| * Returns the leaf EsField or null if any segment is missing. | ||
| */ | ||
| private static EsField resolveFieldByPath(Map<String, EsField> root, String path) { |
There was a problem hiding this comment.
Given that we add a bunch of code to deal with subfields, we should also add a bunch of tests with subfields.
| resolutionsInPlan.add(r); | ||
| } | ||
| } else { | ||
| IndexResolution r = context.indexResolution().get(new IndexPattern(relation.source(), relation.indexPattern())); |
There was a problem hiding this comment.
- When do we encounter a case with more than 1 index resolution?
- How do we test this?
…fields-load # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
|
From discussing with @GalLalouche it occurred to me that we should particularly test date/nanos union types: Here, The peculiarity is that, without The expected behavior should still be that we fail the query because |
Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests. Closes elastic#143218
- Move partially-unmapped non-KEYWORD load validation from the analyzer Resolution batch into Verifier (after existing load-mode checks). - Fail only when those fields appear in FieldAttributes outside EsRelation (actually used by the query). - Resolve indices from non-LOOKUP EsRelations only; pass AnalyzerContext for indexResolution() lookups. - Remove DisallowLoadWithPartiallyMappedNonKeyword; add LoadPartialMappingChecks and a Verifier.verify overload that accepts AnalyzerContext. - Extend AnalyzerUnmappedTests: FROM-only, WHERE, SORT, comma-separated indices, dotted paths, nullify. - Add esql YAML REST tests for load vs partial non-KEYWORD (used vs unused) behind optional_fields_v2.
921e12b to
a83ca15
Compare
|
|
||
| public LogicalPlan verify(LogicalPlan plan, BitSet partialMetrics) { | ||
| Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution()); | ||
| Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution(), context()); |
There was a problem hiding this comment.
Here you don't need to pass context().unmappedResolution() and context().
context() is sufficient, you can extract the unmappedResolution() in the method itself.
There was a problem hiding this comment.
nit: Do we need to pass on the whole context? It's not wrong, but it makes it may make it look like we need all kinds of things in the verifier, but we only really need the unmapped resolution and indexResolution, no?
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
| checkLoadModeDisallowedCommands(plan, failures); | ||
| checkLoadModeDisallowedFunctions(plan, failures); | ||
| checkFlattenedSubFieldLoad(plan, failures); | ||
| if (failures.hasFailures() == false) { |
There was a problem hiding this comment.
Can you explain why is this check?
In ESQL, in general, we try to present to the user the full list of verification failures in one batch. For example, AnalyzerTests.testLookupJoinUnknownField and AnalyzerUnmappedTests.testTbucketWithUnmappedTimestampWithFork.
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/LoadPartialMappingChecks.java
Outdated
Show resolved
Hide resolved
|
|
||
| - match: { status: 400 } | ||
| - match: { error.type: verification_exception } | ||
| - match: { error.reason: "/partial_x/" } |
There was a problem hiding this comment.
Let's assert the full error message, please.
There was a problem hiding this comment.
I am not sure what the intention with this test is. It's throwing
Cannot use field [partial_x] due to ambiguities being mapped as [2] incompatible types: [keyword] enforced by INSIST command, and [long] in index mappings
I was expecting the error message from LoadPartialMappingChecks.checkPartialNonKeywordWithLoad... I guess.
There was a problem hiding this comment.
There is code in ResolveRefs which turns partial_x into a type with a mapping conflict (just like in union types) and attaches this error message. It's code that I remove in #144228 (but #143693 may need it after all, will find out when resolving merge conflicts).
However, I'd also expect to see the new error message. I think we may not see it because the new validation is in a branch with this condition:
if (failures.hasFailures() == false) {
| public void testDisallowLoadWithPartiallyMappedNonKeyword() { | ||
| assumeTrue("Requires OPTIONAL_FIELDS_V2", EsqlCapabilities.Cap.OPTIONAL_FIELDS_V2.isEnabled()); | ||
|
|
||
| var esIndex = new EsIndex( |
There was a problem hiding this comment.
I think tests that live in VerifierTests are better suited for these tests. The advantage with VerifierTests is that it's not bypassing some steps... here you are manually building EsIndex instances and it's kind of fragile imho.
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
Actually, I may be wrong, but the code in LoadPartialMappingChecks can be simpler than what is currently proposed. Notice, also, that I shortened the error message. Imho, it's too long and the second part of it can be omitted. Up to you and @alex-spies if that's the case or not.
static void checkPartialNonKeywordWithLoad(LogicalPlan plan, @Nullable AnalyzerContext context, Failures failures) {
if (context == null || context.unmappedResolution() != UnmappedResolution.LOAD) {
return;
}
Set<String> nonKeywordFieldNames = new HashSet<>();
List<IndexResolution> resolutions = new ArrayList<>();
plan.forEachDown(LogicalPlan.class, p -> {
if (p instanceof EsRelation rel) {
if (rel.indexMode() != IndexMode.LOOKUP) {
IndexResolution r = context.indexResolution().get(new IndexPattern(rel.source(), rel.indexPattern()));
if (r != null) {
resolutions.add(r);
}
}
} else {
p.forEachExpression(FieldAttribute.class, fa -> {
if (fa.dataType() != KEYWORD) {
nonKeywordFieldNames.add(fa.name());
}
});
}
});
for (String fieldName : new TreeSet<>(nonKeywordFieldNames)) {
for (IndexResolution resolution : resolutions) {
if (resolution.isValid() && resolution.get().isPartiallyUnmappedField(fieldName)) {
failures.add(
fail(
plan,
"Loading partially mapped non-KEYWORD fields is not yet supported. "
+ "unmapped_fields=\"load\" is not supported with partially mapped field [{}].",
fieldName
)
);
break;
}
}
}
}
- Refactor repeated partial-index test setup in AnalyzerUnmappedTests using shared helpers for index and field construction. - Assert the full load partial-non-keyword verification message in relevant AnalyzerUnmappedTests scenarios. - Update YAML REST coverage to exercise the intended load verification path and match the full error.reason payload.
alex-spies
left a comment
There was a problem hiding this comment.
Heya, had another look. The simplification suggestion from @astefan is valid.
We still should add more tests, they help a lot esp. when we have many in-flight PRs that may affect each other. I added some suggestions.
Finally, for the questions of whether KEEP/DROP should be permitted: I commented that I formed an opinion and that is that KEEP partial_long/KEEP partial_l* (and resp. for DROP) should be okay, because it's also okay for regular mapped fields with mapping conflicts. I'm actually adding tests for this in #144228, so this PR can try to add tests for this but will likely run into problems solved in #144228, as the current state of the code very eagerly turns partially unmapped non-KEYWORD fields into field attributes with mapping conflicts when they're mentioned anywhere.
|
|
||
| public LogicalPlan verify(LogicalPlan plan, BitSet partialMetrics) { | ||
| Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution()); | ||
| Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution(), context()); |
There was a problem hiding this comment.
nit: Do we need to pass on the whole context? It's not wrong, but it makes it may make it look like we need all kinds of things in the verifier, but we only really need the unmapped resolution and indexResolution, no?
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
| - method: POST | ||
| path: /_query | ||
| parameters: [ ] | ||
| capabilities: [ optional_fields_v2 ] |
There was a problem hiding this comment.
Since we're testing a behavior that wasn't in place on older nodes, I'm almost certain this PR will need a new capability that's used here. It's also fine to bump optional_fields_v2 to optional_fields_v3 (or actually - likely v4 or v5, depending on which of #144112, #143693 and #144228 get merged, first) as long as the feature is SNAPSHOT-only.
|
|
||
| - match: { status: 400 } | ||
| - match: { error.type: verification_exception } | ||
| - match: { error.reason: "/partial_x/" } |
There was a problem hiding this comment.
There is code in ResolveRefs which turns partial_x into a type with a mapping conflict (just like in union types) and attaches this error message. It's code that I remove in #144228 (but #143693 may need it after all, will find out when resolving merge conflicts).
However, I'd also expect to see the new error message. I think we may not see it because the new validation is in a branch with this condition:
if (failures.hasFailures() == false) {
| var esIndex = new EsIndex( | ||
| "partial_idx", | ||
| Map.of("partial_long", new EsField("partial_long", DataType.LONG, emptyMap(), true, EsField.TimeSeriesFieldType.NONE)), | ||
| Map.of("partial_idx", IndexMode.STANDARD), |
There was a problem hiding this comment.
nit: only 1 index is not realistic if you want a partial mapping in production situations (except for index aliases, but you still have to resolve that alias to 2 separate shards).
I'd instead make this Map.of("partial_idx1", IndexMode.STANDARD, "partial_idx2", IndexMode.STANDARD) and use FROM partial_idx* in the query below.
| ); | ||
| assertUnmappedLoadError( | ||
| analyzer().addIndex(esIndex), | ||
| "FROM partial_idx | KEEP partial_long", |
There was a problem hiding this comment.
Sorry, we discussed this offline and I formed an opinion on the KEEP/DROP question. IMO, KEEP/DROP should be allowed, but no usage in other commands, or in expressions except for inside convert functions.
Tests with just KEEP/DROP partial_long don't have to be covered here; let's instead use e.g. WHERE partial_long > 2 or so. I'm adding some positive tests for KEEP partial_... and DROP partial_... in #144228. We can merge this first and leave this edge case up to #144228 to include in another iteration.
| Set.of("partial_long") | ||
| ); | ||
| var plan = analyzer().addIndex(esIndex).statement(setUnmappedLoad("FROM partial_idx | KEEP common")); | ||
| assertThat(plan, not(nullValue())); |
There was a problem hiding this comment.
nit: It's better to assert the plan structure, or use a golden test.
There was a problem hiding this comment.
We have WHERE covered now. Great! Let's do more different commands and expressions.
Like, if someone accidentally slipped in a regression that bypasses this verification for, say, CHANGE_POINT, we would want a test to fail. That's true for all commands.
| esql.query: | ||
| body: | ||
| query: 'SET unmapped_fields="load"; FROM partial_load_yaml_* | WHERE partial_x > 0 | KEEP tag' | ||
| query: 'SET unmapped_fields="load"; FROM partial_load_yaml_* | KEEP partial_x' |
There was a problem hiding this comment.
Sorry, I think the original query was fine! The KEEP partial_x one will be considered valid with #144228.
…-load' into 143218-esql-unmapped-fields-load # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- Move the detection and rejection of partially unmapped non-keywords to Verifier
There was a problem hiding this comment.
Heya, the productive changes look good to me with minor comments. Mostly, I'm unsure if subfields are handled correctly and would suggest this gets thoroughly tested.
Please, consider this unblocked from my end; let's address the remaining comments (esp. around additional testing) and
. If he's available, please seek final review from @astefan as I'll be only back on Monday.
Thank you @shmuelhanoch and @mouhc1ne , as well as @astefan for the thorough first review round!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
- Minor change in how we collect PUNKs: it now avoids string comparisons and relies on es relation output and attribute sets.
…fields-load # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- AnalyzerUnmappedTests: Bring back several unit tests, originally added by Shmuel, but accidentally removed by me during merge conflicts that occurred as I was updating the branch. These tests are failing now, pushing nonetheless so we're all on the same baseline. - Verifier: EsIndex changed the way it tells PUNKs, so we adjust.
…-load' into 143218-esql-unmapped-fields-load # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- Minor adjustments to unit test: assert number of problems, and line/col number. - Revert back the Verifier's PUNK detection to the earlier string based approach with field names, instead of the field attributes approach which changed with Gal's PR.
Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.
Closes #143218