Skip to content

Disallow unmapped_fields=load with partial non-KEYWORD#144109

Open
shmuelhanoch wants to merge 27 commits intoelastic:mainfrom
shmuelhanoch:143218-esql-unmapped-fields-load
Open

Disallow unmapped_fields=load with partial non-KEYWORD#144109
shmuelhanoch wants to merge 27 commits intoelastic:mainfrom
shmuelhanoch:143218-esql-unmapped-fields-load

Conversation

@shmuelhanoch
Copy link
Copy Markdown
Contributor

Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.

Closes #143218

Fail the query when SET unmapped_fields="load" is used and any index
has a field that is partially mapped (present in some indices, unmapped
in others) and whose type where mapped is not KEYWORD. Add analyzer
rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.

Closes elastic#143218
@shmuelhanoch shmuelhanoch force-pushed the 143218-esql-unmapped-fields-load branch from 5f76a74 to 90769e3 Compare March 16, 2026 14:22
@shmuelhanoch shmuelhanoch marked this pull request as ready for review March 16, 2026 14:26
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 16, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies self-requested a review March 16, 2026 14:35
@shmuelhanoch shmuelhanoch changed the title ESQL: disallow unmapped_fields=load with partial non-KEYWORD Disallow unmapped_fields=load with partial non-KEYWORD Mar 18, 2026
@alex-spies alex-spies added v9.4.0 and removed v7.9.4 labels Mar 18, 2026
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shmuelhanoch ! I think the solution is generally correct, but I have some remarks around where to put it and how to test it.

In particular, I think this requires some yaml tests to confirm the behavior is correct in end-to-end tests as well, because our unit tests are a little "academic" and require us to mimic how we create mappings very well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a bunch more test cases. In particular, tests where a partially mapped field is referenced not (just) in KEEP, but also, say, WHERE.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have WHERE covered now. Great! Let's do more different commands and expressions.

Like, if someone accidentally slipped in a regression that bypasses this verification for, say, CHANGE_POINT, we would want a test to fail. That's true for all commands.

Comment on lines +101 to +104
* Resolve a dotted field path (e.g. "a.b.c") in the given root mapping.
* Returns the leaf EsField or null if any segment is missing.
*/
private static EsField resolveFieldByPath(Map<String, EsField> root, String path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we add a bunch of code to deal with subfields, we should also add a bunch of tests with subfields.

resolutionsInPlan.add(r);
}
} else {
IndexResolution r = context.indexResolution().get(new IndexPattern(relation.source(), relation.indexPattern()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When do we encounter a case with more than 1 index resolution?
  • How do we test this?

…fields-load

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
@alex-spies
Copy link
Copy Markdown
Contributor

From discussing with @GalLalouche it occurred to me that we should particularly test date/nanos union types:

SET unmapped_fields="load";
FROM sample_data, sample_data_ts_nanos, no_mapping_sample_data METADATA _index
| WHERE @timestamp == "2021-01-01"::date_nanos

Here, @timestamp is date in sample_data, date_nanos in sample_data_ts_nanos, and outright unmapped in no_mapping_sample_data.

The peculiarity is that, without load, our analyzer will implicitly cast @timestamp to date_nanos. That could mess with the behavior we are implementing.

The expected behavior should still be that we fail the query because @timestamp is partially unmapped and used without a cast. (The case with an explicit cast should work but is out of scope here, and will be either part of #143693 or a follow-up.)

Fail the query when SET unmapped_fields="load" is used and any index
has a field that is partially mapped (present in some indices, unmapped
in others) and whose type where mapped is not KEYWORD. Add analyzer
rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.

Closes elastic#143218
- Move partially-unmapped non-KEYWORD load validation from the analyzer
  Resolution batch into Verifier (after existing load-mode checks).
- Fail only when those fields appear in FieldAttributes outside
  EsRelation (actually used by the query).
- Resolve indices from non-LOOKUP EsRelations only; pass AnalyzerContext
  for indexResolution() lookups.
- Remove DisallowLoadWithPartiallyMappedNonKeyword; add
  LoadPartialMappingChecks and a Verifier.verify overload that accepts
  AnalyzerContext.
- Extend AnalyzerUnmappedTests: FROM-only, WHERE, SORT, comma-separated
  indices, dotted paths, nullify.
- Add esql YAML REST tests for load vs partial non-KEYWORD (used vs
  unused) behind optional_fields_v2.
@shmuelhanoch shmuelhanoch force-pushed the 143218-esql-unmapped-fields-load branch from 921e12b to a83ca15 Compare March 25, 2026 16:36

public LogicalPlan verify(LogicalPlan plan, BitSet partialMetrics) {
Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution());
Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution(), context());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you don't need to pass context().unmappedResolution() and context().
context() is sufficient, you can extract the unmappedResolution() in the method itself.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to pass on the whole context? It's not wrong, but it makes it may make it look like we need all kinds of things in the verifier, but we only really need the unmapped resolution and indexResolution, no?

checkLoadModeDisallowedCommands(plan, failures);
checkLoadModeDisallowedFunctions(plan, failures);
checkFlattenedSubFieldLoad(plan, failures);
if (failures.hasFailures() == false) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why is this check?
In ESQL, in general, we try to present to the user the full list of verification failures in one batch. For example, AnalyzerTests.testLookupJoinUnknownField and AnalyzerUnmappedTests.testTbucketWithUnmappedTimestampWithFork.


- match: { status: 400 }
- match: { error.type: verification_exception }
- match: { error.reason: "/partial_x/" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assert the full error message, please.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the intention with this test is. It's throwing

Cannot use field [partial_x] due to ambiguities being mapped as [2] incompatible types: [keyword] enforced by INSIST command, and [long] in index mappings

I was expecting the error message from LoadPartialMappingChecks.checkPartialNonKeywordWithLoad... I guess.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code in ResolveRefs which turns partial_x into a type with a mapping conflict (just like in union types) and attaches this error message. It's code that I remove in #144228 (but #143693 may need it after all, will find out when resolving merge conflicts).

However, I'd also expect to see the new error message. I think we may not see it because the new validation is in a branch with this condition:

            if (failures.hasFailures() == false) {

public void testDisallowLoadWithPartiallyMappedNonKeyword() {
assumeTrue("Requires OPTIONAL_FIELDS_V2", EsqlCapabilities.Cap.OPTIONAL_FIELDS_V2.isEnabled());

var esIndex = new EsIndex(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests that live in VerifierTests are better suited for these tests. The advantage with VerifierTests is that it's not bypassing some steps... here you are manually building EsIndex instances and it's kind of fragile imho.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I may be wrong, but the code in LoadPartialMappingChecks can be simpler than what is currently proposed. Notice, also, that I shortened the error message. Imho, it's too long and the second part of it can be omitted. Up to you and @alex-spies if that's the case or not.

static void checkPartialNonKeywordWithLoad(LogicalPlan plan, @Nullable AnalyzerContext context, Failures failures) {
        if (context == null || context.unmappedResolution() != UnmappedResolution.LOAD) {
            return;
        }

        Set<String> nonKeywordFieldNames = new HashSet<>();
        List<IndexResolution> resolutions = new ArrayList<>();

        plan.forEachDown(LogicalPlan.class, p -> {
            if (p instanceof EsRelation rel) {
                if (rel.indexMode() != IndexMode.LOOKUP) {
                    IndexResolution r = context.indexResolution().get(new IndexPattern(rel.source(), rel.indexPattern()));
                    if (r != null) {
                        resolutions.add(r);
                    }
                }
            } else {
                p.forEachExpression(FieldAttribute.class, fa -> {
                    if (fa.dataType() != KEYWORD) {
                        nonKeywordFieldNames.add(fa.name());
                    }
                });
            }
        });

        for (String fieldName : new TreeSet<>(nonKeywordFieldNames)) {
            for (IndexResolution resolution : resolutions) {
                if (resolution.isValid() && resolution.get().isPartiallyUnmappedField(fieldName)) {
                    failures.add(
                        fail(
                            plan,
                            "Loading partially mapped non-KEYWORD fields is not yet supported. "
                                + "unmapped_fields=\"load\" is not supported with partially mapped field [{}].",
                            fieldName
                        )
                    );
                    break;
                }
            }
        }
    }

mouhc1ne and others added 2 commits March 26, 2026 10:46
- Refactor repeated partial-index test setup in AnalyzerUnmappedTests
  using shared helpers for index and field construction.
- Assert the full load partial-non-keyword verification message in
  relevant AnalyzerUnmappedTests scenarios.
- Update YAML REST coverage to exercise the intended load verification
  path and match the full error.reason payload.
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, had another look. The simplification suggestion from @astefan is valid.

We still should add more tests, they help a lot esp. when we have many in-flight PRs that may affect each other. I added some suggestions.

Finally, for the questions of whether KEEP/DROP should be permitted: I commented that I formed an opinion and that is that KEEP partial_long/KEEP partial_l* (and resp. for DROP) should be okay, because it's also okay for regular mapped fields with mapping conflicts. I'm actually adding tests for this in #144228, so this PR can try to add tests for this but will likely run into problems solved in #144228, as the current state of the code very eagerly turns partially unmapped non-KEYWORD fields into field attributes with mapping conflicts when they're mentioned anywhere.


public LogicalPlan verify(LogicalPlan plan, BitSet partialMetrics) {
Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution());
Collection<Failure> failures = verifier.verify(plan, partialMetrics, context().unmappedResolution(), context());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to pass on the whole context? It's not wrong, but it makes it may make it look like we need all kinds of things in the verifier, but we only really need the unmapped resolution and indexResolution, no?

- method: POST
path: /_query
parameters: [ ]
capabilities: [ optional_fields_v2 ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're testing a behavior that wasn't in place on older nodes, I'm almost certain this PR will need a new capability that's used here. It's also fine to bump optional_fields_v2 to optional_fields_v3 (or actually - likely v4 or v5, depending on which of #144112, #143693 and #144228 get merged, first) as long as the feature is SNAPSHOT-only.


- match: { status: 400 }
- match: { error.type: verification_exception }
- match: { error.reason: "/partial_x/" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code in ResolveRefs which turns partial_x into a type with a mapping conflict (just like in union types) and attaches this error message. It's code that I remove in #144228 (but #143693 may need it after all, will find out when resolving merge conflicts).

However, I'd also expect to see the new error message. I think we may not see it because the new validation is in a branch with this condition:

            if (failures.hasFailures() == false) {

var esIndex = new EsIndex(
"partial_idx",
Map.of("partial_long", new EsField("partial_long", DataType.LONG, emptyMap(), true, EsField.TimeSeriesFieldType.NONE)),
Map.of("partial_idx", IndexMode.STANDARD),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: only 1 index is not realistic if you want a partial mapping in production situations (except for index aliases, but you still have to resolve that alias to 2 separate shards).

I'd instead make this Map.of("partial_idx1", IndexMode.STANDARD, "partial_idx2", IndexMode.STANDARD) and use FROM partial_idx* in the query below.

);
assertUnmappedLoadError(
analyzer().addIndex(esIndex),
"FROM partial_idx | KEEP partial_long",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we discussed this offline and I formed an opinion on the KEEP/DROP question. IMO, KEEP/DROP should be allowed, but no usage in other commands, or in expressions except for inside convert functions.

Tests with just KEEP/DROP partial_long don't have to be covered here; let's instead use e.g. WHERE partial_long > 2 or so. I'm adding some positive tests for KEEP partial_... and DROP partial_... in #144228. We can merge this first and leave this edge case up to #144228 to include in another iteration.

Set.of("partial_long")
);
var plan = analyzer().addIndex(esIndex).statement(setUnmappedLoad("FROM partial_idx | KEEP common"));
assertThat(plan, not(nullValue()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's better to assert the plan structure, or use a golden test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have WHERE covered now. Great! Let's do more different commands and expressions.

Like, if someone accidentally slipped in a regression that bypasses this verification for, say, CHANGE_POINT, we would want a test to fail. That's true for all commands.

esql.query:
body:
query: 'SET unmapped_fields="load"; FROM partial_load_yaml_* | WHERE partial_x > 0 | KEEP tag'
query: 'SET unmapped_fields="load"; FROM partial_load_yaml_* | KEEP partial_x'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think the original query was fine! The KEEP partial_x one will be considered valid with #144228.

…-load' into 143218-esql-unmapped-fields-load

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- Move the detection and rejection of partially unmapped non-keywords to Verifier
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, the productive changes look good to me with minor comments. Mostly, I'm unsure if subfields are handled correctly and would suggest this gets thoroughly tested.

Please, consider this unblocked from my end; let's address the remaining comments (esp. around additional testing) and :shipit: . If he's available, please seek final review from @astefan as I'll be only back on Monday.

Thank you @shmuelhanoch and @mouhc1ne , as well as @astefan for the thorough first review round!

- Minor change in how we collect PUNKs: it now avoids string comparisons and relies on es relation output and attribute sets.
@mouhc1ne mouhc1ne requested a review from alex-spies March 26, 2026 19:36
mouhc1ne and others added 12 commits March 27, 2026 10:37
…fields-load

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- AnalyzerUnmappedTests: Bring back several unit tests, originally added by Shmuel, but accidentally removed by me during merge conflicts that occurred as I was updating the branch. These tests are failing now, pushing nonetheless so we're all on the same baseline.
- Verifier: EsIndex changed the way it tells PUNKs, so we adjust.
…-load' into 143218-esql-unmapped-fields-load

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
- Minor adjustments to unit test: assert number of problems, and line/col number.
- Revert back the Verifier's PUNK detection to the earlier string based approach with field names, instead of the field attributes approach which changed with Gal's PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: disallow unmapped_fields="load" in case a field is partially mapped but not as KEYWORD and used without cast

5 participants