-
Notifications
You must be signed in to change notification settings - Fork 25.6k
resolve remotes with field caps call #133947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolve remotes with field caps call #133947
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @idegtiarenko, it looks good in general.
I left just a couple of comments regarding some corner cases (maybe a bit paranoid)
| private void doResolveRemotes(List<IndexPattern> indexPatterns, QueryBuilder requestFilter, ActionListener<Set<String>> listener) { | ||
| switch (indexPatterns.size()) { | ||
| case 0 -> listener.onResponse(Set.of()); | ||
| case 1 -> indexResolver.resolveConcreteIndices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the index pattern also contains a cluster/project prefix, will this still be correct?
eg.
FROM remote1:idx1,remote2:idx2
| STATS max(a) by b
| ENRICH policy on a, b
The enrich can only be calculated on the coordinator (after the STATS reduction), but I guess the enrich index on the coordinator won't be resolved. I'm not sure it works at all today tbh, maybe worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the index pattern also contains a cluster/project prefix, will this still be correct?
Yes, this is a valid situation, where all supplied indices are correct specific indices (not aliases nor patterns) and exist.
In such cases we would resolve to exact same 2 identifiers and would derive 2 remotes from them: remote1 and remote2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, I'd just prefer to have a test for this query and make sure we are not introducing a regression.
Apart from that, I think we can proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LOOKUP JOINs we have a specific check: we just don't allow it.
I'm checking ENRICH now, but I couldn't find similar validation for it in the codebase, so I guess it's being treated differently...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is behaving the same way. It looks like we have a similar test for enrich:
Lines 398 to 421 in 37b28e4
| public void testAggThenEnrichRemote() { | |
| String query = String.format(Locale.ROOT, """ | |
| FROM *:events,events | |
| | eval ip= TO_STR(host) | |
| | %s | |
| | stats c = COUNT(*) by os | |
| | %s | |
| | sort vendor | |
| """, enrichHosts(Enrich.Mode.ANY), enrichVendors(Enrich.Mode.REMOTE)); | |
| var error = expectThrows(VerificationException.class, () -> runQuery(query, randomBoolean()).close()); | |
| assertThat(error.getMessage(), containsString("ENRICH with remote policy can't be executed after [stats c = COUNT(*) by os]@4:3")); | |
| } | |
| public void testEnrichCoordinatorThenEnrichRemote() { | |
| String query = String.format(Locale.ROOT, """ | |
| FROM *:events,events | |
| | eval ip= TO_STR(host) | |
| | %s | |
| | %s | |
| | sort vendor | |
| """, enrichHosts(Enrich.Mode.COORDINATOR), enrichVendors(Enrich.Mode.REMOTE)); | |
| var error = expectThrows(VerificationException.class, () -> runQuery(query, randomBoolean()).close()); | |
| assertThat(error.getMessage(), containsString("ENRICH with remote policy can't be executed after [ENRICH _COORDINATOR")); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idegtiarenko I added a test in main for this case, see #134199, and it seems to work fine.
I think it makes sense to run it on this PR and see if the results are the same.
| requestFilter, | ||
| listener.map(EsqlCCSUtils::getRemotesOf) | ||
| ); | ||
| default -> listener.onFailure(new MappingException("Queries with multiple indices are not supported")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK for now, but it will change when we start supporting subqueries and proper JOINs.
I'm not sure how it will actually work, maybe the enriches in a subquery will have to consider the remotes from the main indices in the subquery itself.
Anyway, I'd add a // TODO here, so that we don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the error message is a bit misleading, it took me a couple of seconds to realize that we were referring to multiple FROM, not to multiple indices. I know it comes from a similar check in EsqlSession, probably we should change both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from
elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Line 627 in 938aa13
| listener.onFailure(new MappingException("Queries with multiple indices are not supported")); |
but that is a good call.
This is OK for now, but it will change when we start supporting subqueries and proper JOINs.
I suspect we would still need a distinction between main/top-level from and all the nested ones.
I will think about a way to improve it (across all usages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, latest CI run (that is green) includes my tests.
All good then 🎉
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, the only request is maybe add some comments on what's going on there especially in enrich resolver. I worked on that code and it still took me some time to realize what's going on, it may be tough on somebody who has not as familiar with the code.
| client.execute( | ||
| EsqlResolveFieldsAction.TYPE, | ||
| createFieldCapsRequest(indexPattern, Set.of("_id"), requestFilter, false), | ||
| listener.delegateFailureAndWrap((l, response) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder though what happens here if one of the clusters fails. Will it fail this listener and thus the whole request? In other places, we have to do some handling to figure out what failed and skip failed clusters and so on, but we don't do it here. I'm not sure what happens here if one of the clusters fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am afraid of is if EsqlResolveFieldsAction returns partial response with say cluster A failing and thus having no indices but other clusters having indices, then A would be excluded from the list but not marked as skipped, and then there could be problems with it downstream.
|
This is going to be replaced with #134290 |
This change updates enriches to resolve remotes using dedicated field caps rather than
executionInfo.getClusters().keySet().This is required in order to be able to resolve flat index expressions.
Related to: ES-12612