-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Multi-relation support, a pre-requisite for views and sub-queries #136780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
aa533ad to
1fc334e
Compare
This is extracted from the views prototype, which also includes necessary support for sub-queries.
1fc334e to
9352610
Compare
fang-xing-esql
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to Analyzer, AnalyzerContext and PreAnalyzer make sense to me. I applied the changes to the subquery PR, and subqueries can be resolved properly, thank you @craigtaverner !
The changes to EsqlSession and EsqlCCSUtils also look fine to me.
| ; | ||
|
|
||
| country:text |language_name:keyword |MAX(language_code):integer |language_code:integer | ||
| country:keyword |language_name:keyword |MAX(language_code):integer |language_code:integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious - why this change is happening in this pull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could indeed be part of a separate PR, but the basic changes to CsvTestsDataLoader were made to facilitate views testing, and it was small enough that I thought it convenient to include in this PR. I already expect to have 3 or 4 PRs to complete the first functional Views, and hoped not to have even more. But if the changes to CsvTestsDataLoader and these two csv-spec files are cluttering this PR too much, I can move them out to another PR.
| } | ||
| } | ||
|
|
||
| static void handleFieldCapsFailures( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get this part - why do we need two functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the technique Ievgen introduced for handling multiple lookup join resolutions. We now use the same approach for the main indexes, and the field-caps failures. Basically it uses recursion to iterate over the multiple FROM commands. The two methods are because the first is passed the iterator, and it calls the second for each element.
| } | ||
|
|
||
| private LogicalPlan resolveInsist(Insist insist, List<Attribute> childrenOutput, IndexResolution indexResolution) { | ||
| private LogicalPlan resolveInsist(Insist insist, List<Attribute> childrenOutput, Collection<IndexResolution> indexResolution) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use plural here if we're passing a collection now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched the AnalyzerContext now to give more control to the choice of which indexresolution to use for the INSISTS command. I'll review with Gal.
|
|
||
| // Field is partially unmapped. | ||
| if (resolvedCol instanceof FieldAttribute fa && indexResolution.get().isPartiallyUnmappedField(fa.name())) { | ||
| // TODO: Should the check for partially unmapped fields be done specific to each sub-query in a fork? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this part correctly - if multiple resolutions come with subqueries, aren't they scoped within the subqueries? How does it work - if I have a mapping in index in one subquery, I can't use it in another subquery, can I? What about propagating mappings up/down the subquery tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will discuss this code with Gal when he gets back. In the meantime, I've refined the code to be a bit more clear, and at least only look at partially unmapped fields for index resolutions in the same branch of the plan tree that includes the INSISTS command.
| ignoreOrder:true | ||
| language_code:integer | country:text | language_name:keyword | ||
| 2 | [Austria, Germany] | German | ||
| 2 | [Germany, Austria] | German |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the order change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is related to the mapping changes in CsvTestsDataLoader for the lookup index called languages_lookup_non_unique_key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as mentioned in my other comment, these changes could also be moved to a separate PR. I'd only included them because I will already have 3 or 4 PRs for Views, and hoped to not have even more than that.
| public record PreAnalysis( | ||
| IndexMode indexMode, | ||
| IndexPattern indexPattern, | ||
| Map<IndexPattern, IndexMode> indexes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Elastic code uses "indices" not "indexes"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a search and found a near 50:50 split between the two terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, I would recommend indices since that appears in public api: https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices
| EsqlFunctionRegistry functionRegistry, | ||
| IndexResolution indexResolution, | ||
| Map<IndexPattern, IndexResolution> indexResolution, | ||
| Map<String, IndexResolution> lookupResolution, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here a comment why we index one by String and another by IndexPattern would help. I think I understand why (because lookup only uses single index while general case can have arbitrary expressions) but it took me some work to figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the FROM command has much more complex rules for index resolution on the index pattern, so I wanted to keep the full IndexPattern through the entire stack.
| indexMode.set(p.indexMode()); | ||
| indexPattern.set(p.indexPattern()); | ||
| } else if (indexes.containsKey(p.indexPattern()) == false || indexes.get(p.indexPattern()) == p.indexMode()) { | ||
| indexes.put(p.indexPattern(), p.indexMode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get why we do the put if we already have exactly the same element already. Can't we simplify this as:
} else {
IndexMode previous = indexes.put(p.indexPattern(), p.indexMode());
if(previous != null && previous != p.indexMode()) {
throw new IllegalStateException(
"index pattern '" + p.indexPattern() + "' found with with different index mode: " + previous + " != " + p.indexMode()
);
}
}
Also, when exactly can such a situation happen anyway? And why lookup mode is excluded from the check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was sort-of inherited from the previous version. I tried to keep the logic as close as possible. However, Ievgen also questioned the need for such defensive coding, and suggested I consider an assert instead. I'll investigate that option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I realize where this can occur. This is not the index mode of the actual index, but the mode of the command, so we have FROM, LOOKUP JOIN and TS commands setting the mode. At least the same index can be used in both FROM and LOOKUP JOIN, and will have different modes. We cannot make an assertion on that, since it is actually a feasible thing for a user to write. I think since LOOKUP JOIN is handled separately above, this should only affect FROM vs TS commands. By throwing an exception here we're basically saying we disallow the same index from being used in both FROM and TS subqueries. I don't know if that is the best behaviour, but it is a starting point.
| Holder<Boolean> supportsDenseVector = new Holder<>(false); | ||
| indexes.forEach((ip, mode) -> { | ||
| if (mode == IndexMode.TIME_SERIES) { | ||
| supportsAggregateMetricDouble.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am not sure how it's supposed to work with subqueries. Should this be enabled for the whole query, or only for the subquery that this particular pattern belongs to? Maybe this is out of scope for this patch, just trying to figure out the concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a flag for customers to turn on the data type. They can do this in two ways, reference a time-series index, or use a time-series function. The use of this flag is temporary until the data type is more officially released. Until then we can basically decide for ourselves, and it is certainly much easier to turn it on for the entire query. Basically the data type is supported or it is not. Adding cases for partial support would be much more complex, for no obvious purpose.
| @@ -0,0 +1,127 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confused about how UnionAll relates to the rest of the patch. It's here but nothing else is mentioning or using it. Am I missing something in the picture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be removed. I had originally removed it, and then put it back, thinking I could use it to write a few more unit tests. Those tests never got written, and have questionable value, so I'll remove this class again.
| loadedDatasets.add(dataset.indexName); | ||
| } | ||
| forceMerge(client, loadedDatasets, logger); | ||
| logger.info("Loading enrich policies"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I think those 2 logs are redundant since we have per data set/enrich log entry.
Also I hope they are not loaded per test case as it would flood CI logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are logged only on the first test that needs data, then the EsqlSpecTestCase records that data is loaded and does not load it again. I found these very useful in debugging some odd cases where assertions killed cluster members. As mentioned in my responses to Stas before, these changes to the csv tests could actually be moved into a separate PR, but since I will already have 3 or 4 PRs for the Views MVP, I was hoping not to split the PRs up even further.
| plan.telemetryLabel() | ||
| ); | ||
| } | ||
| // assert indexResolution.matches(plan.indexPattern().indexPattern()) : "Expected index resolution to match the index pattern"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentionally commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did replace the next ten lines of code with this single assertion, but it caused some odd failures, so I brought back the old code, and commented out the assertion. I think we should investigate more, and there is a pre-existing TODO on line 274 for that. I left the assert commented out as a way to hint the direction we could take when we tackle the TODO (so I see it as more information relevant to that todo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, I think the assertion failed because of some normalization of the IndexPattern. Why the existing code still works is a mystery to me (it leaves in place an UnresolvedRelation). But a mystery I thought we could solve later.
| EsqlExecutionInfo executionInfo | ||
| ) throws ElasticsearchStatusException { | ||
| if (indexPattern == null) { | ||
| if (indexPatterns == null || indexPatterns.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think indexPatterns can ever be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreAnalysis.EMPTY was initialized with a null map, but I agree this cannot get to this code. I'll remove this defensive check, and also initialize EMPTY with an empty map, just to be more consistent.
|
|
||
| PreAnalysisResult withIndices(IndexResolution indices) { | ||
| PreAnalysisResult withIndices(IndexPattern indexPattern, IndexResolution indices) { | ||
| Map<IndexPattern, IndexResolution> newIndexResolution = new HashMap<>(this.indexResolution); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this structure is mutable already (see lookupIndices). I would not copy the hash map here and just modify existing one.
Related to #136686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I changed it to follow the pattern used by LookupIndices, and leave the rest of the changes to #136686.
| return combined; | ||
| }) | ||
| .get(); | ||
| var groupedIndices = indicesGrouper.groupIndices(IndicesOptions.DEFAULT, indexExpressions, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a TODO:
// TODO it is not safe to concat multiple index patterns in case any of them contains exclusion. This is going to be resolved in 136804
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
idegtiarenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several minor comments.
Overall 👍 from index resolution point of view assuming some followups in the separate PRs.
|
Heya, late to the party but I took a look at the changes to the min transport version propagation as that's brand new. That looks correct to me! |
…astic#136780) This is extracted from the views prototype at elastic#134995. It provides the underlying support for multiple `FROM ...` commands within the same ES|QL query, something that both non-materialized views and subqueries require. A key feature of this work is that all `UnresolvedRelation` instances are treated equivalently, which means they all support CPS and CPS just as before. This implies that sub-queries using this will support CCS/CPS directly, and for views it means: * The underlying indexes within the view definitions will support CCS/CPS as if they were in the original FROM command * Views are strictly resolved in the coordinator of the local cluster, so that the indexes defined within the views can then be resolved using CCS/CPS rules as normal.
This is extracted from the views prototype at #134995. It provides the underlying support for multiple
FROM ...commands within the same ES|QL query, something that both non-materialized views and subqueries require.A key feature of this work is that all
UnresolvedRelationinstances are treated equivalently, which means they all support CPS and CPS just as before. This implies that sub-queries using this will support CCS/CPS directly, and for views it means:For reviewing this PR, it should be noted that most of the changes are in tests that had mistakes that went unnoticed because they only ever expected a single IndexPattern, and so could get away with mocking a single IndexResolution even if it was for a different index pattern than expressed in the test. The mismatch in index name between FROM and IndexResolution had no consequences. This is no longer possible since we need the IndexPattern to differentiate between different FROM commands. If you want to focus on the functional changes in this PR, look mostly at
EsqlSession,AnalyzerContext,AnalyzerandEsqlCCSUtils.