-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ES|QL] Non-Correlated Subquery in FROM command #135744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @fang-xing-esql, I've created a changelog YAML for you. |
8a72832
to
0c5b79d
Compare
Hi @fang-xing-esql, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for non-correlated subqueries within the FROM command in ES|QL, allowing queries to reference multiple data sources including both index patterns and subqueries. The implementation enables subqueries to be processed similarly to Fork operations, with key distinctions in index resolution and predicate pushdown capabilities.
- Adds grammar and parser support for subquery syntax in FROM commands
- Implements UnionAll logical plan to handle mixed index patterns and subqueries
- Enables predicate pushdown optimization specifically for UnionAll operations
Reviewed Changes
Copilot reviewed 36 out of 39 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
EsqlBaseParser.g4 | Updates grammar to support subquery syntax in FROM_MODE |
LogicalPlanBuilder.java | Creates UnionAll plans and handles subquery/index pattern combinations |
UnionAll.java | New logical plan extending Fork with union-typed field support |
Subquery.java | New logical plan node representing subquery placeholders |
Analyzer.java | Resolves subquery indices and handles union-typed fields |
PushDownAndCombineFilters.java | Adds predicate pushdown optimization for UnionAll |
EsqlSession.java | Implements subquery index resolution during pre-analysis |
Various test files | Adds comprehensive test coverage for subquery functionality |
Comments suppressed due to low confidence (1)
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/SubqueryTests.java:1
- There's a typo in "nested fork/subquery is not supported, it passes Analyzer" - should be "nested fork/subquery is not supported; it passes Analyzer" (semicolon instead of comma for better grammar).
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
return parent; | ||
} else { // We should not reach here as the grammar does not allow it | ||
throw new ParsingException("FROM is required in a subquery"); |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "FROM is required in a subquery" is misleading since the grammar already enforces this requirement. Consider a more descriptive message like "Invalid subquery structure" or remove the comment and exception if this code path is truly unreachable.
throw new ParsingException("FROM is required in a subquery"); | |
throw new ParsingException("Invalid subquery structure"); |
Copilot uses AI. Check for mistakes.
LogicalPlan newChild = switch (child) { | ||
case Project project -> maybePushDownFilterPastProjectForUnionAllChild(pushable, project); | ||
case Limit limit -> maybePushDownFilterPastLimitForUnionAllChild(pushable, limit); | ||
default -> null; // TODO add a general push down for unexpected pattern |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates incomplete functionality. Consider implementing the general push down logic or at least provide a more specific plan for when this will be addressed, as returning null could lead to silent failures in optimization.
default -> null; // TODO add a general push down for unexpected pattern | |
default -> { | |
// Fallback: unknown child type, do not push down filter for this child. | |
// Consider implementing general push down logic here in the future. | |
yield child; | |
} |
Copilot uses AI. Check for mistakes.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
boolean supportsAggregateMetricDouble, | ||
boolean supportsDenseVector | ||
boolean supportsDenseVector, | ||
Set<IndexPattern> subqueryIndices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this subqueryIndices
into the mainIndices is another option, it will require changes to EsqlCCSUtils.initCrossClusterState
and EsqlCCSUtils.createIndexExpressionFromAvailableClusters
, as they associate the ExecutionInfo
with only one index pattern today.
hasCapabilities(adminClient(), List.of(ENABLE_FORK_FOR_REMOTE_INDICES.capabilityName())) | ||
); | ||
} | ||
// Subqueries in FROM are not fully tested in CCS yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is subquery exists in the query convertToRemoteIndices
doesn't generate a correct remote index pattern yet, the query becomes invalid. Subqueries are not fully tested in CCS yet, working on it as a follow up.
x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/parser/SubqueryTests.java
Show resolved
Hide resolved
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
// then the real child, if there is unknown pattern, keep the filter and UnionAll plan unchanged | ||
List<LogicalPlan> newChildren = new ArrayList<>(); | ||
boolean changed = false; | ||
for (LogicalPlan child : unionAll.children()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to special handle based on Child type? Just put a filter on top and we already have rules for handling Filter pushdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason that the children types are checked here is that I'd like to push the predicate closer to an EsRelation
, so that the predicate has more chance to be pushed down to lucene. In this PushDownAndCombineFilters
rule here, if the child is a limit
, filters are not pushed further. However, AddImplicitForkLimit
adds a limit to each fork
/unionall
child, and this limit might prevent us from pushing down the predicate to lucene.
The patterns checked here are what I have seen so far that's added by fork, sometimes the other logical planner rules may eliminate a project
, or swap project
and limit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the PR in part and would like to provide some input. I still have to go over the tests and still to understand some parts of the Analyzer.
Thank you for providing the detailed description of the PR. It helps a lot with the review.
FROM sample_data, (FROM employees metadata _id | sort _id) metadata _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
results in
Found 1 problem\nline 1:50: Unbounded SORT not supported yet [sort _id] please add a LIMIT
This seems to imply that the "default" limit that we usually add to queries is not added to subqueries.
IF this is an acceptable and agreed upon limitation, I think it would help to have it documented in the PR/docs.
FROM (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
results in
Cannot use field [emp_no] due to ambiguities being mapped as [2] incompatible types: [integer] in [employees], [long] in [employees_incompatible]",
but
FROM *, (FROM *) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
doesn't complain. Is the first error valid?
Even FROM * metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, languages, _id
complains.
- Apologies if this is already covered, but I wanted to mention this not to forget about it. Since this is also about
field_caps
calls, using afilter
in the request should be something we test for this functionality. As a regular user I would expect that filter to also apply to subqueries, and I think it does.
"query":"FROM *, (FROM * metadata _index) metadata _id, _index | SORT emp_no desc | KEEP _index, emp_no, _id | stats count=count(*) by _index",
"filter": {
"bool": {
"filter": [
{
"exists": {
"field": "emp_no"
}
}
]
}
}
- I am wondering if this behavior is the expected one, because I couldn't tell tbh:
FROM employees, (FROM employees | eval x = emp_no::long), (FROM employees | eval x = emp_no::string) metadata _index | keep x, emp_no, _index
results in column "x" having all values as "null" while if I run
from employees | fork (eval x = emp_no::string) (eval x = emp_no::long) | keep x, emp_no
I get an error message
"Column [x] has conflicting data types in FORK branches: [LONG] and [KEYWORD]"
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/UnionAll.java
Outdated
Show resolved
Hide resolved
…and limit pushdown for subqueries
Thank you for reviewing @astefan! I replied below.
We should be able to do better here! Thanks for pointing this out, I realized that the
That's a good point. I'll double check filters in the request, and add some tests round it, thanks for reminding me. Added a test here.
|
* to the subquery indices set, if Analyzer doesn't find the subquery' indexResolution, | ||
* it falls back to the main query's indexResolution | ||
*/ | ||
if (isLookup || isMainIndexPattern) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that lookups are not supported in in sub-queries for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support lookup joins inside subqueries. The lookup indices are loaded into PreAnalyzer.lookupIndices
, and it contains the lookup indices from main query and subqueries.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/PreAnalyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/PreAnalyzer.java
Outdated
Show resolved
Hide resolved
mainExecutionInfo.skipOnFailurePredicate(), | ||
mainExecutionInfo.includeCCSMetadata() | ||
); | ||
EsqlCCSUtils.initCrossClusterState(indicesExpressionGrouper, verifier.licenseState(), subqueryIndexPattern, subqueryExecutionInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that EsqlExecutionInfo
needs to be copied because how it initializes the state for CC?
It has to change for CPS (we add remotes based on the field caps responce opposed to pre-initializing based on indicesExpressionGrouper). Possibly that is going to allow to use the same execution info here as well as report metadata from remotes used in views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this makes an implicit assumption: what is in the main query will follow the main index pattern.
I'm thinking about LOOKUP JOIN in particular.
A query like
FROM idx,(from remote1:idx2) | LOOKUP JOIN lujo
will execute the join always on the coordinator cluster (ie. the JOIN won't be pushed to the subquery), and the absence of lujo
index on remote1
won't result in a validation exception.
I'm not sure this is correct, but I don't think it introduces significant problems.
A more problematic query could be:
FROM remote1:idx1,remote2:idx2,(from remote3:idx3) | LOOKUP JOIN lujo
In this case, where will the JOIN be executed? Will the coordinator need to have lujo
?
I know CCS is still an open point, but it would be good to have some design decisions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that
EsqlExecutionInfo
needs to be copied because how it initializes the state for CC? It has to change for CPS (we add remotes based on the field caps responce opposed to pre-initializing based on indicesExpressionGrouper). Possibly that is going to allow to use the same execution info here as well as report metadata from remotes used in views.
Yes, this is correct, the main reason of copying EsqlExecutionInfo
is to avoid reusing it when it is already initialized with the main index pattern, if it is reused after it is initialized with the main index pattern, it errors out. If we can make EsqlExecutionInfo
reusable, copying is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a bit more clarity on the execution model for remote subqueries (is there a design doc I could read?) - if we have a remote subquery, does it work like INLINE STATS
, i.e. execute remote, bring results back to coordinator, treat them as constant table from now on - or is it continuing to execute the query remotely once exiting the subquery?
); | ||
EsqlCCSUtils.initCrossClusterState(indicesExpressionGrouper, verifier.licenseState(), subqueryIndexPattern, subqueryExecutionInfo); | ||
|
||
return EsqlCCSUtils.createIndexExpressionFromAvailableClusters(subqueryExecutionInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we resolve expressions is going to change soon: we are going to resolve IndexPattern directly.
Lets coordinate how to integrate that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have multiple FROM
patterns, it will probably make sense to create a common field-caps call? But having separate ones may be ok for starters too.
We should be careful with subqueryExecutionInfo
though - for example, if some clusters are marked as skipped there, should they also be skipped in the main query? I am concerned we could get inconsistent results where one of the subqueries skips one set of clusters, another skips other clusters, and main query would skip third set of clusters, and it would be impossible to understand what happened in the result (and also report what happened, since we only have one _clusters
in the output). Would be nice to document what should happen in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fang-xing-esql
I had a first quick look and I left a couple of comments.
I also tested some queries and I got some unexpected results. see below:
- count seems to be inconsistent as soon as we add conditions to the subquery
from employees,(from employees) | stats count(*)
returns 100
while
from employees,(from employees | where emp_no > 0) | stats count(*)
returns 200.
- sometimes the engine complains about nested subqueries being unsupported
This works just fine
from (from idx,( from idx | sort foo) )
This doesn't work:
from idx,(from idx,( from idx | sort foo) )
{
"error": {
"root_cause": [
{
"type": "verification_exception",
"reason": "Found 1 problem\nline 1:12: Nested subqueries are not supported"
}
- unsupported types...?
Using CSV dataset
from employees,(from * | where true ) | stats count(*)
{
"error": {
"root_cause": [
{
"type": "verification_exception",
"reason": "Found 3 problems\nline 1:1: EVAL does not support type [counter_long] as the return data type of expression [from *]\nline 1:1: EVAL does not support type [counter_long] as the return data type of expression [from *]\nline 1:1: EVAL does not support type [counter_double] as the return data type of expression [from *]"
}
mainExecutionInfo.skipOnFailurePredicate(), | ||
mainExecutionInfo.includeCCSMetadata() | ||
); | ||
EsqlCCSUtils.initCrossClusterState(indicesExpressionGrouper, verifier.licenseState(), subqueryIndexPattern, subqueryExecutionInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this makes an implicit assumption: what is in the main query will follow the main index pattern.
I'm thinking about LOOKUP JOIN in particular.
A query like
FROM idx,(from remote1:idx2) | LOOKUP JOIN lujo
will execute the join always on the coordinator cluster (ie. the JOIN won't be pushed to the subquery), and the absence of lujo
index on remote1
won't result in a validation exception.
I'm not sure this is correct, but I don't think it introduces significant problems.
A more problematic query could be:
FROM remote1:idx1,remote2:idx2,(from remote3:idx3) | LOOKUP JOIN lujo
In this case, where will the JOIN be executed? Will the coordinator need to have lujo
?
I know CCS is still an open point, but it would be good to have some design decisions on this.
Thanks for reviewing @luigidellaquila ! I replied below.
This makes me re-think about whether merging subquery index pattern into the main index pattern in parser is a good choice as an optimization, it seems like we have a good reason not to do it. @astefan mentioned a similar query pattern. I'll modify parser and not do this subquery merging. Changed here. Currently, when there are duplicate index patterns in the main
The first query with subqueries can be flattened at parsing time, as the main
Thank you for catching this! It is related to how the time-series data types are supported out of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated strictly to this PR and a brain dump, take it more like fyi and food for thought. CC @quackaplop.
A "meta" aspect as the more I look at code the more I start wondering about the high level differences between fork
and union
:
(from the PR description)
Pushes down eligible filters/predicates from the main query into subqueries. This is another key distinction between UnionAll and Fork, as predicate pushdown applies only to UnionAll, while Fork remains unchanged.
Why is that? I mean why wouldn't filters be pushed down for fork
as well like for union
?
Imo, pushing down filters and properly resolving union types (fork
does some work around properly merging attributes having the same name) could have been done as a separate PR. And the current PR could have offered functionality on par with fork
.
I keep coming back to fork
because I feel like there could be many similarities between the two and, yet, union
has a lot more than fork
(in terms of functionality) in this PR.
As an user, I see as a main difference between fork
and union
the fact that union
can go to different index patterns, while fork
does everything on one index pattern. But apart from that, in my head the two should be identical in behavior. This raises some expectations with users, who could move from fork
(because of its limited index pattern usage) to union
and if, in this usage change, the UX has differences, then there will be questions/frustrations and limitations.
Another big difference I see is the presence of _fork
: I kept adding metadata _index
to my tests to learn where some data is coming from... I don't know if other users will feel the same urge as me, but I kind of liked the _fork
column that fork
is adding.
Also, what incentive would an user have to move from union
to fork
? Performance?
// build a map of UnionAll output to a list of LogicalPlan that reference this output | ||
Map<Attribute, List<LogicalPlan>> outputToPlans = outputToPlans(unionAll, plan); | ||
|
||
List<List<Attribute>> outputs = unionAll.children().stream().map(LogicalPlan::output).toList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't Fork.outputUnion()
do the same thing as this line?
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
Yeah, the main motivation of predicate pushdown into |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am done with the review.
LGTM.
I think the amount of work and analysis that went into this PR is amazing. It took me a lot of time to go through almost everything. Any aspects that I am mentioning below or I mentioned previously can be addressed in follow up issues/PRs.
FROM books,(FROM books | SORT score DESC, author | LIMIT 5 | KEEP author, score),(FROM books | STATS total = COUNT(*)) METADATA _score
| WHERE author:"Faulkner"
| EVAL score = round(_score, 2)
| FORK (SORT score DESC, author | LIMIT 5 | KEEP author, score)
(STATS total = COUNT(*))
| SORT _fork, score DESC, author
AssertionError: Nested FORKs are not yet supported
at org.elasticsearch.xpack.esql.session.FieldNameUtils.lambda$resolveFieldNames$8(FieldNameUtils.java:118)
at org.elasticsearch.xpack.esql.core.tree.Node.forEachDownMayReturnEarly(Node.java:90)
at org.elasticsearch.xpack.esql.core.tree.Node.forEachDownMayReturnEarly(Node.java:98)
at org.elasticsearch.xpack.esql.core.tree.Node.forEachDownMayReturnEarly(Node.java:84)
at org.elasticsearch.xpack.esql.session.FieldNameUtils.resolveFieldNames(FieldNameUtils.java:222)
at org.elasticsearch.xpack.esql.session.EsqlSession.analyzedPlan(EsqlSession.java:441)
at org.elasticsearch.xpack.esql.session.EsqlSession.execute(EsqlSession.java:192)
FROM books,(FROM books METADATA _score | SORT score DESC, author | LIMIT 5 | KEEP author, score),(FROM books METADATA _score | STATS total = COUNT(*)) METADATA _score
| WHERE author:"Faulkner"
| EVAL score = round(_score, 2)
| SORT _fork, score DESC, author
Gives this error only: line 2:51: Unknown column [score], did you mean [_score]?
even though there are other issues with the query.
Fixing this error by using sort _score
(instead of sort score
) leads me to another error: line 5:20: Unknown column [_fork], did you mean [_score]?
ES|QL does report verification errors all in one message.
FROM books,(FROM books METADATA _score | SORT _score DESC, author | LIMIT 5 | KEEP author, _score),(FROM books METADATA _score | STATS total = COUNT(*)) METADATA _score
| WHERE author:"Faulkner"
| EVAL score = round(_score, 2)
| SORT score DESC, author
Found 2 problems\nline 3:15: [:] operator cannot be used after FROM\nline 3:21: [:] operator cannot operate on [author], which is not a field from an index mapping
The second error ^ is incorrect: author
is in fact a field from an index mapping (from all three (sub)queries).
FROM employees, (from employees | where gender == "F" | keep emp_no, gender), (from employees | where languages > 3 | keep emp_no, languages)
| where emp_no > 10050
| keep emp_no, gender, languages
This query asks for all fields from field_caps (*
), even though the query looks "simple" :-)). Wondering if we could do better here.
I discovered this during the logical plan optimization step where we create something like this:
\_Eval[[null[LONG] AS avg_worked_seconds#1636, null[DATETIME] AS birth_date#1637, null[KEYWORD] AS first_name#1638, null[DOUBLE] AS height#1639, null[DOUBLE] AS height.float#1640, null[DOUBLE] AS height.half_float#1641, null[DOUBLE] AS height.scaled_float#1642, null[DATETIME] AS hire_date#1643, null[BOOLEAN] AS is_rehired#1644, null[KEYWORD] AS job_positions#1645, null[INTEGER] AS languages#1646, null[INTEGER] AS languages.byte#1647, null[LONG] AS languages.long#1648, null[INTEGER] AS languages.short#1649, null[KEYWORD] AS last_name#1650, null[INTEGER] AS salary#1651, null[DOUBLE] AS salary_change#1652, null[INTEGER] AS salary_change.int#1653, null[KEYWORD] AS salary_change.keyword#1654, null[LONG] AS salary_change.long#1655, null[BOOLEAN] AS still_hired#1656]] = ull[DOUBLE] AS height#1639, null[DOUBLE] AS height.float#1640, null[DOUBLE] AS height.half_float#1641, null[DOUBLE] AS height.scaled_float#1642, null[DATETIME] AS hire_date#1643, null[BOOLEAN] AS is_rehired#1644, null[KEYWORD] AS job_positions#1645, null[INTEGER] AS languages#1646, null[INTEGER] AS languages.byte#1647, null[LONG] AS languages.long#1648, null[INTEGER] AS languages.short#1649, null[KEYWORD] AS last_name#1650, null[INTEGER] AS salary#1651, null[DOUBLE] AS salary_change#1652, null[INTEGER] AS salary_change.int#1653, null[KEYWORD] AS salary_change.keyword#1654, null[LONG] AS salary_change.long#1655, null[BOOLEAN] AS still_hired#1656]]
* | \_Limit[1000[INTEGER],false] | ||
* | \_Filter[languages{f}#19 > 0[INTEGER] AND emp_no{f}#16 > 10000[INTEGER]] | ||
* | \_EsRelation[test1][_meta_field{f}#22, emp_no{f}#16, first_name{f}#17, ..] | ||
* \_LocalRelation[[_meta_field{r}#33, emp_no{r}#34, first_name{r}#35, gender{r}#36, hire_date{r}#37, job{r}#38, job.raw{r}#39, l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LocalRelation here is a placeholder for the branch from languages....
, because the filter act on the field that belong to the other branches and basically eliminates completely languages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit confusing that this LocalRelation
doesn't only hold the fields that belong to languages
index only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fang-xing-esql, that looks pretty good.
I left another round of comments, most of them are very minor changes, but one needs to be fixed before moving forward
} | ||
|
||
public void testNestedSubqueries() { | ||
VerificationException e = expectThrows(VerificationException.class, () -> planSubquery(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a capability check (otherwise it will fail in non-snapshot)
} | ||
|
||
public void testForkInSubquery() { | ||
VerificationException e = expectThrows(VerificationException.class, () -> planSubquery(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
} | ||
|
||
return changed ? new Fork(fork.source(), newSubPlans, newOutput) : fork; | ||
return fork instanceof UnionAll unionAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to avoid the instanceof
, this could be a new polymorphic method withSubplansAndOutput(newSubPlans, newOutput)
*/ | ||
private static class ResolveUnionTypesInUnionAll extends Rule<LogicalPlan, LogicalPlan> { | ||
// The mapping between explicit conversion functions and the corresponding attributes in the UnionAll output | ||
private Map<AbstractConvertFunction, Attribute> convertFunctionsToAttributes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some refactoring:
the rules are treated as singletons (see private static final List<Batch<LogicalPlan>> RULES
, line 202), so they have to be stateless
indexPattern.set(p.indexPattern()); | ||
if (mainAndSubqueryIndices.isEmpty()) { // the index pattern from main query is always the first to be seen | ||
mainAndSubqueryIndices.add(p.indexPattern()); | ||
} else if (EsqlCapabilities.Cap.SUBQUERY_IN_FROM_COMMAND.isEnabled()) { // collect subquery index patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do you need a capability check here? Isn't it already guarded by the grammar?
COMPLETION(Completion.class::isInstance), | ||
SAMPLE(Sample.class::isInstance); | ||
SAMPLE(Sample.class::isInstance), | ||
SUBQUERY(Subquery.class::isInstance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This is not properly a command, but ++ to tracking it
This PR enables support for
non-correlated subqueries
within theFROM
command. Related to https://github.com/elastic/esql-planning/issues/89A
non-correlated subquery
in this context is one that is fully self-contained and does not reference attributes from the outer query. Enabling support for these subqueries in theFROM
command provides an additional way to define a data source, beyond directly specifying index patterns in anES|QL
query.Example
This feature is built on top of
Fork
. Subqueries are processed in a manner similar to howFork
operates today, with modifications made to the following components to support this functionality:FROM_MODE
is updated to support subquery syntax.LogicalPlanBuilder
creates aUnionAll
logical plan on top of multiple data sources. Each data source can be either index patterns or subqueries.UnionAll
extendsFork
, but unlikeFork
, eachUnionAll
leg may fetch data from different indices—this is one of the key differences betweenUnionAll
andFork
.fieldcaps
calls to build anIndexResolution
for each subquery.UnionAll
leg,InvalidMappedField
are not created across them. If conversion functions are required for common fields between the main index and subquery indices, those conversion functions must be pushed down into eachUnionAll
leg.UnionAll
andFork
, as predicate pushdown applies only toUnionAll
, whileFork
remains unchanged.Restrictions and follow ups to be addressed in the next PRs:
LogicalPlanOptimizer
will error out, if the subquery has commands besidesFROM
command. This is tracked in [ES|QL] Allow nested non-correlated subqueries in from command #136034.FieldNameUtils.resolveFieldNames
to identify subquery field names for field caps call, instead of using all fields*
.