-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL: Improve field resolution for FORK #128501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| | WHERE x > y | ||
| | KEEP a, b, c, d, x | ||
| """, Set.of("a", "a.*", "c", "c.*", "d", "d.*", "e", "e.*", "x", "x.*", "y", "y.*")); | ||
| """, ALL_FIELDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically here what we had before was better
what we do now is that if we detect that a FORK branch requires all fields, we don't look further and we end up requiring all fields from field caps.
I think this is fine for now - we can look into improving this later - but at least the current implementation is less error prone than before.
| (STATS x = count(*), y=min(z)) | ||
| | LOOKUP JOIN my_lookup_index ON xyz | ||
| | WHERE x > y OR _fork == "fork1" | ||
| """, Set.of("a", "c", "abc", "b", "def", "z", "xyz", "def.*", "xyz.*", "z.*", "abc.*", "a.*", "c.*", "b.*")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually got better - because we stopped asking field caps for _fork and b (which was the result of an eval).
| (STATS x = count(*), y=min(z)) | ||
| | WHERE x > y | ||
| """, Set.of("a", "a.*", "b", "b.*", "c", "c.*", "z", "z.*")); | ||
| """, Set.of("a", "a.*", "c", "c.*", "z", "z.*")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we correctly detect that now that we don't need to ask for b because it's the result of an eval
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
carlosdelest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I have some questions about legibility on the plan traversal
| PreAnalysisResult initialResult = result; | ||
| projectAll.set(false); | ||
| parsed.forEachDown(p -> {// go over each plan top-down | ||
| if (hasFork && seenFork.get() == false && p instanceof Fork == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this forEachDown loop:
- Shouldn't we check first if we have
seenFork? Otherwise makes no sense to step down on the individual plans? - Why do we have to check for
seenForkin the inner loop?
Maybe adding some comments would help me understand the plan traversal here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check first if we have seenFork? Otherwise makes no sense to step down on the individual plans?
If there is no FORK command, this loop will execute exactly as before. That's why I am checking hasFork first, because the other conditions are relevant only in the context where FORK is being used.
Why do we have to check for seenFork in the inner loop?
This has to do with how FORK is being modelled.
A query like:
FROM test
| WHERE id > 1 // common pre-filter
| FORK
( WHERE content:"fox" )
( WHERE content:"dog" )
| SORT _fork
| KEEP _fork, id, content
If FORK is used we want to analyze the FORK branches separately - which is why we skip the plans in this forEachDown loop until we reach FORK. When we reach FORK we call fieldNames on each child and we try to union the field names.
This forEachDown loop assumes the plan is linear (except for some LookupJoin check).
If someone is making a change to fieldNames, I wanted to make sure they don't have to necessarily worry about FORK and whether the plan is linear or if it contains plans that have more than one child.
But I also didn't want us to introduce bugs with FORK because we did not consider it when we made changes to fieldNames.
Which is why I put the handling of FORK right at the beginning and decided to call fieldNames on each child.
I will add a comment in the code with some explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and I figured out the source of confusion here - we do traverse this with forEachDown, not forEachUp 🤦♀️ 🤦♀️ 🤦♀️
I am just reverting to requesting all fields when FORK and not try to optimize this further which seems so brittle atm
when we are out to tech preview and we have amazing test coverage for FORK we can go back and modify this more confidently.
| if (hasFork && seenFork.get() == false && p instanceof Fork == false) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition here seems to bypass all other plan types that are not fork until a fork is found. This seems brittle. I have the feeling this is like this because of the current limitations in fork but I am not sure it's true. Nevertheless, if fork evolves in the future and has no limitations anymore, this condition here still stands?
The condition makes everything else in the query dependent on fork existence. Maybe some comments in code would explain why this logic makes sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is very brittle
I will just revert back to requesting all fields - we can improve this later and I'd argue it's not a must for tech preview
carlosdelest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I think we can simplify some code
| boolean[] canRemoveAliases = new boolean[] { true }; | ||
|
|
||
| PreAnalysisResult initialResult = result; | ||
| projectAll.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - I think this assignment is unnecessary
| } | ||
| }); | ||
|
|
||
| if (projectAll.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't happen, right? I don't see projectAll being updated in the previous forEachDown() block 🤔
astefan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM

fixes #128271
fixes #128272
related #121950
In #121950 we made some changes for how we do field resolution for FORK.
Previously we would ask for all fields when FORK was used in the query, which was suboptimal.
After I merged #121950, we got some failing tests in
IndexResolverFieldNamesTests. That's because prior to me merging the #121950 there were some other fixes made inEsqlSession.fieldNamesthat were merged and I did not update my branch with the latest changes.I am just reverting back to requesting all fields for FORK atm.