-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Attempt to make some operations cheaper #125228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // to avoid allocating iterator that performs concurrent modification checks | ||
| for (int c = 0; c < children.size(); c++) { | ||
| children.get(c).forEachUp(action); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add this as a comment, please? The explanation in the comment helps, but I think the important point is that this reduces stack size rather than increasing performance, which the current comment kinda implies by mentioning the concurrent modification checks - which I understand didn't seem to be a problem?
| // please do not refactor it to a for-each loop | ||
| // to avoid allocating iterator that performs concurrent modification checks | ||
| for (int i = 0; i < c.size(); i++) { | ||
| var e = c.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're making more and more improvements to how we perform query plan transformations/traversals.
Could we please add at least a micro benchmark as one of the next steps? Without one, it's both hard to understand the impact of optimizations, and it's also unclear if we maybe introduce accidental regressions (in whichever PRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will try to add one in a separate change.
| // has no type info so it's difficult to have automatic checking without having base classes). | ||
|
|
||
| if (arg instanceof Collection<?> c) { | ||
| if (arg instanceof List<?> c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm we do not expect sets in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't however it's not enforced - add an else throwing an error for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ this could easily become a bug in the future, as it makes total sense for a node property to be e.g. a set of expressions. We might also just not have any test that exercises this code path with a non-list collection. To really know, we'd have to go and check all the info() method implementations on Node subclasses manually. And enforce that the only collection that node properties can use are lists.
My suggestion: let's add a code path for List<?> in addition to the more general code path for Collection<?> - and let's add an assertion in the latter because that code path shouldn't be used, so that this only trips in tests but doesn't damage production code.
| children().forEach(c -> c.forEachDown(action)); | ||
| // please do not refactor it to a for-each loop | ||
| // to avoid allocating iterator that performs concurrent modification checks | ||
| for (int c = 0; c < children.size(); c++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's squeeze a bit more perf:
for (int c = 0, s = children.size(); c < s; c++)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please keep an eye out of stream usage - I've specifically avoided them in the core however they are used in some rules and they started showing up last I checked the profiler.
| // has no type info so it's difficult to have automatic checking without having base classes). | ||
|
|
||
| if (arg instanceof Collection<?> c) { | ||
| if (arg instanceof List<?> c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't however it's not enforced - add an else throwing an error for safety.
|
Being a small change, it makes sense to backport this all the way to 8.x, especially after seeing the list iterator bubbling up. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ievgen, this generally LGTM. However, I think we should preserve the Collection code path in QueryPlan.doTransformExpression just in case that some node property is not a list. Alternatively, we could also double check and enforce that node properties that are collections can only be lists, and place that enforcement close to the actual node properties so it's possible to validate this invariant without relying on (potentially insufficient) testing.
A thought: the improvement to Node.java works because the children of a query plan are not mutable. I wonder if we could make this clearer/more idiomatic by turning the children into an unmodifiable list - which would also throw exceptions in case we accidentally don't respect this invariant somewhere.
| // to avoid allocating iterator that performs concurrent modification checks | ||
| for (int c = 0; c < children.size(); c++) { | ||
| children.get(c).forEachUp(action); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add this as a comment, please? The explanation in the comment helps, but I think the important point is that this reduces stack size rather than increasing performance, which the current comment kinda implies by mentioning the concurrent modification checks - which I understand didn't seem to be a problem?
| // please do not refactor it to a for-each loop | ||
| // to avoid allocating iterator that performs concurrent modification checks | ||
| for (int i = 0; i < c.size(); i++) { | ||
| var e = c.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're making more and more improvements to how we perform query plan transformations/traversals.
Could we please add at least a micro benchmark as one of the next steps? Without one, it's both hard to understand the impact of optimizations, and it's also unclear if we maybe introduce accidental regressions (in whichever PRs).
| // has no type info so it's difficult to have automatic checking without having base classes). | ||
|
|
||
| if (arg instanceof Collection<?> c) { | ||
| if (arg instanceof List<?> c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ this could easily become a bug in the future, as it makes total sense for a node property to be e.g. a set of expressions. We might also just not have any test that exercises this code path with a non-list collection. To really know, we'd have to go and check all the info() method implementations on Node subclasses manually. And enforce that the only collection that node properties can use are lists.
My suggestion: let's add a code path for List<?> in addition to the more general code path for Collection<?> - and let's add an assertion in the latter because that code path shouldn't be used, so that this only trips in tests but doesn't damage production code.
|
|
||
| return hasChanged ? transformed : arg; | ||
| } | ||
| assert arg instanceof Set<?> == false : "Set arguments are not supported"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is insufficient in theory - someone in the future could use a Queue (for whatever reason) or a custom implementation of Collection. A safer way is to just assert false here in case of Collection and saying that non-list collections are not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let me change this to Collection<?>.
I would like to avoid another branch with custom logic for collection for now as somebody might accidentally use it without knowing its performance limitation.
We can add it when we see the set or any other custom implementation is absolutely required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline: in practical terms, yeah that'd likely be fine. But I'm also paranoid as this is called on general node properties and for whatever reason there might just be one node subclass that never hit this code path in tests, yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in person and decided to keep the other branch for now (in case this behaviour is actually used but not covered by tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ievgen, LGTM!
| action.accept((T) this); | ||
| children().forEach(c -> c.forEachDown(action)); | ||
| // please do not refactor it to a for-each loop to avoid | ||
| // allocating iterator that performs concurrent modification checks and extra stack traces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| // allocating iterator that performs concurrent modification checks and extra stack traces | |
| // allocating iterator that performs concurrent modification checks and extra stack frames |
| } | ||
| return hasChanged ? transformed : arg; | ||
| } else if (arg instanceof Collection<?> c) { | ||
| List<Object> transformed = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to mark this branch as deprecated/likely dead + we could still add an assert false here - before maybe later getting rid of this branch altogether and enforcing that the only collections used as properties are all lists (but in a more visible place, e.g. in the c'tor of QueryPlan/Node).


This attempts to make optimizing a tiny bit cheaper.
Please see inline comments for details.
Related to: #124395