-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Account for the inlinestats LocalRelations #134455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Account for the inlinestats LocalRelations #134455
Conversation
This adds memory tracking for the blocks used in the `LocalRelation`s generated at the intermediary phase of executing an INLINESTATS.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| completionInfoAccumulator.accumulate(result.completionInfo()); | ||
| LocalRelation resultWrapper = resultToPlan(subPlans.stubReplacedSubPlan(), result); | ||
| localRelationBlocks.set(resultWrapper.supplier().get()); | ||
| var releasingNext = ActionListener.runAfter(next, () -> releaseLocalRelationBlocks(localRelationBlocks)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? So that LocalRelation generated blocks are not kept in memory unnecessarily? For example, in case there are multiple inline stats commands, first one creates Blocks in memory, we release them then the second inline stats creates some more Blocks, we release them and so forth and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that LocalRelation generated blocks are not kept in memory unnecessarily?
Yes. Not kept in memory, we're not referencing them from the circuite breaker, but unaccounted within the circuite breaker.
first one creates Blocks in memory, we release them then the second inline stats creates some more Blocks, we release them and so forth and so on
Right. Ideally, we'd just pass the blocks from the produced result along into the LocalRelation, but they (can) come fragmented over many pages and SessionUtils#fromPages sticks them into contiguous blocks (now - i.e. with this PR - also with memory accounting).
| // Translate the subquery into a separate, coordinator based plan and the results 'broadcasted' as a local relation | ||
| completionInfoAccumulator.accumulate(result.completionInfo()); | ||
| LocalRelation resultWrapper = resultToPlan(subPlans.stubReplacedSubPlan(), result); | ||
| localRelationBlocks.set(resultWrapper.supplier().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the row which creates a special CopyingLocalSupplier (see https://github.com/elastic/elasticsearch/pull/128917/files#diff-23897d0bd181d50370709de01c3ab3acc2f91be91d15db03f5dcdf5f27cf7866R32). I don't think it has anything to do with this PR, but I am mentioning in case you notice something that might have something to do with it. Before I added that supplier, the blocks created as part of row were being double released with inlinestats queries resulting in exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I added that supplier, the blocks created as part of row were being double released with inlinestats queries resulting in exceptions.
I think that's to do with how inlinestats is executed: the "source" (ES or a row supplier) is accessed twice when executing INLINE STATS: once for doing the agg, then for doing the join. The operator for the agg will get the blocks from the local relation and when done release them. And then the join operator will try to do the same, but those blocks are already released. These blocks are concerned with the left-hand side of the join.
The blocks tracked with this PR are concerned with the right-hand side. The plan looks something like:
LimitExec[1000[INTEGER],12]
\_HashJoinExec[[b{r}#6],[b{r}#6],[b{r}#6],[c{r}#9]]
|_LocalSourceExec[[x{r}#2, a{r}#4, b{r}#6],org.elasticsearch.xpack.esql.plan.logical.local.CopyingLocalSupplier@10d9bd]
\_LocalSourceExec[[c{r}#9, b{r}#6],[IntArrayBlock[positions=3, mvOrdering=UNORDERED, vector=IntArrayVector[positions=3, values=[0, 0, 0]]], IntVectorBlock[vector=IntArrayVector[positions=3, values=[2, 3, 4]]]]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we don't properly account for the block when it's received on the data node as part of the second phase plan. Though we might do.
Do we plan to drop 1mb limit or just relax it?
Yes, you're right, following through I see the blocks in the supplier in a
... this question thus becomes more relevant, since a fragment could contain more than a |
@nik9000 I've added back a limit, though a more dynamic one, proportional to CB's limit, but also that within its own limits. Not sure if we have a precendent for this that I could take inspiration from? |
I made The precedent for these "some percentage of memory" with |
This makes configurable the limit that the intermediate LocalRelation used in INLINE STATS execution can grow to. By default, this can grow up to .1% of the heap. Related #134455
This adds memory tracking for the blocks used in the
LocalRelations generated at the intermediary phase of executing an INLINESTATS.Closes #124744