Skip to content

Conversation

@bpintea
Copy link
Contributor

@bpintea bpintea commented Sep 10, 2025

This adds memory tracking for the blocks used in the LocalRelations generated at the intermediary phase of executing an INLINESTATS.

Closes #124744

This adds memory tracking for the blocks used in the `LocalRelation`s
generated at the intermediary phase of executing an INLINESTATS.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@bpintea bpintea marked this pull request as draft September 11, 2025 08:46
@astefan astefan requested review from nik9000 and removed request for alex-spies September 11, 2025 13:51
@bpintea bpintea marked this pull request as ready for review September 11, 2025 13:54
completionInfoAccumulator.accumulate(result.completionInfo());
LocalRelation resultWrapper = resultToPlan(subPlans.stubReplacedSubPlan(), result);
localRelationBlocks.set(resultWrapper.supplier().get());
var releasingNext = ActionListener.runAfter(next, () -> releaseLocalRelationBlocks(localRelationBlocks));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? So that LocalRelation generated blocks are not kept in memory unnecessarily? For example, in case there are multiple inline stats commands, first one creates Blocks in memory, we release them then the second inline stats creates some more Blocks, we release them and so forth and so on.

Copy link
Contributor Author

@bpintea bpintea Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that LocalRelation generated blocks are not kept in memory unnecessarily?

Yes. Not kept in memory, we're not referencing them from the circuite breaker, but unaccounted within the circuite breaker.

first one creates Blocks in memory, we release them then the second inline stats creates some more Blocks, we release them and so forth and so on

Right. Ideally, we'd just pass the blocks from the produced result along into the LocalRelation, but they (can) come fragmented over many pages and SessionUtils#fromPages sticks them into contiguous blocks (now - i.e. with this PR - also with memory accounting).

// Translate the subquery into a separate, coordinator based plan and the results 'broadcasted' as a local relation
completionInfoAccumulator.accumulate(result.completionInfo());
LocalRelation resultWrapper = resultToPlan(subPlans.stubReplacedSubPlan(), result);
localRelationBlocks.set(resultWrapper.supplier().get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the row which creates a special CopyingLocalSupplier (see https://github.com/elastic/elasticsearch/pull/128917/files#diff-23897d0bd181d50370709de01c3ab3acc2f91be91d15db03f5dcdf5f27cf7866R32). I don't think it has anything to do with this PR, but I am mentioning in case you notice something that might have something to do with it. Before I added that supplier, the blocks created as part of row were being double released with inlinestats queries resulting in exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I added that supplier, the blocks created as part of row were being double released with inlinestats queries resulting in exceptions.

I think that's to do with how inlinestats is executed: the "source" (ES or a row supplier) is accessed twice when executing INLINE STATS: once for doing the agg, then for doing the join. The operator for the agg will get the blocks from the local relation and when done release them. And then the join operator will try to do the same, but those blocks are already released. These blocks are concerned with the left-hand side of the join.

The blocks tracked with this PR are concerned with the right-hand side. The plan looks something like:

LimitExec[1000[INTEGER],12]
\_HashJoinExec[[b{r}#6],[b{r}#6],[b{r}#6],[c{r}#9]]
  |_LocalSourceExec[[x{r}#2, a{r}#4, b{r}#6],org.elasticsearch.xpack.esql.plan.logical.local.CopyingLocalSupplier@10d9bd]
  \_LocalSourceExec[[c{r}#9, b{r}#6],[IntArrayBlock[positions=3, mvOrdering=UNORDERED, vector=IntArrayVector[positions=3, values=[0, 0, 0]]], IntVectorBlock[vector=IntArrayVector[positions=3, values=[2, 3, 4]]]]]

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two hypothetical/exploratory questions. No biggie, just clearing some vague concerns.

But I think either @nik9000 or @dnhatn should look at these changes before the PR goes in.
Thank you @bpintea

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we don't properly account for the block when it's received on the data node as part of the second phase plan. Though we might do.

Do we plan to drop 1mb limit or just relax it?

@bpintea
Copy link
Contributor Author

bpintea commented Sep 18, 2025

I think we don't properly account for the block when it's received on the data node as part of the second phase plan.

Yes, you're right, following through I see the blocks in the supplier in a LocalRelation aren't tracked on the remote node when serialised. I guess that's a somewhat separate issue. But then...

Do we plan to drop 1mb limit or just relax it?

... this question thus becomes more relevant, since a fragment could contain more than a LocalRelation (with multiple INLINE STATS). I first thought we'd just drop the limit, but now I think we should still have one.

@bpintea
Copy link
Contributor Author

bpintea commented Sep 22, 2025

I first thought we'd just drop the limit, but now I think we should still have one.

@nik9000 I've added back a limit, though a more dynamic one, proportional to CB's limit, but also that within its own limits. Not sure if we have a precendent for this that I could take inspiration from?
Also, is this worth being configurable, maybe in terms of % or upper limit?

@nik9000
Copy link
Member

nik9000 commented Sep 23, 2025

Also, is this worth being configurable, maybe in terms of % or upper limit?

I made PhysicalSettings a while ago for cluster settings for things like this. Though this is sort of LogicalSettings, but yeah, something like that. The advantage of making it configurable is that users can tweak it. And we can do that in serverless. Which feels pretty important actually.

The precedent for these "some percentage of memory" with Setting#memorySizeSetting. I believe these naturally support values like 1% meaning "one percent of heap". I know I've seen some of these with quite complex defaults. That's what I'd suggest for you here - go with your logic for the default and then, if we override it, just let it be whatever the override sets it to.

@bpintea bpintea merged commit cbd9f05 into elastic:main Sep 24, 2025
34 checks passed
@bpintea bpintea deleted the enh/track_mem_inlinejoin_local_relation branch September 24, 2025 09:46
@bpintea
Copy link
Contributor Author

bpintea commented Sep 24, 2025

Thanks for the guidance, @nik9000.
I've merged this as it was and opened #135339 since the proposed changes grew a bit large.

bpintea added a commit that referenced this pull request Sep 26, 2025
This makes configurable the limit that the intermediate LocalRelation used in INLINE STATS execution can grow to.
By default, this can grow up to .1% of the heap.

Related #134455
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: Make INLINESTATS properly track memory of data passed between phases

4 participants