Skip to content

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Sep 5, 2025

Tracks the more memory that's involved in topn.

Lucene TopN

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:

FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's FieldDoc. And another 40 at
least for copying to the values to FieldDoc. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

Esql Engine TopN

Esql does track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!
@nik9000 nik9000 requested review from benwtrent and dnhatn September 5, 2025 17:14
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@nik9000 nik9000 changed the title ESQL: Reserve memory for Lucene's TopN ESQL: Reserve memory TopN Sep 5, 2025
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Nik!

this.encoders = encoders;
this.sortOrders = sortOrders;
this.inputQueue = new Queue(topCount);
breaker.addEstimateBytesAndMaybeBreak(Queue.sizeOf(topCount), "esql engine topn");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move this memory acquire to Queue ctor for consistency with Queue#close.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave a comment about why I'm not doing that! The trouble is that we allocate the memory in the super. I guess I can make the ctor private and call a static. that's good. I'll do that.

@nik9000 nik9000 added the v8.18.7 label Sep 5, 2025
@nik9000 nik9000 enabled auto-merge (squash) September 5, 2025 22:52
@nik9000 nik9000 merged commit e9c145b into elastic:main Sep 8, 2025
33 checks passed
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
@nik9000
Copy link
Member Author

nik9000 commented Sep 8, 2025

9.1: #134321
8.19: #134331
8.18: #134335

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
I committed it in elastic#134235 by accident. We were going to use it as part
of that but decided against it.
elasticsearchmachine pushed a commit that referenced this pull request Sep 8, 2025
I committed it in #134235 by accident. We were going to use it as part
of that but decided against it.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Sep 8, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
nik9000 added a commit that referenced this pull request Sep 9, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
I committed it in elastic#134235 by accident. We were going to use it as part
of that but decided against it.
elasticsearchmachine pushed a commit that referenced this pull request Sep 9, 2025
* ESQL: Reserve memory TopN (#134235)

Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!

* fix backport
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 11, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 19, 2025
Tracks the more memory that's involved in topn.

Lucene doesn't track memory usage for TopN and can use a fair bit of it.
Try this query:
```
FROM big_table
| SORT a, b, c, d, e
| LIMIT 1000000
| STATS MAX(a)
```

We attempt to return all million documents from lucene. Is we did this
with the compute engine we're track all of the memory usage. With lucene
we have to reserve it.

In the case of the query above the sort keys weight 8 bytes each. 40
bytes total. Plus another 72 for Lucene's `FieldDoc`. And another 40 at
least for copying to the values to `FieldDoc`. That totals something
like 152 bytes a piece. That's 145mb. Worth tracking!

 ## Esql Engine TopN

Esql *does* track memory for topn, but it doesn't track the memory used by the min heap itself. It's just a big array of pointers. But it can get very big!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.7 v8.19.4 v9.1.4 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants