Skip to content

Conversation

@nielsbauman
Copy link
Contributor

By keeping a list of all the rollover results in a rollover request batch, we were keeping references to all the intermediate cluster states that we built. We've seen this list take up ~1.4GB with 600 rollover requests in one batch.

We only kept the list of results to compute the "reason" for the allocation reroute, so we can easily drop the cluster state reference from the list and only keep what we need.

Fixes #123893

By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893
@nielsbauman nielsbauman added >bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Mar 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this Niels!

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could come up with a variant of Strings.collectionToDelimitedStringWithLimit that builds the string incrementally. There's no need to keep all the result strings here when we're only keeping the first 1000 characters.

@nielsbauman
Copy link
Contributor Author

I wonder if we could come up with a variant of Strings.collectionToDelimitedStringWithLimit that builds the string incrementally. There's no need to keep all the result strings here when we're only keeping the first 1000 characters.

I'm sure there is something we can come up with. I don't think it'd be necessary for this PR/bug as the cost is pretty limited now. But ideally, it indeed wouldn't hurt to have something like that.

@nielsbauman nielsbauman merged commit ff6465b into elastic:main Mar 6, 2025
17 checks passed
@nielsbauman nielsbauman deleted the fix-rollover-oome branch March 6, 2025 17:34
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.18 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts
9.0

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124107

nielsbauman added a commit to nielsbauman/elasticsearch that referenced this pull request Mar 6, 2025
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
…24257)

By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes #123893
@nielsbauman
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x
8.18
8.17

Questions ?

Please refer to the Backport tool documentation

nielsbauman added a commit to nielsbauman/elasticsearch that referenced this pull request Mar 6, 2025
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893

(cherry picked from commit ff6465b)

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/admin/indices/rollover/LazyRolloverAction.java
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this pull request Mar 6, 2025
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893

(cherry picked from commit ff6465b)

# Conflicts:
#	server/src/main/java/org/elasticsearch/action/admin/indices/rollover/LazyRolloverAction.java
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
… (#124265)

# Backport

This will backport the following commits from `main` to `8.x`:  - [Avoid
hoarding cluster state references during rollover
(#124107)](#124107)

<!--- Backport version: 9.6.4 -->

### Questions ? Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
…) (#124266)

# Backport

This will backport the following commits from `main` to `8.18`:  -
[Avoid hoarding cluster state references during rollover
(#124107)](#124107)

<!--- Backport version: 9.6.4 -->

### Questions ? Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)
elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025
…) (#124267)

# Backport

This will backport the following commits from `main` to `8.17`:  -
[Avoid hoarding cluster state references during rollover
(#124107)](#124107)

<!--- Backport version: 9.6.4 -->

### Questions ? Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893
costin pushed a commit to costin/elasticsearch that referenced this pull request Mar 15, 2025
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes elastic#123893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team v8.17.4 v8.18.1 v8.19.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batched rollover requests hoard heap space

4 participants