Restricts snapshot concurrency based on available heap memory #136952

joshua-adams-1 · 2025-10-22T11:46:35Z

Limits the concurrency of smaller nodes when loading IndexMetaData objects from heap to prevent nodes with small heaps from going OOMe

Closes #131822
Closes: ES-12538

Limits the concurrency of smaller nodes when loading IndexMetaData objects from heap to prevent nodes with small heaps from going OOMe Closes elastic#131822 Closes: ES-12538

elasticsearchmachine · 2025-10-22T11:47:25Z

Hi @joshua-adams-1, I've created a changelog YAML for you.

joshua-adams-1 · 2025-10-22T11:47:42Z

.../java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryDeleteThrottlingTests.java

I am extending this test to ensure that, depending on the available heap memory, we throttle the number of concurrent snapshot threads accordingly

joshua-adams-1 · 2025-10-22T11:48:35Z

server/src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryTests.java

+     * Tests whether we adjust the maximum concurrency when deleting snapshots
+     * according to the size of the heap memory
+     */
+    public void testMaxIndexDeletionConcurrency() {


This test is different to the previous. This is just a basic unit test to ensure that the function of heap memory -> snapshot threads is as expected, but doesn't test whether all threads are properly utilised

joshua-adams-1 · 2025-10-22T11:50:22Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

    );
    private volatile int maxHeapSizeForSnapshotDeletion;

+    public static final Setting<ByteSizeValue> HEAP_SIZE_SETTING = Setting.memorySizeSetting(


If there is an easier way to get the total heap memory then please let me know. I was following a similar approach to above with the MAX_HEAP_SIZE_FOR_SNAPSHOT_DELETION_SETTING setting.

My concern is that to modify HEAP_SIZE_SETTING inside the tests I had to use the Setting.Property.Dynamic property, but I don't want other users/code changing this percentage to something stupidly small and then the snapshotting takes too long and times out.

You can get the heap memory with Runtime.getRuntime().maxMemory() or JvmInfo.jvmInfo().getMem().getHeapMax(). Is that what you need?

I did consider this, but I thought it too static, and it also conflicted with the approach I took in #133558

My thought process is that by utilising the existing memorySizeSetting here to return 10% of the available heap memory for use when loading IndexMetadata objects, the actual % can be dynamically updated without requiring a code change which is nice, and the ByteSizeValue handles all the maths in the background. I still have a concern about this setting being abused by someone setting the value really low (say 1%) which will force us to use only a single snapshot thread, but this concern isn't specific to the MAX_HEAP_SIZE_FOR_INDEX_METADATA_SETTING setting, but a wider concern of any dynamically updatable setting.

My concern in:

If there is an easier way to get the total heap memory then please let me know

was because I was trying to get 100% of the heap via a setting which seemed long-winded, but after this comment by David it seems the best approach. Do you agree?

I think it's ok. I mean there's two made-up fiddle factors in play here, the "max 10% of heap" is a guess, as is the "max 50MiB of heap per IMD instance", but then the concurrency number is just one divided by the other. I'm not sure that's the most user-friendly interface really but my only other idea is to control the actual concurrency with a setting whose default is a function of JvmInfo.jvmInfo().getMem().getHeapMax(). One super-opaque setting or two slightly-less-opaque settings...

Prior to loading the IndexMetadata object from heap memory, can we check 1) how big the object we will be loading is, and 2) whether we have enough space to load it? Then, if we don't have space, we block until we do. It's not ideal, but better than OOMing?

Yet another possibility would be to introduce a variant of org.elasticsearch.cluster.metadata.IndexMetadata.Builder#fromXContent() which skips over any mappings. It's the mappings that take up most of the heap space in practice, but we simply don't need them here.

XContentPaser can be configured to filter out keys. For snapshots, we control the XContentParser creation in ChecksumBlobStoreFormat where we can configure it to skip the mappings key. I tested it briefly and it seems to be possible with the following change.

--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java +++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java @@ -150,7 +151,8 @@ public final class ChecksumBlobStoreFormat<T> { try ( XContentParser parser = XContentHelper.createParserNotCompressed( XContentParserConfiguration.EMPTY.withRegistry(namedXContentRegistry) - .withDeprecationHandler(LoggingDeprecationHandler.INSTANCE), + .withDeprecationHandler(LoggingDeprecationHandler.INSTANCE) + .withFiltering(null, Set.of("*.mappings"), false), bytesReference, XContentType.SMILE ) @@ -161,7 +163,8 @@ public final class ChecksumBlobStoreFormat<T> { try ( XContentParser parser = XContentHelper.createParserNotCompressed( XContentParserConfiguration.EMPTY.withRegistry(namedXContentRegistry) - .withDeprecationHandler(LoggingDeprecationHandler.INSTANCE), + .withDeprecationHandler(LoggingDeprecationHandler.INSTANCE) + .withFiltering(null, Set.of("*.mappings"), false), bytesReference, XContentType.SMILE )

If something like this is feasible, I'd think we don't need the memory limit which feels mostly a guess.

Interesting, TIL, I didn't know there was a filter-on-parse option.

Can we just use the includeStrings parameter? Really we only need settings.index.number_of_shards tho I could imagine we might have to pull a few more things in to satisfy various invariants on IndexMetadata.

On reflection, maybe we don't even need to load this as a full IndexMetadata. We could define another MetadataStateFormat<...> which defines a different fromXContent that only reads settings.index.number_of_shards and skips everything else.

I have pushed a WIP PR here if you guys wouldn't mind taking a look. If you agree with the approach, I would close this PR in favour of the above.

joshua-adams-1 · 2025-10-22T11:53:25Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            this.maxIndexDeletionConcurrency = Math.min(
+                // Prevent smaller nodes from loading too many IndexMetadata objects in parallel
+                // and going OOMe (ES-12538)
+                (int) Math.pow(2, heapSizeInGb),


This was a heuristic since I didn't know what we wanted to throttle the snapshot threads to be.

For context to any reviewer, we currently have up to 10 snapshot threads on any size node, and in this instance a node with 1GiB heap OOMed because of this. I used a simple Math.pow(2, heapSizeInGb) heuristic, but this does mean that a node must have >4GB heap to use all 10 snapshot threads. I suspect this is far too conservative of an estimate so suggestions are appreciated. Happy to use a simple step function if people think that would suffice, ie if heap is less than 2GB, use 4 threads, else use 10

we currently have up to 10 snapshot threads on any size node

We do have fewer threads for smaller nodes, see here. The threshold is current 750MB. Are we considering this to be too low and want to raise it to something like 1GB or more? Or is this change specific about snapshot deletion? For the later, I'd probably just go with a simple heuristic like use half of the snapshot threads (minimal 1) if heap is smaller than 2GB?

This change is specific to index-metadata-loading during snapshot deletion, which is something we do today concurrently on all the available snapshot threads and which we've seen to need unusually large amounts of heap. Some IndexMetadata instances can be 50MiB or more when deserialized. Most of the work that the snapshot threads do has much lower heap footprint than this.

I don't think it makes sense to have the concurrency limit be an exponentially-increasing function of the heap size tho, since the resource usage is linear function of the concurrency limit. I'd suggest we take the observed 50MiB as a reasonable estimate of a big IndexMetadata blob and make sure we don't use more than, say, 10% of heap for this. So if the node has 1GiB of heap then allow 2 threads, 2GiB means 4 threads, etc.

Thanks both, based on the suggestion above I have extended BlobStoreRepository with a setting to read 10% of available heap memory, in the same way I did for MAX_HEAP_SIZE_FOR_SNAPSHOT_DELETION_SETTING

…a-adams-1/elasticsearch into blobstore-repo-concurrency

elasticsearchmachine · 2025-10-22T14:08:28Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

ywangd

I left some clarification questions/comments.

ywangd · 2025-10-23T03:47:36Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

    );
    private volatile int maxHeapSizeForSnapshotDeletion;

+    public static final Setting<ByteSizeValue> HEAP_SIZE_SETTING = Setting.memorySizeSetting(


You can get the heap memory with Runtime.getRuntime().maxMemory() or JvmInfo.jvmInfo().getMem().getHeapMax(). Is that what you need?

ywangd · 2025-10-23T03:51:43Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            this.maxIndexDeletionConcurrency = Math.min(
+                // Prevent smaller nodes from loading too many IndexMetadata objects in parallel
+                // and going OOMe (ES-12538)
+                (int) Math.pow(2, heapSizeInGb),


we currently have up to 10 snapshot threads on any size node

We do have fewer threads for smaller nodes, see here. The threshold is current 750MB. Are we considering this to be too low and want to raise it to something like 1GB or more? Or is this change specific about snapshot deletion? For the later, I'd probably just go with a simple heuristic like use half of the snapshot threads (minimal 1) if heap is smaller than 2GB?

memory

Restricts snapshot concurrency based on available heap memory

8f9d2ba

Limits the concurrency of smaller nodes when loading IndexMetaData objects from heap to prevent nodes with small heaps from going OOMe Closes elastic#131822 Closes: ES-12538

joshua-adams-1 self-assigned this Oct 22, 2025

joshua-adams-1 added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Oct 22, 2025

elasticsearchmachine added the v9.3.0 label Oct 22, 2025

Update docs/changelog/136952.yaml

a5137b8

joshua-adams-1 commented Oct 22, 2025

View reviewed changes

joshua-adams-1 added 2 commits October 22, 2025 13:57

Adds ByteSizeConstants

4b9d4b0

Merge branch 'blobstore-repo-concurrency' of https://github.com/joshu…

84358c7

…a-adams-1/elasticsearch into blobstore-repo-concurrency

joshua-adams-1 marked this pull request as ready for review October 22, 2025 14:08

joshua-adams-1 requested a review from a team as a code owner October 22, 2025 14:08

joshua-adams-1 requested review from DaveCTurner and ywangd October 22, 2025 14:08

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 22, 2025

ywangd reviewed Oct 23, 2025

View reviewed changes

joshua-adams-1 added 2 commits October 23, 2025 13:37

Update setting to prevent index metadata reads from exceeding 10% of

8ac4fbc

memory

Merge branch 'main' into blobstore-repo-concurrency

1b9884c

joshua-adams-1 mentioned this pull request Oct 27, 2025

Introduce INDEX_SHARD_COUNT_FORMAT #137210

Open

Restricts snapshot concurrency based on available heap memory #136952

Are you sure you want to change the base?

Restricts snapshot concurrency based on available heap memory #136952

Uh oh!

Conversation

joshua-adams-1 commented Oct 22, 2025

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joshua-adams-1 Oct 22, 2025 •

edited

Loading

joshua-adams-1 Oct 22, 2025 •

edited

Loading