Limit size of shardDeleteResults #133558

joshua-adams-1 · 2025-08-26T14:12:35Z

Modifies BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults to have a variable size depending on the remaining heap space rather than a hard-coded 2GB size which caused smaller nodes with less heap space to OOMe.

Relates to #131822
Closes #116379

Closes ES-12540

Modifies `BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults` to have a variable size depending on the remaining heap space rather than a hard-coded 2GB size which caused smaller nodes with less heap space to OOMe. Relates to elastic#131822 Closes ES-12540

…-1/elasticsearch into limit-shard-blobs-to-delete

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

Modifies `addShardDeleteResult` to only write to `shardDeleteResults` when there is capacity for the write

server/src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java

BlobStoreRepository

…ard-blobs-to-delete

…-1/elasticsearch into limit-shard-blobs-to-delete

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java

…-1/elasticsearch into limit-shard-blobs-to-delete

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

joshua-adams-1 · 2025-09-12T11:38:29Z

server/src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryTests.java

+    @TestLogging(reason = "test includes assertions about logging", value = "org.elasticsearch.repositories.blobstore:WARN")
+    public void testShardBlobsToDeleteWithLimitedHeapSpace() {
+        // Limit the heap size so we force it to truncate the stream
+        int totalBytesRequired = randomIntBetween(1000, 10000);


Arbitrarily chosen values, but tested with both bounds, and we are always guaranteed to be writing more data than we can hold

Hmm could we instead check org.elasticsearch.repositories.blobstore.BlobStoreRepository.ShardBlobsToDelete#sizeInBytes and keep going until at least this has reached the limit we chose (but maybe not going much further than that)? I think in most cases this test is going to do enormously more work than needed to verify what we're trying to verify.

DaveCTurner

Thanks Josh, looking good, just a few more comments.

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

DaveCTurner · 2025-09-15T08:37:51Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            if (compressed == null || shardDeleteResults == null) {
+                // No output stream: nothing to return


Likewise here I don't think we should change anything with respect to these values being null.

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

DaveCTurner · 2025-09-15T08:43:32Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            if (resources.isEmpty()) {
+                return;
+            }


Likewise here, let's always track these resources even if the limit is zero.

DaveCTurner · 2025-09-15T08:48:14Z

server/src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryTests.java

+    @TestLogging(reason = "test includes assertions about logging", value = "org.elasticsearch.repositories.blobstore:WARN")
+    public void testShardBlobsToDeleteWithLimitedHeapSpace() {
+        // Limit the heap size so we force it to truncate the stream
+        int totalBytesRequired = randomIntBetween(1000, 10000);


Hmm could we instead check org.elasticsearch.repositories.blobstore.BlobStoreRepository.ShardBlobsToDelete#sizeInBytes and keep going until at least this has reached the limit we chose (but maybe not going much further than that)? I think in most cases this test is going to do enormously more work than needed to verify what we're trying to verify.

DaveCTurner · 2025-09-15T08:54:49Z

server/src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryTests.java

+                    for (int index = between(0, 1000); index > 0; index--) {
+                        final var indexId = new IndexId(randomIdentifier(), randomUUID());
+                        for (int shard = between(1, 30); shard > 0; shard--) {
+                            final var shardId = shard;
+                            final var shardGeneration = new ShardGeneration(randomUUID());
+                            expectedShardGenerations.put(indexId, shard, shardGeneration);
+                            final var blobsToDelete = generateRandomBlobsToDelete(0, 100);


Max 1000 indices × max 30 shards × max 100 blobs is a max of 3M items. That seems like a lot. Does this really give us much more coverage than a smaller test?

A much more interesting test here would be to see what happens if we stop writing just shy of the limit, such that the final flush pushes us over. Could we instead pick a lower limit, write until we get very close to the limit (according to shardBlobsToDelete.sizeInBytes()) and then verify that we didn't lose anything?

DaveCTurner · 2025-09-15T08:56:06Z

server/src/test/java/org/elasticsearch/repositories/blobstore/BlobStoreRepositoryTests.java

+
+        final var expectedShardGenerations = ShardGenerations.builder().put(indexId, shardId, shardGeneration).build();
+
+        Settings.Builder settings = Settings.builder().put("repositories.blobstore.max_shard_delete_results_size", "0b");


Likewise here I think this can just be folded into testShardBlobsToDeleteWithLimitedHeapSpace.

DaveCTurner · 2025-10-03T11:02:31Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

            }

-            return Iterators.flatMap(Iterators.forRange(0, resultCount, i -> {
+            List<String> blobPaths = new ArrayList<>();


Materializing this entire collection of paths as an ArrayList<String> is exactly what we're trying to avoid doing in the first place!

DaveCTurner · 2025-10-03T11:04:04Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            if (maxHeapSizeInBytes > maxShardDeleteResultsSize) {
+                return maxShardDeleteResultsSize;
+            }
+            return (int) maxHeapSizeInBytes;


This is a rather elaborate way to write Math.min() :) I'd suggest keeping it as a long throughout even though we know it will always be less than Integer.MAX_VALUE, but you can try a Math.toIntExact if you'd prefer.

DaveCTurner · 2025-10-03T11:04:24Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-        ShardBlobsToDelete() {
+        // Gets 25% of the heap size to be allocated to the shard_delete_results stream
+        public final Setting<ByteSizeValue> MAX_SHARD_DELETE_RESULTS_SIZE_SETTING = Setting.memorySizeSetting(
+            "repositories.blobstore.max_shard_delete_results_size",


You have to register this setting in ClusterSettings.

DaveCTurner · 2025-10-03T11:04:34Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        public final Setting<ByteSizeValue> MAX_SHARD_DELETE_RESULTS_SIZE_SETTING = Setting.memorySizeSetting(
+            "repositories.blobstore.max_shard_delete_results_size",
+            "25%",
+            Setting.Property.NodeScope


Could we make this a dynamic setting too?

DaveCTurner · 2025-10-03T11:05:53Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-                new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed);
-                resultCount += 1;
+                // Only write if we have capacity
+                if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) {


Could we make this call TruncatedOutputStream#hasCapacity? No need for a comment that way, and also it's important that we use the same has-capacity computation as the underlying stream here.

DaveCTurner · 2025-10-03T11:06:13Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed);
+                    // We only want to read this shard delete result if we were able to write the entire object.
+                    // Otherwise, for partial writes, an EOFException will be thrown upon reading
+                    if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) {


Likewise here, this should call TruncatedOutputStream#hasCapacity.

DaveCTurner · 2025-10-03T13:00:19Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    logger.warn(
+                        "Failure to clean up the following dangling blobs, {}, for index {} and shard {}",
+                        blobsToDelete,
+                        indexId,
+                        shardId
+                    );


We can't reasonably log every skipped blob at WARN like this - we've already captured several (compressed) GiB of blob names before getting to this point, so it wouldn't be surprising if there were several GiB more. We wouldn't expect users to go through these logs and delete the blobs manually - indeed we would strongly discourage that kind of behaviour.

Instead, let's log this at DEBUG and keep count of the number of blobs we skipped. Then at the end we can log at WARN how many blobs we've leaked.

Also nit it's not really a "failure", we're deliberately skipping this work because of resource constraints. We should mention in the user-facing WARN message that these dangling blobs will be cleaned up by subsequent deletes, and perhaps suggest that the master node needs a larger heap size to perform such large snapshot deletes in future.

DaveCTurner · 2025-10-03T13:00:48Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) {
+                        resultCount += 1;
+                    }


This if also needs an else to keep track of the blobs that leaked because we ran out of capacity during the write.

…ua-adams-1/elasticsearch into limit-shard-blobs-to-delete

DaveCTurner · 2025-10-07T15:42:07Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    // We only want to read this shard delete result if we were able to write the entire object.
+                    // Otherwise, for partial writes, an EOFException will be thrown upon reading
+                    if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
+                        successfullyWrittenBlobsCount += 1;


This replaces resultCount but it's the count of the number of successfully recorded shards not blobs.

DaveCTurner · 2025-10-07T15:52:55Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) {
+                        successfullyWrittenBlobsCount += 1;
+                    } else {
+                        leakedBlobsCount += 1;


Likewise this is recording the number of shards with leaked blobs rather than the number of leaked blobs. However, rather than just renaming the variable I think we should actually count the number of leaked blobs (i.e. += blobsToDelete.size() here).

DaveCTurner · 2025-10-07T15:53:35Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    logger.debug(
+                        "Unable to clean up the following dangling blobs, {}, for index {} and shard {} "


This also applies to the other branch that increases leakedBlobsCount.

DaveCTurner · 2025-10-07T15:59:02Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+    public static final Setting<ByteSizeValue> MAX_HEAP_SIZE_FOR_SNAPSHOT_DELETION_SETTING = Setting.memorySizeSetting(
+        "repositories.blobstore.max_shard_delete_results_size",
+        "25%",
+        Setting.Property.Dynamic,


Yet more trappiness: you now need to register a listener for updates to this setting (e.g. call clusterSettings.initializeAndWatch(...)). You can get a clusterSettings from clusterService.getClusterSettings(). I think I'd be inclined to do that in the BlobStoreRepository constructor rather than doing it each time we create a SnapshotsDeletion.

elasticsearchmachine added the v9.2.0 label Aug 26, 2025

elasticsearchmachine and others added 4 commits August 26, 2025 14:29

[CI] Auto commit changes from spotless

97e9969

Minor tweaks

24b7a62

Merge branch 'limit-shard-blobs-to-delete' of github.com:joshua-adams…

d888113

…-1/elasticsearch into limit-shard-blobs-to-delete

Ran ./gradlew spotlessApply precommit

ee89eb2

joshua-adams-1 commented Aug 26, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java Outdated Show resolved Hide resolved

joshua-adams-1 added the >non-issue label Aug 26, 2025

DaveCTurner reviewed Aug 26, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java Outdated Show resolved Hide resolved

joshua-adams-1 added 5 commits August 27, 2025 12:25

TBR - Add TODO

92991b9

Uses a setting to control the max shardDeleteResults size

3190772

Modifies `addShardDeleteResult` to only write to `shardDeleteResults` when there is capacity for the write

Remove TODOs

a16856c

Fix failing unit tests

381d294

Merge branch 'main' into limit-shard-blobs-to-delete

203d513

joshua-adams-1 requested a review from DaveCTurner September 1, 2025 14:17

DaveCTurner reviewed Sep 2, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java Outdated Show resolved Hide resolved

joshua-adams-1 added 6 commits September 3, 2025 14:53

Moved the limit logic out of the streams submodule and into

daf09b6

BlobStoreRepository

Merge branch 'main' of github.com:elastic/elasticsearch into limit-sh…

dc70d5b

…ard-blobs-to-delete

Add tests

0355c2a

Run ./gradlew spotlessApply precommit

f072128

Merge branch 'limit-shard-blobs-to-delete' of github.com:joshua-adams…

abb2d4c

…-1/elasticsearch into limit-shard-blobs-to-delete

Merge branch 'main' into limit-shard-blobs-to-delete

a0d728f

DaveCTurner reviewed Sep 4, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java Outdated Show resolved Hide resolved

joshua-adams-1 marked this pull request as ready for review September 4, 2025 15:32

joshua-adams-1 requested a review from a team as a code owner September 4, 2025 15:32

joshua-adams-1 requested a review from DaveCTurner September 4, 2025 15:33

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 4, 2025

joshua-adams-1 self-assigned this Sep 4, 2025

joshua-adams-1 added the :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. label Sep 4, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Sep 4, 2025

Revert StreamOutput and delete tests

acd2182

DaveCTurner reviewed Sep 11, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java Outdated Show resolved Hide resolved

DaveCTurner reviewed Sep 11, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java Outdated Show resolved Hide resolved

joshua-adams-1 added 5 commits September 11, 2025 16:40

Adds BoundedOutputStreamTests

be05a1f

Creates TruncatedOutputStream

8d66c1e

Merge branch 'main' into limit-shard-blobs-to-delete

ce64bf5

Spotless commit

0fa5099

Merge branch 'limit-shard-blobs-to-delete' of github.com:joshua-adams…

3575240

…-1/elasticsearch into limit-shard-blobs-to-delete

joshua-adams-1 commented Sep 12, 2025

View reviewed changes

Add skippedResultsCount

d55893d

joshua-adams-1 requested a review from DaveCTurner September 15, 2025 08:29

DaveCTurner reviewed Sep 15, 2025

View reviewed changes

joshua-adams-1 mentioned this pull request Sep 16, 2025

Add BytesStreamOutputTests #134788

Merged

joshua-adams-1 added 3 commits September 18, 2025 12:09

Rewrite the unit tests

3725a3c

Spotless apply

ed00f1a

Merge branch 'main' into limit-shard-blobs-to-delete

ce6195d

joshua-adams-1 requested a review from DaveCTurner September 18, 2025 13:14

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

elasticsearchmachine and others added 2 commits October 2, 2025 07:40

[CI] Update transport version definitions

37404a5

Merge branch 'main' into limit-shard-blobs-to-delete

fc41d60

DaveCTurner requested changes Oct 3, 2025

View reviewed changes

DaveCTurner reviewed Oct 3, 2025

View reviewed changes

joshua-adams-1 added 6 commits October 7, 2025 15:31

David comments

d1e81f7

Fix test

2f0ea30

Merge branch 'limit-shard-blobs-to-delete' of https://github.com/josh…

6babba9

…ua-adams-1/elasticsearch into limit-shard-blobs-to-delete

Merge branch 'main' into limit-shard-blobs-to-delete

1ff464d

spotless

0d01264

Modify comment

d73ffef

DaveCTurner reviewed Oct 7, 2025

View reviewed changes

		if (compressed == null \|\| shardDeleteResults == null) {
		// No output stream: nothing to return


		final var expectedShardGenerations = ShardGenerations.builder().put(indexId, shardId, shardGeneration).build();

		Settings.Builder settings = Settings.builder().put("repositories.blobstore.max_shard_delete_results_size", "0b");

		logger.debug(
		"Unable to clean up the following dangling blobs, {}, for index {} and shard {} "

Limit size of shardDeleteResults #133558

Are you sure you want to change the base?

Limit size of shardDeleteResults #133558

Conversation

joshua-adams-1 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joshua-adams-1 commented Aug 26, 2025 •

edited

Loading