-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Limit size of shardDeleteResults #133558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Limit size of shardDeleteResults #133558
Conversation
Modifies `BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults` to have a variable size depending on the remaining heap space rather than a hard-coded 2GB size which caused smaller nodes with less heap space to OOMe. Relates to elastic#131822 Closes ES-12540
…-1/elasticsearch into limit-shard-blobs-to-delete
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
Modifies `addShardDeleteResult` to only write to `shardDeleteResults` when there is capacity for the write
server/src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java
Outdated
Show resolved
Hide resolved
BlobStoreRepository
…ard-blobs-to-delete
…-1/elasticsearch into limit-shard-blobs-to-delete
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/BoundedOutputStream.java
Outdated
Show resolved
Hide resolved
…-1/elasticsearch into limit-shard-blobs-to-delete
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
@TestLogging(reason = "test includes assertions about logging", value = "org.elasticsearch.repositories.blobstore:WARN") | ||
public void testShardBlobsToDeleteWithLimitedHeapSpace() { | ||
// Limit the heap size so we force it to truncate the stream | ||
int totalBytesRequired = randomIntBetween(1000, 10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrarily chosen values, but tested with both bounds, and we are always guaranteed to be writing more data than we can hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm could we instead check org.elasticsearch.repositories.blobstore.BlobStoreRepository.ShardBlobsToDelete#sizeInBytes
and keep going until at least this has reached the limit we chose (but maybe not going much further than that)? I think in most cases this test is going to do enormously more work than needed to verify what we're trying to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Josh, looking good, just a few more comments.
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
if (compressed == null || shardDeleteResults == null) { | ||
// No output stream: nothing to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here I don't think we should change anything with respect to these values being null
.
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
if (resources.isEmpty()) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, let's always track these resources even if the limit is zero.
@TestLogging(reason = "test includes assertions about logging", value = "org.elasticsearch.repositories.blobstore:WARN") | ||
public void testShardBlobsToDeleteWithLimitedHeapSpace() { | ||
// Limit the heap size so we force it to truncate the stream | ||
int totalBytesRequired = randomIntBetween(1000, 10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm could we instead check org.elasticsearch.repositories.blobstore.BlobStoreRepository.ShardBlobsToDelete#sizeInBytes
and keep going until at least this has reached the limit we chose (but maybe not going much further than that)? I think in most cases this test is going to do enormously more work than needed to verify what we're trying to verify.
for (int index = between(0, 1000); index > 0; index--) { | ||
final var indexId = new IndexId(randomIdentifier(), randomUUID()); | ||
for (int shard = between(1, 30); shard > 0; shard--) { | ||
final var shardId = shard; | ||
final var shardGeneration = new ShardGeneration(randomUUID()); | ||
expectedShardGenerations.put(indexId, shard, shardGeneration); | ||
final var blobsToDelete = generateRandomBlobsToDelete(0, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max 1000 indices × max 30 shards × max 100 blobs is a max of 3M items. That seems like a lot. Does this really give us much more coverage than a smaller test?
A much more interesting test here would be to see what happens if we stop writing just shy of the limit, such that the final flush pushes us over. Could we instead pick a lower limit, write until we get very close to the limit (according to shardBlobsToDelete.sizeInBytes()
) and then verify that we didn't lose anything?
|
||
final var expectedShardGenerations = ShardGenerations.builder().put(indexId, shardId, shardGeneration).build(); | ||
|
||
Settings.Builder settings = Settings.builder().put("repositories.blobstore.max_shard_delete_results_size", "0b"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here I think this can just be folded into testShardBlobsToDeleteWithLimitedHeapSpace
.
} | ||
|
||
return Iterators.flatMap(Iterators.forRange(0, resultCount, i -> { | ||
List<String> blobPaths = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Materializing this entire collection of paths as an ArrayList<String>
is exactly what we're trying to avoid doing in the first place!
if (maxHeapSizeInBytes > maxShardDeleteResultsSize) { | ||
return maxShardDeleteResultsSize; | ||
} | ||
return (int) maxHeapSizeInBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather elaborate way to write Math.min()
:) I'd suggest keeping it as a long
throughout even though we know it will always be less than Integer.MAX_VALUE
, but you can try a Math.toIntExact
if you'd prefer.
ShardBlobsToDelete() { | ||
// Gets 25% of the heap size to be allocated to the shard_delete_results stream | ||
public final Setting<ByteSizeValue> MAX_SHARD_DELETE_RESULTS_SIZE_SETTING = Setting.memorySizeSetting( | ||
"repositories.blobstore.max_shard_delete_results_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to register this setting in ClusterSettings
.
public final Setting<ByteSizeValue> MAX_SHARD_DELETE_RESULTS_SIZE_SETTING = Setting.memorySizeSetting( | ||
"repositories.blobstore.max_shard_delete_results_size", | ||
"25%", | ||
Setting.Property.NodeScope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this a dynamic setting too?
new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed); | ||
resultCount += 1; | ||
// Only write if we have capacity | ||
if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this call TruncatedOutputStream#hasCapacity
? No need for a comment that way, and also it's important that we use the same has-capacity computation as the underlying stream here.
new ShardSnapshotMetaDeleteResult(Objects.requireNonNull(indexId.getId()), shardId, blobsToDelete).writeTo(compressed); | ||
// We only want to read this shard delete result if we were able to write the entire object. | ||
// Otherwise, for partial writes, an EOFException will be thrown upon reading | ||
if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, this should call TruncatedOutputStream#hasCapacity
.
logger.warn( | ||
"Failure to clean up the following dangling blobs, {}, for index {} and shard {}", | ||
blobsToDelete, | ||
indexId, | ||
shardId | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't reasonably log every skipped blob at WARN
like this - we've already captured several (compressed) GiB of blob names before getting to this point, so it wouldn't be surprising if there were several GiB more. We wouldn't expect users to go through these logs and delete the blobs manually - indeed we would strongly discourage that kind of behaviour.
Instead, let's log this at DEBUG
and keep count of the number of blobs we skipped. Then at the end we can log at WARN
how many blobs we've leaked.
Also nit it's not really a "failure", we're deliberately skipping this work because of resource constraints. We should mention in the user-facing WARN
message that these dangling blobs will be cleaned up by subsequent deletes, and perhaps suggest that the master node needs a larger heap size to perform such large snapshot deletes in future.
if (shardDeleteResults.size() < this.shardDeleteResultsMaxSize) { | ||
resultCount += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
also needs an else
to keep track of the blobs that leaked because we ran out of capacity during the write.
// We only want to read this shard delete result if we were able to write the entire object. | ||
// Otherwise, for partial writes, an EOFException will be thrown upon reading | ||
if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) { | ||
successfullyWrittenBlobsCount += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces resultCount
but it's the count of the number of successfully recorded shards not blobs.
if (this.truncatedShardDeleteResultsOutputStream.hasCapacity()) { | ||
successfullyWrittenBlobsCount += 1; | ||
} else { | ||
leakedBlobsCount += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this is recording the number of shards with leaked blobs rather than the number of leaked blobs. However, rather than just renaming the variable I think we should actually count the number of leaked blobs (i.e. += blobsToDelete.size()
here).
logger.debug( | ||
"Unable to clean up the following dangling blobs, {}, for index {} and shard {} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also applies to the other branch that increases leakedBlobsCount
.
public static final Setting<ByteSizeValue> MAX_HEAP_SIZE_FOR_SNAPSHOT_DELETION_SETTING = Setting.memorySizeSetting( | ||
"repositories.blobstore.max_shard_delete_results_size", | ||
"25%", | ||
Setting.Property.Dynamic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet more trappiness: you now need to register a listener for updates to this setting (e.g. call clusterSettings.initializeAndWatch(...)
). You can get a clusterSettings
from clusterService.getClusterSettings()
. I think I'd be inclined to do that in the BlobStoreRepository
constructor rather than doing it each time we create a SnapshotsDeletion
.
Modifies
BlobStoreRepository.ShardBlobsToDelete.shardDeleteResults
to have a variable size depending on the remaining heap space rather than a hard-coded 2GB size which caused smaller nodes with less heap space to OOMe.Relates to #131822
Closes #116379
Closes ES-12540