-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Optionally make S3 writes unconditional #137185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally make S3 writes unconditional #137185
Conversation
As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility.
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
| // REPOSITORY_ANALYSIS is a strict check for 100% S3 compatibility, including conditional write support | ||
| return supportsConditionalWrites || purpose == OperationPurpose.REPOSITORY_ANALYSIS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB no tests for the REPOSITORY_ANALYSIS condition here, they'll be added when we resolve #134632.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean repo analysis will always fail for storage service not supporting conditional writes? I assume that's intentional even if this means existing compatible ones could start failing? IIUC, this work has something to do with AWS's MPU guarantee. Do we consider a storage passing the analysis if it supports the MPU guarantees but not the conditional writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right, this is covered in the docs:
NOTE: Different versions of Elasticsearch may perform different checks for repository compatibility, with newer versions typically being stricter than older ones. A storage system that passes repository analysis with one version of Elasticsearch may fail with a different version. This indicates it behaves incorrectly in ways that the former version did not detect.
This doesn't yet matter for the non-linearizable MPUs but when we move to using conditional writes for CAS operations we will only do so if the storage supports conditional writes. If the user indicates that their storage doesn't support conditional writes yet then we'll fall back to today's MPU-based CAS implementation.
| ); | ||
|
|
||
| static final Setting<Boolean> UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES = Setting.boolSetting( | ||
| "unsafely_incompatible_with_s3_conditional_writes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have other names for the setting in mind? I assume with conditional ETag there would be another parameter, since it's a different feature from existence check, and s3-like-storage might have one of those. Maybe supports_s3_write_existence_check, and later supports_s3_write_etag_check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be with unsafe_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't planning on distinguishing conditional-on-object-exists from conditional-on-etag writes. Do you see a need to do that? My thinking is that "conditional writes" covers both cases, and although these things were announced by AWS as separate features they're now documented together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only argument I have is that these features came at different time and use different implementation that s3-like-repos might have only one of them.
But the feature parity is blurry, and how much we should put effort into featuring these knobs. I think it's s3-like-repo responsibility provide own s3-like-client rather than using not fully compatible aws client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth to name this setting more precisely. If a storage does support etag but not the other, it is not obvious whether it should set this setting to true or false.
I also wonder whether it is better named more positively, e.g. supports_s3_conditional_writes, which seems more straightforward. But maybe you prefer the more negative form to also serve as a warnng?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a storage does support etag but not the other, it is not obvious
I think it's clear in that case that the storage does not properly support conditional writes so the user needs to disable them until they can upgrade their storage. I don't think it means much to the user to distinguish the different kinds of conditional writes.
I also wonder whether it is better named more positively
Yeah I'd prefer that but it's also important to get the unsafe bit into the name so that users know they need to work on their storage upgrade ASAP.
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look fine to me. My question is about how we want to roll this out.
Once it is released, a 3rd party storage that previously passed the repo analysis check can start failing. I assume that customer does not really run repo analysis as a regular check on upgrade? So they may run with the code assuming full compatibility (the default) for a while before it is caught by running repo analysis again. What would be our recommendation at this point? Since storage server upgrade might not be feasible, do we suggest configuring the setting to false and ignore the rather severe warning? It also brings up the question on whether their existing data is already corrupted because the system has running with a false assumption?
| // REPOSITORY_ANALYSIS is a strict check for 100% S3 compatibility, including conditional write support | ||
| return supportsConditionalWrites || purpose == OperationPurpose.REPOSITORY_ANALYSIS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean repo analysis will always fail for storage service not supporting conditional writes? I assume that's intentional even if this means existing compatible ones could start failing? IIUC, this work has something to do with AWS's MPU guarantee. Do we consider a storage passing the analysis if it supports the MPU guarantees but not the conditional writes?
| ); | ||
|
|
||
| static final Setting<Boolean> UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES = Setting.boolSetting( | ||
| "unsafely_incompatible_with_s3_conditional_writes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth to name this setting more precisely. If a storage does support etag but not the other, it is not obvious whether it should set this setting to true or false.
I also wonder whether it is better named more positively, e.g. supports_s3_conditional_writes, which seems more straightforward. But maybe you prefer the more negative form to also serve as a warnng?
Generally that's what we'd recommend. See e.g. these docs which say to verify snapshot repositories with repo analysis as part of the test-cluster upgrade.
If after an upgrade they don't notice that their storage doesn't support conditional writes then it must be ignoring the |
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I am quite satisfied since running repository analysis is in fact a suggested step for upgrade.
If after an upgrade they don't notice that their storage doesn't support conditional writes then it must be ignoring the If-None-Match header we added in #133030, in which case it doesn't really matter if we continue to send it or not, no action is needed.
My question is that if they run repo analysis after upgrade and encounter the failure. What would be our suggestion? I see 3 options:
- Ignore the repo analysis failure
- Set
unsafely_incompatible_with_s3_conditional_writestotruewhich sounds rather uncomfortable on the paper due to the name and associated warning. - Ask their storage provider for feature parity
I guess in pratice we should take option 2. In that case, I wonder whether the wording should be a bit softer since "may lead to repository corruption" sounds quite concerning. Or maybe my understanding is off and repo analysis does not fail if the storage server ignores the header?
We'd generally expect users to do both option 2 (short-term workaround) and option 3 (long-term fix).
It's true tho: e.g. if you accidentally set up two clusters to write to the repository at the same time, without overwrite protection they might both write the same |
Actually this is currently true, we do not verify that |
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for explaining.
if you accidentally set up two clusters to write to the repository at the same time
We don't support such setup, right? The conditional write will help us better protect against it. But just by having such a setup itself is already on the disaster path.
That's correct, this isn't expected to work, but it's an easy mistake to make it seems and we'd much rather have it fail to work with a benign error message before it harms the repository contents :) |
As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility.
In elastic#137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.
In #137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.
…ic#138406) In elastic#137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.
As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility. Backport of elastic#137185 and elastic#138406 to 9.2
As of #133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility. Backport of #137185 and #138406 to 9.2
As of #133030 Elasticsearch uses S3's support for conditional writes to
protect against repository corruption. It is possible that some other
storage system may claim to be fully S3-compatible despite rejecting
such requests.
This commit adds a repository setting that allows to make all writes
unconditional, unsafely disabling the corruption protection, but
allowing users some time to work with their storage supplier to address
the incompatibility.