Optionally make S3 writes unconditional #137185

DaveCTurner · 2025-10-27T11:20:28Z

As of #133030 Elasticsearch uses S3's support for conditional writes to
protect against repository corruption. It is possible that some other
storage system may claim to be fully S3-compatible despite rejecting
such requests.

This commit adds a repository setting that allows to make all writes
unconditional, unsafely disabling the corruption protection, but
allowing users some time to work with their storage supplier to address
the incompatibility.

As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility.

elasticsearchmachine · 2025-10-27T11:20:53Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DaveCTurner · 2025-10-27T11:25:46Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java

+        // REPOSITORY_ANALYSIS is a strict check for 100% S3 compatibility, including conditional write support
+        return supportsConditionalWrites || purpose == OperationPurpose.REPOSITORY_ANALYSIS;


NB no tests for the REPOSITORY_ANALYSIS condition here, they'll be added when we resolve #134632.

Does this mean repo analysis will always fail for storage service not supporting conditional writes? I assume that's intentional even if this means existing compatible ones could start failing? IIUC, this work has something to do with AWS's MPU guarantee. Do we consider a storage passing the analysis if it supports the MPU guarantees but not the conditional writes?

Yes that's right, this is covered in the docs:

NOTE: Different versions of Elasticsearch may perform different checks for repository compatibility, with newer versions typically being stricter than older ones. A storage system that passes repository analysis with one version of Elasticsearch may fail with a different version. This indicates it behaves incorrectly in ways that the former version did not detect.

This doesn't yet matter for the non-linearizable MPUs but when we move to using conditional writes for CAS operations we will only do so if the storage supports conditional writes. If the user indicates that their storage doesn't support conditional writes yet then we'll fall back to today's MPU-based CAS implementation.

Relates elastic/elasticsearch#137185

mhl-b · 2025-10-27T18:08:11Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java

    );

+    static final Setting<Boolean> UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES = Setting.boolSetting(
+        "unsafely_incompatible_with_s3_conditional_writes",


Do you have other names for the setting in mind? I assume with conditional ETag there would be another parameter, since it's a different feature from existence check, and s3-like-storage might have one of those. Maybe supports_s3_write_existence_check, and later supports_s3_write_etag_check?

Can be with unsafe_

I wasn't planning on distinguishing conditional-on-object-exists from conditional-on-etag writes. Do you see a need to do that? My thinking is that "conditional writes" covers both cases, and although these things were announced by AWS as separate features they're now documented together.

The only argument I have is that these features came at different time and use different implementation that s3-like-repos might have only one of them.

But the feature parity is blurry, and how much we should put effort into featuring these knobs. I think it's s3-like-repo responsibility provide own s3-like-client rather than using not fully compatible aws client.

I think it might be worth to name this setting more precisely. If a storage does support etag but not the other, it is not obvious whether it should set this setting to true or false.

I also wonder whether it is better named more positively, e.g. supports_s3_conditional_writes, which seems more straightforward. But maybe you prefer the more negative form to also serve as a warnng?

If a storage does support etag but not the other, it is not obvious

I think it's clear in that case that the storage does not properly support conditional writes so the user needs to disable them until they can upgrade their storage. I don't think it means much to the user to distinguish the different kinds of conditional writes.

I also wonder whether it is better named more positively

Yeah I'd prefer that but it's also important to get the unsafe bit into the name so that users know they need to work on their storage upgrade ASAP.

ywangd

The code changes look fine to me. My question is about how we want to roll this out.

Once it is released, a 3rd party storage that previously passed the repo analysis check can start failing. I assume that customer does not really run repo analysis as a regular check on upgrade? So they may run with the code assuming full compatibility (the default) for a while before it is caught by running repo analysis again. What would be our recommendation at this point? Since storage server upgrade might not be feasible, do we suggest configuring the setting to false and ignore the rather severe warning? It also brings up the question on whether their existing data is already corrupted because the system has running with a false assumption?

ywangd · 2025-10-28T03:36:01Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3BlobStore.java

+        // REPOSITORY_ANALYSIS is a strict check for 100% S3 compatibility, including conditional write support
+        return supportsConditionalWrites || purpose == OperationPurpose.REPOSITORY_ANALYSIS;


Does this mean repo analysis will always fail for storage service not supporting conditional writes? I assume that's intentional even if this means existing compatible ones could start failing? IIUC, this work has something to do with AWS's MPU guarantee. Do we consider a storage passing the analysis if it supports the MPU guarantees but not the conditional writes?

ywangd · 2025-10-28T03:52:52Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java

    );

+    static final Setting<Boolean> UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES = Setting.boolSetting(
+        "unsafely_incompatible_with_s3_conditional_writes",


I think it might be worth to name this setting more precisely. If a storage does support etag but not the other, it is not obvious whether it should set this setting to true or false.

I also wonder whether it is better named more positively, e.g. supports_s3_conditional_writes, which seems more straightforward. But maybe you prefer the more negative form to also serve as a warnng?

DaveCTurner · 2025-10-28T08:51:55Z

Once it is released, a 3rd party storage that previously passed the repo analysis check can start failing. I assume that customer does not really run repo analysis as a regular check on upgrade?

Generally that's what we'd recommend. See e.g. these docs which say to verify snapshot repositories with repo analysis as part of the test-cluster upgrade.

So they may run with the code assuming full compatibility (the default) for a while before it is caught by running repo analysis again. What would be our recommendation at this point? Since storage server upgrade might not be feasible, do we suggest configuring the setting to false and ignore the rather severe warning? It also brings up the question on whether their existing data is already corrupted because the system has running with a false assumption?

If after an upgrade they don't notice that their storage doesn't support conditional writes then it must be ignoring the If-None-Match header we added in #133030, in which case it doesn't really matter if we continue to send it or not, no action is needed. My immediate concern is more that some storage might actively reject these requests because of the new If-None-Match header, and we need a way to turn that header off temporarily.

ywangd

Overall I am quite satisfied since running repository analysis is in fact a suggested step for upgrade.

If after an upgrade they don't notice that their storage doesn't support conditional writes then it must be ignoring the If-None-Match header we added in #133030, in which case it doesn't really matter if we continue to send it or not, no action is needed.

My question is that if they run repo analysis after upgrade and encounter the failure. What would be our suggestion? I see 3 options:

Ignore the repo analysis failure
Set unsafely_incompatible_with_s3_conditional_writes to true which sounds rather uncomfortable on the paper due to the name and associated warning.
Ask their storage provider for feature parity

I guess in pratice we should take option 2. In that case, I wonder whether the wording should be a bit softer since "may lead to repository corruption" sounds quite concerning. Or maybe my understanding is off and repo analysis does not fail if the storage server ignores the header?

DaveCTurner · 2025-10-29T07:18:40Z

I see 3 options:

We'd generally expect users to do both option 2 (short-term workaround) and option 3 (long-term fix).

I wonder whether the wording should be a bit softer since "may lead to repository corruption" sounds quite concerning.

It's true tho: e.g. if you accidentally set up two clusters to write to the repository at the same time, without overwrite protection they might both write the same index-${N} root blob and then start deleting blobs they believe to be unreferenced, and some of those deletes will be incorrect.

DaveCTurner · 2025-10-29T07:20:04Z

Or maybe my understanding is off and repo analysis does not fail if the storage server ignores the header?

Actually this is currently true, we do not verify that failIfAlreadyExists actually does fail on objects that already exist. But we'd like to address that too, see #134632.

ywangd

LGTM

Thanks for explaining.

if you accidentally set up two clusters to write to the repository at the same time

We don't support such setup, right? The conditional write will help us better protect against it. But just by having such a setup itself is already on the disaster path.

DaveCTurner · 2025-10-29T07:58:49Z

We don't support such setup, right? The conditional write will help us better protect against it. But just by having such a setup itself is already on the disaster path.

That's correct, this isn't expected to work, but it's an easy mistake to make it seems and we'd much rather have it fail to work with a benign error message before it harms the repository contents :)

…3672) Relates elastic/elasticsearch#137185

As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility.

In elastic#137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.

In #137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.

…ic#138406) In elastic#137185 we introduced a new S3 repository setting called `unsafely_incompatible_with_s3_conditional_writes` but named the code symbol `UNSAFELY_INCOMPATIBLE_WITH_S3_WRITES` omitting the "conditional" bit. That makes the dependent code confusing to read, so this commit fixes the symbol name.

As of elastic#133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility. Backport of elastic#137185 and elastic#138406 to 9.2

As of #133030 Elasticsearch uses S3's support for conditional writes to protect against repository corruption. It is possible that some other storage system may claim to be fully S3-compatible despite rejecting such requests. This commit adds a repository setting that allows to make all writes unconditional, unsafely disabling the corruption protection, but allowing users some time to work with their storage supplier to address the incompatibility. Backport of #137185 and #138406 to 9.2

DaveCTurner requested a review from ywangd October 27, 2025 11:20

DaveCTurner added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.3.0 labels Oct 27, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 27, 2025

Imports

8e30ae3

DaveCTurner commented Oct 27, 2025

View reviewed changes

DaveCTurner added 2 commits October 27, 2025 11:31

Sometimes set irrelevant parameter in repo analysis tests

01ec33a

Fix compile

cd7be97

DaveCTurner mentioned this pull request Oct 27, 2025

Add docs for S3 unsafely_incompatible_with_s3_conditional_writes elastic/docs-content#3672

Merged

DaveCTurner added a commit to DaveCTurner/elastic-docs-content that referenced this pull request Oct 27, 2025

Add docs for S3 unsafely_incompatible_with_s3_conditional_writes

d9c74d5

Relates elastic/elasticsearch#137185

DaveCTurner mentioned this pull request Oct 27, 2025

Migrate S3's CompareAndExchangeOperation to use conditional writes #137197

Open

mhl-b reviewed Oct 27, 2025

View reviewed changes

ywangd reviewed Oct 28, 2025

View reviewed changes

DaveCTurner requested review from mhl-b and ywangd October 28, 2025 14:37

DaveCTurner added 3 commits October 28, 2025 14:37

Merge branch 'main' into 2025/10/27/s3-conditional-writes-option

78f34c3

Mocks

a3c4c05

Merge branch 'main' into 2025/10/27/s3-conditional-writes-option

541d471

ywangd reviewed Oct 29, 2025

View reviewed changes

DaveCTurner requested a review from ywangd October 29, 2025 07:20

ywangd approved these changes Oct 29, 2025

View reviewed changes

DaveCTurner merged commit dc8afd2 into elastic:main Oct 29, 2025
34 checks passed

DaveCTurner deleted the 2025/10/27/s3-conditional-writes-option branch October 29, 2025 07:59

DaveCTurner added a commit to elastic/docs-content that referenced this pull request Oct 29, 2025

Add docs for S3 unsafely_incompatible_with_s3_conditional_writes (#…

cf6def8

…3672) Relates elastic/elasticsearch#137185

DaveCTurner mentioned this pull request Nov 21, 2025

Fix name of UNSAFELY_INCOMPATIBLE_WITH_S3_CONDITIONAL_WRITES #138406

Merged

DaveCTurner mentioned this pull request Nov 27, 2025

Optionally make S3 writes unconditional #138735

Merged

		// REPOSITORY_ANALYSIS is a strict check for 100% S3 compatibility, including conditional write support
		return supportsConditionalWrites \|\| purpose == OperationPurpose.REPOSITORY_ANALYSIS;

Optionally make S3 writes unconditional #137185

Optionally make S3 writes unconditional #137185

Uh oh!

Conversation

DaveCTurner commented Oct 27, 2025

Uh oh!

elasticsearchmachine commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhl-b Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Oct 28, 2025

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Oct 29, 2025

Uh oh!

DaveCTurner commented Oct 29, 2025

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mhl-b Oct 27, 2025 •

edited

Loading