Reduce Data Loss in System Indices Migration 8x #120566

JVerwolf · 2025-01-21T23:23:18Z

Jira: ES-9724

This PR removes a potential cause of data loss when migrating system indices. It does this by changing the way we set a "write-block" on the system index to migrate - now using a dedicated transport request rather than a settings update. Furthermore, we no longer delete the write-block prior to deleting the index, as this was another source of potential data loss. Additionally, we now remove the block if the migration fails.

main branch PR: #120168

…nt/es-9724-reduce-data-loss-system-indices-8x

elasticsearchmachine · 2025-01-21T23:35:48Z

Hi @JVerwolf, I've created a changelog YAML for you.

…nt/es-9724-reduce-data-loss-system-indices-8x

… of github.com:JVerwolf/elasticsearch into enhancement/es-9724-reduce-data-loss-system-indices-8x

JVerwolf · 2025-01-23T00:09:53Z

server/src/main/java/org/elasticsearch/upgrades/SystemIndexMigrator.java

+                                        ),
+                                        e
+                                    );
+                                    removeReadOnlyBlockOnReindexFailure(oldIndex, delegate2, e);


This error handling function (second param to ActionListener.wrap) will be called If there is an error returned by the aliases request in setAliasAndRemoveOldIndex, or if there is an an error thrown by the happy-path function that's the first parameter to ActionListener.wrap.

This function will log the error and then remove the WRITE block on the original index, in an attempt to not leave things in a broken state.

I'm not sure how to test this path, or even if this path is needed. I'd be happy to get feedback here from reviewers - thanks!

Just for local testing - would introducing code that always throws an exception to TransportIndicesAliasesAction.masterOperation help to test this path?

Hmm, good idea. I'll look into how to do that - I think I remember seeing examples of that type of thing somewhere.

elasticsearchmachine · 2025-01-23T00:11:15Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

ldematte · 2025-01-23T11:14:14Z

Side note: if this is a backport (looks like a backport?) could you add the tag please?

server/src/main/java/org/elasticsearch/upgrades/SystemIndexMigrator.java

alexey-ivanov-es · 2025-01-23T13:45:43Z

server/src/main/java/org/elasticsearch/upgrades/SystemIndexMigrator.java

+                                        ),
+                                        e
+                                    );
+                                    removeReadOnlyBlockOnReindexFailure(oldIndex, delegate2, e);


Just for local testing - would introducing code that always throws an exception to TransportIndicesAliasesAction.masterOperation help to test this path?

… of github.com:JVerwolf/elasticsearch into enhancement/es-9724-reduce-data-loss-system-indices-8x

…nt/es-9724-reduce-data-loss-system-indices-8x

JVerwolf · 2025-01-24T22:42:06Z

...les/reindex/src/internalClusterTest/java/org/elasticsearch/migration/FeatureMigrationIT.java

+        // Retry the migration
+        client().execute(PostFeatureUpgradeAction.INSTANCE, new PostFeatureUpgradeRequest(TEST_REQUEST_TIMEOUT)).get();
+
+        // Ensure that the migration is successful after the alias request is unblocked
+        assertBusy(() -> {
+            GetFeatureUpgradeStatusResponse statusResp = client().execute(
+                GetFeatureUpgradeStatusAction.INSTANCE,
+                new GetFeatureUpgradeStatusRequest(TEST_REQUEST_TIMEOUT)
+            ).get();
+            logger.info(Strings.toString(statusResp));
+            assertThat(statusResp.getUpgradeStatus(), equalTo(GetFeatureUpgradeStatusResponse.UpgradeStatus.NO_MIGRATION_NEEDED));


This is failing. When I set breakpoints, the taskState in org.elasticsearch.upgrades.SystemIndexMigrator#cleanUpPreviousMigration is null, which prevents the previous "new" index from being cleaned up. I then get an exception upon trying to create an index that already exists.

@gwbrown Do you know why this might be happening? Thanks!

server/src/main/java/org/elasticsearch/upgrades/SystemIndexMigrator.java

…nt/es-9724-reduce-data-loss-system-indices-8x

JVerwolf · 2025-01-28T00:45:14Z

The new test I added breaks the original code (without my changes) as well as my PR. It seems the task state is not being restored in the subsequent migration runs, which prevents the new index from being cleaned-up. @rjernst and I spent a while debugging this, but weren't able to locate the cause as of yet. I'll disable the test for now, and revisit it in a future PR.

alexey-ivanov-es

LGTM

…nt/es-9724-reduce-data-loss-system-indices-8x

This reverts commit a3919b0.

Reverts #120566 The original PR is causing the following exception to be thrown when security is enabled: ``` system-indices-testing-es01-1 | org.elasticsearch.ElasticsearchSecurityException: action [indices:admin/block/add] is unauthorized for user [_system] with effective roles [_system], this action is granted by the index privileges [manage,all] ```

JVerwolf added 2 commits January 21, 2025 15:11

Reduce data loss on system index migration

015e372

spotless

83dcafd

elasticsearchmachine added the v9.0.0 label Jan 21, 2025

JVerwolf changed the base branch from main to 8.x January 21, 2025 23:26

elastic deleted a comment from github-actions bot Jan 21, 2025

Merge branch '8.x' of github.com:elastic/elasticsearch into enhanceme…

409f24a

…nt/es-9724-reduce-data-loss-system-indices-8x

JVerwolf added v8.18.0 :Core/Infra/Core Core issues without another label and removed v9.0.0 labels Jan 21, 2025

JVerwolf mentioned this pull request Jan 21, 2025

Reduce Data Loss in System Indices Migration #120168

Merged

JVerwolf added >bug Team:Core/Infra Meta label for core/infra team labels Jan 21, 2025

Update docs/changelog/120566.yaml

c341419

JVerwolf and others added 5 commits January 21, 2025 15:36

Update 120566.yaml

c79883a

Delete old index

ba5128c

spotless

81904ab

Merge branch '8.x' of github.com:elastic/elasticsearch into enhanceme…

5c2fe23

…nt/es-9724-reduce-data-loss-system-indices-8x

Merge branch 'enhancement/es-9724-reduce-data-loss-system-indices-8x'…

c26e51d

… of github.com:JVerwolf/elasticsearch into enhancement/es-9724-reduce-data-loss-system-indices-8x

JVerwolf commented Jan 23, 2025

View reviewed changes

JVerwolf marked this pull request as ready for review January 23, 2025 00:10

JVerwolf requested review from a team and alexey-ivanov-es January 23, 2025 00:10

alexey-ivanov-es reviewed Jan 23, 2025

View reviewed changes

JVerwolf added 2 commits January 23, 2025 15:37

Add test for failure

8f5ca49

Add test for failure

7f71e74

JVerwolf added the backport label Jan 24, 2025

Delete docs/changelog/120566.yaml

44687d0

JVerwolf added 2 commits January 24, 2025 13:49

Merge branch 'enhancement/es-9724-reduce-data-loss-system-indices-8x'…

add0253

… of github.com:JVerwolf/elasticsearch into enhancement/es-9724-reduce-data-loss-system-indices-8x

Merge branch '8.x' of github.com:elastic/elasticsearch into enhanceme…

29499fc

…nt/es-9724-reduce-data-loss-system-indices-8x

JVerwolf commented Jan 24, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/upgrades/SystemIndexMigrator.java Outdated Show resolved Hide resolved

JVerwolf added 2 commits January 27, 2025 16:37

Add error logging, dissable test

c367f0d

Merge branch '8.x' of github.com:elastic/elasticsearch into enhanceme…

deae30e

…nt/es-9724-reduce-data-loss-system-indices-8x

JVerwolf requested a review from alexey-ivanov-es January 28, 2025 00:45

alexey-ivanov-es approved these changes Jan 28, 2025

View reviewed changes

JVerwolf added 3 commits January 28, 2025 10:13

Use AwaitsFix in test

34d1368

Merge branch '8.x' of github.com:elastic/elasticsearch into enhanceme…

83c121d

…nt/es-9724-reduce-data-loss-system-indices-8x

spotless

ab87731

JVerwolf merged commit a3919b0 into elastic:8.x Jan 28, 2025
15 checks passed

JVerwolf added a commit that referenced this pull request Jan 29, 2025

Revert "Reduce Data Loss in System Indices Migration 8x (#120566)"

53b670e

This reverts commit a3919b0.

JVerwolf mentioned this pull request Jan 29, 2025

Revert "Reduce Data Loss in System Indices Migration 8x" #121120

Merged

This was referenced Jan 29, 2025

Reenable "Reduce Data Loss in System Indices Migration 8x" with fix #121213

Closed

Fix privileges for system index migration WRITE block #121329

Merged

This was referenced Feb 10, 2025

Bugfix/fix privileges in system migration block #122217

Merged

System Index Migration Failure Results in a Non-Recoverable State #122326

Merged

Reduce Data Loss in System Indices Migration 8x #120566

Reduce Data Loss in System Indices Migration 8x #120566

Uh oh!

Conversation

JVerwolf commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 21, 2025

Uh oh!

JVerwolf Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

alexey-ivanov-es Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

JVerwolf Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jan 23, 2025

Uh oh!

ldematte commented Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

alexey-ivanov-es Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

JVerwolf Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JVerwolf commented Jan 28, 2025

Uh oh!

alexey-ivanov-es left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JVerwolf commented Jan 21, 2025 •

edited

Loading

JVerwolf Jan 23, 2025 •

edited

Loading