Fix Test SharedClusterSnapshotRestoreIT.testDataFileFailureDuringRestore (#80515) (#124282)

nicktindall · original-brownbear · web-flow · commit dc3d47527db5 · 2025-03-07T11:35:44.000+11:00
This is a test/assertion only issue. We were removing the tracking of a shard restore after invoking the listener for the restore. The whole mechanics around `onGoingRestores` though is used to wait for the blobstore to go idle during node shutdown. The problem with removing the tracking for the shard after resolving the listener is that if the restore is retried very quickly due to some reroute or so, then we have a race where it's retried before the failed restore is removed from `onGoingRestores`. => fixed by just removing the tracking before resolving the listener which is more correct anyway since we are done with the blobstore at this point. closes #80477 (cherry picked from commit ea93bdb) Co-authored-by: Armin Braun <me@obrown.io>
diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
@@ -2992,7 +2992,7 @@ public void restoreShard(
             final boolean added = ongoingRestores.add(shardId);
             assert added : "add restore for [" + shardId + "] that already has an existing restore";
         }
-        executor.execute(ActionRunnable.wrap(ActionListener.runAfter(restoreListener, () -> {
+        executor.execute(ActionRunnable.wrap(ActionListener.runBefore(restoreListener, () -> {
             final List<ActionListener<Void>> onEmptyListeners;
             synchronized (ongoingRestores) {
                 if (ongoingRestores.remove(shardId) && ongoingRestores.isEmpty() && emptyListeners != null) {

Original file line number	Diff line number	Diff line change
`@@ -2992,7 +2992,7 @@ public void restoreShard(`
`2992`	`2992`	`final boolean added = ongoingRestores.add(shardId);`
`2993`	`2993`	`assert added : "add restore for [" + shardId + "] that already has an existing restore";`
`2994`	`2994`	`}`
`2995`		`- executor.execute(ActionRunnable.wrap(ActionListener.runAfter(restoreListener, () -> {`
	`2995`	`+ executor.execute(ActionRunnable.wrap(ActionListener.runBefore(restoreListener, () -> {`
`2996`	`2996`	`final List<ActionListener<Void>> onEmptyListeners;`
`2997`	`2997`	`synchronized (ongoingRestores) {`
`2998`	`2998`	`if (ongoingRestores.remove(shardId) && ongoingRestores.isEmpty() && emptyListeners != null) {`