Test remote state BWC in both CM version directions#20912
Test remote state BWC in both CM version directions#20912andrross wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit f1489d0)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to f1489d0
Previous suggestionsSuggestions up to commit 892518d
|
|
❌ Gradle check result for 892518d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
The RemotePublicationClusterStateIT test was not deterministically exercising the case where a new-version cluster-manager writes remote cluster state that old-version nodes must deserialize. This meant backwards-incompatible serialization changes could be merged without being caught, since the test would pass whenever an old-version node happened to win the cluster-manager election. In the mixed-version (one-third upgraded) phase, the test now explicitly tests both directions: 1. Old CM writes state, new nodes read from remote store 2. New CM writes state, old nodes read from remote store To force a specific version to be cluster-manager, the test uses the voting config exclusions API to repeatedly exclude the current CM until a node of the desired version wins the election. Exclusions are cleared immediately after each re-election so no node leaves the cluster. Also changes the remote test cluster's dependency on the non-remote test cluster from dependsOn to mustRunAfter, so the remote tests can be run independently without first running the full non-remote suite. Signed-off-by: Andrew Ross <andrross@amazon.com>
892518d to
f1489d0
Compare
|
Persistent review updated to latest commit f1489d0 |
|
❌ Gradle check result for f1489d0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for f1489d0: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Closing as forward compatibility is explicitly not supported here. See #20948 |
The RemotePublicationClusterStateIT test was not deterministically exercising the case where a new-version cluster-manager writes remote cluster state that old-version nodes must deserialize. This meant backwards-incompatible serialization changes could be merged without being caught, since the test would pass whenever an old-version node happened to win the cluster-manager election.
In the mixed-version (one-third upgraded) phase, the test now explicitly tests both directions:
To force a specific version to be cluster-manager, the test uses the voting config exclusions API to repeatedly exclude the current CM until a node of the desired version wins the election. Exclusions are cleared immediately after each re-election so no node leaves the cluster.
Also changes the remote test cluster's dependency on the non-remote test cluster from dependsOn to mustRunAfter, so the remote tests can be run independently without first running the full non-remote suite.
Related Issues
Related to #20910
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.