Skip to content

Conversation

@jordan-powers
Copy link
Contributor

@jordan-powers jordan-powers commented Sep 25, 2025

Follow-up to #135460.

This patch addresses the new failures by moving the node feature check such that it is executed before any nodes are upgraded.

Note that I'm not sure why the previous solution didn't work. But it wouldn't reproduce in an individual test run, only when running the whole test suite, so I think it was some race condition. This new approach guarantees that the node features will be checked before any attempt is made to upgrade the nodes.

Fixes #135338
Fixes #135312
Fixes #135316
Fixes #135312
Fixes #135316
Fixes #135320
Fixes #135237
Fixes #135315
Fixes #135324
Fixes #135327
Fixes #135238
Fixes #135236
Fixes #135344
Fixes #135325
Fixes #135511
Fixes #135512
Fixes #135525
Fixes #135319

@jordan-powers jordan-powers added >test Issues or PRs that are addressing/adding tests Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings v9.2.0 labels Sep 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for looking into this!

- class: org.elasticsearch.upgrades.SearchableSnapshotsRollingUpgradeIT
method: testBlobStoreCacheWithFullCopyInMixedVersions
issue: https://github.com/elastic/elasticsearch/issues/135474
- class: org.elasticsearch.upgrades.SyntheticSourceRollingUpgradeIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few more tests need to be unmuted now.

// Skip remaining tests if upgrade failed
assumeFalse("Cluster upgrade failed", upgradeFailed);

beforeUpgrade();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked a few test failures and it fails with fails with indices/data streams that are managed internally. For example the ilm history or deprecation logging.

I think without this change a node upgrade could happen before the @Before method with old feature check was invoked. In the case an internal index is created while the upgrade is in progress. Older nodes don't understand the renamed index setting. This caused the test to fail before it got the old feature check. (cluster would never go into a green state after a node upgraded)

So I think this should ensure that this doesn't happen.

public void checkFeatures() {
@Override
protected void beforeUpgrade() {
if (Build.current().isSnapshot()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use getOldClusterVersion() and check whether suffix is -SNAPSHOT?

I don't think this works, because this code runs in the jvm that runs the test suite and not Elasticsearch.

@jordan-powers jordan-powers enabled auto-merge (squash) September 26, 2025 15:22
@jordan-powers jordan-powers enabled auto-merge (squash) September 26, 2025 15:57
@jordan-powers jordan-powers merged commit 77b4fd5 into elastic:main Sep 26, 2025
34 checks passed
@jordan-powers jordan-powers deleted the fix-rolling-upgrade-tests-2 branch October 1, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment