-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
A BWC test for remote cluster state was added in #20221. This is failing intermittently:
https://build.ci.opensearch.org/job/gradle-check/72744/consoleText
https://build.ci.opensearch.org/job/gradle-check/72748/consoleText
- Build failure (top-level):
Task :qa:rolling-upgrade:v2.19.6-remote#twoThirdsUpgradedTest FAILED
Execution failed for task ':qa:rolling-upgrade:v2.19.6-remote#twoThirdsUpgradedTest'.
process was found dead while waiting for cluster health yellow, cluster{:qa:rolling-upgrade:v2.19.6-remote}
- IndexMetadata XContent deserialization failure (old node reading index metadata blobs written by upgraded cluster-manager):
[2026-03-18T11:33:45,850][ERROR][o.o.g.r.RemoteClusterStateService] [v2.19.6-remote-2] Failed to read cluster state from remote
org.opensearch.gateway.remote.RemoteStateTransferException: Download failed for java_for_range
at org.opensearch.gateway.remote.RemoteIndexMetadataManager.lambda$getWrappedReadListener$3(RemoteIndexMetadataManager.java:159)
at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.lambda$readAsync$0(RemoteWriteableEntityBlobStore.java:87)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955)
...
Caused by: java.lang.IllegalStateException: Can't get text on a START_ARRAY at -1:702
at org.opensearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:99)
at org.opensearch.core.xcontent.AbstractXContentParser.map(AbstractXContentParser.java:298)
at org.opensearch.core.xcontent.AbstractXContentParser.mapStrings(AbstractXContentParser.java:282)
at org.opensearch.cluster.metadata.IndexMetadata$Builder.fromXContent(IndexMetadata.java:2013)
at org.opensearch.cluster.metadata.IndexMetadata.fromXContent(IndexMetadata.java:1080)
at org.opensearch.repositories.blobstore.ChecksumBlobStoreFormat.deserialize(ChecksumBlobStoreFormat.java:144)
at org.opensearch.gateway.remote.model.RemoteIndexMetadata.deserialize(RemoteIndexMetadata.java:136)
at org.opensearch.gateway.remote.model.RemoteIndexMetadata.deserialize(RemoteIndexMetadata.java:35)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.read(RemoteWriteableEntityBlobStore.java:77)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.lambda$readAsync$0(RemoteWriteableEntityBlobStore.java:85)
This repeats for every index in the cluster (test_index, test_recovery, index_with_replicas, test_index_old, geo_shape_index_old, test-index-segrep, etc.).
- DiscoveryNodes binary deserialization failure (old node reading discovery nodes blob written by upgraded cluster-manager):
[2026-03-18T11:33:45,859][ERROR][o.o.g.r.RemoteClusterStateService] [v2.19.6-remote-2] Failed to read cluster state from remote
org.opensearch.gateway.remote.RemoteStateTransferException: Download failed for nodes
at org.opensearch.gateway.remote.RemoteClusterStateAttributesManager.lambda$getWrappedReadListener$3(RemoteClusterStateAttributesManager.java:103)
at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.lambda$readAsync$0(RemoteWriteableEntityBlobStore.java:87)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955)
...
Caused by: java.lang.IllegalStateException: unexpected byte [0x08]
at org.opensearch.core.common.io.stream.StreamInput.readBoolean(StreamInput.java:596)
at org.opensearch.core.common.io.stream.StreamInput.readBoolean(StreamInput.java:586)
at org.opensearch.cluster.node.DiscoveryNode.<init>(DiscoveryNode.java:344)
at org.opensearch.cluster.node.DiscoveryNodes.readFrom(DiscoveryNodes.java:777)
at org.opensearch.gateway.remote.model.RemoteDiscoveryNodes.lambda$static$0(RemoteDiscoveryNodes.java:37)
at org.opensearch.repositories.blobstore.ChecksumWritableBlobStoreFormat.deserialize(ChecksumWritableBlobStoreFormat.java:105)
at org.opensearch.gateway.remote.model.RemoteDiscoveryNodes.deserialize(RemoteDiscoveryNodes.java:101)
at org.opensearch.gateway.remote.model.RemoteDiscoveryNodes.deserialize(RemoteDiscoveryNodes.java:32)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.read(RemoteWriteableEntityBlobStore.java:77)
at org.opensearch.common.remote.RemoteWriteableEntityBlobStore.lambda$readAsync$0(RemoteWriteableEntityBlobStore.java:85)
Related component
Cluster Manager
To Reproduce
Not deterministic. I think it requires a scenario where there is a mixed version cluster, a new version node is elected as cluster manager, and the new version cluster manager publishes a new cluster state.
Expected behavior
The tests should pass every time.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status