control plane RollingUpdate config silently breaks cluster creation

**What happened**:

While following the documented process for [Bare Metal RollingUpgrades](https://anywhere.eks.amazonaws.com/docs/clustermgmt/cluster-upgrades/baremetal-upgrades/#rolling-upgrades), I prepared a new `cluster.yaml` that includes the `upgradeRolloutStrategy` set to `RollingUpdate` for both the control plane and worker node groups.

Initially, I encountered a validation error:

```
Error: the cluster config file provided is invalid: validating upgrade rollout strategy configuration: WorkerNodeGroupConfiguration: upgradeRolloutStrategy.rollingUpdate field is required for upgradeRolloutStrategy.type RollingUpdate
```

After addressing this by adding the required `rollingUpdate` fields for the worker group, the configuration passed validation. For the control plane, it appears `maxSurge` defaults to 1, and since the preflight checks succeed without explicitly setting it, I assumed that was acceptable.

Here is the diff between the original and updated `cluster.yaml`:

```diff
--- cluster.yaml.orig   2025-07-12 21:15:44.266427279 +0000
+++ cluster.yaml        2025-07-12 21:16:23.706199561 +0000
@@ -13,6 +13,8 @@
       cidrBlocks:
       - 10.96.0.0/12
   controlPlaneConfiguration:
+    upgradeRolloutStrategy:
+      type: RollingUpdate
     count: 3
     endpoint:
       host: "10.162.10.140"
@@ -31,6 +33,11 @@
       kind: TinkerbellMachineConfig
       name: eks-a
     name: worker-group
+    upgradeRolloutStrategy:
+      rollingUpdate:
+        maxSurge: 1
+        maxUnavailable: 0
+      type: RollingUpdate
 ---
 apiVersion: anywhere.eks.amazonaws.com/v1alpha1
 kind: TinkerbellDatacenterConfig
```
When attempting to create the cluster using:

```bash
eksctl anywhere create cluster \
  --hardware-csv hardware.csv \
  -f cluster.yaml \
  --no-timeouts
```

The process failed with an error that suggested a missing `TinkerbellMachineConfig`:

```
cluster has an error: Dependent cluster objects don't exist: TinkerbellMachineConfig.anywhere.eks.amazonaws.com "eks-a-cp" not found
```

However, this resource clearly exists in the bootstrap cluster:

```bash
$ kubectl get TinkerbellMachineConfig --all-namespaces
NAMESPACE   NAME       AGE
default     eks-a      54s
default     eks-a-cp   54s
```

Despite the resource being present, the cluster creation process keeps retrying endlessly. Even with `--no-timeouts`, which ironically guarantees being stuck in a loop with no helpful feedback.

---

**What you expected to happen**:

I expected the cluster creation to either work or fail with a clear and accurate error. If something is misconfigured, the message should explain what's actually wrong so it's possible to fix it without guesswork.

---

**How to reproduce it (as minimally and precisely as possible)**:

1. Create a `cluster.yaml` that includes `upgradeRolloutStrategy.type: RollingUpdate` for both control plane and worker groups. Ensure `rollingUpdate` values are included for the worker group but left out for the control plane (since it passes validation).
2. Run `eksctl anywhere create cluster` using the configuration.
3. Observe the error related to a missing `TinkerbellMachineConfig`, even though the resource is present, and note the endless retry loop behavior.

---

**Anything else we need to know?**:

---

**Environment**:
- EKS Anywhere Release
```
$ eksctl anywhere version
Version: v0.22.6
Release Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/eks-a/manifest.yaml
Bundle Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/100/manifest.yaml
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

control plane RollingUpdate config silently breaks cluster creation #9898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

control plane RollingUpdate config silently breaks cluster creation #9898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions