-
Notifications
You must be signed in to change notification settings - Fork 317
Description
What happened:
While following the documented process for Bare Metal RollingUpgrades, I prepared a new cluster.yaml
that includes the upgradeRolloutStrategy
set to RollingUpdate
for both the control plane and worker node groups.
Initially, I encountered a validation error:
Error: the cluster config file provided is invalid: validating upgrade rollout strategy configuration: WorkerNodeGroupConfiguration: upgradeRolloutStrategy.rollingUpdate field is required for upgradeRolloutStrategy.type RollingUpdate
After addressing this by adding the required rollingUpdate
fields for the worker group, the configuration passed validation. For the control plane, it appears maxSurge
defaults to 1, and since the preflight checks succeed without explicitly setting it, I assumed that was acceptable.
Here is the diff between the original and updated cluster.yaml
:
--- cluster.yaml.orig 2025-07-12 21:15:44.266427279 +0000
+++ cluster.yaml 2025-07-12 21:16:23.706199561 +0000
@@ -13,6 +13,8 @@
cidrBlocks:
- 10.96.0.0/12
controlPlaneConfiguration:
+ upgradeRolloutStrategy:
+ type: RollingUpdate
count: 3
endpoint:
host: "10.162.10.140"
@@ -31,6 +33,11 @@
kind: TinkerbellMachineConfig
name: eks-a
name: worker-group
+ upgradeRolloutStrategy:
+ rollingUpdate:
+ maxSurge: 1
+ maxUnavailable: 0
+ type: RollingUpdate
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: TinkerbellDatacenterConfig
When attempting to create the cluster using:
eksctl anywhere create cluster \
--hardware-csv hardware.csv \
-f cluster.yaml \
--no-timeouts
The process failed with an error that suggested a missing TinkerbellMachineConfig
:
cluster has an error: Dependent cluster objects don't exist: TinkerbellMachineConfig.anywhere.eks.amazonaws.com "eks-a-cp" not found
However, this resource clearly exists in the bootstrap cluster:
$ kubectl get TinkerbellMachineConfig --all-namespaces
NAMESPACE NAME AGE
default eks-a 54s
default eks-a-cp 54s
Despite the resource being present, the cluster creation process keeps retrying endlessly. Even with --no-timeouts
, which ironically guarantees being stuck in a loop with no helpful feedback.
What you expected to happen:
I expected the cluster creation to either work or fail with a clear and accurate error. If something is misconfigured, the message should explain what's actually wrong so it's possible to fix it without guesswork.
How to reproduce it (as minimally and precisely as possible):
- Create a
cluster.yaml
that includesupgradeRolloutStrategy.type: RollingUpdate
for both control plane and worker groups. EnsurerollingUpdate
values are included for the worker group but left out for the control plane (since it passes validation). - Run
eksctl anywhere create cluster
using the configuration. - Observe the error related to a missing
TinkerbellMachineConfig
, even though the resource is present, and note the endless retry loop behavior.
Anything else we need to know?:
Environment:
- EKS Anywhere Release
$ eksctl anywhere version
Version: v0.22.6
Release Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/eks-a/manifest.yaml
Bundle Manifest URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/100/manifest.yaml