feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760
feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760abhijith-darshan wants to merge 1 commit intomainfrom
Conversation
7f657cc to
c03284a
Compare
c03284a to
eafedb2
Compare
db51886 to
7112676
Compare
9b334da to
991503c
Compare
a91d627 to
77b4cc7
Compare
77b4cc7 to
bba2bd2
Compare
📝 WalkthroughWalkthroughChanges introduce a PayloadSchedulable condition to track cluster node scheduling readiness, convert Node counter fields from int32 to int in the API schema, refactor status reconciliation to exclude control-plane nodes from readiness calculations, update CRD definitions and documentation, rename controller registration keys, and extend E2E tests with node cordoning/uncordoning utilities and new test scenarios. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
internal/controller/cluster/cluster_status.go (1)
57-76:⚠️ Potential issue | 🔴 CriticalWire
PayloadSchedulableinto theReadycomputation.
PayloadSchedulableis produced but not consumed byreconcileReadyStatus, so clusterReadycan stay true when all worker nodes are unschedulable.🔧 Minimal fix
- readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition) + readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition, payloadSchedulable)Also applies to: 87-87
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/cluster/cluster_status.go` around lines 57 - 76, The Ready computation ignores payloadSchedulable so Ready can be true even when payloads cannot be scheduled; update reconcileReadyStatus usage to include the payloadSchedulable condition (or change reconcileReadyStatus signature to accept a third condition) and ensure reconcileReadyStatus considers greenhousev1alpha1.PayloadSchedulable alongside kubeConfigValidCondition and resourcesDeployedCondition when producing the Ready condition; modify the call site where readyCondition is set (the readyCondition := r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus implementation to incorporate the PayloadSchedulable condition into its readiness logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/greenhouse/controllers.go`:
- Around line 45-47: The controller key names were changed which can break
existing enabled/disabled config; add backward-compatible alias entries mapping
the old keys to the same setup functions so both names work for at least one
release: duplicate the map entries so the old key strings point to the same
handlers (use (&clustercontrollers.BootstrapReconciler{}).SetupWithManager,
startClusterReconciler, and
(&clustercontrollers.KubeconfigReconciler{}).SetupWithManager) or implement a
small migration resolver that translates legacy keys to the new ones before
reading configuration.
In `@e2e/cluster/e2e_test.go`:
- Around line 278-301: The two asynchronous assertions using Eventually(func(g
Gomega) { ... }) (the blocks that fetch the Cluster by name
remoteClusterNodeName in namespace env.TestNamespace and check the
PayloadSchedulable condition) are missing the terminal matcher; append
.Should(Succeed()) to both Eventually(...) invocations so the inner Gomega
assertions are executed with polling and retries.
In `@e2e/cluster/expect/expect.go`:
- Around line 142-160: The cordon/uncordon helpers currently affect any
unschedulable nodes; modify CordonRemoteNodes to mark nodes it cordons with a
distinctive annotation (e.g.,
"e2e.k8s.io/cordoned-by":"expect.CordonRemoteNodes") and apply a single Patch
that sets node.Spec.Unschedulable=true and adds that annotation (skip
control-plane and already-annotated nodes), and modify the corresponding
Uncordon helper to only revert nodes that contain that exact annotation (unset
node.Spec.Unschedulable and remove the annotation with a Patch using
MergeFrom(base)); this scopes uncordon to nodes cordoned by these helpers and
prevents touching pre-existing unschedulable nodes.
In `@internal/controller/cluster/cluster_status.go`:
- Around line 266-268: The code currently always sets LastTransitionTime =
metav1.Now() when node.Spec.Unschedulable, causing spurious status churn; modify
the node.Spec.Unschedulable branch (the place creating a
greenhousev1alpha1.NodeStatus for unschedulable nodes) to preserve the existing
LastTransitionTime if the node was already marked unschedulable (e.g., check the
prior status entry for this node or compare the previous status message to "Node
is unschedulable") and only set LastTransitionTime = metav1.Now() when
transitioning from a different state to unschedulable; keep the rest of the
NodeStatus fields the same.
---
Duplicate comments:
In `@internal/controller/cluster/cluster_status.go`:
- Around line 57-76: The Ready computation ignores payloadSchedulable so Ready
can be true even when payloads cannot be scheduled; update reconcileReadyStatus
usage to include the payloadSchedulable condition (or change
reconcileReadyStatus signature to accept a third condition) and ensure
reconcileReadyStatus considers greenhousev1alpha1.PayloadSchedulable alongside
kubeConfigValidCondition and resourcesDeployedCondition when producing the Ready
condition; modify the call site where readyCondition is set (the readyCondition
:= r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus
implementation to incorporate the PayloadSchedulable condition into its
readiness logic.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
api/v1alpha1/cluster_types.gocharts/manager/crds/greenhouse.sap_clusters.yamlcmd/greenhouse/controllers.godocs/reference/api/index.htmldocs/reference/api/openapi.yamle2e/cluster/e2e_test.goe2e/cluster/expect/expect.goe2e/shared/cluster.gointernal/controller/cluster/cluster_status.gointernal/controller/cluster/status_test.go
💤 Files with no reviewable changes (2)
- charts/manager/crds/greenhouse.sap_clusters.yaml
- docs/reference/api/openapi.yaml
c178db3 to
9406fa2
Compare
Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set ready false when all nodes are not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): e2e test node not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set not ready only when all nodes are not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set PayloadSchedulable condition Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> Automatic generation of CRD API Docs (chore): fix eventually Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): remove control-plane node check Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): fix flake test Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): skip control plane nodes Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>
9406fa2 to
daa850e
Compare
Description
Set Cluster Ready status to False when all remote cluster
NodesareUnschedulableWhat type of PR is this? (check all applicable)
Related Tickets & Documents
Added tests?
e2e test to verify cluster status when nodes are
UnschedulableAdded to documentation?
Checklist
Summary by CodeRabbit
New Features
Refactor
Tests