Skip to content

feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760

Open
abhijith-darshan wants to merge 1 commit intomainfrom
spike/1714
Open

feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760
abhijith-darshan wants to merge 1 commit intomainfrom
spike/1714

Conversation

@abhijith-darshan
Copy link
Contributor

@abhijith-darshan abhijith-darshan commented Jan 30, 2026

Description

Set Cluster Ready status to False when all remote cluster Nodes are Unschedulable

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert

Related Tickets & Documents

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help
  • Separate ticket for tests # (issue/pr)

e2e test to verify cluster status when nodes are Unschedulable

Added to documentation?

  • 📜 README.md
  • 🤝 Documentation pages updated
  • 🙅 no documentation needed
  • (if applicable) generated OpenAPI docs for CRD changes

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • Introduced PayloadSchedulable condition type for cluster status monitoring.
    • Control-plane nodes now excluded from readiness calculations.
    • Unschedulable nodes tracked separately in cluster status.
  • Refactor

    • Improved cluster readiness evaluation logic for better status accuracy.
    • Updated controller naming for consistency.
  • Tests

    • Added comprehensive end-to-end tests for cluster node readiness and state transitions.

@abhijith-darshan abhijith-darshan requested a review from a team as a code owner January 30, 2026 13:33
@abhijith-darshan abhijith-darshan changed the title feat(clusters): cluster shows Ready=False when all worker nodes are un-schedulable feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady Jan 30, 2026
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 30, 2026
@uwe-mayer uwe-mayer linked an issue Feb 4, 2026 that may be closed by this pull request
1 task
@abhijith-darshan abhijith-darshan force-pushed the spike/1714 branch 2 times, most recently from 9b334da to 991503c Compare February 4, 2026 13:34
@abhijith-darshan abhijith-darshan force-pushed the spike/1714 branch 3 times, most recently from a91d627 to 77b4cc7 Compare February 23, 2026 13:38
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Changes introduce a PayloadSchedulable condition to track cluster node scheduling readiness, convert Node counter fields from int32 to int in the API schema, refactor status reconciliation to exclude control-plane nodes from readiness calculations, update CRD definitions and documentation, rename controller registration keys, and extend E2E tests with node cordoning/uncordoning utilities and new test scenarios.

Changes

Cohort / File(s) Summary
Type Definitions & Schema
api/v1alpha1/cluster_types.go, charts/manager/crds/greenhouse.sap_clusters.yaml, docs/reference/api/openapi.yaml, docs/reference/api/index.html
Convert Node.Total and Node.Ready fields from int32 to int; add PayloadSchedulable condition type constant; remove int32 format specifications from CRD and OpenAPI schemas; update API documentation.
Controller Registration
cmd/greenhouse/controllers.go
Rename controller keys: "bootStrap" → "bootstrap" and "clusterReconciler" → "cluster" in the knownControllers map.
Status Reconciliation Logic
internal/controller/cluster/cluster_status.go
Introduce control-plane node filtering constant; add PayloadSchedulable condition initialization and reconciliation logic; exclude control-plane nodes from node counting; mark PayloadSchedulable as false when kubeconfig is invalid or no nodes are ready; add helper for control-plane node detection.
Status Tests
internal/controller/cluster/status_test.go
Update numeric field type expectations from int32 to int; rewrite Gomega assertions to use non-nested matchers.
E2E Tests & Helpers
e2e/cluster/e2e_test.go, e2e/cluster/expect/expect.go
Add new test context for cluster node readiness with cordoning/uncordoning; update ReconcileReadyNotReady to use annotations instead of labels; introduce CordonRemoteNodes and UnCordonRemoteNodes public helper functions.
Shared Test Utilities
e2e/shared/cluster.go
Add conditional logging for ClusterRoleBinding lookup errors in OffBoardRemoteCluster.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Nodes now count without the format chains so tight,
Control-plane friends step back, let payloads take flight,
PayloadSchedulable blooms, from false to true,
Cordons dance on remote clusters, old tests born anew,
Bootstrap spelled right, the controller keys align—
A tidy refactor, everything in line! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: introducing PayloadSchedulable condition that shows False when worker nodes are unschedulable, directly addressing the PR's primary objective.
Description check ✅ Passed The description follows the template structure with all required sections completed: clear description of changes, feature type selected, related issue referenced (#1714), tests confirmed added, documentation status marked, and checklist items completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch spike/1714

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
internal/controller/cluster/cluster_status.go (1)

57-76: ⚠️ Potential issue | 🔴 Critical

Wire PayloadSchedulable into the Ready computation.

PayloadSchedulable is produced but not consumed by reconcileReadyStatus, so cluster Ready can stay true when all worker nodes are unschedulable.

🔧 Minimal fix
-		readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition)
+		readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition, payloadSchedulable)

Also applies to: 87-87

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/cluster/cluster_status.go` around lines 57 - 76, The
Ready computation ignores payloadSchedulable so Ready can be true even when
payloads cannot be scheduled; update reconcileReadyStatus usage to include the
payloadSchedulable condition (or change reconcileReadyStatus signature to accept
a third condition) and ensure reconcileReadyStatus considers
greenhousev1alpha1.PayloadSchedulable alongside kubeConfigValidCondition and
resourcesDeployedCondition when producing the Ready condition; modify the call
site where readyCondition is set (the readyCondition :=
r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus
implementation to incorporate the PayloadSchedulable condition into its
readiness logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/greenhouse/controllers.go`:
- Around line 45-47: The controller key names were changed which can break
existing enabled/disabled config; add backward-compatible alias entries mapping
the old keys to the same setup functions so both names work for at least one
release: duplicate the map entries so the old key strings point to the same
handlers (use (&clustercontrollers.BootstrapReconciler{}).SetupWithManager,
startClusterReconciler, and
(&clustercontrollers.KubeconfigReconciler{}).SetupWithManager) or implement a
small migration resolver that translates legacy keys to the new ones before
reading configuration.

In `@e2e/cluster/e2e_test.go`:
- Around line 278-301: The two asynchronous assertions using Eventually(func(g
Gomega) { ... }) (the blocks that fetch the Cluster by name
remoteClusterNodeName in namespace env.TestNamespace and check the
PayloadSchedulable condition) are missing the terminal matcher; append
.Should(Succeed()) to both Eventually(...) invocations so the inner Gomega
assertions are executed with polling and retries.

In `@e2e/cluster/expect/expect.go`:
- Around line 142-160: The cordon/uncordon helpers currently affect any
unschedulable nodes; modify CordonRemoteNodes to mark nodes it cordons with a
distinctive annotation (e.g.,
"e2e.k8s.io/cordoned-by":"expect.CordonRemoteNodes") and apply a single Patch
that sets node.Spec.Unschedulable=true and adds that annotation (skip
control-plane and already-annotated nodes), and modify the corresponding
Uncordon helper to only revert nodes that contain that exact annotation (unset
node.Spec.Unschedulable and remove the annotation with a Patch using
MergeFrom(base)); this scopes uncordon to nodes cordoned by these helpers and
prevents touching pre-existing unschedulable nodes.

In `@internal/controller/cluster/cluster_status.go`:
- Around line 266-268: The code currently always sets LastTransitionTime =
metav1.Now() when node.Spec.Unschedulable, causing spurious status churn; modify
the node.Spec.Unschedulable branch (the place creating a
greenhousev1alpha1.NodeStatus for unschedulable nodes) to preserve the existing
LastTransitionTime if the node was already marked unschedulable (e.g., check the
prior status entry for this node or compare the previous status message to "Node
is unschedulable") and only set LastTransitionTime = metav1.Now() when
transitioning from a different state to unschedulable; keep the rest of the
NodeStatus fields the same.

---

Duplicate comments:
In `@internal/controller/cluster/cluster_status.go`:
- Around line 57-76: The Ready computation ignores payloadSchedulable so Ready
can be true even when payloads cannot be scheduled; update reconcileReadyStatus
usage to include the payloadSchedulable condition (or change
reconcileReadyStatus signature to accept a third condition) and ensure
reconcileReadyStatus considers greenhousev1alpha1.PayloadSchedulable alongside
kubeConfigValidCondition and resourcesDeployedCondition when producing the Ready
condition; modify the call site where readyCondition is set (the readyCondition
:= r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus
implementation to incorporate the PayloadSchedulable condition into its
readiness logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6cf86 and bba2bd2.

📒 Files selected for processing (10)
  • api/v1alpha1/cluster_types.go
  • charts/manager/crds/greenhouse.sap_clusters.yaml
  • cmd/greenhouse/controllers.go
  • docs/reference/api/index.html
  • docs/reference/api/openapi.yaml
  • e2e/cluster/e2e_test.go
  • e2e/cluster/expect/expect.go
  • e2e/shared/cluster.go
  • internal/controller/cluster/cluster_status.go
  • internal/controller/cluster/status_test.go
💤 Files with no reviewable changes (2)
  • charts/manager/crds/greenhouse.sap_clusters.yaml
  • docs/reference/api/openapi.yaml

uwe-mayer
uwe-mayer previously approved these changes Mar 18, 2026
Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): set ready false when all nodes are not ready

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): e2e test node not ready

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): set not ready only when all nodes are not ready

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): set PayloadSchedulable condition

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

Automatic generation of CRD API Docs

(chore): fix eventually

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): remove control-plane node check

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): fix flake test

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

(chore): skip control plane nodes

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-apis documentation Improvements or additions to documentation helm-charts size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SPIKE] - PayloadSchedulable Condition for Clusters

2 participants