feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady by abhijith-darshan · Pull Request #1760 · cloudoperators/greenhouse

abhijith-darshan · 2026-01-30T13:33:41Z

Description

Set Cluster Ready status to False when all remote cluster Nodes are Unschedulable

What type of PR is this? (check all applicable)

Related Tickets & Documents

Related Issue [SPIKE] - PayloadSchedulable Condition for Clusters #1714

Added tests?

👍 yes
🙅 no, because they aren't needed
🙋 no, because I need help
Separate ticket for tests # (issue/pr)

e2e test to verify cluster status when nodes are Unschedulable

Added to documentation?

📜 README.md
🤝 Documentation pages updated
🙅 no documentation needed
(if applicable) generated OpenAPI docs for CRD changes

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing unit tests pass locally with my changes

Summary by CodeRabbit

New Features
- Introduced PayloadSchedulable condition type for cluster status monitoring.
- Control-plane nodes now excluded from readiness calculations.
- Unschedulable nodes tracked separately in cluster status.
Refactor
- Improved cluster readiness evaluation logic for better status accuracy.
- Updated controller naming for consistency.
Tests
- Added comprehensive end-to-end tests for cluster node readiness and state transitions.

internal/controller/cluster/cluster_status.go

coderabbitai · 2026-03-03T14:12:34Z

📝 Walkthrough

Walkthrough

Changes introduce a PayloadSchedulable condition to track cluster node scheduling readiness, convert Node counter fields from int32 to int in the API schema, refactor status reconciliation to exclude control-plane nodes from readiness calculations, update CRD definitions and documentation, rename controller registration keys, and extend E2E tests with node cordoning/uncordoning utilities and new test scenarios.

Changes

Cohort / File(s)	Summary
Type Definitions & Schema `api/v1alpha1/cluster_types.go`, `charts/manager/crds/greenhouse.sap_clusters.yaml`, `docs/reference/api/openapi.yaml`, `docs/reference/api/index.html`	Convert Node.Total and Node.Ready fields from int32 to int; add PayloadSchedulable condition type constant; remove int32 format specifications from CRD and OpenAPI schemas; update API documentation.
Controller Registration `cmd/greenhouse/controllers.go`	Rename controller keys: "bootStrap" → "bootstrap" and "clusterReconciler" → "cluster" in the knownControllers map.
Status Reconciliation Logic `internal/controller/cluster/cluster_status.go`	Introduce control-plane node filtering constant; add PayloadSchedulable condition initialization and reconciliation logic; exclude control-plane nodes from node counting; mark PayloadSchedulable as false when kubeconfig is invalid or no nodes are ready; add helper for control-plane node detection.
Status Tests `internal/controller/cluster/status_test.go`	Update numeric field type expectations from int32 to int; rewrite Gomega assertions to use non-nested matchers.
E2E Tests & Helpers `e2e/cluster/e2e_test.go`, `e2e/cluster/expect/expect.go`	Add new test context for cluster node readiness with cordoning/uncordoning; update ReconcileReadyNotReady to use annotations instead of labels; introduce CordonRemoteNodes and UnCordonRemoteNodes public helper functions.
Shared Test Utilities `e2e/shared/cluster.go`	Add conditional logging for ClusterRoleBinding lookup errors in OffBoardRemoteCluster.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Nodes now count without the format chains so tight,
Control-plane friends step back, let payloads take flight,
PayloadSchedulable blooms, from false to true,
Cordons dance on remote clusters, old tests born anew,
Bootstrap spelled right, the controller keys align—
A tidy refactor, everything in line! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: introducing PayloadSchedulable condition that shows False when worker nodes are unschedulable, directly addressing the PR's primary objective.
Description check	✅ Passed	The description follows the template structure with all required sections completed: clear description of changes, feature type selected, related issue referenced (`#1714`), tests confirmed added, documentation status marked, and checklist items completed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch spike/1714

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

internal/controller/cluster/cluster_status.go (1)

57-76: ⚠️ Potential issue | 🔴 Critical

Wire PayloadSchedulable into the Ready computation.

PayloadSchedulable is produced but not consumed by reconcileReadyStatus, so cluster Ready can stay true when all worker nodes are unschedulable.

🔧 Minimal fix

-		readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition)
+		readyCondition := r.reconcileReadyStatus(kubeConfigValidCondition, resourcesDeployedCondition, payloadSchedulable)

Also applies to: 87-87

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/controller/cluster/cluster_status.go` around lines 57 - 76, The
Ready computation ignores payloadSchedulable so Ready can be true even when
payloads cannot be scheduled; update reconcileReadyStatus usage to include the
payloadSchedulable condition (or change reconcileReadyStatus signature to accept
a third condition) and ensure reconcileReadyStatus considers
greenhousev1alpha1.PayloadSchedulable alongside kubeConfigValidCondition and
resourcesDeployedCondition when producing the Ready condition; modify the call
site where readyCondition is set (the readyCondition :=
r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus
implementation to incorporate the PayloadSchedulable condition into its
readiness logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/greenhouse/controllers.go`:
- Around line 45-47: The controller key names were changed which can break
existing enabled/disabled config; add backward-compatible alias entries mapping
the old keys to the same setup functions so both names work for at least one
release: duplicate the map entries so the old key strings point to the same
handlers (use (&clustercontrollers.BootstrapReconciler{}).SetupWithManager,
startClusterReconciler, and
(&clustercontrollers.KubeconfigReconciler{}).SetupWithManager) or implement a
small migration resolver that translates legacy keys to the new ones before
reading configuration.

In `@e2e/cluster/e2e_test.go`:
- Around line 278-301: The two asynchronous assertions using Eventually(func(g
Gomega) { ... }) (the blocks that fetch the Cluster by name
remoteClusterNodeName in namespace env.TestNamespace and check the
PayloadSchedulable condition) are missing the terminal matcher; append
.Should(Succeed()) to both Eventually(...) invocations so the inner Gomega
assertions are executed with polling and retries.

In `@e2e/cluster/expect/expect.go`:
- Around line 142-160: The cordon/uncordon helpers currently affect any
unschedulable nodes; modify CordonRemoteNodes to mark nodes it cordons with a
distinctive annotation (e.g.,
"e2e.k8s.io/cordoned-by":"expect.CordonRemoteNodes") and apply a single Patch
that sets node.Spec.Unschedulable=true and adds that annotation (skip
control-plane and already-annotated nodes), and modify the corresponding
Uncordon helper to only revert nodes that contain that exact annotation (unset
node.Spec.Unschedulable and remove the annotation with a Patch using
MergeFrom(base)); this scopes uncordon to nodes cordoned by these helpers and
prevents touching pre-existing unschedulable nodes.

In `@internal/controller/cluster/cluster_status.go`:
- Around line 266-268: The code currently always sets LastTransitionTime =
metav1.Now() when node.Spec.Unschedulable, causing spurious status churn; modify
the node.Spec.Unschedulable branch (the place creating a
greenhousev1alpha1.NodeStatus for unschedulable nodes) to preserve the existing
LastTransitionTime if the node was already marked unschedulable (e.g., check the
prior status entry for this node or compare the previous status message to "Node
is unschedulable") and only set LastTransitionTime = metav1.Now() when
transitioning from a different state to unschedulable; keep the rest of the
NodeStatus fields the same.

---

Duplicate comments:
In `@internal/controller/cluster/cluster_status.go`:
- Around line 57-76: The Ready computation ignores payloadSchedulable so Ready
can be true even when payloads cannot be scheduled; update reconcileReadyStatus
usage to include the payloadSchedulable condition (or change
reconcileReadyStatus signature to accept a third condition) and ensure
reconcileReadyStatus considers greenhousev1alpha1.PayloadSchedulable alongside
kubeConfigValidCondition and resourcesDeployedCondition when producing the Ready
condition; modify the call site where readyCondition is set (the readyCondition
:= r.reconcileReadyStatus(...) line) and update the reconcileReadyStatus
implementation to incorporate the PayloadSchedulable condition into its
readiness logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b6cf86 and bba2bd2.

📒 Files selected for processing (10)

api/v1alpha1/cluster_types.go
charts/manager/crds/greenhouse.sap_clusters.yaml
cmd/greenhouse/controllers.go
docs/reference/api/index.html
docs/reference/api/openapi.yaml
e2e/cluster/e2e_test.go
e2e/cluster/expect/expect.go
e2e/shared/cluster.go
internal/controller/cluster/cluster_status.go
internal/controller/cluster/status_test.go

💤 Files with no reviewable changes (2)

charts/manager/crds/greenhouse.sap_clusters.yaml
docs/reference/api/openapi.yaml

cmd/greenhouse/controllers.go

e2e/cluster/e2e_test.go

e2e/cluster/expect/expect.go

internal/controller/cluster/cluster_status.go

Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set ready false when all nodes are not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): e2e test node not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set not ready only when all nodes are not ready Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): set PayloadSchedulable condition Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> Automatic generation of CRD API Docs (chore): fix eventually Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): remove control-plane node check Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): fix flake test Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com> (chore): skip control plane nodes Signed-off-by: abhijith-darshan <abhijith.ravindra@sap.com>

abhijith-darshan requested a review from a team as a code owner January 30, 2026 13:33

github-actions bot added the size/L label Jan 30, 2026

uwe-mayer requested changes Jan 30, 2026

View reviewed changes

internal/controller/cluster/cluster_status.go Outdated Show resolved Hide resolved

abhijith-darshan force-pushed the spike/1714 branch from 7f657cc to c03284a Compare January 30, 2026 14:01

abhijith-darshan requested a review from uwe-mayer January 30, 2026 14:01

abhijith-darshan force-pushed the spike/1714 branch from c03284a to eafedb2 Compare January 30, 2026 15:27

github-actions bot added core-apis helm-charts labels Jan 30, 2026

abhijith-darshan changed the title ~~feat(clusters): cluster shows Ready=False when all worker nodes are un-schedulable~~ feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady Jan 30, 2026

github-actions bot added the documentation Improvements or additions to documentation label Jan 30, 2026

abhijith-darshan force-pushed the spike/1714 branch from db51886 to 7112676 Compare February 2, 2026 23:03

uwe-mayer linked an issue Feb 4, 2026 that may be closed by this pull request

[SPIKE] - PayloadSchedulable Condition for Clusters #1714

Open

1 task

abhijith-darshan force-pushed the spike/1714 branch 2 times, most recently from 9b334da to 991503c Compare February 4, 2026 13:34

abhijith-darshan force-pushed the spike/1714 branch 3 times, most recently from a91d627 to 77b4cc7 Compare February 23, 2026 13:38

abhijith-darshan force-pushed the spike/1714 branch from 77b4cc7 to bba2bd2 Compare March 3, 2026 14:12

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

cmd/greenhouse/controllers.go Show resolved Hide resolved

e2e/cluster/e2e_test.go Outdated Show resolved Hide resolved

e2e/cluster/expect/expect.go Show resolved Hide resolved

internal/controller/cluster/cluster_status.go Show resolved Hide resolved

uwe-mayer previously approved these changes Mar 18, 2026

View reviewed changes

abhijith-darshan dismissed uwe-mayer’s stale review via b609df8 March 20, 2026 10:18

abhijith-darshan force-pushed the spike/1714 branch 3 times, most recently from c178db3 to 9406fa2 Compare March 20, 2026 13:34

abhijith-darshan force-pushed the spike/1714 branch from 9406fa2 to daa850e Compare March 20, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760

feat(clusters): cluster shows PayloadSchedulable=False when all worker nodes are un-schedulable / NotReady#1760
abhijith-darshan wants to merge 1 commit intomainfrom
spike/1714

abhijith-darshan commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhijith-darshan commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What type of PR is this? (check all applicable)

Related Tickets & Documents

Added tests?

Added to documentation?

Checklist

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhijith-darshan commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading