Skip to content

Conversation

@Cali0707
Copy link
Member

Fixes #4604

Proposed Changes

  • Only create schedulers if there is an installed data plane statefulset

Release Note

fix: data plane dispatcher schedulers work when not all data plane statefulsets are installed

@knative-prow knative-prow bot requested review from aliok and matzew December 11, 2025 18:58
@Cali0707
Copy link
Member Author

/cc @creydr @twoGiants

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/control-plane labels Dec 11, 2025
@knative-prow knative-prow bot requested review from creydr and twoGiants December 11, 2025 18:58
@knative-prow
Copy link

knative-prow bot commented Dec 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2025
@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 17.07317% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 28.78%. Comparing base (42935d0) to head (594ffed).

Files with missing lines Patch % Lines
...l-plane/pkg/reconciler/consumergroup/controller.go 17.07% 64 Missing and 4 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4612   +/-   ##
=======================================
  Coverage   28.77%   28.78%           
=======================================
  Files         294      294           
  Lines       16169    16253   +84     
=======================================
+ Hits         4653     4678   +25     
- Misses      11063    11120   +57     
- Partials      453      455    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Cali0707
Copy link
Member Author

/cherry-pick release-1.20

@knative-prow-robot
Copy link
Contributor

@Cali0707: once the present PR merges, I will cherry-pick it on top of release-1.20 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Cali0707
Copy link
Member Author

/cherry-pick release-1.19

@knative-prow-robot
Copy link
Contributor

@Cali0707: once the present PR merges, I will cherry-pick it on top of release-1.19 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Cali0707
Copy link
Member Author

/retest-required

1 similar comment
@Cali0707
Copy link
Member Author

/retest-required

@Cali0707
Copy link
Member Author

/hold

This failure seems related to the changes I made - will look into it

@knative-prow knative-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 12, 2025
@Cali0707 Cali0707 force-pushed the fix-scheduler-errors branch from 1b1fd6d to 594ffed Compare December 15, 2025 14:28
@Cali0707
Copy link
Member Author

/retest-required

@Cali0707
Copy link
Member Author

/unhold

@knative-prow knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2025
@Cali0707
Copy link
Member Author

/test reconciler-tests-keda

Failure was unrelated - cert-manager did not become ready in time

Copy link
Contributor

@twoGiants twoGiants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix 😸 👍

I will continue with the review next week. It's more complex than it looks, I wasn't able to finish. There are nuances here and there. I left a few comments.

A regression unit tests which captures the bug would be great. Also tests which cover all the new logic.

See my comments below.

logger := logging.FromContext(sm.ctx)
logger.Infow("Removing scheduler for deleted StatefulSet", zap.String("statefulset", ssName), zap.String("scheduler", schedulerKey))

delete(sm.schedulers, schedulerKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the scheduler is a leader it is not demoted before it's removed. This will probably be an issue.

for ssName := range statefulSetToSchedulerKey {
if _, err := statefulSetLister.Get(ssName); err == nil {
schedulerMgr.createSchedulerForStatefulSet(ssName)
} else if !apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This it a bit difficult to read. I would invert the error here => else if apierrors.IsNotFound(err) { and then log the infow. And on other errors I would not proceed but exit with an error. Otherwise we continue with a partial setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StatefulSet scheduler fails when not all dispatchers are installed

3 participants