Conversation
|
Not sure I quite understand the goals of this, it's already possible to support different schedulers via the pod specs? (though Kai is the only gang scheduler currently working). I'd suggest kicking off work like this off with a github issue with plenty of detail and a discord discussion as well. |
be3e7f3 to
1475638
Compare
Sure, Let me create a new issue to introduce this. I can introduce some background here. This is a real request from one of our customer. We have some schedulers which want to integrate Grove. It would be great to have a unify scheduler backend. In that way, we can support other schedulers easily. Since we need to support multiple scheduler as backend especially we need to support k8s 1.34 workload API. Once we have this backend framework we can easily add new scheduler support like default-kube scheduler, Koordinator. In this PR I will only involve scheduler backend framework. For KAI scheduler backend, I won't change the currently workflow that means KAI will still handle podgang and create podgroups/pods. |
418038f to
206f953
Compare
a24032c to
e9ec287
Compare
Ronkahn21
left a comment
There was a problem hiding this comment.
Overall looks great! A few architectural points to consider:
-
Controller Responsibility: I don’t think the pcs-controller should be updating the PodGang status. Ideally, it should only handle the creation, leaving the podGang-controller to manage its own status.
-
Scaling & Performance: We should discuss the PodGang pod reference fields. Adding this to the pcs-controller increases its complexity. For better scalability, it might be better to let the PodGroup own the pod status before we move toward creating the backend API.
Since the API changes are currently out of scope, we can sync on this later. Amazing job overall, thanks!
operator/internal/controller/podcliqueset/components/podgang/syncflow.go
Outdated
Show resolved
Hide resolved
| sort.Slice(podReferences, func(i, j int) bool { | ||
| return podReferences[i].Name < podReferences[j].Name | ||
| }) |
There was a problem hiding this comment.
we might need to change the api field to ignore order, to reduce case where sorting big pods list each time
There was a problem hiding this comment.
we might need to change the api field to ignore order, to reduce case where sorting big pods list each time
That's a good idea since we have customer which have over 500 pods. But in this case, we won't have the order of the pods. I am not sure is it acceptable?
There was a problem hiding this comment.
And if you do not sort it then there will be unnecessary updates to the PodGangs.
There was a problem hiding this comment.
I will add an TODO here, we can do it in future
operator/internal/controller/podcliqueset/components/podgang/syncflow.go
Outdated
Show resolved
Hide resolved
operator/internal/controller/podcliqueset/components/podgang/syncflow.go
Outdated
Show resolved
Hide resolved
operator/internal/controller/podcliqueset/components/podgang/syncflow.go
Outdated
Show resolved
Hide resolved
|
c90b6bd to
10cf479
Compare
aa4ca3b to
b0b609c
Compare
|
@kangclzjc please rebase your PR so that it becomes easier to review this PR. |
f6f5a9f to
d2f7151
Compare
5855c30 to
847e4dc
Compare
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Co-authored-by: Madhav Bhargava <madhav.bhargava@sap.com> Signed-off-by: Kang Zhang <100667394+kangclzjc@users.noreply.github.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
…tus correctly Signed-off-by: kangclzjc <kangz@nvidia.com>
… name Signed-off-by: kangclzjc <kangz@nvidia.com>
… keep all customer set Signed-off-by: kangclzjc <kangz@nvidia.com>
Signed-off-by: kangclzjc <kangz@nvidia.com>
e22aa86 to
0deba97
Compare
…alidation Signed-off-by: kangclzjc <kangz@nvidia.com>
68f7551 to
e6a8015
Compare
What type of PR is this?
In order to support different scheduler as backends we modify Grove and import scheduler backend interface
What this PR does / why we need it:
In the current PodGang component's sync flow we do the following:
PodGangresource.So you can see
PodGangwill be created after pods. However, there is a problem with upcomingWorkloadAPI support andkube-schedulerbackend.We don't want break current
PodGangworking flow. We import this scheduler backend framework to leave theWorkloadmanagement work to scheduler backend in Grove. For other scheduler, scheduler backend in Grove may manage different CR based onPodGang(Just like KAI, it will createPodGroups. In the future, we will move this management from KAI scheduler to Grove scheduler backend).To create a
Workloadobject, you will need to createPodGangresource. ThePodGangresource cannot be created before the Pods have been created and have a back reference to the PodGang. The issue is that only after theWorkloadobject is created will thekube-schedulerchoose to run scoring/filtering plugins to reserve node capacity to schedule this workload PodGroups. The Pods need to have a reference to theWorkloadobject in their spec.So to accommodate
WorkloadAPI the flow needs to be changed as below in the PodGang component:PodGangwith PodGroups(having emptyPodReferencesas none will exist at this point) andInitializedcondition set toFalse.PodGangwill trigger the creation of theWorkloadobject in the schedulerbackend reconciler which will use thekubescheduler backend.PodGanghasInitializedcondition set toTrue. - done in the PCLQ reconciler.Which issue(s) this PR fixes:
Fixes #275
Fixes #445
Special notes for your reviewer:
Does this PR introduce a API change?
Yes. We will introduce a new API
SchedulerBackendAdditional documentation e.g., enhancement proposals, usage docs, etc.: