Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
f706786
add scheduler backend framework
kangclzjc Feb 22, 2026
75210ed
remove podgang label
kangclzjc Feb 22, 2026
f38e2c5
prevent update schduelerName
kangclzjc Feb 23, 2026
fc3065d
Apply suggestion from @unmarshall
kangclzjc Feb 27, 2026
b6fff72
rename schedulerBackend to backend
kangclzjc Feb 27, 2026
ffaff4a
rename to schedBackend
kangclzjc Feb 27, 2026
456068d
move interface to common
kangclzjc Feb 27, 2026
6d0a1a9
fix tas
kangclzjc Feb 27, 2026
2c34b1e
reuse check pclq health
kangclzjc Feb 28, 2026
9aefb90
remove useless test and manually call config
kangclzjc Feb 28, 2026
b9e4dde
move podgang methods from syncflow to podgang in pcs podgang component
kangclzjc Feb 28, 2026
78237d7
combine config in handler
kangclzjc Feb 28, 2026
7252a3d
if empty call Getdefault to get default backend name
kangclzjc Feb 28, 2026
92969d8
remove initOnce
kangclzjc Feb 28, 2026
4010d3c
add register test for predict podgang
kangclzjc Feb 28, 2026
d2f8bd6
move back to schedulerbackend dir and use switch to replace factory
kangclzjc Mar 1, 2026
31ce24f
reduce duplicated error logs when update podgang pod reference or status
kangclzjc Mar 2, 2026
e89f93c
add WithStatusSubresource in fake client otherwise it won't patch sta…
kangclzjc Mar 2, 2026
62097f7
Backend Name reture pod-facing scheduler name which is real scheduler…
kangclzjc Mar 2, 2026
667ce0f
add config nil check and if admin set kube default false, keep it and…
kangclzjc Mar 4, 2026
5c7842e
print all scheduler names if not uniq
kangclzjc Mar 5, 2026
e6a8015
change api to defaultProfileName and add test in config and webhook v…
kangclzjc Mar 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/api-reference/operator-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,10 @@ _Appears in:_
| `enableProfiling` _boolean_ | EnableProfiling enables profiling via host:port/debug/pprof/ endpoints. | | |






#### LeaderElectionConfiguration


Expand Down Expand Up @@ -865,6 +869,57 @@ _Appears in:_
| `concurrentSyncs` _integer_ | ConcurrentSyncs is the number of workers used for the controller to concurrently work on events. | | |


#### SchedulerConfiguration



SchedulerConfiguration configures scheduler profiles and which is the default.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profiles` _[SchedulerProfile](#schedulerprofile) array_ | Profiles is the list of scheduler profiles. Each profile has a backend name and optional config.<br />The kube-scheduler backend is always enabled; use profile name "kube-scheduler" to configure or set it as default.<br />Valid profile names: "kube-scheduler", "kai-scheduler". Use defaultProfileName to designate the default backend. If not set, defaulting sets it to "kube-scheduler". | | |
| `defaultProfileName` _string_ | DefaultProfileName is the name of the default scheduler profile. If unset, defaulting sets it to "kube-scheduler". | | |


#### SchedulerName

_Underlying type:_ _string_

SchedulerName defines the name of the scheduler backend (used in OperatorConfiguration scheduler.profiles[].name).



_Appears in:_
- [SchedulerProfile](#schedulerprofile)

| Field | Description |
| --- | --- |
| `kai-scheduler` | SchedulerNameKai is the KAI scheduler backend.<br /> |
| `kube-scheduler` | SchedulerNameKube is the profile name for the Kubernetes default scheduler in OperatorConfiguration.<br /> |


#### SchedulerProfile



SchedulerProfile defines a scheduler backend profile with optional backend-specific config.



_Appears in:_
- [SchedulerConfiguration](#schedulerconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _[SchedulerName](#schedulername)_ | Name is the scheduler profile name. Valid values: "kube-scheduler", "kai-scheduler".<br />For the Kubernetes default scheduler use "kube-scheduler"; Pod.Spec.SchedulerName will be set to "default-scheduler". | | Enum: [kai-scheduler kube-scheduler] <br />Required: \{\} <br /> |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#rawextension-runtime-pkg)_ | Config holds backend-specific options. The operator unmarshals it into the config type for this backend (see backend config types). | | |


#### Server


Expand Down
52 changes: 25 additions & 27 deletions docs/proposals/375-scheduler-backend-framework/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ func Initialize(client client.Client, scheme *runtime.Scheme, eventRecorder reco
// Get returns the backend for the given name. kube-scheduler is always available; other backends return nil if not enabled via a profile.
func Get(name string) SchedulerBackend

// GetDefault returns the backend designated as default in OperatorConfiguration (the profile with default: true; if none, kube-scheduler). The manager does not define the default; it exposes the one from config.
// GetDefault returns the backend designated as default in OperatorConfiguration (scheduler.defaultProfileName).
func GetDefault() SchedulerBackend

```
Expand All @@ -241,12 +241,15 @@ type OperatorConfiguration struct {

// SchedulerConfiguration configures scheduler profiles and which is the default.
type SchedulerConfiguration struct {
// Profiles is the list of scheduler profiles. Each profile has a backend name, optional config, and whether it is the default.
// Profiles is the list of scheduler profiles. Each profile has a backend name and optional config.
// The kube-scheduler backend is always enabled and active even if not listed here. Listing "kube-scheduler" in profiles
// only adds a profile (e.g. with config like GangScheduling: false) and allows marking it as default.
// Valid backend names: "kube-scheduler", "kai-scheduler". Exactly one profile should have default: true; if none, kube-scheduler is the default.
// only adds a profile (e.g. with config like GangScheduling: false). Use defaultProfileName to designate the default backend.
// Valid backend names: "kube-scheduler", "kai-scheduler". If defaultProfileName is unset, defaulting sets it to "kube-scheduler".
// +optional
Profiles []SchedulerProfile `json:"profiles,omitempty"`
// DefaultProfileName is the name of the default scheduler profile.
// +optional
DefaultProfileName string `json:"defaultProfileName,omitempty"`
}

// SchedulerName is the name for a supported scheduler backend.
Expand All @@ -270,7 +273,7 @@ var SupportedSchedulerNames = []SchedulerName {
// <add any other supported backend scheduler constant here>
}

// SchedulerProfile defines a scheduler backend profile with optional backend-specific config and default flag.
// SchedulerProfile defines a scheduler backend profile with optional backend-specific config.
type SchedulerProfile struct {
// Name is the scheduler backend name. Valid values: "kube-scheduler", "kai-scheduler".
// +kubebuilder:validation:Enum=kai-scheduler;kube-scheduler
Expand All @@ -279,18 +282,15 @@ type SchedulerProfile struct {
// Config holds backend-specific options. The operator unmarshals it into the config type for this backend (see backend config types below).
// +optional
Config *runtime.RawExtension `json:"config,omitempty"`

// Default indicates this profile is the default backend when a workload does not specify one. Exactly one profile should have default: true.
// +optional
Default bool `json:"default,omitempty"`
}
```

The `OperatorConfiguration` provides a way to enable and configure one or more scheduler backends. `SchedulerProfile` allows you to configure the following:

- **Name:** This is the name of the scheduler backend. This must be one of the supported schedulers.
- **Config:** Optional scheduler-specific configuration as `runtime.RawExtension`. It is the responsibility of the scheduler backend implementation to interpret and possibly deserialize it to type.
- **Default:** Indicates if this scheduler should be the default. In case no scheduler name is set in any `PodSpec` across all `PodCliqueTemplateSpec` then the default scheduler as indicated via this field will be set.

`SchedulerConfiguration.defaultProfileName` designates which profile is the default. When no scheduler name is set in any `PodSpec` across all `PodCliqueTemplateSpec`, the default scheduler indicated by `defaultProfileName` will be used.

**Backend Enabling Behavior:**

Expand All @@ -300,22 +300,20 @@ The kube-scheduler backend has special behavior compared to other scheduler back

2. **Explicit Configuration Optional**: You only need to add kube-scheduler to `profiles` if you want to:
- Configure it with specific options (e.g., `gangScheduling: true`)
- Explicitly mark it as the default (though it's already the default if no other profile sets `default: true`)
- Set it as the default via `defaultProfileName` (defaulting sets kube-scheduler as default when `defaultProfileName` is unset)

3. **Other Schedulers Require Explicit Enablement**: All non-kube-scheduler backends (kai-scheduler, third-party schedulers) must be explicitly listed in `profiles` to be enabled. If a workload references a scheduler that is not in the profiles list, the validating webhook will reject the PodCliqueSet.

4. **Default Selection Logic**:
- If `profiles` is empty → kube-scheduler is the default
- If exactly one profile has `default: true` → that backend is the default
- If multiple profiles have `default: true` → operator startup fails with validation error
- If no profile has `default: true` → kube-scheduler is the default (even if not in the list)
- If `profiles` is empty → defaulting adds kube-scheduler and sets `defaultProfileName: "kube-scheduler"`
- `defaultProfileName` must be one of the configured profile names; validation rejects invalid or missing default profile name

If no `SchedulerProfile` has been set, then Grove operator behaves as if you specified:
```yaml
scheduler:
defaultProfileName: kube-scheduler
profiles:
- name: "kube-scheduler"
default: true
- name: kube-scheduler
```

> NOTE: If you as a workload operator wish to use a specific scheduler, please ensure that it has been enabled and properly configured as part of `OperatorConfiguration`. If PodCliqueSet uses a scheduler which has not been enabled, then the validating webhook will reject any creation request for this PodCliqueSet.
Expand All @@ -336,46 +334,46 @@ type KubeSchedulerConfig struct {

```yaml
# --- Omit scheduler profiles completely ---
# Same as profiles: [{ name: "kube-scheduler", default: true }]
# Same as defaultProfileName: kube-scheduler, profiles: [{ name: "kube-scheduler" }]
```

```yaml
# --- Single scheduler profile, no specific configuration ---
scheduler:
defaultProfileName: kube-scheduler
profiles:
- name: "kube-scheduler"
default: true
- name: kube-scheduler
# In this configuration Gang Scheduling will not be enabled
```

```yaml
# --- Single scheduler profile with configuration ---
scheduler:
defaultProfileName: kube-scheduler
profiles:
- name: "kube-scheduler"
- name: kube-scheduler
config:
gangScheduling: true
default: true
```

```yaml
# --- Multiple scheduler profiles; default is kube-scheduler ---
scheduler:
defaultProfileName: kube-scheduler
profiles:
- name: "kube-scheduler"
- name: kube-scheduler
config:
gangScheduling: true
default: true
- name: "kai-scheduler" # no scheduler-specific configuration is defined
- name: kai-scheduler # no scheduler-specific configuration is defined
```

```yaml
# --- Only kai-scheduler profile; kube-scheduler is still implicitly available but kai-scheduler is the default ---
scheduler:
defaultProfileName: kai-scheduler
profiles:
- name: "kai-scheduler"
- name: kai-scheduler
config: {}
default: true
```


Expand Down
2 changes: 2 additions & 0 deletions operator/api/common/labels.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ const (
LabelPodCliqueScalingGroupReplicaIndex = "grove.io/podcliquescalinggroup-replica-index"
// LabelPodTemplateHash is a key for a label that sets the hash of the PodSpec. This label will be set on a PodClique and will be shared by all pods in the PodClique.
LabelPodTemplateHash = "grove.io/pod-template-hash"
// LabelSchedulerName is a label on PodGang that indicates which scheduler backend should sync this PodGang.
LabelSchedulerName = "grove.io/scheduler-name"
)

// Labels for setting component names for all managed resources whose lifecycle
Expand Down
31 changes: 31 additions & 0 deletions operator/api/config/v1alpha1/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,37 @@ func SetDefaults_OperatorConfiguration(operatorConfig *OperatorConfiguration) {
}
}

// SetDefaults_SchedulerConfiguration sets defaults for scheduler configuration.
// Principle: respect all user-explicit values first.
//
// 1. If user did not include kube in profiles, add kube.
// 2. If defaultProfileName is unset, set it to "kube-scheduler". Validation will reject invalid cases.
func SetDefaults_SchedulerConfiguration(cfg *SchedulerConfiguration) {
if len(cfg.Profiles) == 0 {
cfg.Profiles = []SchedulerProfile{
{Name: SchedulerNameKube},
}
cfg.DefaultProfileName = string(SchedulerNameKube)
return
}
// 1. If user didn't add kube, add it.
hasKube := false
for i := range cfg.Profiles {
if cfg.Profiles[i].Name == SchedulerNameKube {
hasKube = true
break
}
}
if !hasKube {
cfg.Profiles = append(cfg.Profiles, SchedulerProfile{Name: SchedulerNameKube})
}

// 2. No default profile name → set kube as default.
if cfg.DefaultProfileName == "" {
cfg.DefaultProfileName = string(SchedulerNameKube)
}
}

// SetDefaults_ServerConfiguration sets defaults for the server configuration.
func SetDefaults_ServerConfiguration(serverConfig *ServerConfiguration) {
if serverConfig.Webhooks.Port == 0 {
Expand Down
128 changes: 128 additions & 0 deletions operator/api/config/v1alpha1/defaults_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
// /*
// Copyright 2026 The Grove Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// */

package v1alpha1

import (
"testing"

"github.com/stretchr/testify/assert"
)

func TestSetDefaults_SchedulerConfiguration(t *testing.T) {
tests := []struct {
name string
cfg *SchedulerConfiguration
wantProfiles []SchedulerProfile
wantDefaultProfile string
}{
{
name: "empty profiles: add kube and set defaultProfileName",
cfg: &SchedulerConfiguration{},
wantProfiles: []SchedulerProfile{{Name: SchedulerNameKube}},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "nil profiles (len 0): add kube and set defaultProfileName",
cfg: &SchedulerConfiguration{
Profiles: nil,
DefaultProfileName: "",
},
wantProfiles: []SchedulerProfile{{Name: SchedulerNameKube}},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "only kai in profiles: append kube and set defaultProfileName",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{{Name: SchedulerNameKai}},
DefaultProfileName: "",
},
wantProfiles: []SchedulerProfile{{Name: SchedulerNameKai}, {Name: SchedulerNameKube}},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "only kube in profiles, defaultProfileName unset: set defaultProfileName",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{{Name: SchedulerNameKube}},
DefaultProfileName: "",
},
wantProfiles: []SchedulerProfile{{Name: SchedulerNameKube}},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "kube and kai in profiles, defaultProfileName unset: set defaultProfileName to kube",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
DefaultProfileName: "",
},
wantProfiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "kube and kai in profiles, defaultProfileName already set to kube: no change",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
DefaultProfileName: string(SchedulerNameKube),
},
wantProfiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
wantDefaultProfile: string(SchedulerNameKube),
},
{
name: "kube and kai in profiles, defaultProfileName already set to kai: no change",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
DefaultProfileName: string(SchedulerNameKai),
},
wantProfiles: []SchedulerProfile{
{Name: SchedulerNameKube},
{Name: SchedulerNameKai},
},
wantDefaultProfile: string(SchedulerNameKai),
},
{
name: "only kai in profiles, defaultProfileName already kai: append kube only",
cfg: &SchedulerConfiguration{
Profiles: []SchedulerProfile{{Name: SchedulerNameKai}},
DefaultProfileName: string(SchedulerNameKai),
},
wantProfiles: []SchedulerProfile{{Name: SchedulerNameKai}, {Name: SchedulerNameKube}},
wantDefaultProfile: string(SchedulerNameKai),
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
SetDefaults_SchedulerConfiguration(tt.cfg)
assert.Equal(t, tt.wantProfiles, tt.cfg.Profiles, "Profiles after defaulting")
assert.Equal(t, tt.wantDefaultProfile, tt.cfg.DefaultProfileName, "DefaultProfileName after defaulting")
})
}
}
Loading
Loading