Skip to content

Commit 7bd2053

Browse files
shmuelknirrozenbaumdanehansSinaChavoshicaozhuozi
authored
feat: Load the SchedulerConfig from a configuration file/text and make it easier to add plugins (#881)
* configuration implementation (after rebase...) Signed-off-by: Shmuel Kallner <[email protected]> * Moved plugin registry back to pkg/epp/plugins Signed-off-by: Shmuel Kallner <[email protected]> * Removed unneeded 'forced imports' of scorers Signed-off-by: Shmuel Kallner <[email protected]> * Changed 'profilepicker' to 'profilehandler' in new and old code Signed-off-by: Shmuel Kallner <[email protected]> * Pass the configured SchedulingProfiles to LoadSchedulerConfig Signed-off-by: Shmuel Kallner <[email protected]> * Ensure that both the configText and configFile flags are not specified Signed-off-by: Shmuel Kallner <[email protected]> * Load RequestControl plugins from the configuration Signed-off-by: Shmuel Kallner <[email protected]> * Register all plugin factories Signed-off-by: Shmuel Kallner <[email protected]> * Review fixes Signed-off-by: Shmuel Kallner <[email protected]> * Reverted unneeded change Signed-off-by: Shmuel Kallner <[email protected]> * Updates from review comments Signed-off-by: Shmuel Kallner <[email protected]> * Added a stub interface for plugins to get data from the EPP Signed-off-by: Shmuel Kallner <[email protected]> * Added a temporary implementation of plugins.Handle Signed-off-by: Shmuel Kallner <[email protected]> * Added pluginName and plugins.Handle to plugin factory interface Signed-off-by: Shmuel Kallner <[email protected]> * Updated plugin factory signatures to reflect new API Signed-off-by: Shmuel Kallner <[email protected]> * Updated plugin instantiation to reflect new API Signed-off-by: Shmuel Kallner <[email protected]> * Updated plugin instantiation to reflect new API Signed-off-by: Shmuel Kallner <[email protected]> * Updated tests to reflect new API Signed-off-by: Shmuel Kallner <[email protected]> * Do not rename the imported package Signed-off-by: Shmuel Kallner <[email protected]> * Only upper layer of code should log errors Signed-off-by: Shmuel Kallner <[email protected]> * Only pass what is needed to instantiate the plugins Signed-off-by: Shmuel Kallner <[email protected]> * Review updates Signed-off-by: Shmuel Kallner <[email protected]> * Review update Signed-off-by: Shmuel Kallner <[email protected]> * Review update. Make more clear that the code only checks for already defined names Signed-off-by: Shmuel Kallner <[email protected]> * fixed e2e doc in makefile (does not require GPUs) (#976) Signed-off-by: Nir Rozenbaum <[email protected]> * API: Adds 5xx Status Code for Invalid ExtRef (#991) Signed-off-by: Daneyon Hansen <[email protected]> * feat(conformance): Add test for invalid EPP service reference (#959) * fix boilerplate header * add tests for InferencePoolInvalidEPPService * change to expect error on httproute refcond * moved the creation of the context to main.go. (#995) this is useful when writing a different main like llm-d, allowing to propogate the same context to the whole system. Signed-off-by: Nir Rozenbaum <[email protected]> * fix dead links (#989) * feat: add health check for epp cluster (#966) * feat: add health check for epp cluster Signed-off-by: zhengkezhou1 <[email protected]> * remove tls Signed-off-by: zhengkezhou1 <[email protected]> * don't use tls Signed-off-by: zhengkezhou1 <[email protected]> * health checking flag Signed-off-by: zhengkezhou1 <[email protected]> * fix import Signed-off-by: zhengkezhou1 <[email protected]> * add tls options Signed-off-by: zhengkezhou1 <[email protected]> --------- Signed-off-by: zhengkezhou1 <[email protected]> * Server unit test and utility to help with such tests (#820) Signed-off-by: Ira <[email protected]> * Update dynamic-lora-sidecar to expose metrics to track loaded adapters (#980) * Add a metrics to track loaded adapters * Update the sample manifests * Add explanation of metrics from dyanmic LoRA adapter sidecar * Add explanation of metrics from dyanmic LoRA adapter sidecar (take 2) * Update metrics.md based on feedback * refactor: Replace prefix cache structure with golang-lru (#928) * refactor: Replace prefix cache structure with golang-lru Signed-off-by: Kfir Toledo <[email protected]> Co-authored-by: Maroon Ayoub <[email protected]> * fix: rename prefix scorer parameters and convert test to benchmark test Signed-off-by: Kfir Toledo <[email protected]> * feat: Add per server LRU capacity Signed-off-by: Kfir Toledo <[email protected]> * fix: Fix typos and error handle Signed-off-by: Kfir Toledo <[email protected]> * fix: add safety check for LRUCapacityPerServer Signed-off-by: Kfir Toledo <[email protected]> --------- Signed-off-by: Kfir Toledo <[email protected]> Co-authored-by: Maroon Ayoub <[email protected]> * feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test (#834) * copy of accepted inference pool test to start from. * add yaml file for the test * update time out * update the yaml file to add port 9002 * read timeout config from local repo * remove excess comments * correct spelling for scenarios * check route condition on RouteConditionResolvedRefs * remove empty lines in yaml * set optional/defaulted fields as unspecified * fix timeout * fix boilerplate header * change varialbe names to use primary secondary consistently. * remove extra comments * factor out common code * Add actual http traffic validation using echo-basic * remove extra comments from manifest * remove modifiedTimeoutConfig.HTTPRouteMustHaveCondition per review comment. * intermediate update * fix the test run * factor out common code * move epp def to shared manifest * remove extra comments * revert back to two epps * add to do for epp image * switch to GeneralMustHaveConditionTimeout * undo gateway version changes * remove unused HTTPRouteMustHaveConditions * update doc string for GetPod * update docstring * Remove resource type from names in manifests. * remove type from name * remove health check * add todo for combining getpod methods * configuration implementation (after rebase...) Signed-off-by: Shmuel Kallner <[email protected]> * After review, made code more obvious Signed-off-by: Shmuel Kallner <[email protected]> * Fixed merge issues Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Signed-off-by: Daneyon Hansen <[email protected]> Signed-off-by: zhengkezhou1 <[email protected]> Signed-off-by: Ira <[email protected]> Signed-off-by: Kfir Toledo <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]> Co-authored-by: Daneyon Hansen <[email protected]> Co-authored-by: sina chavoshi <[email protected]> Co-authored-by: Xudong Wang <[email protected]> Co-authored-by: Zhengke Zhou <[email protected]> Co-authored-by: Ira Rosen <[email protected]> Co-authored-by: Shotaro Kohama <[email protected]> Co-authored-by: Kfir Toledo <[email protected]> Co-authored-by: Maroon Ayoub <[email protected]>
1 parent 68c73c0 commit 7bd2053

31 files changed

+1728
-28
lines changed

PROJECT

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,12 @@ resources:
2424
kind: InferenceModel
2525
path: sigs.k8s.io/gateway-api-inference-extension/api/v1alpha1
2626
version: v1alpha1
27+
- api:
28+
crdVersion: v1
29+
namespaced: true
30+
domain: x-k8s.io
31+
group: inference
32+
kind: EndpointPickerConfig
33+
path: sigs.k8s.io/gateway-api-inference-extension/api/config/v1alpha1
34+
version: v1alpha1
2735
version: "3"

api/config/v1alpha1/defaults.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/*
2+
Copyright 2025 The Kubernetes Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package v1alpha1
18+
19+
// SetDefaults_EndpointPickerConfig sets default values in a
20+
// EndpointPickerConfig struct.
21+
//
22+
// This naming convension is required by the defalter-gen code.
23+
func SetDefaults_EndpointPickerConfig(cfg *EndpointPickerConfig) {
24+
for idx, pluginConfig := range cfg.Plugins {
25+
if pluginConfig.Name == "" {
26+
cfg.Plugins[idx].Name = pluginConfig.PluginName
27+
}
28+
}
29+
}

api/config/v1alpha1/doc.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
Copyright 2025 The Kubernetes Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
// Package v1alpha1 contains API Schema definitions for the
18+
// inference.networking.x-k8s.io API group.
19+
//
20+
// +kubebuilder:object:generate=true
21+
// +groupName=inference.networking.x-k8s.io
22+
package v1alpha1
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
/*
2+
Copyright 2025 The Kubernetes Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package v1alpha1
18+
19+
import (
20+
"encoding/json"
21+
22+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
23+
)
24+
25+
// +k8s:defaulter-gen=true
26+
// +kubebuilder:object:root=true
27+
28+
// EndpointPickerConfig is the Schema for the endpointpickerconfigs API
29+
type EndpointPickerConfig struct {
30+
metav1.TypeMeta `json:",inline"`
31+
32+
// +required
33+
// +kubebuilder:validation:Required
34+
// Plugins is the list of plugins that will be instantiated.
35+
Plugins []PluginSpec `json:"plugins"`
36+
37+
// +required
38+
// +kubebuilder:validation:Required
39+
// SchedulingProfiles is the list of named SchedulingProfiles
40+
// that will be created.
41+
SchedulingProfiles []SchedulingProfile `json:"schedulingProfiles"`
42+
}
43+
44+
// PluginSpec contains the information that describes a plugin that
45+
// will be instantiated.
46+
type PluginSpec struct {
47+
// +optional
48+
// Name provides a name for plugin entries to reference. If
49+
// omitted, the value of the PluginName field will be used.
50+
Name string `json:"name"`
51+
52+
// +required
53+
// +kubebuilder:validation:Required
54+
// PluginName specifies the plugin to be instantiated.
55+
PluginName string `json:"pluginName"`
56+
57+
// +optional
58+
// Parameters are the set of parameters to be passed to the plugin's
59+
// factory function. The factory function is responsible
60+
// to parse the parameters.
61+
Parameters json.RawMessage `json:"parameters"`
62+
}
63+
64+
// SchedulingProfile contains the information to create a SchedulingProfile
65+
// entry to be used by the scheduler.
66+
type SchedulingProfile struct {
67+
// +kubebuilder:validation:Required
68+
// Name specifies the name of this SchedulingProfile
69+
Name string `json:"name"`
70+
71+
// +required
72+
// +kubebuilder:validation:Required
73+
// Plugins is the list of plugins for this SchedulingProfile. They are assigned
74+
// to the appropriate "slots" based on their type.
75+
Plugins []SchedulingPlugin `json:"plugins"`
76+
}
77+
78+
// SchedulingPlugin describes a plugin that will be associated with a
79+
// SchedulingProfile entry.
80+
type SchedulingPlugin struct {
81+
// +required
82+
// +kubebuilder:validation:Required
83+
// PluginRef specifies a partiular Plugin instance to be associated with
84+
// this SchedulingProfile. The reference is to the name of an
85+
// entry of the Plugins defined in the configuration's Plugins
86+
// section
87+
PluginRef string `json:"pluginRef"`
88+
89+
// +optional
90+
// Weight is the weight fo be used if this plugin is a Scorer.
91+
Weight *int `json:"weight"`
92+
}

api/config/v1alpha1/zz_generated.deepcopy.go

Lines changed: 126 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

api/config/v1alpha1/zz_generated.defaults.go

Lines changed: 38 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

api/config/v1alpha1/zz_generated.register.go

Lines changed: 69 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)