generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 182
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
What happened:
When using multiple schedulingProfiles, EPP pods can start normally but fail to function properly.
root@demo-master:~# curl ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "Qwen/Qwen2-0.5B-Instruct",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'
inference gateway: InferencePoolResourceExhausted - failed to find target pod: single profile handler is intended to be used with a single profile, failed to process multiple profiles
What you expected to happen:
The EPP pod failed to start and reported an error, indicating a configuration error in the schedulingProfiles.
How to reproduce it (as minimally and precisely as possible):
apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
- type: queue-scorer
- type: kv-cache-utilization-scorer
- type: prefix-cache-scorer
- type: single-profile-handler
schedulingProfiles:
- name: default
plugins:
- pluginRef: queue-scorer
- pluginRef: kv-cache-utilization-scorer
- pluginRef: prefix-cache-scorer
- name: custom-profile
plugins:
- pluginRef: queue-scorer
- pluginRef: kv-cache-utilization-scorer
- pluginRef: prefix-cache-scorer
- pluginRef: single-profile-handler
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Inference extension version (use
git describe --tags --dirty --always
): - Cloud provider or hardware configuration:
- Install tools:
- Others:
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.