Skip to content

When using multiple schedulingProfiles, EPP pods can start normally but fail to function properly #1697

@learner0810

Description

@learner0810

What happened:

When using multiple schedulingProfiles, EPP pods can start normally but fail to function properly.

root@demo-master:~# curl  ${IP}:${PORT}/v1/completions     -H 'Content-Type: application/json'     -d '{
      "model": "Qwen/Qwen2-0.5B-Instruct",
      "prompt": "Write as if you were a critic: San Francisco",
      "max_tokens": 100,
      "temperature": 0
    }'
inference gateway: InferencePoolResourceExhausted - failed to find target pod: single profile handler is intended to be used with a single profile, failed to process multiple profiles

What you expected to happen:

The EPP pod failed to start and reported an error, indicating a configuration error in the schedulingProfiles.

How to reproduce it (as minimally and precisely as possible):

apiVersion: inference.networking.x-k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
  - type: queue-scorer
  - type: kv-cache-utilization-scorer
  - type: prefix-cache-scorer
  - type: single-profile-handler
schedulingProfiles:
  - name: default
    plugins:
      - pluginRef: queue-scorer
      - pluginRef: kv-cache-utilization-scorer
      - pluginRef: prefix-cache-scorer
  - name: custom-profile
    plugins:
      - pluginRef: queue-scorer
      - pluginRef: kv-cache-utilization-scorer
      - pluginRef: prefix-cache-scorer
      - pluginRef: single-profile-handler

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions