Skip to content

Conversation

learner0810
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixed epp pod starting but not working when using multiple schedulingProfiles

Which issue(s) this PR fixes:

Fixes # #1697

Does this PR introduce a user-facing change?:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 10, 2025
Copy link

netlify bot commented Oct 10, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit f441bfd
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68e8af45fe08ef0007c6bd63
😎 Deploy Preview https://deploy-preview-1698--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: learner0810
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 10, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @learner0810. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 10, 2025
@learner0810 learner0810 force-pushed the fix-multiple-scheduling-profiles branch from f445ba5 to f441bfd Compare October 10, 2025 07:01
@learner0810
Copy link
Contributor Author

ping @ahg-g

@nirrozenbaum
Copy link
Contributor

@learner0810 not sure if we want to test that on startup. we had that discussion in #1169, where I was actually suggesting to check it but was convinced by others that we shouldn’t.

there might be plugins combinations that work only if they are set together (e.g., in llm-d, pd profile handler will work correctly only if p filter is defined in prefill scheduling profile and d filter in decode profile). I think we don’t want to validate these kind of things cause it has no end.
it should probably be the in the scope of a sanity test to catch those issues.

@learner0810
Copy link
Contributor Author

@learner0810 not sure if we want to test that on startup. we had that discussion in #1169, where I was actually suggesting to check it but was convinced by others that we shouldn’t.

there might be plugins combinations that work only if they are set together (e.g., in llm-d, pd profile handler will work correctly only if p filter is defined in prefill scheduling profile and d filter in decode profile). I think we don’t want to validate these kind of things cause it has no end. it should probably be the in the scope of a sanity test to catch those issues.

Even if misconfigured plugins can successfully route requests (even if routed to the wrong Pod), this may be tolerable. However, if misconfiguration causes requests to fail, we must test this during startup.

Here is my understanding . Please correct me if I'm wrong.

@kfswain
Copy link
Collaborator

kfswain commented Oct 14, 2025

I think I'm okay with flagging this edge case. Esp because it references the default handler. Making our framework easier to use is a big plus wrt adoption.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 14, 2025
@kfswain
Copy link
Collaborator

kfswain commented Oct 14, 2025

I think we don’t want to validate these kind of things cause it has no end.

This is a fair point.... but the profile handler is kind of nuanced and a more complex part of our system. It is probably a real gotcha to try and do something sophisticated (multiple profiles) and then cut yourself on the no handler implementation.

@nirrozenbaum
Copy link
Contributor

I think we don’t want to validate these kind of things cause it has no end.

This is a fair point.... but the profile handler is kind of nuanced and a more complex part of our system. It is probably a real gotcha to try and do something sophisticated (multiple profiles) and then cut yourself on the no handler implementation.

I’m fine with that.
we just need to cut the validation somewhere.
specifically if one configures handler incorrectly the logs show a very clear message about that.
so from one hand it’s easy the identify, from the other hand it’s correct that we can fail on bootstrapping.

for me it feels like testing the internals of plugins, but if you feel we want it on startup let’s go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants