Skip to content

Conversation

@capri-xiyue
Copy link
Contributor

@capri-xiyue capri-xiyue commented Nov 7, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:
See #1779
Which issue(s) this PR fixes:

Part of #1779

Does this PR introduce a user-facing change?:

NONE to existing features with inferencepool. 
But users can use epp without inference pool via args 
`
        - --endpoint-selector
        - "app=vllm-llama3-8b-instruct"
        - --endpoint-target-ports
        - "8000"
`

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 7, 2025
@netlify
Copy link

netlify bot commented Nov 7, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 11a8a68
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/691f6d4f816c030008a428e0
😎 Deploy Preview https://deploy-preview-1833--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested a review from elevran November 7, 2025 19:47
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 7, 2025
@capri-xiyue capri-xiyue changed the title Enable EPP to support endpoint discovery using pod selector [WIP] Enable EPP to support endpoint discovery using pod selector Nov 7, 2025
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 7, 2025
@capri-xiyue
Copy link
Contributor Author

No need to review it right now. I just made the CUJ of standalone epp work without inferencepool. Still need to fix the e2e and ut

@elevran elevran mentioned this pull request Nov 11, 2025
Copy link
Contributor

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursory review to provide initial feedback (realizing this is work in progress)
The main question I have (and might be worth mentioning in the PR description) is the need for a new abstraction/type (EndPointsPool). A naive/simple solution (which perhaps does not work...) would be to copy the selector and port array into a Go InferencePool object and use datastore.PoolSet() along with disabling the Pool notification/reconciliation so it does not overwrite with nil.
Hopefully the rest of the code should not care or depend on the pool's origin (from command line or the API server)

@capri-xiyue
Copy link
Contributor Author

assign @ahg-g for early review.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2025
@capri-xiyue
Copy link
Contributor Author

assign @kfswain for early review

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2025
@capri-xiyue capri-xiyue requested a review from kfswain November 18, 2025 23:43
@elevran
Copy link
Contributor

elevran commented Nov 19, 2025

My general point is that having a canonical internal type that represents the pool is useful, and abstracts how the EPP was configured with the pool.

Do we actually need pool as a canonical type or concept?
If so, and given the current code base and future changes in #1838 - is it warranted to replace the current InferencePool Go with a new type?

Given this from #1838:

  1. Orchestrator agnostic deployment mode: static configuration of endpoints with zero dependency on k8s as a control plane, which means the reconcilers are not started as well.
  2. k8s-native deployment modes
    2.1) No dependency on Gateway API or extension CRDs, endpoints are discovered using a pod selector and port flags
    2.2) No dependency Gateway APIs, extension CRDs are used to enable a richer feature set for EPP, but the proxy is statically
    ...

The Inference Pool is not needed at all in (1) and all reconcilers should be disabled, option (2.1) works well with a "fake" pool that has the minimal attributes set and a no pool reconciler, and (2.2) or later use the full CRD.
I don't think the internal type improves or changes any of that.

Ultimately we deal with inference endpoints (BTW: agree with @kfswain comment that the term is not used consistently across and we should make it more explicit). The InferencePool is merely one way to define attributes of service discovery for relevant endpoints.

@ahg-g
Copy link
Contributor

ahg-g commented Nov 19, 2025

I think we are going back and forth on a non material issue; both paths are reasonable and one can't argue that the one in this PR is wrong or harmful; it is also an implementation detail that I recommend to empower contributors to make.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Nov 20, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, capri-xiyue

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Nov 20, 2025

/retest

@capri-xiyue capri-xiyue requested a review from ahg-g November 20, 2025 22:28
@ahg-g
Copy link
Contributor

ahg-g commented Nov 20, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2025
@k8s-ci-robot k8s-ci-robot merged commit 3c8aba1 into kubernetes-sigs:main Nov 20, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants