Skip to content

fix: update RBAC and mirror InferencePool with v1alpha2 API version #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chewong
Copy link
Member

@chewong chewong commented Jul 31, 2025

Follow-up for #1071. The dev release of the reference EPP implementation has transitioned to using InferencePools with API vesrion inference.networking.k8s.io/v1 but most Gateway providers are lagging behind and still using inference.networking.x-k8s.io/v1alpha2. For example, when testing with Istio Gateway + inference.networking.k8s.io/v1:

image

This PR attempts to temporarily fix the issue by mirroring the exact InfernecePool resource from v1 to v1alpha2 in the Helm chart so both the reference EPP and Gateway that hasn't adopted inference.networking.k8s.io/v1 can work well with each other.

Testing:

# install inferencepool + epp deployment via helm
helm install workspace-llama-3-1-8b-instruct config/charts/inferencepool -f values.yaml

# modify HTTPRoute to use InferencePool from inference.networking.x-k8s.io/v1alpha2

kubectl get svc
NAME                                          TYPE           CLUSTER-IP
inference-gateway-istio                       LoadBalancer   10.0.134.160

kubectl run -it --rm --restart=Never curl --image=curlimages/curl -- curl -X POST http://10.0.134.160/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b-instruct",
    "messages": [{"role": "user", "content": "What is kubernetes?"}]
  }'| jq
If you don't see a command prompt, try pressing enter.
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "**What is Kubernetes?**\n\nKubernetes (also known as K8s) is an open-source container orchestration system for automating the deployment, scaling, and management of containerized applications. It was originally designed by Google, and is now maintained by the Cloud Native Computing Foundation (CNCF).\n\n....",
        "reasoning_content": null,
        "role": "assistant",
        "tool_calls": []
      },
      "stop_reason": null
    }
  ],
  "created": 1753989500,
  "id": "chatcmpl-a49c3b08-b64a-4051-b77e-7d04d5f02304",
  "kv_transfer_params": null,
  "model": "llama-3.1-8b-instruct",
  "object": "chat.completion",
  "prompt_logprobs": null,
  "usage": {
    "completion_tokens": 671,
    "prompt_tokens": 40,
    "prompt_tokens_details": null,
    "total_tokens": 711
  }
}

This PR also adds the missing read-only RBAC for InferencePools in inference.networking.k8s.io.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chewong
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Jul 31, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit cc85672
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/688bc2de0018780008c3de8a
😎 Deploy Preview https://deploy-preview-1274--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants