[chore]Bump vLLM Image Tags #1733

Frapschen · 2025-10-16T07:44:40Z

What type of PR is this?
/kind documentation

What this PR does / why we need it:

I try to bump the two image versions:

vllm-cpu-release-repo: v0.8.5 -> v0.10.2(v0.8.5 is 6 months ago verson)
vllm-openai: v0.8.5 -> v0.11.0(There are many bug fix)

Which issue(s) this PR fixes:
Fixes #1722

Does this PR introduce a user-facing change?:

netlify · 2025-10-16T07:44:45Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`d69f91b`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68f0a26b4db49300084892ee
😎 Deploy Preview	https://deploy-preview-1733--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-10-16T07:44:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Frapschen
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nirrozenbaum · 2025-10-16T08:50:30Z

@Frapschen did you run the quickstart guide with these GPU and CPU versions to make sure it works?
last time we tested it failed on startup and that was the original reason we pinned the version to v0.8.5.

Frapschen · 2025-10-17T02:23:33Z

@nirrozenbaum I can confirm that the CPU one works fine with me.

root@controller-01:~# kubectl get pod -owide
NAME                                           READY   STATUS    RESTARTS      AGE   IP             NODE            NOMINATED NODE   READINESS GATES
inference-gateway-f5c894468-4vxc5              1/1     Running   0             17d   10.233.98.85   controller-01   <none>           <none>
kgateway-7f4455889-zfrtz                       1/1     Running   0             17d   10.233.98.84   controller-01   <none>           <none>
vllm-llama3-8b-instruct-6c9757687-cvzll        1/1     Running   1 (27d ago)   42d   10.233.98.36   controller-01   <none>           <none>
vllm-llama3-8b-instruct-6c9757687-gdpqf        1/1     Running   1 (27d ago)   42d   10.233.98.34   controller-01   <none>           <none>
vllm-llama3-8b-instruct-6c9757687-rjqq2        1/1     Running   1 (27d ago)   42d   10.233.98.35   controller-01   <none>           <none>
vllm-llama3-8b-instruct-cpu-7555494db4-bvpd4   2/2     Running   1 (15h ago)   15h   10.233.98.89   controller-01   <none>           <none>
vllm-llama3-8b-instruct-epp-86c8cdcf64-8dtrp   1/1     Running   0             21d   10.233.98.71   controller-01   <none>           <none>

vllm-llama3-8b-instruct-cpu-7555494db4-bvpd4 pod manifest:

root@controller-01:~# kubectl get pod vllm-llama3-8b-instruct-cpu-7555494db4-bvpd4 -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 5a70f09e83d15c3e709291de67baafa16b5bb33869b5c7ad5e7ce35d514e1dd7
    cni.projectcalico.org/podIP: 10.233.98.89/32
    cni.projectcalico.org/podIPs: 10.233.98.89/32
  creationTimestamp: "2025-10-16T10:16:35Z"
  generateName: vllm-llama3-8b-instruct-cpu-7555494db4-
  labels:
    app: vllm-llama3-8b-instruct-cpu
    pod-template-hash: 7555494db4
  name: vllm-llama3-8b-instruct-cpu-7555494db4-bvpd4
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: vllm-llama3-8b-instruct-cpu-7555494db4
    uid: 2fac45a0-a030-4a23-bdc5-40ece7717e3a
  resourceVersion: "5878803"
  uid: 7c697912-88aa-42d8-84bd-45fbc30f21b4
spec:
  containers:
  - args:
    - --model
    - Qwen/Qwen2.5-1.5B-Instruct
    - --port
    - "8000"
    - --enable-lora
    - --max-loras
    - "4"
    - --lora-modules
    - '{"name": "food-review-0", "path": "SriSanth2345/Qwen-1.5B-Tweet-Generations",
      "base_model_name": "Qwen/Qwen2.5-1.5B"}'
    - '{"name": "food-review-1", "path": "SriSanth2345/Qwen-1.5B-Tweet-Generations",
      "base_model_name": "Qwen/Qwen2.5-1.5B"}'
    command:
    - python3
    - -m
    - vllm.entrypoints.openai.api_server
    env:
    - name: PORT
      value: "8000"
    - name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
      value: "true"
    - name: VLLM_CPU_KVCACHE_SPACE
      value: "4"
    - name: HF_ENDPOINT
      value: https://hf-mirror.com
    image: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.10.2
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 240
      httpGet:
        path: /health
        port: http
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 1
    name: lora
    ports:
    - containerPort: 8000
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 600
      httpGet:
        path: /health
        port: http
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "12"
        memory: 9000Mi
      requests:
        cpu: "12"
        memory: 9000Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /data
      name: data
    - mountPath: /dev/shm
      name: shm
    - mountPath: /adapters
      name: adapters
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2w9kw
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - env:
    - name: DYNAMIC_LORA_ROLLOUT_CONFIG
      value: /config/configmap.yaml
    image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer:main
    imagePullPolicy: Always
    name: lora-adapter-syncer
    resources: {}
    restartPolicy: Always
    stdin: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    tty: true
    volumeMounts:
    - mountPath: /config
      name: config-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2w9kw
      readOnly: true
  nodeName: controller-01
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: data
  - emptyDir:
      medium: Memory
    name: shm
  - emptyDir: {}
    name: adapters
  - configMap:
      defaultMode: 420
      name: vllm-qwen-adapters
    name: config-volume
  - name: kube-api-access-2w9kw
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-10-16T10:16:38Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-10-16T10:16:38Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-10-16T10:36:06Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-10-16T10:36:06Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-10-16T10:16:35Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://756cf1020ba929555afea36c1c85ab16b1cddc72967689cbc749e2304f941e18
    image: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.10.2
    imageID: public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo@sha256:22d44a924b90e309423ae19718a8cf4a6f377fdc8bb699ecfdcbb52f9625523c
    lastState:
      terminated:
        containerID: containerd://db8fd241e66c65277ff12eed5879380a7e063a2b02094394b16dd30d6c9d7da4
        exitCode: 1
        finishedAt: "2025-10-16T10:26:44Z"
        reason: Error
        startedAt: "2025-10-16T10:16:40Z"
    name: lora
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2025-10-16T10:26:46Z"
    volumeMounts:
    - mountPath: /data
      name: data
    - mountPath: /dev/shm
      name: shm
    - mountPath: /adapters
      name: adapters
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2w9kw
      readOnly: true
      recursiveReadOnly: Disabled
  hostIP: 172.16.112.10
  hostIPs:
  - ip: 172.16.112.10
  initContainerStatuses:
  - containerID: containerd://d7c0b168752d0b8a4a3aa4d2c3ff784973c44d443c4439b2594102f50d00cd7e
    image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer:main
    imageID: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer@sha256:a2927ee562c1d1e9cfc076fb54defda356575fbf2bb515bba6e61bdd99fbab7c
    lastState: {}
    name: lora-adapter-syncer
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2025-10-16T10:16:37Z"
    volumeMounts:
    - mountPath: /config
      name: config-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2w9kw
      readOnly: true
      recursiveReadOnly: Disabled
  phase: Running
  podIP: 10.233.98.89
  podIPs:
  - ip: 10.233.98.89
  qosClass: Burstable
  startTime: "2025-10-16T10:16:35Z"

curl test:

root@controller-01:~# IP=10.233.98.89
root@controller-01:~# PORT=8000
root@controller-01:~# curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "food-review-1",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'
HTTP/1.1 200 OK
date: Fri, 17 Oct 2025 02:15:28 GMT
server: uvicorn
content-length: 918
content-type: application/json

{"id":"cmpl-8123f689bd7f40c4965cee037470fad0","object":"text_completion","created":1760667328,"model":"food-review-1","choices":[{"index":0,"text":" Giants - 2019\n\nThe San Francisco Giants have been one of the most successful teams in Major League Baseball over the past few years, and they continue to be a force to be reckoned with. The team has won three World Series championships in the last five years, and they are currently sitting at the top of their division.\n\nIn this year's season, the Giants have shown that they can still compete with any team in the league. They have had some tough games, but they","logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"prompt_logprobs":null,"prompt_token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":10,"total_tokens":110,"completion_tokens":100,"prompt_tokens_details":null},"kv_transfer_params":null}

Bump vLLM Image Tags

d69f91b

k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Oct 16, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 16, 2025

k8s-ci-robot requested review from elevran and kfswain October 16, 2025 07:44

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[chore]Bump vLLM Image Tags #1733

[chore]Bump vLLM Image Tags #1733

Frapschen commented Oct 16, 2025

Uh oh!

netlify bot commented Oct 16, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Oct 16, 2025

Uh oh!

nirrozenbaum commented Oct 16, 2025

Uh oh!

Frapschen commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[chore]Bump vLLM Image Tags #1733

Are you sure you want to change the base?

[chore]Bump vLLM Image Tags #1733

Conversation

Frapschen commented Oct 16, 2025

Uh oh!

netlify bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Oct 16, 2025

Uh oh!

nirrozenbaum commented Oct 16, 2025

Uh oh!

Frapschen commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Oct 16, 2025 •

edited

Loading