Skip to content

Sync 260313 kserve upstream odh#1195

Open
spolti wants to merge 35 commits intoopendatahub-io:masterfrom
spolti:sync-260313_kserve-upstream-odh
Open

Sync 260313 kserve upstream odh#1195
spolti wants to merge 35 commits intoopendatahub-io:masterfrom
spolti:sync-260313_kserve-upstream-odh

Conversation

@spolti
Copy link
Member

@spolti spolti commented Mar 13, 2026

Commits in this batch:
c1d24e5 (community/master, community/HEAD, community-master) test(llmisvc): add envtest integration tests for scaling reconciliation (kserve#5246)
3fab6f7 fix: to install specific kserve version with kustomize script (kserve#5247)
55f7afd fix: change wrong crd kustomization update (kserve#5234)
dbaff63 fix(llmisvc): add missing os.Exit(1) after controller setup failure (kserve#5242)
5270d1f feat: add storage migration for LLMinferenceService APIs (kserve#5149)
6a9f524 feat(llmisvc): implement autoscaling for WVA and KEDA/HPA integration (kserve#5194)
8f0a110 feat: merge target branch in each GH action job (kserve#5232)
ba4a28b feat(llmisvc): add --disable-uvicorn-access-log to vllm configs (kserve#5154)
0c3c178 fix(llmisvc): add default for MinReplicas & make MaxReplicas required int32 (kserve#5237)
ba94e0b feat: upgrade llm-d to 0.6 (kserve#5121)
7b43cc0 feat: allow configuring GatewayClass for llmisvc e2e tests (kserve#5238)
90447b7 fix(build): fall back to existing version when URL verification fails (kserve#5180)
2983b3f fix: resolve go test -cover failure (kserve#5178)
df95fcd test(llmisvc): unit tests for versioned config resolution (kserve#5182)
3f169a7 feat(llmisvc): add build hooks for distribution-specific logic (kserve#5217)
d8589e5 chore: bump kustomize from v5.5.0 to v5.8.0 (kserve#5218)
db9e355 fix: improve cherry-pick workflow with validation and PR fetch (kserve#5233)
fb09d22 fix(make): increase test timeout from 20m to 30m (kserve#5165)

plus:
5087c02
263bb0d

Summary by CodeRabbit

  • New Features

    • Autoscaling support for LLM workloads (HPA & KEDA) with VariantAutoscaling and tokenizer sidecar (UDS); Gateway API inference extension enabled.
  • Enhancements

    • Updated runtime images and routing/scheduler components; tokenizer integration for model serving and improved scheduler behavior.
    • Schema tightening: minReplicas defaults to 1 and maxReplicas must be specified for autoscaling.
  • Documentation / Tests

    • Expanded unit, integration and e2e tests plus new KV-cache routing samples.

bartoszmajsak and others added 19 commits March 11, 2026 18:09
Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
…e#5233)

Signed-off-by: Jooho Lee <jlee@redhat.com>

It only updates cherry-pick gitaction
Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
…e#5217)

Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
…kserve#5180)

Signed-off-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
… int32 (kserve#5237)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
…ve#5154)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
…kserve#5194)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
…serve#5242)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Signed-off-by: Filippe Spolti <fspolti@redhat.com>
Signed-off-by: Jooho Lee <jlee@redhat.com>
Co-authored-by: Jooho Lee <jlee@redhat.com>
…on (kserve#5246)

Signed-off-by: Vivek Karunai Kiri Ragavan <vkarunai@redhat.com>
Commits included in this batch:
c1d24e5 (community/master, community/HEAD, community-master) test(llmisvc): add envtest integration tests for scaling reconciliation (kserve#5246)
3fab6f7 fix: to install specific kserve version with kustomize script (kserve#5247)
55f7afd fix: change wrong crd kustomization update (kserve#5234)
dbaff63 fix(llmisvc): add missing os.Exit(1) after controller setup failure (kserve#5242)
5270d1f feat: add storage migration for LLMinferenceService APIs (kserve#5149)
6a9f524 feat(llmisvc): implement autoscaling for WVA and KEDA/HPA integration (kserve#5194)
8f0a110 feat: merge target branch in each GH action job (kserve#5232)
ba4a28b feat(llmisvc): add --disable-uvicorn-access-log to vllm configs (kserve#5154)
0c3c178 fix(llmisvc): add default for MinReplicas & make MaxReplicas required int32 (kserve#5237)
ba94e0b feat:  upgrade llm-d to 0.6 (kserve#5121)
7b43cc0 feat: allow configuring GatewayClass for llmisvc e2e tests (kserve#5238)
90447b7 fix(build): fall back to existing version when URL verification fails (kserve#5180)
2983b3f fix: resolve `go test -cover` failure (kserve#5178)
df95fcd test(llmisvc): unit tests for versioned config resolution (kserve#5182)
3f169a7 feat(llmisvc): add build hooks for distribution-specific logic (kserve#5217)
d8589e5 chore: bump kustomize from v5.5.0 to v5.8.0 (kserve#5218)
db9e355 fix: improve cherry-pick workflow with validation and PR fetch (kserve#5233)
fb09d22 fix(make): increase test timeout from 20m to 30m (kserve#5165)
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Large, heterogeneous changes across the repository: adds full autoscaling support (VariantAutoscaling CRD, WVA integration, HPA and KEDA ScaledObject reconciliation, Prometheus trigger/auth plumbing and config parsing); converts ScalingSpec.MaxReplicas from pointer to non-pointer and updates CRD/validation defaults (minReplicas default=1, maxReplicas required in many schemas); introduces tokenizer-UDS sidecar and scheduler config mutation; refactors model artifact attachment APIs to accept mount name/path and updates storage initializer wiring; expands RBAC for new APIs; many new unit/integration/e2e tests and fixtures; numerous GitHub Actions workflow insertions to merge PR base branches; multiple image and dependency version bumps and new test CRDs.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

@spolti spolti force-pushed the sync-260313_kserve-upstream-odh branch from 6184806 to d002db7 Compare March 13, 2026 15:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
config/llmisvcconfig/config-llm-decode-worker-data-parallel.yaml (2)

15-23: ⚠️ Potential issue | 🟠 Major

Major, CWE-250: don't run these containers as UID 0.

Severity: major. A compromise in the routing sidecar or vLLM process lands as root in-container here, which materially increases breakout and lateral-movement impact—especially on the main/worker containers that also retain extra capabilities. Flip runAsNonRoot to true and use a non-root UID/GID compatible with the mounted volumes.

Remediation sketch
         securityContext:
           allowPrivilegeEscalation: false
-          runAsNonRoot: false
+          runAsNonRoot: true
+          runAsUser: 1000
+          runAsGroup: 1000

Apply the same change to the sidecar, decode container, and worker container; add a pod-level fsGroup if the shared writable volumes need it.

As per coding guidelines, **/config/**/*.yaml: SecurityContext configured (runAsNonRoot, readOnlyRootFilesystem).

Also applies to: 234-246, 467-479

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llmisvcconfig/config-llm-decode-worker-data-parallel.yaml` around
lines 15 - 23, The securityContext currently has runAsNonRoot:false which leaves
containers running as UID 0; change runAsNonRoot to true and set an explicit
non-root UID/GID that is compatible with the mounted volumes (e.g.,
runAsUser/runAsGroup in the container-level securityContext or the pod spec)
and, if the shared writable volumes require it, add a pod-level fsGroup to grant
volume access; apply the same fixes to the sidecar, decode container, and worker
container securityContext blocks (and the other occurrences noted) so all
relevant manifests use securityContext.runAsNonRoot=true with appropriate
runAsUser/runAsGroup/fsGroup values.

7-64: ⚠️ Potential issue | 🟠 Major

Major, CWE-770: add CPU/memory requests and limits to all updated containers.

Severity: major. A large model load or traffic spike can exhaust node CPU/memory and evict neighboring workloads because the sidecar, decode container, and worker container are all unbounded in this template. Define tuned requests/limits before shipping the new image versions.

Remediation sketch
+        resources:
+          requests:
+            cpu: "<set>"
+            memory: "<set>"
+          limits:
+            cpu: "<set>"
+            memory: "<set>"

Apply tuned values to the init sidecar, decode container, and worker container.

As per coding guidelines, **/config/**/*.yaml: Resource limits defined (memory, CPU) - prevents DoS.

Also applies to: 65-282, 295-504

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llmisvcconfig/config-llm-decode-worker-data-parallel.yaml` around
lines 7 - 64, The manifest is missing resource requests/limits which can lead to
node OOM/CPU exhaustion; add a resources block with both cpu and memory requests
and limits to the init container "llm-d-routing-sidecar" and to the decode and
worker containers referenced elsewhere (e.g., the decode container and worker
container entries in this file) so each has tuned values (requests <= limits,
use proper units like "500m" CPU and "1Gi" memory as a starting point), and
ensure the same pattern is applied to all affected container specs in this YAML
so no container is unbounded.
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1)

341-356: ⚠️ Potential issue | 🟡 Minor

Add explicit Go-side validation for MaxReplicas < 1 with regression test.

The schema-level +kubebuilder:validation:Minimum=1 annotation on MaxReplicas will reject zero values at the API boundary. However, ValidateWorkloadScaling in v1alpha2 (which v1alpha1 delegates to) lacks an explicit Go-side check for MaxReplicas < 1. Direct Go instantiation paths that bypass schema validation could slip through. Add an explicit validation check (e.g., if scaling.MaxReplicas < 1) and a test case constructing ScalingSpec{MaxReplicas: 0} to catch regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apis/serving/v1alpha1/llm_inference_service_types.go` around lines 341 -
356, Add an explicit Go-side validation to ensure MaxReplicas >= 1 by adding a
check like "if scaling.MaxReplicas < 1 { return fmt.Errorf(... ) }" inside the
ValidateWorkloadScaling (or the v1alpha2 validation function that v1alpha1
delegates to) so code paths that bypass CRD schema validation are rejected; also
add a regression unit test that constructs a ScalingSpec/Scaling with
MaxReplicas set to 0 (e.g., ScalingSpec{MaxReplicas: 0}) and asserts validation
returns an error. Ensure you reference the ScalingSpec type and the
ValidateWorkloadScaling (or corresponding v1alpha2 validator) when locating
where to add the check and the test.
config/llmisvcconfig/config-llm-decode-template.yaml (1)

36-44: ⚠️ Potential issue | 🟠 Major

Security posture: runAsNonRoot: false and no resource limits.

Per Kubernetes security best practices:

  • runAsNonRoot: false (line 39) permits container to run as root.
  • No resources.limits defined - susceptible to resource exhaustion (CWE-400).

Consider setting runAsNonRoot: true and defining CPU/memory limits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llmisvcconfig/config-llm-decode-template.yaml` around lines 36 - 44,
Update the Pod/Container securityContext to avoid running as root by setting
securityContext.runAsNonRoot: true and (if needed) specify
securityContext.runAsUser to a non-zero UID to ensure the container actually
runs unprivileged; also add a resources block on the container with both limits
and requests for cpu and memory (resources.limits and resources.requests) to
prevent resource exhaustion and align with Kubernetes best practices (use
appropriate values for your service). Locate the securityContext stanza and the
container spec (e.g., the same block containing securityContext and the
container definition) and add/adjust these fields accordingly.
config/llmisvcconfig/config-llm-template.yaml (1)

36-44: ⚠️ Potential issue | 🟠 Major

Security posture: runAsNonRoot: false and no resource limits.

Same gaps as decode template:

  • runAsNonRoot: false permits root execution.
  • No resources.limits - DoS risk (CWE-400).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/llmisvcconfig/config-llm-template.yaml` around lines 36 - 44, Update
the Pod/Container securityContext to avoid running as root and add resource
limits: set securityContext.runAsNonRoot: true and add securityContext.runAsUser
and runAsGroup to a non-zero UID/GID, enable
securityContext.readOnlyRootFilesystem: true (or true where appropriate) and
keep allowPrivilegeEscalation: false; additionally add a resources.limits block
(cpu and memory) and recommended resources.requests under the same container
spec to prevent DoS (CWE-400). Reference the securityContext, runAsNonRoot,
runAsUser/runAsGroup, readOnlyRootFilesystem, and
resources.limits/resources.requests in your edits.
charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml (1)

9-17: ⚠️ Potential issue | 🟠 Major

Missing TLS flags in vllm serve for two configurations (CWE-319).

Verified: vllm serve defaults to plain HTTP in v0.5.x and requires explicit --ssl-certfile + --ssl-keyfile to enable HTTPS. Two configs have a critical mismatch:

  1. kserve-config-llm-decode-template (lines 9–17): Main command lacks TLS flags, but probes expect HTTPS (scheme: HTTPS) and sidecar enforces TLS (--decoder-use-tls=true). Handshake will fail, prompts/responses exposed to interception within cluster paths (CWE-319).

  2. kserve-config-llm-prefill-template (lines 657–665): Same mismatch—main command no TLS, probes expect HTTPS.

The other two configs (kserve-config-llm-decode-worker-data-parallel and kserve-config-llm-prefill-worker-data-parallel) already include the TLS flags; apply the same flags to both affected configs.

Remediation

Add to both configs after --disable-uvicorn-access-log:

      - --enable-ssl-refresh
      - --ssl-certfile
      - /var/run/kserve/tls/tls.crt
      - --ssl-keyfile
      - /var/run/kserve/tls/tls.key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 9 - 17, The vllm serve command in templates
kserve-config-llm-decode-template and kserve-config-llm-prefill-template is
missing TLS flags while readiness/liveness probes and the decoder sidecar expect
HTTPS; update the command list in those templates (the vllm serve invocation
that includes --disable-uvicorn-access-log) to append the SSL flags
--enable-ssl-refresh, --ssl-certfile /var/run/kserve/tls/tls.crt and
--ssl-keyfile /var/run/kserve/tls/tls.key so the server runs with TLS like the
other worker-data-parallel configs.
charts/kserve-llmisvc-resources/files/llmisvc/resources.yaml (1)

1610-1617: ⚠️ Potential issue | 🔴 Critical

Critical: RBAC rule is malformed due duplicate resources key and mixed apiGroups.

Line 1616 introduces a second resources key in the same mapping, and Line 1615 places llmd.ai under the resources list instead of apiGroups. This breaks/ambiguates manifest parsing and can drop intended permissions.

Suggested fix
 - apiGroups:
   - monitoring.coreos.com
   resources:
   - podmonitors
   - servicemonitors
-  - llmd.ai
-  resources:
-  - variantautoscalings
   verbs:
   - create
   - delete
   - get
   - list
   - patch
   - update
   - watch
+- apiGroups:
+  - llmd.ai
+  resources:
+  - variantautoscalings
+  verbs:
+  - create
+  - delete
+  - get
+  - list
+  - patch
+  - update
+  - watch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-llmisvc-resources/files/llmisvc/resources.yaml` around lines
1610 - 1617, The RBAC rule is malformed: there is a duplicate `resources` key
and `llmd.ai` was placed in the resources list instead of in `apiGroups`; fix by
consolidating into a single rule that lists both API groups
(`monitoring.coreos.com` and `llmd.ai`) under `apiGroups` and a single
`resources` array containing `podmonitors`, `servicemonitors`, and
`variantautoscalings`, and ensure the `verbs` field is present and correct for
the intended permissions; look for the mapping containing
`monitoring.coreos.com`, `llmd.ai`, `podmonitors`, `servicemonitors`, and
`variantautoscalings` to make this change.
🟠 Major comments (28)
pkg/controller/v1alpha2/llmisvc/config_presets_test.go-429-429 (1)

429-429: ⚠️ Potential issue | 🟠 Major

Fix model path mismatch in startup command.

Line 429 serves from /mnt/models, but the volume mount is /models (Line 460). This can cause model load failures for this preset.

Proposed fix
- Command: []string{"vllm", "serve", "/mnt/models", "--served-model-name", "llama", "--port", "8000"},
+ Command: []string{"vllm", "serve", "/models", "--served-model-name", "llama", "--port", "8000"},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_presets_test.go` at line 429, Startup
command in the preset uses "/mnt/models" while the volume mount is "/models",
causing a path mismatch; update the Command entry (the slice containing "vllm",
"serve", "/mnt/models", ...) to use "/models" so the served-model path matches
the volume mount and the model can be loaded correctly.
pkg/controller/v1alpha2/llmisvc/config_presets_test.go-97-97 (1)

97-97: ⚠️ Potential issue | 🟠 Major

Pin container images by digest, not tag-only references.

Tag-only images (e.g., v0.6.0, v0.5.1) are mutable and vulnerable to CWE-494 and CWE-353. If a registry is compromised or a tag is republished, workloads can pull untrusted code without manifest changes. This affects the preset configurations at lines 97, 182, 315, 428 in the test file and all corresponding preset YAML files in config/llmisvcconfig/.

Update each image reference to include a digest:

- Image: "ghcr.io/llm-d/llm-d-routing-sidecar:v0.6.0"
+ Image: "ghcr.io/llm-d/llm-d-routing-sidecar:v0.6.0@sha256:<trusted_digest>"

- Image: "ghcr.io/llm-d/llm-d-cuda:v0.5.1"
+ Image: "ghcr.io/llm-d/llm-d-cuda:v0.5.1@sha256:<trusted_digest>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_presets_test.go` at line 97, The Image
fields using tag-only references (e.g., the string
"ghcr.io/llm-d/llm-d-routing-sidecar:v0.6.0" found in
pkg/controller/v1alpha2/llmisvc/config_presets_test.go and corresponding preset
YAMLs) must be replaced with digest-pinned references (sha256 digests). Update
each occurrence (including the other tag-only images at the other preset
locations referenced in the comment) to the exact image@sha256:<digest> form so
tests and config/llmisvcconfig/* presets pull immutable images; keep the same
repository/name and tag context when resolving the correct digest and apply the
change where Image: is set in the test and in each preset YAML.
.github/workflows/kserve-llmisvc-controller-docker-publish.yml-41-47 (1)

41-47: ⚠️ Potential issue | 🟠 Major

Avoid direct event interpolation and merging mutable branch refs

The merge step injects ${{ github.event.pull_request.base.ref }} directly into a shell command (CWE-94/CWE-77) and merges a mutable ref instead of an immutable commit (CWE-367). Switch to github.event.pull_request.base.sha via env, quote arguments, and merge FETCH_HEAD.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/kserve-llmisvc-controller-docker-publish.yml around lines
41 - 47, Replace direct event interpolation of
github.event.pull_request.base.ref in the "Merge target branch" step: export an
env variable (e.g., BASE_SHA) from github.event.pull_request.base.sha, use that
env var inside the run block (not the raw event expression), quote all shell
arguments, fetch the target by SHA (git fetch origin $BASE_SHA) or fetch and set
FETCH_HEAD, and perform the merge against FETCH_HEAD (git merge --no-edit
FETCH_HEAD) instead of merging a mutable branch name to avoid script-injection
and mutable ref issues.
.github/workflows/comment-cherry-pick.yml-10-10 (1)

10-10: ⚠️ Potential issue | 🟠 Major

Do not allow CONTRIBUTOR to trigger privileged cherry-pick automation

Including CONTRIBUTOR allows users outside the maintainer set to invoke a write-capable flow, which weakens authorization boundaries (CWE-269). Restrict this gate to trusted repo roles only (e.g., OWNER/MEMBER/COLLABORATOR) or enforce an explicit allowlist.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/comment-cherry-pick.yml at line 10, The workflow
conditional currently allows 'CONTRIBUTOR' to trigger the cherry-pick automation
which grants untrusted users write-capable actions; update the if condition (the
expression containing github.event.comment.author_association) to remove
'CONTRIBUTOR' and instead use only trusted roles (e.g., 'OWNER', 'COLLABORATOR',
and optionally 'MEMBER'), or replace that check with an explicit allowlist check
for authorized usernames/teams; change the author_association clause in the if
expression accordingly so only trusted roles can invoke /cherry-pick.
.github/workflows/e2e-test.yml-38-43 (1)

38-43: ⚠️ Potential issue | 🟠 Major

Repeated unsafe PR-base merge pattern across jobs

All added “Merge target branch” blocks directly interpolate ${{ github.event.pull_request.base.ref }} inside shell commands and merge a mutable branch ref. This combination is vulnerable to shell/script injection (CWE-94/CWE-77) and merge drift/TOCTOU behavior (CWE-367). Replace all of these with fetch+merge by immutable github.event.pull_request.base.sha passed via env and quoted shell args.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

Also applies to: 72-77, 152-157, 264-269, 301-307, 354-360, 432-438, 539-545, 635-641, 752-758, 839-845, 907-913, 1049-1055, 1150-1156, 1256-1262, 1331-1337, 1406-1412

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test.yml around lines 38 - 43, The "Merge target
branch" step is unsafe because it interpolates
github.event.pull_request.base.ref directly into shell commands; replace that
pattern by passing the immutable github.event.pull_request.base.sha into the job
via env (e.g., BASE_SHA) and use quoted shell arguments for fetch/merge commands
instead of the mutable ref; update each "Merge target branch" step (the run
block that fetches/unshallowes and merges origin/${{
github.event.pull_request.base.ref }}) to fetch the target branch by name but
perform the merge using the immutable BASE_SHA env variable and quoted arguments
to avoid shell injection and TOCTOU issues.
.github/workflows/kserve-localmodel-controller-docker-publish.yml-41-47 (1)

41-47: ⚠️ Potential issue | 🟠 Major

Replace branch-name merge with SHA-pinned merge

The step currently interpolates ${{ github.event.pull_request.base.ref }} directly in shell (CWE-94/CWE-77) and merges a moving branch ref (CWE-367). Fetch and merge by github.event.pull_request.base.sha via env, with quoted args.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/kserve-localmodel-controller-docker-publish.yml around
lines 41 - 47, Update the "Merge target branch" step to avoid interpolating
github.event values directly in the run block: set an env variable (e.g.,
BASE_SHA=${{ github.event.pull_request.base.sha }}) on that step and use that
env var in the shell commands, fetch the SHA (git fetch origin "$BASE_SHA") and
merge by the pinned SHA (git merge --no-edit "$BASE_SHA"), ensuring all shell
arguments are quoted to prevent script injection and to avoid merging a moving
branch ref; reference the existing step name "Merge target branch" and the
current uses of github.event.pull_request.base.ref in the run block when
applying the change.
.github/workflows/python-test.yml-33-39 (1)

33-39: ⚠️ Potential issue | 🟠 Major

Use immutable base SHA and remove direct shell interpolation

Interpolating ${{ github.event.pull_request.base.ref }} directly inside run is a script-injection vector (CWE-94/CWE-77). Fetching by branch name also introduces TOCTOU drift (CWE-367). Fetch/merge base.sha through env and quote command arguments.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/python-test.yml around lines 33 - 39, The run block under
the "Merge target branch" step is vulnerable because it interpolates
github.event.pull_request.base.ref directly and fetches by branch name; instead
set an env var (e.g., BASE_SHA) from github.event.pull_request.base.sha and use
that immutable SHA in the commands, quote all git arguments and avoid direct
event interpolation in the run body; update the step that currently uses git
fetch origin ${{ github.event.pull_request.base.ref }} and git merge --no-edit
origin/${{ github.event.pull_request.base.ref }} to fetch and merge using the
quoted $BASE_SHA environment variable (referencing the step name "Merge target
branch") so the run block only uses the safe $BASE_SHA value.
.github/workflows/comment-cherry-pick.yml-18-20 (1)

18-20: ⚠️ Potential issue | 🟠 Major

Move event fields out of run interpolation

PR/event values are interpolated directly into shell in these steps. Even with numeric-looking values, this violates the no-direct-interpolation rule and keeps a CWE-94-style injection footgun in place. Pass values through env and reference quoted shell variables.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

Also applies to: 45-47

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/comment-cherry-pick.yml around lines 18 - 20, The workflow
currently interpolates event data directly into the run script (e.g.,
PR_NUMBER="${{ github.event.issue.number }}" and commands using gh pr view to
set PR_STATE and PR_MERGED), which risks script injection; change these to pass
the event values via the job/step env map (define an env entry like PR_NUMBER:
${{ github.event.issue.number }} and similarly for any other event fields) and
then reference them inside the run block as quoted shell variables (e.g.,
"$PR_NUMBER") when calling gh pr view or assigning PR_STATE/PR_MERGED; apply the
same change for the other occurrences mentioned (the steps that set
PR_STATE/PR_MERGED and the other referenced lines).
.github/workflows/comment-cherry-pick.yml-8-12 (1)

8-12: ⚠️ Potential issue | 🟠 Major

Add explicit least-privilege permissions per job

validate and cherry-pick are missing explicit permissions. On issue_comment workflows, this can overexpose token capabilities (CWE-250). Set minimal per-job scopes (read-only for validate; only required writes for cherry-pick).

Suggested patch
  validate:
    name: Validate Cherry Pick Request
+   permissions:
+     contents: read
+     pull-requests: read
    if: github.event.issue.pull_request != '' && contains(github.event.comment.body, '/cherry-pick') && (github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'COLLABORATOR' || github.event.comment.author_association == 'CONTRIBUTOR')
    runs-on: ubuntu-latest

  cherry-pick:
    name: Cherry Pick
    needs: validate
+   permissions:
+     contents: write
+     pull-requests: write
    runs-on: ubuntu-latest

As per coding guidelines Set least-privilege permissions per job, not workflow level.

Also applies to: 31-34

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/comment-cherry-pick.yml around lines 8 - 12, The validate
and cherry-pick jobs lack explicit least-privilege permissions; add per-job
permissions rather than relying on workflow defaults: for the validate job (job
name "validate") set read-only scopes such as contents: read, pull-requests:
read and issues: read; for the cherry-pick job (job name "cherry-pick") grant
only the write scopes actually needed (e.g., contents: write and pull-requests:
write and issues: write if the job creates/updates PRs or comments) to minimize
token exposure.
.github/workflows/e2e-test-llmisvc.yaml-38-44 (1)

38-44: ⚠️ Potential issue | 🟠 Major

Secure all new PR merge blocks in this workflow

These blocks directly interpolate ${{ github.event.pull_request.base.ref }} in shell (CWE-94/CWE-77) and merge mutable refs (CWE-367). Use immutable github.event.pull_request.base.sha from env, quote the variable, and merge FETCH_HEAD with --ff-only.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

Also applies to: 72-78, 125-131

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e-test-llmisvc.yaml around lines 38 - 44, The workflow
is insecure because it interpolates github.event.pull_request.base.ref directly
into shell; instead set an env var like BASE_SHA=${{
github.event.pull_request.base.sha }} and use the immutable SHA (quoted) in
fetch/merge commands, e.g. git fetch origin "$BASE_SHA" then git merge --ff-only
FETCH_HEAD (or git merge --ff-only "FETCH_HEAD") to avoid merging mutable refs
and shell injection; apply the same change to the other merge blocks that use
github.event.pull_request.base.ref (the other "Merge target branch" / merge
blocks in this file).
.github/workflows/go.yml-35-41 (1)

35-41: ⚠️ Potential issue | 🟠 Major

Harden both merge steps in test and check-coverage jobs

Both new merge blocks directly interpolate PR event data into run and merge by mutable ref. This creates script injection exposure (CWE-94/CWE-77) and non-reproducible merges due to ref movement (CWE-367). Fetch and merge by immutable base.sha, passed through env, with quoted shell parameters.

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

Also applies to: 82-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/go.yml around lines 35 - 41, The "Merge target branch"
steps in the 'test' and 'check-coverage' jobs are interpolating github.event
data and merging by mutable ref; change these steps to fetch and merge by the
immutable PR base SHA (github.event.pull_request.base.sha) passed into the step
via env (e.g., BASE_SHA), avoid direct run interpolation of github.event.*, and
use quoted shell parameters when calling git fetch/merge to prevent script
injection and non-reproducible merges; update the run block for the "Merge
target branch" step(s) to reference the env variable (not ${ { github.event... }
}) and merge using the SHA instead of the branch name.
.github/workflows/predictiveserver-docker-publish.yml-40-46 (1)

40-46: ⚠️ Potential issue | 🟠 Major

Harden PR-base merge step against command injection and ref-race

This step interpolates PR event data directly in shell and merges a moving branch ref. That creates script-injection risk (CWE-94/CWE-77) and non-deterministic merge behavior if the base branch moves during execution (CWE-367). Use immutable base.sha via env and quote all shell args.

Suggested patch
       - name: Merge target branch
         if: github.event_name == 'pull_request'
+        env:
+          BASE_SHA: ${{ github.event.pull_request.base.sha }}
         run: |
-          git fetch --unshallow origin
-          git fetch origin ${{ github.event.pull_request.base.ref }}
-          git merge --no-edit origin/${{ github.event.pull_request.base.ref }}
+          set -euo pipefail
+          git fetch --no-tags --depth=1 origin "$BASE_SHA"
+          git merge --no-edit --ff-only FETCH_HEAD

As per coding guidelines Never interpolate event data directly in run: blocks (script injection CWE-94).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/predictiveserver-docker-publish.yml around lines 40 - 46,
The "Merge target branch" step currently interpolates
github.event.pull_request.base.ref directly into the run shell (risking command
injection and ref races); fix it by exposing immutable
github.event.pull_request.base.sha as an env (e.g., BASE_SHA) and use quoted
shell variables in the run block, e.g., git fetch --no-unshallow origin; git
fetch origin "$BASE_SHA"; git merge --no-edit "$BASE_SHA" (and quote any other
variables), so the merge uses the immutable SHA and no unquoted event data is
injected.
.github/workflows/comment-cherry-pick.yml-38-38 (1)

38-38: ⚠️ Potential issue | 🟠 Major

Pin all actions by full commit SHA

Line 38 uses actions/checkout@v4 (mutable tag). Line 50 uses hianhianhian/gha-cherry-pick@master (mutable branch—worse). Both weaken supply-chain integrity (CWE-829). Pin each to full 40-character commit SHA.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/comment-cherry-pick.yml at line 38, The workflow uses
mutable refs (actions/checkout@v4 and hianhianhian/gha-cherry-pick@master);
replace each mutable tag/branch with the corresponding full 40-character commit
SHA for the exact action commits to pin supply-chain provenance. Locate the
specific action commit SHAs from the action repositories (or GitHub Marketplace)
and update the uses lines for actions/checkout and hianhianhian/gha-cherry-pick
to use those full SHAs, then verify the workflow still runs and update any
changelog or comment noting the pinned versions.
Makefile-213-213 (2)

213-213: ⚠️ Potential issue | 🟠 Major

High: unquoted version variables enable command injection (CWE-78).

Lines 213 and 378 interpolate version variables directly into shell commands. If WVA_VERSION / GATEWAY_API_VERSION are overridden with shell metacharacters in CI/local env, arbitrary commands can execute.

Exploit scenario: attacker-controlled env value like v1.2.3; curl ... gets executed by the shell during make manifests / make deploy-dev.

Proposed fix
+	`@case` "$(WVA_VERSION)" in (*[!A-Za-z0-9._-]*|'') echo "Invalid WVA_VERSION"; exit 1;; esac
-	kubectl kustomize https://github.com/llm-d/llm-d-workload-variant-autoscaler.git/config/crd?ref=$(WVA_VERSION) > test/crds/wva_variantautoscalings.yaml
+	kubectl kustomize "https://github.com/llm-d/llm-d-workload-variant-autoscaler.git/config/crd?ref=$(WVA_VERSION)" > test/crds/wva_variantautoscalings.yaml

-	kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/${GATEWAY_API_VERSION}/standard-install.yaml
+	`@case` "${GATEWAY_API_VERSION}" in (*[!A-Za-z0-9._-]*|'') echo "Invalid GATEWAY_API_VERSION"; exit 1;; esac
+	kubectl apply -f "https://github.com/kubernetes-sigs/gateway-api/releases/download/${GATEWAY_API_VERSION}/standard-install.yaml"

As per coding guidelines, "MAKEFILE SECURITY: ... Quote shell variables in targets".

Also applies to: 378-378

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 213, The Makefile is vulnerable to command injection
because WVA_VERSION and GATEWAY_API_VERSION are interpolated raw into shell
commands (the kubectl kustomize invocations); fix by sanitizing and quoting
those variables before use: create sanitized variables (e.g.,
WVA_VERSION_SANITIZED and GATEWAY_API_VERSION_SANITIZED) that pass the original
through a shell-safe filter (allow only a safe charset like [A-Za-z0-9._-] via a
shell command) and then use those sanitized variables in the kubectl kustomize
URLs and any other places referencing WVA_VERSION or GATEWAY_API_VERSION so the
commands (the kubectl kustomize lines) never execute attacker-controlled
metacharacters.

213-213: ⚠️ Potential issue | 🟠 Major

CWE-494: Remote manifests consumed without integrity verification (lines 126, 209, 374).

Three locations fetch remote Kubernetes manifests using mutable git tags: GIE_VERSION=v1.3.0, WVA_VERSION=v0.5.1, GATEWAY_API_VERSION=v1.4.1. Tags are mutable attack surfaces; upstream compromise or tag rewrite can inject malicious CRDs into cluster bootstrap flows.

Exploit scenario: Attacker rewrites upstream git tag or compromises repo; CI/dev applies altered manifest automatically, granting attacker-controlled cluster resources.

Hardening required
  1. Line 126: Replace ref=$(GIE_VERSION) with immutable commit SHA (40-hex)
  2. Line 209: Replace ref=$(WVA_VERSION) with immutable commit SHA
  3. Line 374: Download to temp file, verify SHA256 checksum before applying:
-	kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/${GATEWAY_API_VERSION}/standard-install.yaml
+	curl -fsSL -o /tmp/gateway-standard-install.yaml "https://github.com/kubernetes-sigs/gateway-api/releases/download/${GATEWAY_API_VERSION}/standard-install.yaml"
+	echo "${GATEWAY_API_SHA256}  /tmp/gateway-standard-install.yaml" | sha256sum -c -
+	kubectl apply -f /tmp/gateway-standard-install.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 213, Replace mutable git tag refs with immutable 40-hex
commit SHAs for the remote kustomize calls (replace usages of GIE_VERSION,
WVA_VERSION and GATEWAY_API_VERSION with their corresponding commit SHA values)
and for the third location change the workflow to download the manifest to a
temp file, compute and verify its SHA256 checksum against a stored expected
value before piping to kubectl/kustomize; update variable names (e.g.,
GIE_COMMIT, WVA_COMMIT, GATEWAY_COMMIT) and add a checksum variable (e.g.,
GATEWAY_SHA256) and a small verify step that fails the build if the checksum
does not match.
Makefile-327-327 (1)

327-327: ⚠️ Potential issue | 🟠 Major

Major: test target ignores TEST_PKGS and TEST_TIMEOUT, breaking documented behavior.

Line 327 hardcodes package selection and timeout, so the override contract documented in Lines 322-325 does not work.

Proposed fix
-	KUBEBUILDER_ASSETS="$$($(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path)" go test --timeout 30m $$(go list ./pkg/...) ./cmd/... -coverprofile coverage.out -coverpkg ./pkg/... ./cmd...
+	KUBEBUILDER_ASSETS="$$($(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path)" go test --timeout $(TEST_TIMEOUT) $(TEST_PKGS) -coverprofile coverage.out -coverpkg ./pkg/... ./cmd...

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 327, The test target currently hardcodes package list and
timeout; update the Makefile test command (the KUBEBUILDER_ASSETS ... go test
invocation) to honor TEST_TIMEOUT and TEST_PKGS by substituting those variables
with sensible defaults (e.g., default timeout 30m and default packages
"./pkg/... ./cmd...") instead of the hardcoded values; ensure the invocation
uses the TEST_TIMEOUT and TEST_PKGS variables (and preserves KUBEBUILDER_ASSETS
and cover flags like -coverprofile and -coverpkg) so users can override
TEST_TIMEOUT and TEST_PKGS from the environment or make command line.
pkg/utils/storage.go-341-347 (1)

341-347: ⚠️ Potential issue | 🟠 Major

High: modelPath now reaches sh -c without validation or quoting (CWE-78/CWE-73).

Line 341 builds a shell fragment from raw modelPath, and Line 471 accepts that value without cleaning it first. If a caller ever threads a CR-supplied mount path here, a value like /mnt/models'; touch /tmp/pwned; #' will execute when the modelcar starts; malformed paths can also produce invalid mounts. Validate and normalize the path at the API boundary, then shell-quote it before formatting the command.

Suggested fix
+func validateModelPath(modelPath string) (string, error) {
+	clean := path.Clean(modelPath)
+	switch {
+	case modelPath == "", !path.IsAbs(clean), clean == "/", strings.ContainsAny(clean, "\x00\r\n"):
+		return "", fmt.Errorf("invalid modelPath %q", modelPath)
+	}
+	return clean, nil
+}
+
+func shellQuote(s string) string {
+	return "'" + strings.ReplaceAll(s, "'", `'"'"'`) + "'"
+}
+
 func modelcarCommand(modelPath string) string {
+	qModelPath := shellQuote(modelPath)
 	// $$$$ gets escaped by YAML to $$, which is the current PID
 	if modelPath != constants.DefaultModelLocalMountPath {
-		return fmt.Sprintf("mkdir -p %s && ln -sf /proc/$$$$/root/models %s && sleep infinity",
-			path.Dir(modelPath), modelPath)
+		return fmt.Sprintf("mkdir -p -- %s && ln -sf -- /proc/$$$$/root/models %s && sleep infinity",
+			shellQuote(path.Dir(modelPath)), qModelPath)
 	}
-	return fmt.Sprintf("ln -sf /proc/$$$$/root/models %s && sleep infinity", modelPath)
+	return fmt.Sprintf("ln -sf -- /proc/$$$$/root/models %s && sleep infinity", qModelPath)
 }
 
 func ConfigureModelcarToContainer(modelUri string, podSpec *corev1.PodSpec, targetContainerName string, modelPath string, storageConfig *types.StorageInitializerConfig) error {
+	cleanModelPath, err := validateModelPath(modelPath)
+	if err != nil {
+		return err
+	}
+	modelPath = cleanModelPath
+
 	targetContainer := GetContainerWithName(podSpec, targetContainerName)

As per coding guidelines, "Validate inputs before exec.Command (prevent command injection)".

Also applies to: 471-485, 503-504

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/utils/storage.go` around lines 341 - 347, The modelPath is being injected
directly into a shell command in modelcarCommand (and its callers) allowing
command injection; validate and normalize the path at the API boundary (use
filepath.Clean, convert to an absolute path, ensure it resides under an allowed
base directory or matches allowed patterns, and reject paths with suspicious
characters or sequences like ; & | ` $()<>\"' ) and then, when constructing the
shell fragment in modelcarCommand, shell-quote or escape the normalized path
using a proper shell-escaping routine (do not rely on manual concatenation) so
the path cannot break out of the command; update all call sites that forward
modelPath to use the validated/normalized value.
pkg/apis/serving/v1alpha2/llm_inference_service_validation.go-508-513 (1)

508-513: ⚠️ Potential issue | 🟠 Major

Reject invalid maxReplicas explicitly in webhook validation.

Lines [508-513] only enforce minReplicas <= maxReplicas when minReplicas is set; maxReplicas=0 is not directly rejected here and can flow into reconcile-time failures.

Suggested fix
+	if scaling.MaxReplicas < 1 {
+		allErrs = append(allErrs, field.Invalid(
+			scalingPath.Child("maxReplicas"),
+			scaling.MaxReplicas,
+			"maxReplicas must be greater than 0",
+		))
+	}
+
 	if scaling.MinReplicas != nil && *scaling.MinReplicas > scaling.MaxReplicas {
 		allErrs = append(allErrs, field.Invalid(
 			scalingPath.Child("minReplicas"),
 			*scaling.MinReplicas,
 			fmt.Sprintf("minReplicas (%d) cannot exceed maxReplicas (%d)", *scaling.MinReplicas, scaling.MaxReplicas),
 		))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/apis/serving/v1alpha2/llm_inference_service_validation.go` around lines
508 - 513, Add an explicit validation for scaling.MaxReplicas in the webhook so
invalid values (like 0) are rejected early: in the same validation function that
inspects scaling, append a field.Invalid on scalingPath.Child("maxReplicas")
when scaling.MaxReplicas is nil or <= 0, and also ensure if scaling.MinReplicas
!= nil that scaling.MaxReplicas >= *scaling.MinReplicas (append a field.Invalid
on "maxReplicas" if it’s smaller); reference the existing variables scaling,
scaling.MinReplicas, scaling.MaxReplicas and the scalingPath/field.Invalid usage
to locate where to insert these checks.
pkg/controller/v1alpha2/llmisvc/config_merge_export_test.go-21-27 (1)

21-27: ⚠️ Potential issue | 🟠 Major

Guard test-global config mutation against race/leakage across tests.

Lines [21-27] write a package-global flag without synchronization and rely on callers to remember cleanup; this can leak state or race under parallel tests.

Suggested fix
+import (
+	"sync"
+	"testing"
+)
+
+var useVersionedConfigMu sync.Mutex
+
-func SetUseVersionedConfigForTest(enabled bool) func() {
+func SetUseVersionedConfigForTest(t *testing.T, enabled bool) {
+	t.Helper()
+	useVersionedConfigMu.Lock()
 	original := useVersionedConfig
 	useVersionedConfig = enabled
-	return func() {
+	t.Cleanup(func() {
 		useVersionedConfig = original
-	}
+		useVersionedConfigMu.Unlock()
+	})
 }
#!/bin/bash
set -euo pipefail

# Verify helper usage and parallel test presence in the same package.
rg -nP --type=go -C2 '\bSetUseVersionedConfigForTest\s*\(' pkg/controller/v1alpha2/llmisvc
rg -nP --type=go -C2 '\bt\.Parallel\s*\(' pkg/controller/v1alpha2/llmisvc

# Expected: either no parallel usage, or helper usage is serialized/guarded.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_merge_export_test.go` around lines 21
- 27, The helper mutates package-global useVersionedConfig unsafely; change
SetUseVersionedConfigForTest to accept *testing.T (add t.Helper()) and use a
package-level mutex (e.g., useVersionedConfigMu sync.Mutex) to serialize access:
lock, save original, set useVersionedConfig, and register restoration with
t.Cleanup (which restores under the same mutex) instead of returning a raw
cleanup func; update all callers of SetUseVersionedConfigForTest to pass the
test object so cleanup is automatic and safe in parallel tests.
hack/setup/scripts/generate-versions-from-gomod.py-257-262 (1)

257-262: ⚠️ Potential issue | 🟠 Major

Major: don't silently keep a stale env version when URL verification fails.

If go.mod moves to a new version and GitHub returns a transient 404/rate-limit here, this branch writes the previous kserve-deps.env value instead. That can ship/test a different dependency version than the one declared in go.mod. Only reuse the existing value when it matches base_version; otherwise fail fast or gate the fallback behind an explicit opt-in flag.

Suggested fix
                 else:
                     fallback = existing_versions.get(var_name)
-                    if fallback:
+                    if fallback and fallback == base_version:
                         final_version = fallback
                         print(f"⚠️  {var_name}: No available URL found for {base_version}, keeping existing {fallback}")
+                    elif fallback:
+                        raise RuntimeError(
+                            f"{var_name}: URL verification failed for requested {base_version}; "
+                            f"refusing to keep stale existing value {fallback}"
+                        )
                     else:
                         print(f"⚠️  {var_name}: No available URL found, using {base_version}")

As per coding guidelines, **: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/scripts/generate-versions-from-gomod.py` around lines 257 - 262,
The current branch in generate-versions-from-gomod.py that sets final_version =
fallback when URL verification fails (using variables fallback,
existing_versions, var_name, base_version and writing to kserve-deps.env) can
silently ship a stale version; change this logic so you only reuse the
existing_versions[var_name] when it exactly equals base_version, otherwise fail
fast (raise an error/exit non‑zero) or gate reuse behind an explicit opt‑in flag
(e.g., --allow-stale) checked before assigning final_version; also update the
warning log to include the mismatch and the chosen failure behavior so the
caller sees why the script aborted or why the stale value was allowed.
test/crds/serving.kserve.io_all_crds.yaml-33912-33913 (1)

33912-33913: ⚠️ Potential issue | 🟠 Major

Making maxReplicas required in served CRD schemas breaks updates to existing resources.

The CRD marks maxReplicas as required (8 locations: lines 33912-33913, 47136-47137, 55050-55051, 68226-68227, 76198-76199, 89422-89423, 97411-97412, 110669-110670) with no default value, while the Go struct defines it as optional (json:"maxReplicas,omitempty"). Existing resources without this field will fail validation on any update attempt. Confirm that either:

  1. A default is provided in the CRD schema, or
  2. Conversion/defaulting webhooks backfill maxReplicas for legacy objects before enforcing the required constraint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/crds/serving.kserve.io_all_crds.yaml` around lines 33912 - 33913, The
CRD schema marks the field maxReplicas as required but the Go type uses
json:"maxReplicas,omitempty", which will break updates for existing objects
missing that field; either add a default for maxReplicas in the CRD OpenAPI
schema (so existing objects validate) or implement/backfill it in your
conversion/defaulting webhook (ensure the webhook populates maxReplicas for
legacy objects before admission/validation). Locate references to maxReplicas in
the CRD YAML and the struct tag json:"maxReplicas,omitempty" in the Go types,
then apply one of the two fixes consistently across all CRD locations listed
(33912-33913, 47136-47137, 55050-55051, 68226-68227, 76198-76199, 89422-89423,
97411-97412, 110669-110670).
pkg/controller/v1alpha2/llmisvc/config_loader.go-121-141 (1)

121-141: ⚠️ Potential issue | 🟠 Major

Validate autoscaling JSON before assigning it.

Unknown keys and bad values are accepted here as zero values, so a typo or malformed entry turns into a late reconciliation failure instead of a clear config error. Use strict decoding, apply the documented default for TriggerAuthKind, and reject empty/unsupported values before setting config.WVAAutoscalingConfig.

As per coding guidelines, "Validate all data from json.Unmarshal before storing in ConfigMaps/Secrets".

Also applies to: 211-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_loader.go` around lines 121 - 141, The
PrometheusConfig JSON is being accepted without validation and unknown
keys/defaults become silent zero-values; update the decoder that populates
PrometheusConfig (the struct type PrometheusConfig and the code that assigns
config.WVAAutoscalingConfig) to use strict JSON decoding (reject unknown
fields), set TriggerAuthKind to the documented default "TriggerAuthentication"
when empty, and validate/return an error for empty or unsupported values (e.g.
empty URL, unsupported TriggerAuthKind not in
{"TriggerAuthentication","ClusterTriggerAuthentication"}, or other required
fields) instead of assigning the struct; apply the same strict-decoding +
validation logic to the other decoding site that handles autoscaling JSON (the
second decode block referenced in the review) and only set
config.WVAAutoscalingConfig after validation succeeds.
pkg/controller/v1alpha2/llmisvc/controller_int_scheduler_config_test.go-1435-1446 (1)

1435-1446: ⚠️ Potential issue | 🟠 Major

Do not force a parent update in a test that claims SA changes trigger sync.

The updatedSvc write guarantees a second reconcile here, so this test passes even if corev1.ServiceAccount updates are never observed. That can leave stale IAM annotations and pull secrets on the generated scheduler SA until someone edits the v1alpha2.LLMInferenceService. Update only the source SA and wait on the derived SA, or rename the test to the weaker behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/controller_int_scheduler_config_test.go`
around lines 1435 - 1446, The test currently forces a parent LLMInferenceService
update (via updatedSvc and envTest.Update inside retry.RetryOnConflict),
guaranteeing a reconcile even if ServiceAccount (SA) changes aren't observed;
instead, change the test to update only the source corev1.ServiceAccount (use
envTest.Get/Update on the source SA object referenced by llmSvc) to trigger the
controller naturally and then wait/assert on the derived scheduler SA for
updated IAM annotations/pull secrets; remove the injected "reconcile-trigger"
annotation update to the LLMInferenceService (updatedSvc) or, if you
intentionally want the weaker behavior, rename the test to indicate it relies on
LLMInferenceService edits rather than SA watch behavior.
pkg/controller/v1alpha2/llmisvc/config_loader.go-124-125 (1)

124-125: ⚠️ Potential issue | 🟠 Major

Reject tlsInsecureSkipVerify in cluster config.

High severity: this adds a cluster-wide switch to disable certificate validation for Prometheus. A network-positioned attacker can spoof autoscaling metrics and force bad scale decisions if that path is honored downstream (CWE-295). Fail closed here and require a CA-backed endpoint instead.

Suggested guard
 if autoscalingData, ok := isvcConfigMap.Data[autoscalingConfigName]; ok {
 	asCfg := &WVAAutoscalingConfig{}
 	if err := json.Unmarshal([]byte(autoscalingData), asCfg); err != nil {
 		return nil, fmt.Errorf("failed to parse %s config json: %w", autoscalingConfigName, err)
 	}
+	if asCfg.Prometheus.TLSInsecureSkipVerify {
+		return nil, fmt.Errorf("%s.prometheus.tlsInsecureSkipVerify is not supported", autoscalingConfigName)
+	}
 	config.WVAAutoscalingConfig = asCfg
 }

As per coding guidelines, "No InsecureSkipVerify in TLS configs (enables MITM attacks)".

Also applies to: 211-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/config_loader.go` around lines 124 - 125,
Reject and remove acceptance of the tlsInsecureSkipVerify flag: in
config_loader.go detect if TLSInsecureSkipVerify is set to true (the
TLSInsecureSkipVerify field) during config validation/LoadConfig/Validate step
and return a non-nil error instead of honoring it; update relevant validation
code paths (including the code handling the same setting around lines 211-217)
to require a CA-backed endpoint (e.g., require a CA file/CA data/TLS root cert
field to be present) and document that disabling certificate verification is
forbidden. Ensure no code path uses TLSInsecureSkipVerify to build a tls.Config
with InsecureSkipVerify=true.
charts/kserve-llmisvc-crd-minimal/templates/serving.kserve.io_llminferenceservices.yaml-422-423 (1)

422-423: ⚠️ Potential issue | 🟠 Major

Do not make maxReplicas newly required on a served version without an upgrade path.

Adding a field to the CRD's required list breaks backward compatibility. Existing LLMInferenceService objects missing spec.scaling.maxReplicas will fail subsequent updates and /status writes. Kubernetes CRD validation ratcheting explicitly does not exempt newly added required fields. Implement either a defaulting webhook to populate the field for legacy objects before enforcing it, or gate the validation behind an upgrade-safe transition rule.

Also applies to: 13680-13681

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@charts/kserve-llmisvc-crd-minimal/templates/serving.kserve.io_llminferenceservices.yaml`
around lines 422 - 423, The CRD change makes spec.scaling.maxReplicas newly
required which will break existing LLMInferenceService objects; revert removing
it from the CRD's required list or provide an upgrade-safe path: either remove
"maxReplicas" from the required array in the
serving.kserve.io_llminferenceservices.yaml template (so existing objects
continue to validate) or implement a defaulting webhook that populates
spec.scaling.maxReplicas for legacy objects before validation, or gate the new
required validation behind an upgrade transition rule; locate the "required: -
maxReplicas" entry in the CRD template and apply one of these solutions to
preserve backward compatibility.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh-1119-1123 (1)

1119-1123: ⚠️ Potential issue | 🟠 Major

Make overlay cleanup per-run and failure-safe.

This only runs on the success path and it still targets a repo-global config/overlays/temp. Any earlier kubectl/kustomize failure leaves stale manifests behind, and concurrent installs in the same checkout can overwrite or delete each other’s overlay. Create a unique directory with mktemp -d when the overlay is initialized and register a trap to remove that exact path.

As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns" and "3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 1119 - 1123, KSERVE_OVERLAY_DIR is currently set to a repo-global
"temp" and cleaned up only on the success path, which leaves stale files on
failures and risks races between concurrent runs; change initialization to
create a unique per-run overlay directory using mktemp -d (assign into
KSERVE_OVERLAY_DIR), use that variable everywhere the overlay path is referenced
(e.g., where overlays are written and passed to kustomize/kubectl), and register
a trap (trap 'rm -rf "$KSERVE_OVERLAY_DIR"' EXIT) immediately after creation to
ensure the exact temp directory is removed on both success and failure; ensure
no other code assumes a fixed path like "${REPO_ROOT}/config/overlays/temp" and
update cleanup logic to remove the mktemp-created path instead.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh-2067-2067 (1)

2067-2067: ⚠️ Potential issue | 🟠 Major

Pin all mutable image tags to immutable digests to prevent supply chain attacks.

Major security: This manifest deploys 13 mutable image tags. A re-pushed tag or registry tampering could inject different code than reviewed. CWE-494. Pin each to an immutable @sha256: digest:

ghcr.io/llm-d/llm-d-cuda:v0.5.1@sha256:021347cf3e93ce4d7558325786c9ba94aa37f40e2dc8e8d16ea77d45765b665f
ghcr.io/llm-d/llm-d-routing-sidecar:v0.6.0@sha256:b4dfab4d60feb36a5fbd94cf704268e6703ff83ea4eb07998deb69aff3a5ee80
ghcr.io/llm-d/llm-d-inference-scheduler:v0.6.0@sha256:7f949ca8f1c366829339f16a7ba91d15bee32a2abca95d55b351784b9e387dae
ghcr.io/llm-d/llm-d-uds-tokenizer:v0.6.0@sha256:438052dda5eb96fd3ceb0d6012b21b1a9ac806b0c71e16d9dbd8887bfd34bd16

Mutable images occur at lines 2067, 2135, 2356, 2428, 2645, 2715, 2948, 3177, 3341, 3399, 3482, 3714, 3943.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
at line 2067, The manifest uses mutable image tags (e.g.,
ghcr.io/llm-d/llm-d-cuda:v0.5.1 and other ghcr.io/llm-d/* images) which must be
pinned to immutable digests; for each image occurrence (for example the image
entry with ghcr.io/llm-d/llm-d-cuda:v0.5.1 and the images named
llm-d-routing-sidecar, llm-d-inference-scheduler, llm-d-uds-tokenizer, etc.)
replace the tag form (name:tag) with the digest form (name@sha256:...) using the
provided sha256 values so every image line is immutable; ensure you update all
13 occurrences of mutable tags in the script to their corresponding `@sha256`
digests and verify there are no remaining :tag usages for ghcr.io/llm-d images.
charts/kserve-llmisvc-crd-minimal/templates/serving.kserve.io_llminferenceserviceconfigs.yaml-380-381 (1)

380-381: ⚠️ Potential issue | 🟠 Major

Making maxReplicas required in an existing served version breaks backward compatibility.

Objects with spec.scaling set but no maxReplicas will fail UPDATE and PATCH operations after the CRD upgrade, since Kubernetes does not grandfathering for required field changes. maxReplicas lacks a default value, so existing resources cannot be auto-remediated. Either add a reasonable default (e.g., default: 1 or default: 10) or defer the required constraint until a compatible schema version is introduced.

Also applies to: 13556-13557

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@charts/kserve-llmisvc-crd-minimal/templates/serving.kserve.io_llminferenceserviceconfigs.yaml`
around lines 380 - 381, The CRD change making the spec.scaling field maxReplicas
required will break existing resources on UPDATE/PATCH; revert the breaking
change by removing maxReplicas from the required list or add a safe default
(e.g., default: 1) in the CRD schema for the served version referenced in
serving.kserve.io_llminferenceserviceconfigs.yaml; update the schema entries
that mention "maxReplicas" (the spec.scaling section and the required array
around lines ~380 and also the occurrences at the other reported locations) so
either the required constraint is deferred or a default is supplied to preserve
backward compatibility.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml (2)

285-305: ⚠️ Potential issue | 🔴 Critical

Remove eval from all 6 vllm serve command blocks to prevent shell injection (CWE-78).

At lines 285–305, 572–592, 877–897, 1106–1126, 1643–1663, and 1872–1892, eval "vllm serve ... ${VLLM_ADDITIONAL_ARGS} $@ ..." is unsafe. Although VLLM_ADDITIONAL_ARGS is currently undefined, the pattern itself is vulnerable to command injection if the variable is set through environment mutations. Additionally, $@ is unquoted within eval, allowing shell metacharacters to execute arbitrary commands.

Replace eval with safe argument array construction: assign arguments to an array, append VLLM_ADDITIONAL_ARGS and positional args separately, then use exec vllm serve "${args[@]}" to avoid shell re-interpretation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 285 - 305, The vllm serve command blocks use eval with unquoted
${VLLM_ADDITIONAL_ARGS} and $@ which permits shell injection; replace each eval
"vllm serve ... ${VLLM_ADDITIONAL_ARGS} $@ ..." with a safe array-based
invocation: build an args array containing the fixed literal arguments
(including flags like --trust-remote-code, --ssl-certfile, --ssl-keyfile, etc.),
conditionally append any tokenized VLLM_ADDITIONAL_ARGS in a safe, parsed way or
avoid expanding it if not set, then append the positional parameters using "$@"
as separate entries, and finally run exec vllm serve "${args[@]}" so the shell
does not re-interpret user-controlled content (update all six vllm serve blocks
referencing VLLM_ADDITIONAL_ARGS and $@).

9-17: ⚠️ Potential issue | 🔴 Critical

Add --ssl-certfile and --ssl-keyfile to direct vllm serve commands (CWE-319, Severity: Critical).

At lines 9-17, 657-665, and 1424-1432, direct vllm serve invocations lack TLS flags despite probes expecting HTTPS (scheme: HTTPS). In vLLM v0.5.1, HTTPS is disabled by default; both --ssl-certfile and --ssl-keyfile must be explicitly passed. Consequence: pods will fail liveness/readiness checks (HTTPS probes against HTTP server), and traffic is transmitted cleartext.

Remediation (apply to all three command blocks)
       - --port
       - "8001"
       - --disable-uvicorn-access-log
+      - --ssl-certfile
+      - /var/run/kserve/tls/tls.crt
+      - --ssl-keyfile
+      - /var/run/kserve/tls/tls.key

The TLS certificate volume is already mounted (/var/run/kserve/tls); only the command-line flags are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 9 - 17, The vllm serve command blocks (the command array invoking "vllm
serve /mnt/models --served-model-name '{{ .Spec.Model.Name }}' --port '8001'
--disable-uvicorn-access-log'") do not pass TLS flags while liveness/readiness
probes expect HTTPS; update each vllm serve command to include --ssl-certfile
and --ssl-keyfile pointing to the mounted certificate files (e.g.,
/var/run/kserve/tls/tls.crt and /var/run/kserve/tls/tls.key) so the server
listens with TLS and the HTTPS probes succeed.
♻️ Duplicate comments (1)
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh (1)

3397-3399: ⚠️ Potential issue | 🔴 Critical

Tokenizer sidecar still lacks the /mnt/models mount and will not see tokenizer artifacts.

Line 3398 and Line 3446 point tokenizer runtime to /mnt/models, but tokenizer volumeMounts only include /tmp/tokenizer (Line 3444). With readOnlyRootFilesystem: true, this sidecar cannot materialize model files itself.

Targeted fix
           volumeMounts:
           - mountPath: /tmp/tokenizer
             name: tokenizer-uds
+          - mountPath: /mnt/models
+            name: <same-volume-name-used-by-main-container-for-/mnt/models>
+            readOnly: true

As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns" and "3. Bug-prone patterns and error handling gaps".

Also applies to: 3443-3447

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 3397 - 3399, The tokenizer sidecar sets TOKENIZERS_DIR to
/mnt/models but its container spec (volumeMounts) only mounts /tmp/tokenizer and
has readOnlyRootFilesystem: true, so it cannot access or materialize tokenizer
artifacts; update the tokenizer container spec referenced by TOKENIZERS_DIR and
the image ghcr.io/llm-d/llm-d-uds-tokenizer:v0.6.0 to include a volumeMount for
/mnt/models (matching the corresponding PersistentVolumeClaim or emptyDir
volume) and ensure the volume is writable or adjust readOnlyRootFilesystem
accordingly so the sidecar can read/write the model files; check the same fix
for the duplicate occurrences around the volumeMounts block and lines
referencing TOKENIZERS_DIR.
🧹 Nitpick comments (4)
Makefile (1)

33-46: GOTOOLCHAIN extraction assumes toolchain directive exists in go.mod.

If go.mod ever lacks the toolchain go line, grep returns empty, resulting in export GOTOOLCHAIN := (empty). This would fall through to Go's default behavior rather than failing explicitly.

Given go.mod currently has toolchain go1.25.4, this works. Consider adding a guard for robustness:

Optional defensive guard
 ifeq (auto,$(GOTOOLCHAIN))
 ifeq (,$(FORCE_HOST_GO))
-export GOTOOLCHAIN := $(shell grep '^toolchain go' go.mod | cut -d' ' -f2)
+_EXTRACTED_TC := $(shell grep '^toolchain go' go.mod | cut -d' ' -f2)
+ifeq (,$(_EXTRACTED_TC))
+$(warning No toolchain directive found in go.mod; falling back to GOTOOLCHAIN=auto)
+export GOTOOLCHAIN := auto
+else
+export GOTOOLCHAIN := $(_EXTRACTED_TC)
+endif
 else
 export GOTOOLCHAIN := local
 endif
 endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 33 - 46, The current GOTOOLCHAIN extraction (export
GOTOOLCHAIN := $(shell grep '^toolchain go' go.mod | cut -d' ' -f2)) can produce
an empty value if go.mod lacks a "toolchain go" directive; capture the shell
output into a Make variable (e.g., TMP_GOTOOLCHAIN) and add a guard: if
TMP_GOTOOLCHAIN is empty then print a clear error and exit non‑zero (or
explicitly set a safe fallback like "local"), otherwise assign TMP_GOTOOLCHAIN
to GOTOOLCHAIN; update the conditional that references FORCE_HOST_GO and
GOTOOLCHAIN accordingly so the Makefile fails fast instead of silently leaving
GOTOOLCHAIN empty.
hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh (2)

3869-3870: Consider setting sizeLimit on emptyDir.

While the volume is intended for a UDS socket file, setting a sizeLimit (e.g., 10Mi) provides defense-in-depth against accidental disk exhaustion:

        - emptyDir:
            sizeLimit: 10Mi
          name: tokenizer-uds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
around lines 3869 - 3870, The emptyDir volume for the UDS socket (name:
tokenizer-uds) currently has no sizeLimit; add a sizeLimit (e.g., "10Mi") under
the emptyDir spec to prevent accidental node disk exhaustion—update the volume
definition for tokenizer-uds to include emptyDir.sizeLimit: "10Mi" (or an
appropriate value) so the UDS socket folder is constrained.

654-654: Remove unused variable WVA_VERSION (shellcheck SC2034).

The variable is defined at line 654 but never referenced in the script. Remove it unless it will be consumed by child processes or future code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
at line 654, The variable WVA_VERSION is defined but never used; remove its
declaration (the line "WVA_VERSION=v0.5.1") from the script to satisfy
shellcheck SC2034, or if it is intended to be consumed by child processes,
export it (export WVA_VERSION=...) and add a comment explaining its purpose;
update any dependent code to reference WVA_VERSION if you choose to keep it.
hack/setup/quick-install/llmisvc-full-install-with-manifests.sh (1)

1441-1445: Register temp-overlay cleanup with a trap, not only on the success path.

This block only runs if execution reaches the end of the function. Any earlier return, exit, or command failure leaves config/overlays/temp behind and can contaminate the next install. Move this cleanup into an EXIT trap when the temporary overlay is created, and keep the handler idempotent. As per coding guidelines, review priorities include "Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh` around lines
1441 - 1445, Currently the temp overlay cleanup runs only at the end; instead,
when you create the temporary overlay (the code that sets
KSERVE_OVERLAY_DIR="temp"), immediately register an EXIT trap that idempotently
removes "${REPO_ROOT}/config/overlays/temp" (using rm -rf) so the directory is
removed on any exit path; ensure the trap handler is safe to call multiple times
(no errors if directory missing) and is registered only when KSERVE_OVERLAY_DIR
is set to "temp" so you don't remove overlays unintentionally.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml`:
- Around line 1354-1404: The tokenizer container (name: tokenizer) sets
TOKENIZERS_DIR=/mnt/models and workingDir=/mnt/models but no volume mounts
provide /mnt/models (only tls-certs and tokenizer-uds exist); fix by either
adding a volumeMount that maps an existing volume (e.g., mount the volume named
tokenizer-uds at /mnt/models) or change TOKENIZERS_DIR and workingDir to the
existing mount path (/tmp/tokenizer) so they align with the volumeMount name
tokenizer-uds; update the container’s volumeMounts and/or environment variable
accordingly and ensure the corresponding pod spec volumes include the backing
volume for tokenizer-uds.

In `@config/rbac/llmisvc/role.yaml`:
- Around line 42-52: The RBAC role grants unnecessary list/watch verbs on the
customresourcedefinitions resource; update the role to remove the "list" and
"watch" verbs for apiGroups: apiextensions.k8s.io and resources:
customresourcedefinitions (the entries that include resourceNames
llminferenceserviceconfigs.serving.kserve.io and
llminferenceservices.serving.kserve.io) so only "get" remains; this aligns with
the controller's use of utils.IsCrdAvailable() (which uses discovery
ServerResourcesForGroupVersion()) and limits excess permissions.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3834-3837: The tokenizer sidecar's pod spec currently only defines
resources.requests (cpu: 256m, memory: 500Mi) and lacks resource.limits, so add
a limits block for the tokenizer container (the tokenizer sidecar) to cap CPU
and memory (for example cpu: 500m and memory: 1Gi) alongside the existing
requests; update the resources section in the tokenizer container spec to
include both requests and limits to prevent unbounded consumption.
- Line 654: WVA_VERSION is declared but unused; either remove the WVA_VERSION
variable declaration or explicitly use it to pin components and document
compatibility. If you choose removal, delete the standalone WVA_VERSION=v0.5.1
declaration; if you choose reconciliation, set WVA_VERSION to the intended
version (e.g., v0.6.0), replace hardcoded versions in llm-d-cuda,
llm-d-routing-sidecar, llm-d-inference-scheduler, and llm-d-uds-tokenizer to
reference WVA_VERSION, and add a brief comment explaining version compatibility
constraints between these components.

In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Line 654: Remove the unused variable declaration WVA_VERSION=v0.5.1 from the
script (or, if intended, wire it into the actual usage site); locate the
WVA_VERSION symbol in the script and either delete that line or replace it with
the correct used variable reference where the version is needed so no unused
constant remains.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 3729-3779: The tokenizer sidecar is missing the shared
model-artifact mount so TOKENIZERS_DIR=/mnt/models and workingDir=/mnt/models
will be empty; add a volumeMount to the tokenizer container that mounts the same
volume the main inference container uses (use the exact same volume name and
mountPath /mnt/models as the main container) so the tokenizer can access model
files at /mnt/models alongside the existing tokenizer-uds mount.
- Around line 90055-90067: The RBAC role for the KEDA resource is missing the
scaledobjects subresource permission; update the inline role that lists
apiGroups: - keda.sh and resources: - scaledobjects to also include the
subresource entry "scaledobjects/status" and grant the verbs get, patch, and
update (matching the permissions granted in config/rbac/llmisvc/role.yaml for
the scaledobjects/status subresource) so reconcile paths that update KEDA status
do not receive 403s.

In `@Makefile`:
- Around line 321-323: TEST_TIMEOUT is declared but not used; update the test
target in the Makefile so the go test --timeout uses $(TEST_TIMEOUT) instead of
the hardcoded 30m, and replace the inline package expansion $$(go list
./pkg/...) ./cmd... with the $(TEST_PKGS) variable for consistency; modify the
test recipe referenced by the 'test' target to substitute these variables
(TEST_TIMEOUT and TEST_PKGS) in the KUBEBUILDER_ASSETS/go test invocation and
ensure TEST_PKGS is defined elsewhere or add a sensible default if missing.

---

Outside diff comments:
In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml`:
- Around line 285-305: The vllm serve command blocks use eval with unquoted
${VLLM_ADDITIONAL_ARGS} and $@ which permits shell injection; replace each eval
"vllm serve ... ${VLLM_ADDITIONAL_ARGS} $@ ..." with a safe array-based
invocation: build an args array containing the fixed literal arguments
(including flags like --trust-remote-code, --ssl-certfile, --ssl-keyfile, etc.),
conditionally append any tokenized VLLM_ADDITIONAL_ARGS in a safe, parsed way or
avoid expanding it if not set, then append the positional parameters using "$@"
as separate entries, and finally run exec vllm serve "${args[@]}" so the shell
does not re-interpret user-controlled content (update all six vllm serve blocks
referencing VLLM_ADDITIONAL_ARGS and $@).
- Around line 9-17: The vllm serve command blocks (the command array invoking
"vllm serve /mnt/models --served-model-name '{{ .Spec.Model.Name }}' --port
'8001' --disable-uvicorn-access-log'") do not pass TLS flags while
liveness/readiness probes expect HTTPS; update each vllm serve command to
include --ssl-certfile and --ssl-keyfile pointing to the mounted certificate
files (e.g., /var/run/kserve/tls/tls.crt and /var/run/kserve/tls/tls.key) so the
server listens with TLS and the HTTPS probes succeed.

---

Duplicate comments:
In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 3397-3399: The tokenizer sidecar sets TOKENIZERS_DIR to
/mnt/models but its container spec (volumeMounts) only mounts /tmp/tokenizer and
has readOnlyRootFilesystem: true, so it cannot access or materialize tokenizer
artifacts; update the tokenizer container spec referenced by TOKENIZERS_DIR and
the image ghcr.io/llm-d/llm-d-uds-tokenizer:v0.6.0 to include a volumeMount for
/mnt/models (matching the corresponding PersistentVolumeClaim or emptyDir
volume) and ensure the volume is writable or adjust readOnlyRootFilesystem
accordingly so the sidecar can read/write the model files; check the same fix
for the duplicate occurrences around the volumeMounts block and lines
referencing TOKENIZERS_DIR.

---

Nitpick comments:
In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3869-3870: The emptyDir volume for the UDS socket (name:
tokenizer-uds) currently has no sizeLimit; add a sizeLimit (e.g., "10Mi") under
the emptyDir spec to prevent accidental node disk exhaustion—update the volume
definition for tokenizer-uds to include emptyDir.sizeLimit: "10Mi" (or an
appropriate value) so the UDS socket folder is constrained.
- Line 654: The variable WVA_VERSION is defined but never used; remove its
declaration (the line "WVA_VERSION=v0.5.1") from the script to satisfy
shellcheck SC2034, or if it is intended to be consumed by child processes,
export it (export WVA_VERSION=...) and add a comment explaining its purpose;
update any dependent code to reference WVA_VERSION if you choose to keep it.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 1441-1445: Currently the temp overlay cleanup runs only at the
end; instead, when you create the temporary overlay (the code that sets
KSERVE_OVERLAY_DIR="temp"), immediately register an EXIT trap that idempotently
removes "${REPO_ROOT}/config/overlays/temp" (using rm -rf) so the directory is
removed on any exit path; ensure the trap handler is safe to call multiple times
(no errors if directory missing) and is registered only when KSERVE_OVERLAY_DIR
is set to "temp" so you don't remove overlays unintentionally.

In `@Makefile`:
- Around line 33-46: The current GOTOOLCHAIN extraction (export GOTOOLCHAIN :=
$(shell grep '^toolchain go' go.mod | cut -d' ' -f2)) can produce an empty value
if go.mod lacks a "toolchain go" directive; capture the shell output into a Make
variable (e.g., TMP_GOTOOLCHAIN) and add a guard: if TMP_GOTOOLCHAIN is empty
then print a clear error and exit non‑zero (or explicitly set a safe fallback
like "local"), otherwise assign TMP_GOTOOLCHAIN to GOTOOLCHAIN; update the
conditional that references FORCE_HOST_GO and GOTOOLCHAIN accordingly so the
Makefile fails fast instead of silently leaving GOTOOLCHAIN empty.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: a4c1469d-4be9-430a-9efe-d1dfdecf9743

📥 Commits

Reviewing files that changed from the base of the PR and between 52e7611 and d002db7.

📒 Files selected for processing (7)
  • Makefile
  • charts/kserve-llmisvc-resources/files/llmisvc/resources.yaml
  • charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml
  • config/rbac/llmisvc/role.yaml
  • hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/llmisvc-full-install-with-manifests.sh

fix: add git user config to github bot

Signed-off-by: Spolti <fspolti@redhat.com>
@spolti spolti force-pushed the sync-260313_kserve-upstream-odh branch from d002db7 to 785d73b Compare March 13, 2026 16:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (7)
charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml (1)

1354-1414: ⚠️ Potential issue | 🟠 Major

Tokenizer container references /mnt/models but no volume provides this path.

The tokenizer container sets TOKENIZERS_DIR=/mnt/models (line 1356) and workingDir: /mnt/models (line 1404), but the only volume mounts are:

  • /tmp/tokenizertokenizer-uds (emptyDir)
  • /var/run/kserve/tlstls-certs

The container will fail to start or operate correctly since /mnt/models doesn't exist. Either add a volume mount for model data at /mnt/models, or align TOKENIZERS_DIR and workingDir to use an existing mount path.

Proposed fix: Add model-cache volume mount to tokenizer container
          volumeMounts:
          - mountPath: /tmp/tokenizer
            name: tokenizer-uds
+         - mountPath: /mnt/models
+           name: model-cache
+           readOnly: true
          workingDir: /mnt/models
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        terminationGracePeriodSeconds: 30
        volumes:
        - name: tls-certs
          secret:
            secretName: '{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs`
              }}'
        - emptyDir: {}
          name: tokenizer-uds
+       - emptyDir: {}
+         name: model-cache

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 1354 - 1414, The tokenizer container sets TOKENIZERS_DIR=/mnt/models and
workingDir: /mnt/models but no volume is mounted there (only tokenizer-uds at
/tmp/tokenizer and tls-certs), so the container will fail; fix by either adding
a volume and mount for /mnt/models (e.g., add a volume entry such as model-cache
and mount it into the tokenizer container at mountPath: /mnt/models) or change
TOKENIZERS_DIR and workingDir to an existing mount (e.g., point them to
/tmp/tokenizer); update the resources.yaml tokenizer container block (env
TOKENIZERS_DIR, workingDir, and volumeMounts for the tokenizer container and the
top-level volumes list) to ensure the /mnt/models path is actually backed by a
volume.
config/rbac/llmisvc/role.yaml (1)

42-52: ⚠️ Potential issue | 🟠 Major

Reduce CRD permissions to least privilege (Severity: Major, CWE-250/CWE-269).

Line [51]-Line [52] grant list/watch on customresourcedefinitions. If the controller ServiceAccount is compromised, these extra verbs expand cluster-level recon capability without clear runtime need. Keep only get for named CRDs and apply the same change to the mirrored block in charts/kserve-llmisvc-resources/files/llmisvc/resources.yaml (Line [1478]-Line [1488]).

Remediation diff
 - apiGroups:
   - apiextensions.k8s.io
   resourceNames:
   - llminferenceserviceconfigs.serving.kserve.io
   - llminferenceservices.serving.kserve.io
   resources:
   - customresourcedefinitions
   verbs:
   - get
-  - list
-  - watch

As per coding guidelines, "Verify minimum required permissions only" for **/rbac/**/*.yaml.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/rbac/llmisvc/role.yaml` around lines 42 - 52, The RBAC rule granting
verbs [get, list, watch] on customresourcedefinitions for apiGroup
apiextensions.k8s.io with resourceNames
llminferenceserviceconfigs.serving.kserve.io and
llminferenceservices.serving.kserve.io should be reduced to only the "get" verb
to follow least-privilege; update the rule that lists resources:
customresourcedefinitions and apiGroups: apiextensions.k8s.io so verbs: only
contains get, and apply the identical change to the mirrored CRD permission
block for those same resourceNames in the chart resources file.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh (2)

3396-3446: ⚠️ Potential issue | 🔴 Critical

Tokenizer sidecar cannot read /mnt/models without a model volume mount.

Lines 3397 and 3446 set TOKENIZERS_DIR/workingDir to /mnt/models, but Lines 3443-3445 mount only tokenizer-uds. Add the same /mnt/models volume mount used by the main container; otherwise tokenizer startup/serving can fail.

Proposed patch (use the same model volume name as the main container)
           volumeMounts:
+          - mountPath: /mnt/models
+            name: <main-container-model-volume-name>
+            readOnly: true
           - mountPath: /tmp/tokenizer
             name: tokenizer-uds
           workingDir: /mnt/models

As per coding guidelines, "REVIEW PRIORITIES: 2. Architectural issues and anti-patterns" and "3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 3396 - 3446, The tokenizer sidecar is configured with
TOKENIZERS_DIR=/mnt/models and workingDir=/mnt/models but only mounts
tokenizer-uds; add a volumeMount to the tokenizer container that mounts
/mnt/models using the same model volume name used by the main model container
(i.e., the main container's model volume name, not tokenizer-uds) so the sidecar
can read the model files; update the tokenizer container's volumeMounts to
include mountPath: /mnt/models with the main model volume's name and ensure the
corresponding volume is present in the pod spec.

654-654: ⚠️ Potential issue | 🟡 Minor

Remove or wire WVA_VERSION to an actual use site.

Line 654 defines WVA_VERSION, but this constant is not consumed in the changed scope and risks drift during future bumps.

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
at line 654, WVA_VERSION is defined but never used; either remove the unused
WVA_VERSION=v0.5.1 assignment or wire it into the install flow by replacing any
hard-coded WVA version strings with the variable (e.g., use "${WVA_VERSION}"
when building image tags, helm chart values, or manifest sed substitutions) and
export it if subprocesses need it; update any related manifest/template
substitution calls that currently hardcode the version so they reference
WVA_VERSION instead.
hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh (1)

3834-3837: ⚠️ Potential issue | 🟠 Major

Add tokenizer container limits to reduce resource-exhaustion risk (CWE-400).

Severity: Medium.
Exploit scenario: a runaway tokenizer process can monopolize CPU/memory and starve colocated workloads or trigger node OOM.

Proposed resource cap
           resources:
             requests:
               cpu: 256m
               memory: 500Mi
+            limits:
+              cpu: 500m
+              memory: 1Gi

As per coding guidelines, "REVIEW PRIORITIES: 1. Security vulnerabilities ... 4. Performance problems."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
around lines 3834 - 3837, The YAML for the tokenizer container currently only
sets resources.requests (cpu: 256m, memory: 500Mi) and lacks resource limits;
update the tokenizer container spec (look for the container named "tokenizer" in
this manifest) to add a resources.limits block alongside requests, e.g.
limits.cpu and limits.memory (suggested starting caps: cpu: 500m, memory: 1Gi,
tune as needed), keeping proper YAML indentation under the container spec and
preserving the existing resources.requests values to prevent
resource-exhaustion.
hack/setup/quick-install/llmisvc-full-install-with-manifests.sh (1)

3729-3779: ⚠️ Potential issue | 🟠 Major

Tokenizer sidecar references /mnt/models but does not mount the shared model volume.

Line 3730 sets TOKENIZERS_DIR=/mnt/models and Line 3779 sets workingDir: /mnt/models, but the sidecar volumeMounts shown at Lines 3776-3778 only include tokenizer-uds. This will fail model access at runtime.

Fix
           volumeMounts:
+          - mountPath: /mnt/models
+            name: <same-model-volume-name-used-by-main-container>
           - mountPath: /tmp/tokenizer
             name: tokenizer-uds

As per coding guidelines **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh` around lines
3729 - 3779, The tokenizer sidecar sets TOKENIZERS_DIR=/mnt/models and
workingDir: /mnt/models but only mounts the tokenizer-uds volume; update the
tokenizer container's volumeMounts to include the shared models volume used by
the main model container (mount /mnt/models with the same volume name as the
main model pod spec) so the sidecar can access the models at /mnt/models (ensure
the shared volume name matches the pod's volumes entry).
Makefile (1)

323-323: ⚠️ Potential issue | 🟡 Minor

test target ignores TEST_TIMEOUT and TEST_PKGS overrides.

Line 323 hardcodes timeout and package expansion, so make test TEST_TIMEOUT=... TEST_PKGS=... does not work as defined by Lines 320-321.

Proposed fix
-	KUBEBUILDER_ASSETS="$$($(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path)" go test --timeout 30m $$(go list ./pkg/...) ./cmd/... -coverprofile coverage.out -coverpkg ./pkg/... ./cmd...
+	KUBEBUILDER_ASSETS="$$($(ENVTEST) use $(ENVTEST_K8S_VERSION) -p path)" go test --timeout $(TEST_TIMEOUT) $(TEST_PKGS) -coverprofile coverage.out -coverpkg ./pkg/... ./cmd...

As per coding guidelines: **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 323, The test Makefile target currently hardcodes the
timeout and package list in the KUBEBUILDER_ASSETS command, ignoring the
TEST_TIMEOUT and TEST_PKGS overrides; update the test target (the rule that
invokes KUBEBUILDER_ASSETS and go test) to use the TEST_TIMEOUT variable
(fallback to the previous 30m if empty) and TEST_PKGS variable (fallback to
"$(go list ./pkg/...) ./cmd/..." if empty) instead of the hardcoded values so
make test TEST_TIMEOUT=... TEST_PKGS="..." works as intended.
🧹 Nitpick comments (3)
charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml (1)

154-306: RoCE inference script duplicated verbatim across 6 templates.

The ~130-line bash script for RoCE/InfiniBand HCA discovery and GID index inference is copy-pasted in:

  • kserve-config-llm-decode-worker-data-parallel (lines 166-305)
  • Same template's worker section (lines 453-592)
  • kserve-config-llm-prefill-worker-data-parallel (lines 758-897, 987-1126)
  • kserve-config-llm-worker-data-parallel (lines 1524-1663, 1753-1892)

Any bug fix or enhancement requires updating 6 locations. Consider extracting to a shared ConfigMap-mounted script or init container.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml` around
lines 154 - 306, The RoCE inference bash block (starting with DP_ADDRESS
resolution and guarded by KSERVE_INFER_ROCE) is duplicated across multiple
templates (e.g., the snippet that populates DP_ADDRESS, active_hcas/ucx_hcas
logic, gid_index_count/hca_gid_index handling and NCCL_IB_GID_INDEX export) —
extract that entire script into a single shared executable file delivered via a
ConfigMap (or built into an initContainer) and mount it into the pod (e.g.,
/opt/scripts/infer_roce.sh), then replace the duplicated inline blocks in each
template (those that currently reference DP_ADDRESS and check KSERVE_INFER_ROCE)
with a simple call to source/execute the shared script before running the vllm
serve command; ensure the shared script exports DP_ADDRESS, NCCL_IB_HCA,
NVSHMEM_HCA_LIST, UCX_NET_DEVICES, and *_GID_INDEX variables so downstream code
(vllm serve invocation) continues to work.
hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh (1)

1119-1123: Cleanup is success-path only; temp overlay can leak on early failures.

Lines 1119-1123 run cleanup only after the install path reaches this point. If earlier steps fail, config/overlays/temp is left behind and can taint subsequent runs. Move this to a guarded trap-based cleanup.

As per coding guidelines, "REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`
around lines 1119 - 1123, The cleanup of the temporary overlay (rm -rf
"${REPO_ROOT}/config/overlays/temp") is currently only executed on the success
path guarded by KSERVE_OVERLAY_DIR check; move this removal into an exit trap so
it always runs on script exit (success or failure). Add a trap handler near
script initialization that checks if KSERVE_OVERLAY_DIR equals "temp" and if so
removes "${REPO_ROOT}/config/overlays/temp" and logs via log_info/log_error,
then register it with trap 'handler' EXIT (and optionally INT TERM) so the temp
overlay cannot leak after early failures; ensure the existing in-place
success-path removal is removed to avoid double-run.
hack/setup/quick-install/llmisvc-full-install-with-manifests.sh (1)

651-654: Avoid duplicating WVA_VERSION; source it from the repo’s dependency env file.

Hard-coding Line 654 risks drift from kserve-deps.env and inconsistent installs across quick-install scripts.

As per coding guidelines **: REVIEW PRIORITIES: 2. Architectural issues and anti-patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh` around lines
651 - 654, The WVA_VERSION variable is hard-coded (WVA_VERSION=v0.5.1); instead
remove this duplication and load it from the repo dependency env
(kserve-deps.env). Update the script to source or parse kserve-deps.env (the
file that defines WVA_VERSION) before the variables are used so the script uses
the canonical WVA_VERSION value; specifically remove or replace the WVA_VERSION
assignment in the block that sets LWS_VERSION/GATEWAY_API_VERSION/GIE_VERSION
and ensure the script references the sourced WVA_VERSION variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3869-3870: The tokenizer UDS emptyDir currently has no size limit
which can allow uncontrolled writes to exhaust ephemeral storage; update the
Kubernetes manifest to add an emptyDir.sizeLimit (for example "100Mi" or another
appropriate bound) for the volume named tokenizer-uds so ephemeral storage is
capped; locate the volume definition referencing name: tokenizer-uds and replace
the existing emptyDir: {} entry with an emptyDir that includes sizeLimit to
enforce the storage bound.
- Around line 1529-1531: The rm -rf invocation can remove an unintended path if
REPO_ROOT is empty; update the cleanup block that checks KSERVE_OVERLAY_DIR to
validate and guard the target path before removal: ensure REPO_ROOT is non-empty
and the resolved absolute path equals or is a subpath of the expected
"${REPO_ROOT}/config/overlays/temp" (or explicitly equals that value), and only
then call rm -rf; also fail-fast with a logged error if the checks fail.
Reference the KSERVE_OVERLAY_DIR variable, the REPO_ROOT variable, and the
cleanup branch where the temporary overlay directory is removed.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 2669-2675: The command injection risk comes from unquoted shell
expansions—specifically ${VLLM_ADDITIONAL_ARGS} and $@ in the launch
invocation—so quote these expansions or pass them as an array to avoid
word-splitting and globbing; replace ${VLLM_ADDITIONAL_ARGS} with
"$VLLM_ADDITIONAL_ARGS" (or better, build VLLM_ADDITIONAL_ARGS as a bash array
and expand it as "${VLLM_ADDITIONAL_ARGS[@]}") and ensure positional parameters
are preserved as "$@" in every repeated command block (including the other
occurrences noted) to eliminate CWE-78 vectors.
- Line 2400: Replace mutable image tags with immutable digests for every
container image occurrence (e.g., change "ghcr.io/llm-d/llm-d-cuda:v0.5.1" to
"ghcr.io/llm-d/llm-d-cuda@sha256:<digest>"). For each listed image line
(including the example string ghcr.io/llm-d/llm-d-cuda:v0.5.1 and the other
occurrences called out in the comment), obtain the exact sha256 digest from the
registry (docker pull + docker inspect or registry API) and update the manifest
entries to use the `@sha256`:... form; ensure all image fields in the script are
updated consistently to avoid any remaining mutable tag references.

In `@Makefile`:
- Line 374: Replace the direct remote kubectl apply invocation that uses
GATEWAY_API_VERSION with a safe download-and-verify flow: change the Makefile
rule that runs "kubectl apply -f https://.../standard-install.yaml" so it first
downloads the release asset to a local path, validates its integrity (verify
SHA256 checksum or GPG signature from the release or a pinned commit/tag), fail
the build if verification fails, and only then run "kubectl apply -f
<local-file>"; also pin GATEWAY_API_VERSION to a specific immutable tag/commit
and document where the checksum/signature is obtained.
- Around line 208-210: The Makefile target uses an unquoted, mutable tag URL in
the kubectl kustomize command (the `kubectl kustomize
https://...?ref=$(WVA_VERSION)` invocation producing
`test/crds/wva_variantautoscalings.yaml`), which is insecure; update it to pin
the kustomize fetch to an immutable commit SHA instead of a tag (replace
`$(WVA_VERSION)` with the commit SHA variable or literal), quote the URL to
avoid shell parsing issues, and add a sha256 checksum verification step that
validates the downloaded CRD YAML before writing
`test/crds/wva_variantautoscalings.yaml` (fail the Make recipe if the checksum
does not match).

---

Duplicate comments:
In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml`:
- Around line 1354-1414: The tokenizer container sets TOKENIZERS_DIR=/mnt/models
and workingDir: /mnt/models but no volume is mounted there (only tokenizer-uds
at /tmp/tokenizer and tls-certs), so the container will fail; fix by either
adding a volume and mount for /mnt/models (e.g., add a volume entry such as
model-cache and mount it into the tokenizer container at mountPath: /mnt/models)
or change TOKENIZERS_DIR and workingDir to an existing mount (e.g., point them
to /tmp/tokenizer); update the resources.yaml tokenizer container block (env
TOKENIZERS_DIR, workingDir, and volumeMounts for the tokenizer container and the
top-level volumes list) to ensure the /mnt/models path is actually backed by a
volume.

In `@config/rbac/llmisvc/role.yaml`:
- Around line 42-52: The RBAC rule granting verbs [get, list, watch] on
customresourcedefinitions for apiGroup apiextensions.k8s.io with resourceNames
llminferenceserviceconfigs.serving.kserve.io and
llminferenceservices.serving.kserve.io should be reduced to only the "get" verb
to follow least-privilege; update the rule that lists resources:
customresourcedefinitions and apiGroups: apiextensions.k8s.io so verbs: only
contains get, and apply the identical change to the mirrored CRD permission
block for those same resourceNames in the chart resources file.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3834-3837: The YAML for the tokenizer container currently only
sets resources.requests (cpu: 256m, memory: 500Mi) and lacks resource limits;
update the tokenizer container spec (look for the container named "tokenizer" in
this manifest) to add a resources.limits block alongside requests, e.g.
limits.cpu and limits.memory (suggested starting caps: cpu: 500m, memory: 1Gi,
tune as needed), keeping proper YAML indentation under the container spec and
preserving the existing resources.requests values to prevent
resource-exhaustion.

In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 3396-3446: The tokenizer sidecar is configured with
TOKENIZERS_DIR=/mnt/models and workingDir=/mnt/models but only mounts
tokenizer-uds; add a volumeMount to the tokenizer container that mounts
/mnt/models using the same model volume name used by the main model container
(i.e., the main container's model volume name, not tokenizer-uds) so the sidecar
can read the model files; update the tokenizer container's volumeMounts to
include mountPath: /mnt/models with the main model volume's name and ensure the
corresponding volume is present in the pod spec.
- Line 654: WVA_VERSION is defined but never used; either remove the unused
WVA_VERSION=v0.5.1 assignment or wire it into the install flow by replacing any
hard-coded WVA version strings with the variable (e.g., use "${WVA_VERSION}"
when building image tags, helm chart values, or manifest sed substitutions) and
export it if subprocesses need it; update any related manifest/template
substitution calls that currently hardcode the version so they reference
WVA_VERSION instead.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 3729-3779: The tokenizer sidecar sets TOKENIZERS_DIR=/mnt/models
and workingDir: /mnt/models but only mounts the tokenizer-uds volume; update the
tokenizer container's volumeMounts to include the shared models volume used by
the main model container (mount /mnt/models with the same volume name as the
main model pod spec) so the sidecar can access the models at /mnt/models (ensure
the shared volume name matches the pod's volumes entry).

In `@Makefile`:
- Line 323: The test Makefile target currently hardcodes the timeout and package
list in the KUBEBUILDER_ASSETS command, ignoring the TEST_TIMEOUT and TEST_PKGS
overrides; update the test target (the rule that invokes KUBEBUILDER_ASSETS and
go test) to use the TEST_TIMEOUT variable (fallback to the previous 30m if
empty) and TEST_PKGS variable (fallback to "$(go list ./pkg/...) ./cmd/..." if
empty) instead of the hardcoded values so make test TEST_TIMEOUT=...
TEST_PKGS="..." works as intended.

---

Nitpick comments:
In `@charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml`:
- Around line 154-306: The RoCE inference bash block (starting with DP_ADDRESS
resolution and guarded by KSERVE_INFER_ROCE) is duplicated across multiple
templates (e.g., the snippet that populates DP_ADDRESS, active_hcas/ucx_hcas
logic, gid_index_count/hca_gid_index handling and NCCL_IB_GID_INDEX export) —
extract that entire script into a single shared executable file delivered via a
ConfigMap (or built into an initContainer) and mount it into the pod (e.g.,
/opt/scripts/infer_roce.sh), then replace the duplicated inline blocks in each
template (those that currently reference DP_ADDRESS and check KSERVE_INFER_ROCE)
with a simple call to source/execute the shared script before running the vllm
serve command; ensure the shared script exports DP_ADDRESS, NCCL_IB_HCA,
NVSHMEM_HCA_LIST, UCX_NET_DEVICES, and *_GID_INDEX variables so downstream code
(vllm serve invocation) continues to work.

In
`@hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh`:
- Around line 1119-1123: The cleanup of the temporary overlay (rm -rf
"${REPO_ROOT}/config/overlays/temp") is currently only executed on the success
path guarded by KSERVE_OVERLAY_DIR check; move this removal into an exit trap so
it always runs on script exit (success or failure). Add a trap handler near
script initialization that checks if KSERVE_OVERLAY_DIR equals "temp" and if so
removes "${REPO_ROOT}/config/overlays/temp" and logs via log_info/log_error,
then register it with trap 'handler' EXIT (and optionally INT TERM) so the temp
overlay cannot leak after early failures; ensure the existing in-place
success-path removal is removed to avoid double-run.

In `@hack/setup/quick-install/llmisvc-full-install-with-manifests.sh`:
- Around line 651-654: The WVA_VERSION variable is hard-coded
(WVA_VERSION=v0.5.1); instead remove this duplication and load it from the repo
dependency env (kserve-deps.env). Update the script to source or parse
kserve-deps.env (the file that defines WVA_VERSION) before the variables are
used so the script uses the canonical WVA_VERSION value; specifically remove or
replace the WVA_VERSION assignment in the block that sets
LWS_VERSION/GATEWAY_API_VERSION/GIE_VERSION and ensure the script references the
sourced WVA_VERSION variable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 8e8151de-4974-445e-bf5d-f9ea75b8624a

📥 Commits

Reviewing files that changed from the base of the PR and between d002db7 and 785d73b.

📒 Files selected for processing (8)
  • .github/workflows/precommit-check.yml
  • Makefile
  • charts/kserve-llmisvc-resources/files/llmisvc/resources.yaml
  • charts/kserve-runtime-configs/files/llmisvcconfigs/resources.yaml
  • config/rbac/llmisvc/role.yaml
  • hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/llmisvc-full-install-with-manifests.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/precommit-check.yml

Signed-off-by: Jooho Lee <jlee@redhat.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

♻️ Duplicate comments (2)
hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh (2)

3834-3837: ⚠️ Potential issue | 🟠 Major

Cap the tokenizer sidecar's compute and socket storage (CWE-400/CWE-770).

Severity: Medium. Exploit scenario: a noisy or malformed workload can push the tokenizer sidecar into pod eviction or node pressure because Line 3834-Line 3837 set requests only and Line 3869-Line 3870 create an unlimited emptyDir. Add container limits and an emptyDir.sizeLimit.

Hardening patch
           resources:
             requests:
               cpu: 256m
               memory: 500Mi
+            limits:
+              cpu: 500m
+              memory: 1Gi

-        - emptyDir: {}
+        - emptyDir:
+            sizeLimit: 64Mi
          name: tokenizer-uds

As per coding guidelines, REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code) and 4. Performance problems.

Also applies to: 3869-3870

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
around lines 3834 - 3837, The tokenizer sidecar currently only defines resource
requests and an unlimited emptyDir causing possible eviction or node-pressure;
add resource limits (e.g., cpu and memory) alongside the existing requests in
the tokenizer container spec (look for the container named or defined as the
tokenizer sidecar where resources.requests: cpu: 256m, memory: 500Mi is set) and
set appropriate limits (e.g., cpu: 500m, memory: 1Gi). Also locate the emptyDir
volume used by that sidecar (the volume declared where emptyDir: is present
around the referenced lines) and add an emptyDir.sizeLimit (e.g., "1Gi") to cap
socket storage. Ensure both changes are applied to the same pod spec so the
container has both requests and limits and its emptyDir is size-limited.

1528-1531: ⚠️ Potential issue | 🟠 Major

Guard the recursive cleanup target before rm -rf (CWE-73).

Severity: Medium. Exploit scenario: if REPO_ROOT is empty or overridden, Line 1530 can recurse into an unintended path tree during cleanup. Validate the target and fail closed before deleting it.

Hardening patch
     # Cleanup temporary overlay after all resources are installed
     if [ "${KSERVE_OVERLAY_DIR}" = "temp" ]; then
-        rm -rf "${REPO_ROOT}/config/overlays/temp"
-        log_info "Temporary overlay directory cleaned up"
+        repo_root="${REPO_ROOT:-}"
+        cleanup_dir="${repo_root}/config/overlays/temp"
+        if [ -n "${repo_root}" ] && [ "${repo_root}" != "/" ] && [ -d "${cleanup_dir}" ]; then
+            rm -rf -- "${cleanup_dir}"
+            log_info "Temporary overlay directory cleaned up"
+        else
+            log_error "Refusing to delete unexpected cleanup path: ${cleanup_dir}"
+            return 1
+        fi
     fi

As per coding guidelines, REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code) and 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`
around lines 1528 - 1531, The cleanup currently calls rm -rf
"${REPO_ROOT}/config/overlays/temp" without validating REPO_ROOT, which can
delete unintended paths; before removal in the block guarded by
KSERVE_OVERLAY_DIR, resolve and validate the target path (e.g., compute the
absolute path for "${REPO_ROOT}/config/overlays/temp" using realpath or
equivalent), ensure REPO_ROOT is non-empty and that the resolved path strictly
equals or is a subpath of the expected repository config overlay directory, and
abort (log error and do not run rm -rf) if validation fails; reference
KSERVE_OVERLAY_DIR and REPO_ROOT in your checks and only perform rm -rf after
these guards.
🧹 Nitpick comments (6)
charts/kserve-runtime-configs/README.md (1)

14-14: Fix MD014 in console snippet by dropping the $ prompt.

Line 14 includes $ in a command block without output; this trips markdownlint MD014. Remove $ or include command output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-runtime-configs/README.md` at line 14, The Markdown snippet
containing the shell command "helm install kserve-runtime-configs
oci://ghcr.io/kserve/charts/kserve-runtime-configs --set
kserve.servingruntime.enabled=true --set kserve.llmisvcConfigs.enabled=true
--version v0.17.0" triggers MD014 because it includes a "$" prompt without
output; remove the leading "$" from that code block line (or alternatively add
expected command output) so the block contains only the command text and no
prompt.
charts/kserve-crd/README.md (1)

12-12: Remove prompt prefix in fenced console command to satisfy MD014.

Line 12 uses a leading $ without output, which triggers markdownlint MD014. Remove the prompt prefix or include output lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/kserve-crd/README.md` at line 12, The fenced code block contains a
prompt prefix ("$ ") before the helm command which triggers markdownlint MD014;
remove the leading "$ " from the command string (the line starting with "$ helm
install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd --version v0.17.0") so
it becomes a plain code line ("helm install kserve-crd ...") or alternatively
add command output lines if you intentionally want to keep the prompt.
pkg/controller/v1alpha2/llmisvc/router_discovery_migration_test.go (1)

130-183: Cover the controller-status ordering case.

These cases only exercise the Gateway entry when it appears before the Kuadrant entry. In pkg/controller/v1alpha2/llmisvc/router_discovery.go, Lines 720-730 still inspect route.Status.Parents[0], so the answer changes if the gateway controller writes second. Add the reversed-order variant here to lock that down.

Proposed test addition
+		{
+			name: "route with Gateway status after Kuadrant still uses Gateway result",
+			route: HTTPRoute("test-route",
+				InNamespace[*gwapiv1.HTTPRoute]("test-ns"),
+				WithParentRef(GatewayRef("test-gateway", RefInNamespace("test-ns"))),
+				WithHTTPRule(
+					WithBackendRefs(BackendRefInferencePool("my-pool")),
+				),
+				WithHTTPRouteMultipleControllerStatus(
+					GatewayRef("test-gateway", RefInNamespace("test-ns")),
+					KuadrantControllerStatus,
+					GatewayAPIControllerStatusInvalidKind,
+				),
+			),
+			expected: metav1.ConditionFalse,
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/v1alpha2/llmisvc/router_discovery_migration_test.go` around
lines 130 - 183, Add mirrored test cases to
TestIsInferencePoolSupportedWithMultipleParents that cover when the Kuadrant
controller entry appears before the Gateway entry to ensure ordering doesn't
affect outcome; for each existing case (ResolvedRefs=True, ResolvedRefs=False
InvalidKind, and only Kuadrant/Unknown) create a variant that calls
WithHTTPRouteMultipleControllerStatus with the Kuadrant controller status first
and the Gateway controller status second (keeping the same GatewayRef and status
values), then assert the same expected metav1.ConditionStatus so
router_discovery.go's behavior against route.Status.Parents ordering is locked
down.
hack/setup/quick-install/kserve-standard-mode-full-install-helm.sh (1)

654-654: WVA_VERSION is introduced but not consumed in this install path.

Line 654 adds another unreferenced pin, which increases version bookkeeping without enforcement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-helm.sh` at line
654, The new WVA_VERSION variable is declared but unused; either remove the
WVA_VERSION=v0.5.1 line or wire it into the Helm install flow so the value is
enforced (for example use WVA_VERSION when constructing chart image tags, chart
version flags, or --set values passed to helm install/upgrade). Update
references in the script parts that deploy the WVA component (search for helm
install/upgrade invocations or image tag variables like IMAGE_TAG,
CHART_VERSION, or similar) to consume WVA_VERSION, or delete the unused variable
to avoid stale version bookkeeping.
hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh (1)

654-654: WVA_VERSION remains unconsumed here.

Line 654 adds version state that this script does not apply anywhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh` at line
654, The variable WVA_VERSION is declared but never used; either remove the
unused WVA_VERSION declaration or apply it where a Westworld/VisionAgent (WVA)
version is currently hardcoded—search for occurrences of the intended WVA
install commands or chart installs (look for any helm install/upgrade, kubectl
apply, or image tags related to WVA) and replace hardcoded version strings with
the WVA_VERSION variable (or export it) so the value is actually consumed, or
delete the WVA_VERSION line if no use is required.
hack/setup/quick-install/keda-dependency-install.sh (1)

647-654: Remove dead version pins from this generated dependency-only script.

Line 647 and Line 654 are not consumed by this installer path. Keeping unreferenced pins here increases drift risk and implies dependency control that this script does not enforce.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/setup/quick-install/keda-dependency-install.sh` around lines 647 - 654,
Remove the dead/unreferenced version pins KSERVE_VERSION and WVA_VERSION from
this generated dependency-only script: delete the KSERVE_VERSION and WVA_VERSION
variable assignments (they are not consumed by this installer path) and, if
these values are still needed elsewhere, move their definitions into the actual
installer or the real consumer script rather than keeping them in this
dependency-only file; ensure no remaining references to KSERVE_VERSION or
WVA_VERSION exist in this file after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e-test-llmisvc.yaml:
- Around line 38-46: The workflow currently injects untrusted event data
directly into shell commands using `${{ github.event.pull_request.base.ref }}`
(seen in the Merge target branch step used by detect-changes,
llmisvc-image-build and test-llmisvc), which allows script injection; fix by
moving the PR base ref into a workflow env (e.g. set env: BASE_REF: ${{
github.event.pull_request.base.ref }} in each job) and then reference that env
inside the run block as a quoted single argument (e.g. git fetch origin
"$BASE_REF" && git merge --no-edit -- "origin/$BASE_REF") so the ref is not
subject to shell tokenization or command injection; apply the same change to all
three occurrences.

In @.github/workflows/e2e-test.yml:
- Around line 38-45: The "Merge target branch" steps directly interpolate `${{
github.event.pull_request.base.ref }}` into run shells (script injection risk);
update each occurrence (e.g., the "Merge target branch" step) to stop direct
interpolation by either moving the fetch/merge logic into a reusable composite
action that accepts the base ref as an input or by passing the ref through an
environment variable (e.g., env: BASE_REF: ${{
github.event.pull_request.base.ref }}) and then referencing the safe shell
variable `$BASE_REF` inside the run block with proper quoting; apply this change
to all 17 affected jobs that use `${{ github.event.pull_request.base.ref }}` so
no event data is injected directly into run commands.

In @.github/workflows/kserve-llmisvc-controller-docker-publish.yml:
- Around line 41-49: The step "Merge target branch" is injecting untrusted event
data directly into a run: shell command via the token ${{
github.event.pull_request.base.ref }}, causing a script injection risk; replace
this by removing direct interpolation in the run block and either (a) use
actions/checkout@v3 with the ref input to fetch/checkout the target branch (use
the ref input rather than embedding into a shell command), or (b) assign the
base ref to an env variable (e.g. BASE_REF: ${{
github.event.pull_request.base.ref }}) and in the run block reference that
variable only after validating/sanitizing it (allowlist chars like
[A-Za-z0-9._/-]) and then call git merge --no-edit "origin/$BASE_REF" to ensure
the branch name is not interpreted as shell code; update the "Merge target
branch" step and the git merge invocation accordingly.

In @.github/workflows/python-test.yml:
- Around line 33-41: The "Merge target branch" step currently interpolates
github.event.pull_request.base.ref directly into the run block (git merge
--no-edit origin/${{ github.event.pull_request.base.ref }}) which is a
script-injection risk; fix it by removing direct interpolation from the shell
and instead use a safer action to check out/merge the base ref (e.g., replace
the run block with an actions/checkout step that sets ref: ${{
github.event.pull_request.base.ref }} or perform the merge via a dedicated
action), or if you must run shell, validate and canonicalize the ref against an
allowlist and pass it as an action input/output (not inline) before using it in
git merge; update the "Merge target branch" step and remove the direct use of
github.event.pull_request.base.ref inside the run commands (git fetch
--unshallow origin, git merge --no-edit origin/..., etc.).

In `@charts/kserve-llmisvc-crd-minimal/README.md`:
- Line 12: Remove the shell prompt prefix from the command example so the
markdown code block contains only the executable command (replace "`$ helm
install kserve-llmisvc-crd-minimal
oci://ghcr.io/kserve/charts/kserve-llmisvc-crd-minimal --version v0.17.0`" with
the same text but without the leading `$`), ensuring the README's command block
complies with MD014.

In `@charts/kserve-localmodel-crd/README.md`:
- Line 12: The README's console code block contains a shell prompt prefix ("$")
before the helm command (`$ helm install kserve-localmodel-crd
oci://ghcr.io/kserve/charts/kserve-localmodel-crd --version v0.17.0`) which
triggers markdownlint MD014; remove the leading "$ " so the block contains only
the raw command (`helm install kserve-localmodel-crd
oci://ghcr.io/kserve/charts/kserve-localmodel-crd --version v0.17.0`) or switch
the fence to a shell-labeled block with prompt handling, ensuring the code block
shows the command without the prompt prefix.

In `@hack/setup/quick-install/kserve-knative-mode-dependency-install.sh`:
- Line 1152: The script currently forces pilot to enable
ENABLE_GATEWAY_API_INFERENCE_EXTENSION by always passing --set
pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true; change this to be
configurable with a default of false by introducing a shell variable (e.g.,
ENABLE_GATEWAY_API_INFERENCE_EXTENSION="${ENABLE_GATEWAY_API_INFERENCE_EXTENSION:-false}")
and only append --set pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true to
the helm/installation args when that variable is explicitly set to "true", so
the default behavior matches Istio's default off state and operators can opt-in.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh`:
- Line 1167: Remove the hard-coded Istio feature-gate flag (--set
pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true) and make inclusion
conditional on an environment variable so the extension remains opt-in; i.e.,
check the env var ENABLE_GATEWAY_API_INFERENCE_EXTENSION in the install script
and only append the "--set
pilot.env.ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true" argument to the helm
command when that env var is explicitly set to "true" (otherwise omit it),
ensuring the default behavior remains unchanged.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3811-3813: The tokenizer sidecar (container named tokenizer-uds)
sets TOKENIZERS_DIR=/mnt/models and workingDir=/mnt/models but does not mount
the same model volume, so it cannot access model artifacts; update the pod spec
to add the same volumeMount (the /mnt/models mount used by the main model
container) to the tokenizer-uds container so TOKENIZERS_DIR and workingDir point
to an actually mounted volume, ensuring the sidecar has the identical
volumeMount path as the main container.

In `@pkg/controller/v1alpha2/llmisvc/router.go`:
- Around line 552-563: The current referenced-pool lookup hard-codes
igwapi.InferencePool and returns an error if that single-version fetch fails;
change the block that handles llmSvc.Spec.Router.Scheduler.Pool.Ref to use the
same dual-version resolution used elsewhere (try apis/v1 then apis/v1alpha2 or
the helper that resolves both) instead of directly instantiating
igwapi.InferencePool; if the dual-version lookup finds the resource, call
r.evaluateSingleInferencePoolCondition(ctx, llmSvc, <resolvedObject>) as before,
otherwise mark the error via
llmSvc.MarkInferencePoolNotReady("InferencePoolFetchError", err.Error()) and
return the error so the reconcile loop remains safe and consistent with
managed-pool handling.
- Around line 590-595: The readiness check currently marks the inference pool
ready if either v1Ready or v1alpha2Ready is true, which is incorrect once the
route is migrated to v1; change the logic so that if the route is migrated
(check the serving.kserve.io/inference-pool-migrated annotation or use an
existing helper like llmSvc.IsMigratedToV1()/llmSvc.ActiveBackendVersion),
require v1Ready only, otherwise keep the current OR behavior; update the block
around v1Ready/v1alpha2Ready and the call to llmSvc.MarkInferencePoolReady() to
gate on that migration check.

---

Duplicate comments:
In `@hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh`:
- Around line 3834-3837: The tokenizer sidecar currently only defines resource
requests and an unlimited emptyDir causing possible eviction or node-pressure;
add resource limits (e.g., cpu and memory) alongside the existing requests in
the tokenizer container spec (look for the container named or defined as the
tokenizer sidecar where resources.requests: cpu: 256m, memory: 500Mi is set) and
set appropriate limits (e.g., cpu: 500m, memory: 1Gi). Also locate the emptyDir
volume used by that sidecar (the volume declared where emptyDir: is present
around the referenced lines) and add an emptyDir.sizeLimit (e.g., "1Gi") to cap
socket storage. Ensure both changes are applied to the same pod spec so the
container has both requests and limits and its emptyDir is size-limited.
- Around line 1528-1531: The cleanup currently calls rm -rf
"${REPO_ROOT}/config/overlays/temp" without validating REPO_ROOT, which can
delete unintended paths; before removal in the block guarded by
KSERVE_OVERLAY_DIR, resolve and validate the target path (e.g., compute the
absolute path for "${REPO_ROOT}/config/overlays/temp" using realpath or
equivalent), ensure REPO_ROOT is non-empty and that the resolved path strictly
equals or is a subpath of the expected repository config overlay directory, and
abort (log error and do not run rm -rf) if validation fails; reference
KSERVE_OVERLAY_DIR and REPO_ROOT in your checks and only perform rm -rf after
these guards.

---

Nitpick comments:
In `@charts/kserve-crd/README.md`:
- Line 12: The fenced code block contains a prompt prefix ("$ ") before the helm
command which triggers markdownlint MD014; remove the leading "$ " from the
command string (the line starting with "$ helm install kserve-crd
oci://ghcr.io/kserve/charts/kserve-crd --version v0.17.0") so it becomes a plain
code line ("helm install kserve-crd ...") or alternatively add command output
lines if you intentionally want to keep the prompt.

In `@charts/kserve-runtime-configs/README.md`:
- Line 14: The Markdown snippet containing the shell command "helm install
kserve-runtime-configs oci://ghcr.io/kserve/charts/kserve-runtime-configs --set
kserve.servingruntime.enabled=true --set kserve.llmisvcConfigs.enabled=true
--version v0.17.0" triggers MD014 because it includes a "$" prompt without
output; remove the leading "$" from that code block line (or alternatively add
expected command output) so the block contains only the command text and no
prompt.

In `@hack/setup/quick-install/keda-dependency-install.sh`:
- Around line 647-654: Remove the dead/unreferenced version pins KSERVE_VERSION
and WVA_VERSION from this generated dependency-only script: delete the
KSERVE_VERSION and WVA_VERSION variable assignments (they are not consumed by
this installer path) and, if these values are still needed elsewhere, move their
definitions into the actual installer or the real consumer script rather than
keeping them in this dependency-only file; ensure no remaining references to
KSERVE_VERSION or WVA_VERSION exist in this file after removal.

In `@hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh`:
- Line 654: The variable WVA_VERSION is declared but never used; either remove
the unused WVA_VERSION declaration or apply it where a Westworld/VisionAgent
(WVA) version is currently hardcoded—search for occurrences of the intended WVA
install commands or chart installs (look for any helm install/upgrade, kubectl
apply, or image tags related to WVA) and replace hardcoded version strings with
the WVA_VERSION variable (or export it) so the value is actually consumed, or
delete the WVA_VERSION line if no use is required.

In `@hack/setup/quick-install/kserve-standard-mode-full-install-helm.sh`:
- Line 654: The new WVA_VERSION variable is declared but unused; either remove
the WVA_VERSION=v0.5.1 line or wire it into the Helm install flow so the value
is enforced (for example use WVA_VERSION when constructing chart image tags,
chart version flags, or --set values passed to helm install/upgrade). Update
references in the script parts that deploy the WVA component (search for helm
install/upgrade invocations or image tag variables like IMAGE_TAG,
CHART_VERSION, or similar) to consume WVA_VERSION, or delete the unused variable
to avoid stale version bookkeeping.

In `@pkg/controller/v1alpha2/llmisvc/router_discovery_migration_test.go`:
- Around line 130-183: Add mirrored test cases to
TestIsInferencePoolSupportedWithMultipleParents that cover when the Kuadrant
controller entry appears before the Gateway entry to ensure ordering doesn't
affect outcome; for each existing case (ResolvedRefs=True, ResolvedRefs=False
InvalidKind, and only Kuadrant/Unknown) create a variant that calls
WithHTTPRouteMultipleControllerStatus with the Kuadrant controller status first
and the Gateway controller status second (keeping the same GatewayRef and status
values), then assert the same expected metav1.ConditionStatus so
router_discovery.go's behavior against route.Status.Parents ordering is locked
down.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: d282827d-6766-4b72-8c3f-c39a228b587f

📥 Commits

Reviewing files that changed from the base of the PR and between 785d73b and a58a44e.

⛔ Files ignored due to path filters (16)
  • python/aiffairness/uv.lock is excluded by !**/*.lock
  • python/artexplainer/uv.lock is excluded by !**/*.lock
  • python/custom_model/uv.lock is excluded by !**/*.lock
  • python/custom_tokenizer/uv.lock is excluded by !**/*.lock
  • python/custom_transformer/uv.lock is excluded by !**/*.lock
  • python/huggingfaceserver/uv.lock is excluded by !**/*.lock
  • python/kserve/uv.lock is excluded by !**/*.lock
  • python/lgbserver/uv.lock is excluded by !**/*.lock
  • python/paddleserver/uv.lock is excluded by !**/*.lock
  • python/pmmlserver/uv.lock is excluded by !**/*.lock
  • python/predictiveserver/uv.lock is excluded by !**/*.lock
  • python/sklearnserver/uv.lock is excluded by !**/*.lock
  • python/storage/uv.lock is excluded by !**/*.lock
  • python/test_resources/graph/error_404_isvc/uv.lock is excluded by !**/*.lock
  • python/test_resources/graph/success_200_isvc/uv.lock is excluded by !**/*.lock
  • python/xgbserver/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (75)
  • .github/workflows/agent-docker-publish.yml
  • .github/workflows/e2e-test-llmisvc.yaml
  • .github/workflows/e2e-test.yml
  • .github/workflows/go.yml
  • .github/workflows/kserve-controller-docker-publish.yml
  • .github/workflows/kserve-llmisvc-controller-docker-publish.yml
  • .github/workflows/kserve-localmodel-agent-docker-publish.yml
  • .github/workflows/kserve-localmodel-controller-docker-publish.yml
  • .github/workflows/precommit-check.yml
  • .github/workflows/predictiveserver-docker-publish.yml
  • .github/workflows/python-test.yml
  • .github/workflows/router-docker-publish.yml
  • .github/workflows/scheduled-go-security-scan.yml
  • .github/workflows/storage-initializer-docker-publisher.yml
  • charts/_common/README.md
  • charts/_common/common-sections.yaml
  • charts/kserve-crd-minimal/Chart.yaml
  • charts/kserve-crd-minimal/README.md
  • charts/kserve-crd/Chart.yaml
  • charts/kserve-crd/README.md
  • charts/kserve-llmisvc-crd-minimal/Chart.yaml
  • charts/kserve-llmisvc-crd-minimal/README.md
  • charts/kserve-llmisvc-crd/Chart.yaml
  • charts/kserve-llmisvc-crd/README.md
  • charts/kserve-llmisvc-resources/Chart.yaml
  • charts/kserve-llmisvc-resources/README.md
  • charts/kserve-llmisvc-resources/values.yaml
  • charts/kserve-localmodel-crd-minimal/Chart.yaml
  • charts/kserve-localmodel-crd-minimal/README.md
  • charts/kserve-localmodel-crd/Chart.yaml
  • charts/kserve-localmodel-crd/README.md
  • charts/kserve-localmodel-resources/Chart.yaml
  • charts/kserve-localmodel-resources/README.md
  • charts/kserve-localmodel-resources/values.yaml
  • charts/kserve-resources/Chart.yaml
  • charts/kserve-resources/README.md
  • charts/kserve-resources/values.yaml
  • charts/kserve-runtime-configs/Chart.yaml
  • charts/kserve-runtime-configs/README.md
  • charts/kserve-runtime-configs/values.yaml
  • docs/samples/explanation/alibi/alibiexplainer/pyproject.toml
  • hack/release/generate-install.sh
  • hack/setup/quick-install/keda-dependency-install.sh
  • hack/setup/quick-install/kserve-knative-mode-dependency-install.sh
  • hack/setup/quick-install/kserve-knative-mode-full-install-helm.sh
  • hack/setup/quick-install/kserve-knative-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/kserve-standard-mode-dependency-install.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-helm.sh
  • hack/setup/quick-install/kserve-standard-mode-full-install-with-manifests.sh
  • hack/setup/quick-install/llmisvc-dependency-install.sh
  • hack/setup/quick-install/llmisvc-full-install-helm.sh
  • hack/setup/quick-install/llmisvc-full-install-with-manifests.sh
  • kserve-deps.env
  • pkg/controller/v1alpha2/llmisvc/fixture/gwapi_builders.go
  • pkg/controller/v1alpha2/llmisvc/router.go
  • pkg/controller/v1alpha2/llmisvc/router_discovery.go
  • pkg/controller/v1alpha2/llmisvc/router_discovery_migration_test.go
  • pkg/controller/v1alpha2/llmisvc/router_pool_readiness_test.go
  • python/VERSION
  • python/aiffairness/pyproject.toml
  • python/artexplainer/pyproject.toml
  • python/custom_model/pyproject.toml
  • python/custom_tokenizer/pyproject.toml
  • python/custom_transformer/pyproject.toml
  • python/huggingfaceserver/pyproject.toml
  • python/kserve/pyproject.toml
  • python/lgbserver/pyproject.toml
  • python/paddleserver/pyproject.toml
  • python/pmmlserver/pyproject.toml
  • python/predictiveserver/pyproject.toml
  • python/sklearnserver/pyproject.toml
  • python/storage/pyproject.toml
  • python/test_resources/graph/error_404_isvc/pyproject.toml
  • python/test_resources/graph/success_200_isvc/pyproject.toml
  • python/xgbserver/pyproject.toml
✅ Files skipped from review due to trivial changes (7)
  • charts/_common/README.md
  • charts/kserve-runtime-configs/values.yaml
  • charts/kserve-localmodel-resources/values.yaml
  • python/custom_model/pyproject.toml
  • charts/kserve-llmisvc-crd-minimal/Chart.yaml
  • charts/kserve-resources/values.yaml
  • python/custom_tokenizer/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (8)
  • .github/workflows/go.yml
  • .github/workflows/router-docker-publish.yml
  • .github/workflows/agent-docker-publish.yml
  • kserve-deps.env
  • .github/workflows/kserve-localmodel-controller-docker-publish.yml
  • .github/workflows/storage-initializer-docker-publisher.yml
  • .github/workflows/kserve-localmodel-agent-docker-publish.yml
  • .github/workflows/predictiveserver-docker-publish.yml

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/overlays/odh/params.env`:
- Around line 7-8: Replace the mutable-tag image references with digest-pinned
image URIs to prevent supply-chain drift: for the
kserve-llm-d-inference-scheduler and kserve-llm-d-routing-sidecar keys, lookup
the exact image digest for
quay.io/opendatahub/llm-d-inference-scheduler:release-v0.6 and
quay.io/opendatahub/llm-d-routing-sidecar:release-v0.6 (via quay API, skopeo
inspect, or docker pull + docker inspect) and update the values to the canonical
form quay.io/...@sha256:<digest>; while editing, consider doing the same for the
other keys that use :latest to ensure all images are digest-pinned for
reproducible deployments.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e5a0704e-afff-414c-89f3-44b70fb5b8fd

📥 Commits

Reviewing files that changed from the base of the PR and between a58a44e and 9b87773.

📒 Files selected for processing (1)
  • config/overlays/odh/params.env

Upstream commit ba94e0b (PR kserve#5121) upgraded llm-d to v0.6 and introduced
the 'always-disagg-pd-decider' plugin in the scheduler config. The ODH
params.env was still pinned to release-v0.4 which does not have this
plugin, causing the scheduler to fail with:
  "plugin type 'always-disagg-pd-decider' is not found in registry"
@spolti spolti force-pushed the sync-260313_kserve-upstream-odh branch from 9b87773 to f9b3a42 Compare March 14, 2026 03:33
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@pierDipi pierDipi force-pushed the sync-260313_kserve-upstream-odh branch from 4d5bff0 to f60f842 Compare March 16, 2026 19:52
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@spolti spolti added do-not-merge/hold tide/merge-method-merge Change lgtm to use a merge commit labels Mar 16, 2026
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@pierDipi
Copy link
Member

/lgtm

Copy link

@Jooho Jooho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jooho, spolti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: New/Backlog

Development

Successfully merging this pull request may close these issues.

6 participants