Skip to content

Sync upstream to odh march 18#143

Open
Gregory-Pereira wants to merge 42 commits intoopendatahub-io:mainfrom
llm-d:sync-upstream-to-odh-march-18
Open

Sync upstream to odh march 18#143
Gregory-Pereira wants to merge 42 commits intoopendatahub-io:mainfrom
llm-d:sync-upstream-to-odh-march-18

Conversation

@Gregory-Pereira
Copy link
Member

@Gregory-Pereira Gregory-Pereira commented Mar 18, 2026

Dupe of #129 but from a non main branch so I can force-push to it to resolve any merge conflicts

Summary by CodeRabbit

  • New Features

    • Added external tokenizer plugin support for end-to-end request tokenization
    • Enabled Encoder-Prefiller-Decoder (EPD) workflow with multimodal input handling
    • Integrated OpenTelemetry distributed tracing for enhanced observability
    • Added SSRF protection with allowlist validation for prefill and encoder targets
  • Improvements

    • Refactored configuration handling with enhanced CLI options and validation
    • Updated health check endpoints and improved deployment configurations
    • Upgraded GitHub Actions workflows and CI/CD pipelines for better automation
    • Enhanced connector protocol support with improved logging and error handling

yangligt2 and others added 30 commits February 19, 2026 19:28
* Fix panic in SGLang proxy handling of concurrent requests

Signed-off-by: YANG LI <yangligt@google.com>

* Add concurrency unit test for SGLang context logic

Signed-off-by: YANG LI <yangligt@google.com>

---------

Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing

    Add centralized telemetry package and custom spans
    following the llm-d distributed tracing proposal.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* update Dockerfile.sidecar

Signed-off-by: sallyom <somalley@redhat.com>

* tracing: remove extra success results & startup spans and cleanup

Signed-off-by: sallyom <somalley@redhat.com>

* fix: avoid os.Exit bypassing defer in main

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix: address review nits for tracing PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* test: add edge case tests for StripScheme

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* remove extra comments from sidecar spans

Signed-off-by: sallyom <somalley@redhat.com>

* fix lint error

Signed-off-by: sallyom <somalley@redhat.com>

* protect against segfault on tests

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#662)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.65.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (#664)

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` |


Updates `k8s.io/api` from 0.34.4 to 0.34.5
- [Commits](kubernetes/api@v0.34.4...v0.34.5)

Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5)

Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5
- [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5)

Updates `k8s.io/client-go` from 0.34.4 to 0.34.5
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.4...v0.34.5)

Updates `k8s.io/component-base` from 0.34.4 to 0.34.5
- [Commits](kubernetes/component-base@v0.34.4...v0.34.5)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#674)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0)

Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.66.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/lycheeverse/lychee-action/releases)
- [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: lycheeverse/lychee-action
  dependency-version: 2.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches

Build and push -dev variants of EPP and sidecar container images
on pushes to main and release-* branches, tagged with commit SHA.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* ci: extract reusable build workflow and tag dev images by branch

Refactor ci-release and ci-dev to call a shared ci-build-images
reusable workflow, reducing duplication. Tag dev images with the
branch name instead of commit SHA so each branch has exactly one
image that gets overwritten on push, avoiding image accumulation.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Newlines at EOF

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.43.5 to 1.44.0.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.43.5...v1.44.0)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The v0.44.1 release was removed from GitHub, causing 404 errors
in the trivy-scan action. Update to the latest available version.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
* Allow sidecar server to reload TLS certificates

Enables TLS certificates to be rotated without restarting
sidecar and vLLM deployments.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Pass certPath to reloader

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Improvements

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps the go-dependencies group with 7 updates:

| Package | From | To |
| --- | --- | --- |
| [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.66.0` | `0.67.0` |
| [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` |
| [golang.org/x/sync](https://github.com/golang/sync) | `0.19.0` | `0.20.0` |
| [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `1.79.1` | `1.79.2` |


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.66.0 to 0.67.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.66.0...zpages/v0.67.0)

Updates `go.opentelemetry.io/otel` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `go.opentelemetry.io/otel/trace` from 1.41.0 to 1.42.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0)

Updates `golang.org/x/sync` from 0.19.0 to 0.20.0
- [Commits](golang/sync@v0.19.0...v0.20.0)

Updates `google.golang.org/grpc` from 1.79.1 to 1.79.2
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.79.1...v1.79.2)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.67.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: golang.org/x/sync
  dependency-version: 0.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…gs (#682)

* feat(sidecar): simplify TLS command line options with StringSlice flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* keep deprecated flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* Address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar

Signed-off-by: roytman <roytman@il.ibm.com>

* add Encoding span.SetAttributes, fix bug in connector_sglang_test.go

Signed-off-by: roytman <roytman@il.ibm.com>

* fix e2e tests

Signed-off-by: roytman <roytman@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_shared_storage.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>

* flags order

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

* remove redundant comment

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
roytman and others added 11 commits March 15, 2026 15:52
* Check for uniqueness of media URLs

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* Implement Options pattern for sidecar proxy

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* options.go and options_test.go review fixes and code enhancements

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* enhance tests, enhance options pattern implementation

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* simplify InferencePool flag to namespace/name format - PR #685

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* code enhancements, resolve review comments

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* set default connector and KVConnector and migrate connector flag to KVConnector

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* add complete before validate in tests, align KVConnector and Connector initialization as before

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

---------

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3 to 4.
- [Release notes](https://github.com/dorny/paths-filter/releases)
- [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md)
- [Commits](dorny/paths-filter@v3...v4)

---
updated-dependencies:
- dependency-name: dorny/paths-filter
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…red (#691)

* NonCachedTokens defines the minimum number of non-cached tokens required to trigger disaggregated PD. A value of 0 disables disaggregation.

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

* updated the signature

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

* updated logging to Construtcor

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

---------

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>
…support (#694)

* Bump kv-cache and GAIE

This is needed to consume the new changes related to consuming tokens
directly.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Update import paths for GAIE and kv-cache API changes

Adapt import paths for packages renamed in GAIE:
- pkg/common/util/logging -> pkg/common/observability/logging
- pkg/epp/util/metrics -> pkg/common/observability/metrics
- pkg/epp/datalayer/http -> pkg/epp/framework/plugins/datalayer/source/http
- pkg/epp/datalayer/plugins/approximateprefix -> pkg/epp/framework/plugins/datalayer/attribute/prefix

Also adapt to kv-cache API change where NewHTTPDataSource now
returns (ds, error) instead of just ds.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Adapt precise-prefix-cache-scorer to kv-cache API changes

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add external tokenizer PrepareData plugin and TokenizedPrompt scorer path

Add a PrepareData plugin that calls a tokenizer sidecar over UDS gRPC
to tokenize prompts before scheduling. The plugin populates
request.TokenizedPrompt with token IDs from the sidecar. It supports
both completions (Render) and chat completions (RenderChat) requests.
The plugin is fail-open: tokenization errors are logged and the request
proceeds without TokenizedPrompt.

Update precise-prefix-cache-scorer to use pre-tokenized input when
TokenizedPrompt is present on the request, calling
GetPodScoresFromTokens to skip internal tokenization. Falls back to
existing internal tokenization path when TokenizedPrompt is not set.

Register the new tokenizer plugin in RegisterAllPlugins.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add deployment config and kind environment for external tokenizer

Add EPP config with prepareDataPlugins feature gate and tokenizer
plugin pointing at /tmp/tokenizer/tokenizer-uds.socket.

Update kind-dev-env.sh to support EXTERNAL_TOKENIZER_ENABLED flag
for selecting the tokenizer config.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* tokenizer: use interface to be able to mock in unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add tokenizer plugin unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* pprefix-cache-scorer: use interface to be able to mock in unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add precise-prefix-cache tokenized unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add e2e test for external tokenizer PrepareData plugin

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Move Consumes() from PrefixBasedPDDecider to PdProfileHandler

The GAIE dependency bump introduced plugin layer execution order
validation. PrefixBasedPDDecider has no scheduling interface so it
falls into DefaultLayer (-1), while its producer prefix-cache-scorer
is SchedulingLayer (2). The check (2 > -1) fails.

Move Consumes() to PdProfileHandler which is a ProfileHandler
(SchedulingLayer = 2), matching the producer's layer. This is
semantically correct as PdProfileHandler is the actual scheduling
plugin that consumes the data through the decider.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add end-to-end benchmark for external tokenizer + scorer flow

Benchmarks the request latency through the EPP when using the external
tokenizer PrepareData plugin with the precise-prefix-cache-scorer.

Includes completion, chat completion, shared-prefix, and multi-message
scenarios with prompt token count reporting.

Usage: EXTERNAL_TOKENIZER_ENABLED=true KV_CACHE_ENABLED=true make env-dev-kind
  make bench-external-tokenizer
Signed-off-by: Antonio Cardace <acardace@redhat.com>

---------

Signed-off-by: Antonio Cardace <acardace@redhat.com>
…1.28 (#727)

* Corrected configuration file

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* No longer set primaryPort plugin parameter

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Deprecate use of primaryPort plugin parameter

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Deprecate use of the x-data-parallel-host-port header

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Add a real plugin Handle to the DP tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Make sure user understands that Istio >= 1.28.1 is needed

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* build: remove CGO dependency by migrating to pure-Go ZMQ

Update llm-d-kv-cache to v0.6.1-0.20260317063900-80aba2cb5a99 (main snapshot),
pending merge of llm-d/llm-d-kv-cache#431 which switches from pebbe/zmq4 (CGO)
to a pure-Go ZMQ implementation.

- go.mod/go.sum: bump kv-cache to current main pseudo-version (placeholder;
  will be updated to a real tag once llm-d/llm-d-kv-cache#431 is merged)
- Makefile: set CGO_ENABLED=0; drop check-dependencies prereq from test targets
- Makefile.tools.mk: remove ##@ Dependencies section (check/install-dependencies)
- Dockerfile.epp: remove EPEL + zeromq install steps; set CGO_ENABLED=0 in build
- DEVELOPMENT.md: remove ZeroMQ from prerequisites list
- .github/workflows/ci-pr-checks.yaml: remove CGO configuration and
  install-dependencies steps; remove CGO env vars from lint step

NOTE: This commit is intentionally draft — go.mod must be updated to the
tagged kv-cache release that includes the pure-Go ZMQ changes before merging.

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* update kv-cache (pure Go zmq, tip of main)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* revert to micro image once CGO is disabled

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
@github-actions
Copy link

Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This PR refactors the inference scheduler's core architecture and adds multimodal support. Key changes include: renaming pod-based headers to endpoint-based headers throughout; introducing Options-based configuration with deprecation migration in the sidecar proxy; adding a TokenizerPlugin for PrepareData phase with UDS tokenizer integration; wiring OpenTelemetry tracing across core proxy and plugin paths; refactoring proxy connectors (NIXLV2, SGLang, shared-storage) with new EPD protocol support for multimodal preprocessing; removing TLS certificate parameter from Server.Start() and moving TLS setup into startHTTP with dynamic certificate handling; removing ZMQ and CGO dependencies; reorganizing observability-related packages; introducing new CI workflows for multi-image building/pushing; and renaming Connector fields to KVConnector/ECConnector to distinguish connector types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes


Key Issues & Findings

TLS Certificate Handling Risk (High Priority)

File: pkg/sidecar/proxy/proxy_helpers.go, pkg/sidecar/proxy/proxy.go

The removal of the cert parameter from Start() and relocation of TLS setup to startHTTP() introduces a critical initialization sequence concern:

  • Certificate is now loaded conditionally based on SecureServing flag during startHTTP()
  • GetCertificate hook is wired dynamically, but there is no pre-flight validation that CertPath references valid, readable files before the TLS listener binds
  • If certificate loading fails mid-handler (e.g., file permissions change), the connection handler will fail silently via log.Error() without propagating the error to the caller
  • CWE-295: Improper Certificate Validation - no explicit validation that the loaded certificate matches expected domain/CN; self-signed fallback is generated without hostname constraints

Recommendation: Add synchronous certificate pre-validation in startHTTP() before listener bind. Test self-signed generation with hostname pinning. Consider error propagation for cert reloader failures.


Encoder Endpoint Validation Missing

File: pkg/sidecar/proxy/chat_completions.go

The new encoder target selection and protocol routing logic does not validate encoder hostnames against the SSRF allowlist:

// Inspects EncoderEndpointsHeader, filters to allowed hosts
// But filtering logic is incomplete; missing explicit allowlist check
  • EncoderEndpointsHeader is parsed and used to route requests to encoder services, but no call to allowlistValidator.IsAllowed() is shown
  • Contrast with prefill path, which explicitly validates prefillHostPort against allowlist
  • CWE-918: Server-Side Request Forgery (SSRF) - encoder endpoints bypass the allowlist validator

Recommendation: Apply the same SSRF validation to encoder endpoints as used for prefill. Add test coverage for disallowed encoder IPs.


OpenTelemetry Span Context Leakage

File: pkg/sidecar/proxy/connector_sglang.go

The prefill span context uses context.WithoutCancel(), which strips deadline and cancel signals but preserves trace context:

ctx := context.WithoutCancel(r.Context())  // Span lives; cancel/deadline removed

If the prefill goroutine blocks indefinitely (slow/hung encoder), the trace context prevents proper garbage collection of the request object. Combined with the span's parent reference, this can cause unbounded memory retention.

  • CWE-400: Uncontrolled Resource Consumption - span references prevent context cleanup
  • No timeout or deadline applied to the prefill request within the goroutine

Recommendation: Wrap prefill context with a timeout or deadline. Add explicit span End() even on panic. Monitor span count in production.


Deprecated Flag Migration Incomplete

File: pkg/sidecar/proxy/options.go, cmd/pd-sidecar/main.go

The Complete() method migrates deprecated boolean flags to string slices (e.g., UseTLSForPrefillerTLSForStages), but does not validate mutual exclusivity:

  • Old code: --tls-prefiller and --tls-decoder as separate booleans
  • New code: --tls-stages=prefiller,decoder as a string slice
  • Migration logic does not prevent both deprecated and new flags being set simultaneously, creating ambiguity in which takes precedence

Recommendation: Add explicit guard in Complete() that errors if both deprecated and new TLS flags are supplied. Document precedence in deprecation message.


Protocol Runner Interface Assumptions

File: pkg/sidecar/proxy/proxy.go

The new protocolRunner and epdProtocolRunner function types are assigned without nil checks:

server.runPDConnectorProtocol = ...  // Assigned once; never validated
server.runEPDConnectorProtocol = ...

If initialization order is disrupted or conditional logic prevents assignment, calling runPDConnectorProtocol() will panic with nil dereference. No defensive checks in the call sites.

Recommendation: Add assertions or guards in protocol dispatch logic. Consider returning error instead of panicking on uninitialized protocol handlers.


TestMain Race Condition

File: test/profiling/tokenizerbench/benchmark_test.go

TestMain waits for readiness with a fixed 30-second timeout but does not account for test parallelization (go test -parallel):

waitForReady(timeout time.Duration) error  // Blocks all tests

If multiple test workers are spawned, they all block on the same readiness check, potentially timing out despite the service being healthy.

Recommendation: Use a sync mechanism (atomic or channel) to ensure readiness check runs once per process, not per test invocation.


Multimodal Item Deduplication Logic

File: pkg/sidecar/proxy/connector_epd_shared_storage.go

The deduplication in fanoutEncoderPrimer() only deduplicates URL-based multimodal items, not inline audio:

// Deduplicates URL-based MM items
// Inline audio (input_audio) is never deduplicated

If a large request contains duplicate inline audio blobs, each will be forwarded to the encoder independently, wasting bandwidth and compute.

Recommendation: Extend deduplication to inline audio by hashing content or adding explicit duplicate detection. Document this limitation in the EPD protocol comments.


PrefixCacheMatchInfo Consumption Change Breaking

File: pkg/plugins/profile/prefix_based_pd_decider.go

The removal of Consumes() method from PrefixBasedPDDecider changes the plugin's declared data dependencies:

  • Old: Plugin declared consumption of PrefixCacheMatchInfo via Consumes()
  • New: No such declaration; plugin still accesses the data but does not declare it

This breaks the plugin framework's contract for verifying that required data is produced earlier in the pipeline. A missing or mislabeled producer will no longer be caught.

Recommendation: Restore Consumes() method or add an integration test that validates the data flow through the entire plugin chain without relying on declarations.


Header Constant Rename Incomplete

File: pkg/common/common.go, various test files

The rename from PrefillPodHeader to PrefillEndpointHeader was applied, but only in Go code. Configuration files and deployment manifests may reference the old header name:

  • Check if any external configuration or ingress rules reference the old header name
  • Kubernetes network policies or service meshes may be filtering on the old header

Recommendation: Audit all configuration files (YAML, ConfigMaps, ServiceEntry definitions) for references to the old header name. Add migration docs.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Sync upstream to odh march 18' is vague and fails to convey meaningful information about the changeset. It indicates a sync operation without describing what is being synced or why. Replace with a specific title describing the primary change, e.g., 'Add OpenTelemetry tracing, external tokenizer support, and sidecar options pattern' or similar descriptive phrasing that captures core functionality.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Greg Pereira <grpereir@redhat.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
scripts/kind-dev-env.sh (1)

257-261: ⚠️ Potential issue | 🟡 Minor

Quote ${KUSTOMIZE_DIR} argument.

CWE-78: Unquoted variable is subject to word splitting.

Proposed fix
-kubectl kustomize --enable-helm  ${KUSTOMIZE_DIR} \
+kubectl kustomize --enable-helm "${KUSTOMIZE_DIR}" \
 	| envsubst '${POOL_NAME} ${MODEL_NAME} ${MODEL_NAME_SAFE} ${EPP_NAME} ${EPP_IMAGE} ${VLLM_SIMULATOR_IMAGE} \
   ${PD_ENABLED} ${KV_CACHE_ENABLED} ${SIDECAR_IMAGE} ${UDS_TOKENIZER_IMAGE} ${TARGET_PORTS} \
   ${VLLM_REPLICA_COUNT} ${VLLM_REPLICA_COUNT_P} ${VLLM_REPLICA_COUNT_D} ${VLLM_DATA_PARALLEL_SIZE}' \
-  | kubectl --context ${KUBE_CONTEXT} apply -f -
+  | kubectl --context "${KUBE_CONTEXT}" apply -f -

As per coding guidelines: "Quote all variables to prevent injection".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/kind-dev-env.sh` around lines 257 - 261, The kubectl kustomize
invocation uses an unquoted ${KUSTOMIZE_DIR} (and other expansions) which allows
word splitting/injection; update the command to quote the variable references
(e.g., use "${KUSTOMIZE_DIR}" and quote "${KUBE_CONTEXT}") and ensure the
envsubst argument string is passed as a single quoted argument so all expansions
are protected (reference the kubectl kustomize call, the KUSTOMIZE_DIR variable,
envsubst invocation, and the kubectl --context usage).
Makefile.tools.mk (1)

57-61: ⚠️ Potential issue | 🟡 Minor

Add --fail flag to curl to detect download failures.

The curl command will silently succeed with HTTP errors (404, 500) and pass garbage to tar. Adding --fail or -f ensures non-zero exit on HTTP errors.

Proposed fix
 $(TYPOS): | $(LOCALBIN)
 	`@echo` "Downloading typos $(TYPOS_VERSION)..."
-	curl -L https://github.com/crate-ci/typos/releases/download/$(TYPOS_VERSION)/typos-$(TYPOS_VERSION)-$(TYPOS_ARCH).tar.gz | tar -xz -C $(LOCALBIN) $(TAR_OPTS)
+	curl -fL https://github.com/crate-ci/typos/releases/download/$(TYPOS_VERSION)/typos-$(TYPOS_VERSION)-$(TYPOS_ARCH).tar.gz | tar -xz -C $(LOCALBIN) $(TAR_OPTS)
 	chmod +x $(TYPOS)
 	`@echo` "typos installed successfully."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile.tools.mk` around lines 57 - 61, The curl invocation in the Makefile
rule for target "$(TYPOS)" can silently succeed on HTTP errors; update the curl
command in the "$(TYPOS): | $(LOCALBIN)" recipe to add the --fail (or -f) flag
so it exits non-zero on HTTP errors before piping to tar; keep the existing
variables (TYPOS_VERSION, TYPOS_ARCH, TAR_OPTS, LOCALBIN) and rest of the
pipeline unchanged so failures are detected and the chmod/echo steps don't run
on bad downloads.
Makefile (1)

220-229: ⚠️ Potential issue | 🟡 Minor

Quote paths in container build command to handle paths with spaces.

If KV_CACHE_PATH_CHECK resolves to a path containing spaces (e.g., /home/user/my projects/kv-cache), the build command will fail or behave unexpectedly.

Proposed fix
 	`@KV_CACHE_PATH_CHECK`=$$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache 2>/dev/null); \
 	if [ -z "$$KV_CACHE_PATH_CHECK" ]; then \
 		echo "Error: Could not find kv-cache module even after download."; \
 		exit 1; \
 	fi; \
 	$(CONTAINER_RUNTIME) build \
 		--platform linux/$(TARGETARCH) \
 		-t $(UDS_TOKENIZER_IMAGE) \
-		-f $$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile \
-		$$KV_CACHE_PATH_CHECK/services/uds_tokenizer
+		-f "$$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile" \
+		"$$KV_CACHE_PATH_CHECK/services/uds_tokenizer"

As per coding guidelines: "Quote shell variables in targets".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 220 - 229, The shell variable expansion for
KV_CACHE_PATH_CHECK is not quoted in the container build invocation which breaks
when the path contains spaces; update the Docker build invocation that uses
$(CONTAINER_RUNTIME) and the -f and context arguments to quote the
KV_CACHE_PATH_CHECK expansions (e.g., use
"$$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile" and
"$$KV_CACHE_PATH_CHECK/services/uds_tokenizer") and ensure any other variable
expansions used in that command (like UDS_TOKENIZER_IMAGE and TARGETARCH if
interpolated into paths) are properly quoted to prevent word-splitting.
Dockerfile.epp (1)

10-12: ⚠️ Potential issue | 🟡 Minor

Add defaults to TARGETOS/TARGETARCH ARGs to prevent empty build parameters.

Lines 10–11 declare build args without defaults. Line 27 references them in the RUN command; if the caller doesn't provide --build-arg, GOOS and GOARCH will be empty, causing build failures or inconsistent behavior.

Remediation
-ARG TARGETOS
-ARG TARGETARCH
+ARG TARGETOS=linux
+ARG TARGETARCH=amd64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.epp` around lines 10 - 12, The Dockerfile declares build args ARG
TARGETOS and ARG TARGETARCH without defaults causing empty GOOS/GOARCH at build
time; update the ARG declarations (ARG TARGETOS and ARG TARGETARCH) to include
sensible defaults (e.g., linux and amd64) or provide fallback values so the RUN
step that uses GOOS/GOARCH always has non-empty values; ensure the
Dockerfile.epp ARG lines are changed accordingly so downstream references in the
RUN command receive the defaults.
pkg/plugins/scorer/precise_prefix_cache_test.go (1)

134-135: ⚠️ Potential issue | 🔴 Critical

Fix NewChunkedTokenDatabase call arity in precise_prefix_cache_test.go.

kvblock.NewChunkedTokenDatabase returns (processor, error), as confirmed by production code in precise_prefix_cache.go. The test file uses single-value assignment at five call sites, which will fail to compile when building with //go:build embedded_tokenizers. This is a critical bug-prone pattern with missing error handling.

Affected lines: 134, 246, 319, 389, 500.

Add error handling to each call:

tokenProcessor, err := kvblock.NewChunkedTokenDatabase(kvblock.DefaultTokenProcessorConfig())
require.NoError(t, err)

Reference: precise_prefix_cache_uds_test.go already implements this pattern correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/precise_prefix_cache_test.go` around lines 134 - 135, The
test calls to kvblock.NewChunkedTokenDatabase incorrectly use single-value
assignment but the function returns (processor, error); update each call (where
tokenProcessor is created) to capture the error (e.g., tokenProcessor, err :=
kvblock.NewChunkedTokenDatabase(kvblock.DefaultTokenProcessorConfig())) and
assert no error (require.NoError(t, err)) before using tokenProcessor in
TokensToKVBlockKeys and other methods; ensure this pattern is applied at all
five call sites to match the implementation used in
precise_prefix_cache_uds_test.go.
.github/workflows/check-typos.yaml (1)

8-16: ⚠️ Potential issue | 🟠 Major

Pin actions by full SHA and set explicit job permissions.

Line 13 and Line 16 use mutable tags (@v6, @v1.44.0) instead of full commit SHAs. Malicious tag retargeting can inject code into CI execution (CWE-829, CWE-494). The job lacks an explicit permissions: block, allowing GITHUB_TOKEN to default to overprivileged permissions (CWE-269).

Replace actions/checkout@v6 with its full commit SHA, replace crate-ci/typos@v1.44.0 with its full commit SHA, and add job-level permissions restricted to contents: read.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/check-typos.yaml around lines 8 - 16, The workflow uses
mutable action tags (actions/checkout@v6 and crate-ci/typos@v1.44.0) and lacks
explicit job permissions; replace those tags with their corresponding full
commit SHAs for actions/checkout and crate-ci/typos to pin versions securely,
and add a job-level permissions block restricting GITHUB_TOKEN to contents: read
so the typos job runs with least privilege; update the uses lines for the action
identifiers (actions/checkout and crate-ci/typos) and add a top-level
permissions: { contents: read } entry for the typos job.
.github/workflows/md-link-check.yml (1)

11-20: ⚠️ Potential issue | 🟠 Major

Pin action refs to immutable SHAs and set job-scoped permissions (High).

Lines 17 and 20 use mutable tags (@v6, @v2.8.0). If a tag is moved or compromised, the workflow executes attacker-controlled code with repository token access (CWE-829, CWE-494). The job lacks explicit permissions, increasing blast radius if compromised (CWE-269).

Remediation diff
 jobs:
   lychee:
     name: Check Markdown Links
     runs-on: ubuntu-latest
+    permissions:
+      contents: read

     steps:
       - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@<full_sha_for_v6>

       - name: Run lychee link checker
-        uses: lycheeverse/lychee-action@v2.8.0
+        uses: lycheeverse/lychee-action@<full_sha_for_v2.8.0>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/md-link-check.yml around lines 11 - 20, Replace the
mutable action refs in the workflow (uses: actions/checkout@v6 and uses:
lycheeverse/lychee-action@v2.8.0) with their corresponding immutable commit SHAs
and add an explicit job-level permissions block to restrict token scope; update
the lychee step and checkout step to use pinned SHAs and add a top-level or
job-scoped permissions section (e.g., read-only for contents and no write where
not needed) so the workflow no longer relies on mutable tags and limits
repository token privileges.
pkg/sidecar/proxy/connector_sglang.go (1)

209-221: ⚠️ Potential issue | 🟠 Major

Unbounded request body read - potential memory exhaustion (CWE-400).

io.ReadAll(r.Body) reads the entire request body without size limits. A malicious client could send an arbitrarily large payload to exhaust server memory.

Recommended fix
 func (s *Server) parseSGLangRequest(r *http.Request) (map[string]interface{}, error) {
-	body, err := io.ReadAll(r.Body)
+	const maxBodySize = 10 << 20 // 10 MB limit
+	body, err := io.ReadAll(io.LimitReader(r.Body, maxBodySize))
 	if err != nil {
 		return nil, fmt.Errorf("failed to read request body: %w", err)
 	}
+	if len(body) >= maxBodySize {
+		return nil, fmt.Errorf("request body exceeds maximum size")
+	}

As per coding guidelines: "Use io.LimitReader for HTTP response bodies (prevent memory exhaustion)"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_sglang.go` around lines 209 - 221,
parseSGLangRequest currently calls io.ReadAll(r.Body) with no size limit;
replace that with a bounded read using io.LimitReader (or http.MaxBytesReader
upstream) to prevent memory exhaustion. In parseSGLangRequest, wrap r.Body with
io.LimitReader(r.Body, <maxBytes>) (define a sensible constant like
maxRequestBodySize), use io.ReadAll on that limited reader, and return a clear
error if the body exceeds the limit or the read fails; ensure you reference the
function name parseSGLangRequest and the new maxRequestBodySize constant so
reviewers can find the change.
🟡 Minor comments (7)
docs/architecture.md-41-41 (1)

41-41: ⚠️ Potential issue | 🟡 Minor

Fix compound modifier hyphenation in architecture description.

Line 41 should use “state-of-the-art” (hyphenated) to avoid ambiguous phrasing in technical docs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture.md` at line 41, Update the phrase in the architecture text
where it reads "state of the art" (in the sentence describing the llm-d
inference scheduler extending the EPP in [GIE]) to the hyphenated form
"state-of-the-art" to correct the compound modifier hyphenation and avoid
ambiguous phrasing.
scripts/kind-dev-env.sh-229-236 (1)

229-236: ⚠️ Potential issue | 🟡 Minor

Quote variables in pipeline commands.

CWE-78: Unquoted variables in command arguments risk word splitting if paths contain spaces or special characters.

Proposed fix
-kubectl kustomize deploy/components/crds-gateway-api |
-	kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f -
+kubectl kustomize deploy/components/crds-gateway-api |
+	kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f -

-kubectl kustomize deploy/components/crds-gie |
-	kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f -
+kubectl kustomize deploy/components/crds-gie |
+	kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f -

-kubectl kustomize --enable-helm deploy/components/crds-istio |
-	kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f -
+kubectl kustomize --enable-helm deploy/components/crds-istio |
+	kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f -

As per coding guidelines: "Quote all variables to prevent injection".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/kind-dev-env.sh` around lines 229 - 236, The pipeline kubectl
commands use unquoted variable expansions which can cause word splitting or
injection (e.g., the three lines that pipe `kubectl kustomize ... | kubectl
--context ${KUBE_CONTEXT} apply ... -f -`); update those commands to quote the
variable expansions (e.g., use `--context "${KUBE_CONTEXT}"`) and quote any
other shell variables in the pipeline to ensure paths or contexts with
spaces/special characters are handled safely while preserving the existing pipe
to `kubectl -f -`.
scripts/kind-dev-env.sh-254-254 (1)

254-254: ⚠️ Potential issue | 🟡 Minor

Quote ${EPP_CONFIG} to prevent word splitting.

CWE-78: If EPP_CONFIG contains spaces or glob characters, the redirection will fail or behave unexpectedly.

Proposed fix
-envsubst '$MODEL_NAME' < ${EPP_CONFIG} > ${TEMP_FILE}
+envsubst '$MODEL_NAME' < "${EPP_CONFIG}" > "${TEMP_FILE}"

As per coding guidelines: "Quote all variables to prevent injection".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/kind-dev-env.sh` at line 254, The redirection uses unquoted variables
which can lead to word-splitting or globbing when EPP_CONFIG or TEMP_FILE
contain spaces or special characters; update the envsubst invocation to quote
the file variables so that EPP_CONFIG and TEMP_FILE are expanded safely (use
"${EPP_CONFIG}" for the input redirection and "${TEMP_FILE}" for the output
redirection) while keeping the envsubst variable list ('$MODEL_NAME') as-is;
locate the line with envsubst and replace the unquoted references to EPP_CONFIG
and TEMP_FILE accordingly.
pkg/sidecar/proxy/data_parallel.go-18-22 (1)

18-22: ⚠️ Potential issue | 🟡 Minor

Move this deprecation warning off the request path.

Line 21 emits an Info log for every routed data-parallel request, and this header is still the active routing contract in this codepath. That will flood logs and make valid traffic look like misuse. Log once during startup/plugin init, or demote it to a debug/once-only path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/data_parallel.go` around lines 18 - 22, The deprecation
Info log inside dataParallelHandler floods logs on every request; remove or
demote that per-request s.logger.Info call and instead emit the deprecation
warning only once during startup/initialization (or at plugin init) where the
dataParallel routing is configured. Locate the s.dataParallelProxies setup and
add a one-time log (e.g., using sync.Once or during the function that populates
dataParallelProxies) to warn about DataParallelEndpointHeader deprecation; then
delete or change the s.logger.Info call in dataParallelHandler that references
dataParallelPodHostPort so per-request handling uses only handler :=
s.dataParallelProxies[dataParallelPodHostPort].
pkg/sidecar/proxy/connector_nixlv2.go-240-261 (1)

240-261: ⚠️ Potential issue | 🟡 Minor

true_ttft_ms is not TTFT here.

Line 250 uses decodeStart, which is captured before the decode request is even sent downstream. That measures dispatch latency, not time-to-first-token, so this will under-report user-visible latency for both buffered and streaming responses. Either rename the attribute or capture the timestamp of the first downstream write/flush and use that instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 240 - 261, The metric
currently computed as trueTTFT uses decodeStart (in function handling spans that
sets trueTTFT and calls currentSpan.SetAttributes) which measures dispatch
latency, not user-visible time-to-first-token; fix by capturing the actual
downstream first-write/flush timestamp (e.g., record firstWriteTime when the
proxy writes/flushed the first token downstream and make it available via
context or a shared field identified by requestStartTimeKey/firstWriteTimeKey)
and compute trueTTFT = firstWriteTime.Sub(requestStart); then replace the
attribute "llm_d.pd_proxy.true_ttft_ms" to use that value (or if you choose not
to change behavior, rename the attribute to reflect dispatch latency instead of
TTFT to avoid mislabeling).
pkg/sidecar/proxy/connector_sglang_test.go-123-170 (1)

123-170: ⚠️ Potential issue | 🟡 Minor

Data race on prefillFinished variable.

prefillFinished is a plain bool written at line 128 inside a goroutine (via the HTTP handler) and read at line 170 from the main test goroutine. This is a data race detectable by -race.

Fix using atomic
-	var prefillFinished bool
+	var prefillFinished atomic.Bool

 	slowPrefill := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		testInfo.prefillHandler.ServeHTTP(w, r)
 		time.Sleep(300 * time.Millisecond) // Simulated load delay on KV Cache
-		prefillFinished = true
+		prefillFinished.Store(true)
 	})
 ...
-	Expect(prefillFinished).To(BeTrue())
+	Expect(prefillFinished.Load()).To(BeTrue())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_sglang_test.go` around lines 123 - 170, The test
has a data race on the plain bool prefillFinished (written inside slowPrefill
HTTP handler and read later in the test). Change prefillFinished to a uint32
(e.g., var prefillFinished uint32), set it in slowPrefill using
atomic.StoreUint32(&prefillFinished, 1), and check it with
atomic.LoadUint32(&prefillFinished) == 1 in the Expect assertion; also add the
"sync/atomic" import. Update references in slowPrefill and the
Expect(prefillFinished) check accordingly.
pkg/sidecar/proxy/connector_epd_shared_storage.go-79-102 (1)

79-102: ⚠️ Potential issue | 🟡 Minor

Shallow copy may cause data corruption.

buildEncoderRequest performs a shallow copy at lines 82-84. If originalRequest contains nested maps/slices (like messages), they share references. Concurrent modifications to originalRequest while encoder requests are in-flight could corrupt data.

The fix at line 96 replaces messages entirely which is safe, but other nested fields copied from originalRequest could still be affected.

Consider deep copy for safety
 func buildEncoderRequest(originalRequest map[string]any, mmItem map[string]any) map[string]any {
-	// Create a deep copy of the original request
-	encoderRequest := make(map[string]any)
-	for k, v := range originalRequest {
-		encoderRequest[k] = v
-	}
+	// Create a proper deep copy via JSON round-trip
+	data, _ := json.Marshal(originalRequest)
+	var encoderRequest map[string]any
+	_ = json.Unmarshal(data, &encoderRequest)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_epd_shared_storage.go` around lines 79 - 102,
buildEncoderRequest currently makes a shallow copy of originalRequest (lines
creating encoderRequest) which can leave nested maps/slices shared and lead to
concurrent data corruption; change it to perform a deep copy of originalRequest
before mutating it (for example by JSON marshal/unmarshal into map[string]any or
by a recursive copy helper) so all nested structures are duplicated, then set
"messages", "max_tokens", "stream" and delete "stream_options" on that
independent copy; keep the function name buildEncoderRequest and the
encoderRequest variable but ensure the deep-copy approach preserves the original
nested types.
🧹 Nitpick comments (19)
Makefile (2)

189-192: post-deploy-test has redundant and inconsistent output.

Line 191 uses echo without @ prefix, causing make to print both the command and its output. The two echo statements also appear redundant.

Proposed fix
 .PHONY: post-deploy-test
 post-deploy-test: ## Run post deployment tests
-	echo Success!
 	`@echo` "Post-deployment tests passed."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 189 - 192, The Makefile target post-deploy-test prints
both the command and output and duplicates messages; remove the redundant plain
echo "Success!" and keep a single, quieted message by using an `@-prefixed` echo
(e.g., replace the two echo lines with one `@echo` "Post-deployment tests
passed."). Update the post-deploy-test target to only output the final success
message and suppress printing the command invocation.

112-114: Hardcoded upstream/main may fail if remote is named differently.

The signed-commits-check target hardcodes upstream/main, but some contributors may have a differently named remote (e.g., origin). Consider making this configurable or documenting the expected remote setup.

Proposed fix
+UPSTREAM_BRANCH ?= upstream/main
+
 .PHONY: signed-commits-check
 signed-commits-check:
-	@./scripts/check-commits.sh upstream/main
+	@./scripts/check-commits.sh $(UPSTREAM_BRANCH)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 112 - 114, The signed-commits-check Makefile target
hardcodes "upstream/main", which breaks for repos using a different remote;
change the target to call ./scripts/check-commits.sh with a configurable Make
variable (e.g., $(GIT_REMOTE) or $(REMOTE)) that defaults to "upstream" or
"origin" if not set, and update any README/docs to mention how to override the
variable (invoke make signed-commits-check GIT_REMOTE=origin). Ensure the unique
symbols referenced are the signed-commits-check target and the
./scripts/check-commits.sh invocation so reviewers can locate and update the
Makefile accordingly.
scripts/check-commits.sh (1)

18-31: Use ${variable} bracing consistently and handle potential failures in pipelines.

The loop iterates correctly, but the pipeline error masking issue compounds here. If git cat-file or git log fails for a commit, the script will incorrectly pass that commit instead of failing.

Proposed fix with explicit error handling
 for commit in $(git rev-list "$RANGE"); do
-    HASH=$(git rev-parse --short "$commit")
+    HASH=$(git rev-parse --short "$commit") || { echo "Failed to parse commit"; exit 1; }
     
     # 1. Check for cryptographic signature header (gpgsig)
     # Using cat-file bypasses the SSH/GPG verification engine entirely
-    if ! git cat-file commit "$commit" | grep -q "^gpgsig"; then
+    if ! git cat-file commit "$commit" 2>/dev/null | grep -q "^gpgsig"; then
         ERR_SIG="$ERR_SIG $HASH"
     fi
     
     # 2. Check for DCO Sign-off text in the commit message
-    if ! git log -1 --format="%b" "$commit" | grep -q "Signed-off-by:"; then
+    if ! git log -1 --format="%b" "$commit" 2>/dev/null | grep -q "Signed-off-by:"; then
         ERR_DCO="$ERR_DCO $HASH"
     fi
 done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-commits.sh` around lines 18 - 31, Enable pipefail and use
braced variable names, then treat git command failures as errors: add set -o
pipefail before the loop and replace unbraced variables with ${RANGE},
${commit}, ${HASH}, ${ERR_SIG}, ${ERR_DCO}; for each check run git cat-file
commit "${commit}" and git log -1 --format="%b" "${commit}" and if either
command fails (non-zero exit) or the grep test fails, append ${HASH} to the
appropriate ${ERR_SIG} or ${ERR_DCO} variable so a failing git command does not
silently pass the check.
pkg/metrics/metrics.go (1)

7-7: Add an alias to the imported metrics package to improve clarity.

Line 7 imports a package as metrics inside package metrics. While this compiles correctly, it's non-idiomatic Go and reduces readability during code reviews and refactors. Use an explicit alias:

Suggested diff
-	"sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics"
+	obsmetrics "sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics"

Update line 26:

-			Help:      metrics.HelpMsgWithStability("Total number of P/D disaggregation decisions made", compbasemetrics.ALPHA),
+			Help:      obsmetrics.HelpMsgWithStability("Total number of P/D disaggregation decisions made", compbasemetrics.ALPHA),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/metrics/metrics.go` at line 7, The import shadows the current package
name (package metrics) by importing the same identifier; change the import to an
explicit alias (for example: observability or observabilityMetrics) for
"sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics"
and update every reference in this file that currently uses metrics.<...> (e.g.,
metrics.NewCounter, metrics.Register, etc.) to use the new alias
(observability.NewCounter, observability.Register) so the package no longer
conflicts with the file's package name.
pkg/plugins/datalayer/models/datasource_test.go (1)

48-48: Bound Poll with a timeout context to avoid hanging tests on network stalls.

Line 48 uses context.Background() from setup; a blocked dial/read can hold this test indefinitely depending on HTTP client behavior.

Remediation diff
-	err = source.Poll(ctx, endpoint)
+	pollCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
+	defer cancel()
+	err = source.Poll(pollCtx, endpoint)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/datalayer/models/datasource_test.go` at line 48, The test calls
source.Poll(ctx, endpoint) with the setup's background context and can hang on
network stalls; change it to pass a cancellable timeout context (e.g., ctx,
cancel := context.WithTimeout(context.Background(), <reasonable duration>) with
defer cancel()) so Poll is bounded, and ensure time is imported in the test
file; reference the Poll call in datasource_test.go and replace the existing ctx
(from context.Background()) with the new timeout context.
pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go (1)

107-137: Assert fallback execution when TokenIDs is empty.

This test only checks that ScoreTokens is not called. It should also assert that the non-tokenized scoring path (GetPodScores) is invoked, otherwise a silent no-op path can still pass.

Proposed test hardening
 func TestPrecisePrefixCacheScorer_SkipsTokenizedPromptWhenEmpty(t *testing.T) {
 	ctx := utils.NewTestContext(t)
 	fromTokensCalled := false
+	getPodScoresCalled := false

 	scorer := &PrecisePrefixCacheScorer{
 		typedName:      plugin.TypedName{Type: PrecisePrefixCachePluginType, Name: "test"},
 		kvEventsConfig: &kvevents.Config{},
 		kvCacheIndexer: &mockKVCacheIndexer{
 			scoreTokensFunc: func(_ context.Context, _ []uint32, _ string, _ []string) (map[string]float64, error) {
 				fromTokensCalled = true
 				return map[string]float64{}, nil
 			},
 			getPodScoresFunc: func(_ context.Context, _ *types.RenderChatRequest, _ string, _ string, _ []string) (map[string]float64, error) {
+				getPodScoresCalled = true
 				return map[string]float64{}, nil
 			},
 		},
 	}
@@
 	scorer.Score(ctx, scheduling.NewCycleState(), request, testEndpoints)
 	assert.False(t, fromTokensCalled, "ScoreTokens should not be called with empty TokenIDs")
+	assert.True(t, getPodScoresCalled, "GetPodScores should be called as fallback when TokenIDs is empty")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 107 -
137, The test only verifies ScoreTokens wasn't called but should also assert the
fallback non-tokenized path is used: modify the mockKVCacheIndexer in
TestPrecisePrefixCacheScorer_SkipsTokenizedPromptWhenEmpty to set a flag (e.g.,
fromPodScoresCalled) inside getPodScoresFunc and then after calling
scorer.Score(...) assert that fromPodScoresCalled is true; keep the existing
fromTokensCalled check and ensure you reference the same scorer.Score invocation
and the TokenizedPrompt.TokenIDs empty input so the assertion proves
GetPodScores (getPodScoresFunc) executes when TokenIDs is empty.
.github/workflows/ci-pr-checks.yaml (1)

56-56: Make dependency drift fail fast after go mod tidy.

Line 56 runs go mod tidy but does not assert go.mod/go.sum stayed unchanged, so CI can pass with uncommitted module graph rewrites.

Remediation
-      - name: Install dependencies
-        run: go mod tidy
+      - name: Verify module files are tidy
+        run: |
+          go mod tidy
+          git diff --exit-code -- go.mod go.sum

As per coding guidelines, **: REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml at line 56, The CI step currently runs
"run: go mod tidy" but doesn't fail when go.mod/go.sum are modified; add a
follow-up step after the "run: go mod tidy" command that verifies no changes
were produced (for example invoking git diff --exit-code go.mod go.sum or git
diff --quiet && git diff --exit-code) and fail the job if the files changed so
dependency drift fails fast; reference the existing "run: go mod tidy" step to
place the new verification step immediately after it.
pkg/sidecar/proxy/options_test.go (2)

142-154: Duplicate test cases with misleading name.

Test cases "empty string does not set values" (lines 143-147) and "deprecated flags take precedence when InferencePool is empty" (lines 149-153) have identical setup and expected values. The second case name implies deprecated flags are set, but they aren't. Either remove the duplicate or set deprecated flags to actually test precedence.

Suggested fix
 		{
-			name:              "deprecated flags take precedence when InferencePool is empty",
-			inferencePool:     "",
-			expectedNamespace: "",
-			expectedName:      "",
+			name:              "deprecated flags take precedence when InferencePool is empty",
+			inferencePool:     "",
+			expectedNamespace: "deprecated-ns",  // Set in test setup
+			expectedName:      "deprecated-pool", // Set in test setup
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/options_test.go` around lines 142 - 154, The two
table-driven test entries named "empty string does not set values" and
"deprecated flags take precedence when InferencePool is empty" are identical;
update the test cases in pkg/sidecar/proxy/options_test.go so the second case
actually exercises deprecated-flag precedence: either remove the duplicate entry
or modify the second entry to set the deprecated flags (e.g., set the deprecated
namespace/name inputs) and adjust expectedNamespace/expectedName accordingly;
look for the test table in the Test... function (the test that iterates over
entries with fields inferencePool/expectedNamespace/expectedName) and fix the
row with name "deprecated flags take precedence when InferencePool is empty" to
reflect the intended scenario.

156-173: Test setup doesn't configure deprecated flags for precedence test.

The test loop doesn't set opts.InferencePoolNamespace or opts.InferencePoolName for the "deprecated flags take precedence" case, so that test case doesn't actually verify precedence behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/options_test.go` around lines 156 - 173, The test isn't
setting the deprecated flag values before calling Complete(), so the "deprecated
flags take precedence" case never exercises precedence; update the test loop to
assign the deprecated values to the options before Complete() (e.g. set
opts.InferencePoolNamespace = tt.deprecatedNamespace and opts.InferencePoolName
= tt.deprecatedName for that case or for all cases where tt includes those
fields) so Complete() can resolve and the assertions on InferencePoolNamespace
and InferencePoolName validate precedence; use the NewOptions(),
opts.InferencePool, and opts.InferencePoolNamespace/InferencePoolName symbols
referenced in the test.
test/profiling/tokenizerbench/benchmark_test.go (1)

17-33: Package comment doesn't match package name.

Doc comment declares Package external_tokenizer_scorer but the actual package is tokenizerbench. This will cause godoc inconsistency.

Fix
-// Package external_tokenizer_scorer benchmarks the end-to-end request flow
+// Package tokenizerbench benchmarks the end-to-end request flow
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/profiling/tokenizerbench/benchmark_test.go` around lines 17 - 33, The
package-level comment at the top of the file declares "Package
external_tokenizer_scorer" while the actual package is declared as "package
tokenizerbench"; update the package comment to match the declared package name
(change the comment to "Package tokenizerbench" and adjust any descriptive text
accordingly) so godoc and the package declaration are consistent, referencing
the existing package declaration symbol "package tokenizerbench" and the
erroneous comment text "Package external_tokenizer_scorer".
test/e2e/e2e_test.go (1)

676-680: Inconsistent model name handling in runStreamingChatCompletion.

runStreamingChatCompletion hardcodes simModelName (line 680) while runStreamingCompletion accepts theModel as a parameter. This asymmetry breaks the pattern established by other helper pairs (runCompletion/runChatCompletion).

Proposed fix
-func runStreamingChatCompletion(prompt string) (string, string) {
+func runStreamingChatCompletion(prompt string, theModel string) (string, string) {
 	ginkgo.By(fmt.Sprintf("Sending Streaming Chat Completion Request: (port %s)", port))

 	// Use raw HTTP for streaming to capture headers
-	body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, simModelName, prompt)
+	body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, theModel, prompt)

Update the call site at line 195:

-			nsHdr, podHdr = runStreamingChatCompletion(simplePrompt)
+			nsHdr, podHdr = runStreamingChatCompletion(simplePrompt, simModelName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/e2e_test.go` around lines 676 - 680, runStreamingChatCompletion
currently hardcodes simModelName; change its signature to accept a model
parameter (e.g., func runStreamingChatCompletion(theModel string, prompt string)
(string, string)) and replace simModelName in the body with theModel so it
mirrors runStreamingCompletion/runCompletion/runChatCompletion; then update
every call site that invokes runStreamingChatCompletion to pass the desired
model (use simModelName where tests expect the simulator) so the helper pair is
consistent.
pkg/sidecar/proxy/connector_test.go (1)

85-85: Avoid hard-coded http:// slicing for prefill endpoint headers.

Line 85 and Line 146 assume the backend URL always starts with http://. This is brittle and can mis-handle HTTPS endpoints. Use common.StripScheme for host:port extraction.

♻️ Proposed change
- req.Header.Add(common.PrefillEndpointHeader, testInfo.prefillBackend.URL[len("http://"):])
+ req.Header.Add(common.PrefillEndpointHeader, common.StripScheme(testInfo.prefillBackend.URL))

Also applies to: 146-146

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_test.go` at line 85, Replace the brittle manual
slicing of testInfo.prefillBackend.URL when setting the PrefillEndpointHeader
with a call to common.StripScheme to reliably extract host:port (instead of
req.Header.Add(common.PrefillEndpointHeader,
testInfo.prefillBackend.URL[len("http://"):])). Update both occurrences that set
common.PrefillEndpointHeader (the one using testInfo.prefillBackend.URL and the
second occurrence around the same test) to call
common.StripScheme(testInfo.prefillBackend.URL) so HTTPS and other schemes are
handled correctly.
cmd/epp/main.go (1)

65-69: Log runner failures at the process boundary before returning exit code 1.

Line 65-Line 69 returns non-zero on startup/runtime failure but does not emit a top-level error from this boundary, which weakens incident triage in aggregated logs.

🛠️ Proposed change
-	if err := runner.NewRunner().
+	if err := runner.NewRunner().
 		WithCustomCollectors(metrics.GetCollectors()...).
 		Run(ctx); err != nil {
+		ctrl.Log.Error(err, "EPP runner exited with error")
 		return 1
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/epp/main.go` around lines 65 - 69, The runner startup currently returns 1
on error without logging; capture the error returned by
NewRunner().WithCustomCollectors(metrics.GetCollectors()...).Run(ctx), log a
clear top-level error message including the err value (using the project’s
top-level logger, e.g., log/processLogger) and any relevant context before
returning 1 so failures at the process boundary are recorded for triage.
pkg/sidecar/proxy/proxy.go (2)

147-148: Error from LRU cache creation ignored.

Lines 147-148 use // nolint:all to silence the ignored error. While lru.New only fails for size ≤ 0 (and 16 is safe), explicitly handling or documenting this is cleaner than blanket nolint.

Alternative: explicit handling
-	prefillerCache, _ := lru.New[string, http.Handler](16) // nolint:all
-	encoderCache, _ := lru.New[string, http.Handler](16)   // nolint:all
+	prefillerCache, err := lru.New[string, http.Handler](16)
+	if err != nil {
+		panic("BUG: invalid cache size") // size is compile-time constant
+	}
+	encoderCache, err := lru.New[string, http.Handler](16)
+	if err != nil {
+		panic("BUG: invalid cache size")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/proxy.go` around lines 147 - 148, The LRU constructor
errors for prefillerCache and encoderCache are being ignored with a blanket
nolint; replace that with explicit error handling: call
lru.New[string,http.Handler](16) for prefillerCache and encoderCache, check the
returned error (from lru.New) and handle it—either propagate it, log and exit,
or panic with a clear message—so failures are not silently suppressed; update
references in proxy.go where prefillerCache and encoderCache are created to use
this explicit check instead of "// nolint:all".

235-250: Unknown EC connector silently ignored.

setECConnector silently returns for unknown connector values (line 247-248). Consider logging a warning so operators know their configuration was not applied.

Add warning log
 	switch ecConnector {
 	case ECExampleConnector:
 		s.runEPDConnectorProtocol = s.runEPDProtocol
 	default:
-		// Unknown EC connector value, skip encoder stage
+		// Unknown EC connector value, skip encoder stage
+		// Consider: log.Log.Info("Unknown EC connector, encoder stage disabled", "connector", ecConnector)
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/proxy.go` around lines 235 - 250, The switch in
setECConnector silently returns on unknown EC connector values; update the
default case in Server.setECConnector to emit a warning that includes the
invalid ecConnector value so operators are notified (use the Server's logger
field, e.g., s.logger.Warnf or s.log.Warnf; if the Server has no logger, use the
package logger) and keep the current behavior of skipping the encoder stage.
pkg/sidecar/proxy/connector_sglang_test.go (1)

53-54: Flaky synchronization using time.Sleep.

time.Sleep(1 * time.Second) at lines 54 and 152 for waiting on proxy startup is fragile. Consider polling for readiness or using a ready channel.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_sglang_test.go` around lines 53 - 54, The test
uses a brittle time.Sleep(1 * time.Second) to wait for the proxy to start;
replace that sleep with a deterministic readiness check: poll the proxy's
readiness/health endpoint or use a ready channel returned by the proxy start
routine (replace/remove the time.Sleep calls in connector_sglang_test.go),
implementing a small waitForProxyReady helper that retries with a
timeout/context until the proxy accepts connections or returns healthy, and use
that helper in place of time.Sleep at both locations.
pkg/sidecar/proxy/chat_completions.go (1)

82-90: Redundant condition check.

Line 86 else if numHosts > 0 is redundant since the outer if numHosts > 0 at line 82 already guarantees this.

Simplify
 	if numHosts > 0 {
 		if s.config.EnablePrefillerSampling {
 			// Sample a host value from the list
 			prefillHostPort = strings.TrimSpace(prefillHostPorts[s.prefillSamplerFn(numHosts)])
-		} else if numHosts > 0 {
+		} else {
 			// Select only the first header value, consistent with previous behavior
 			prefillHostPort = strings.TrimSpace(prefillHostPorts[0])
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/chat_completions.go` around lines 82 - 90, The else-if
check "else if numHosts > 0" is redundant because the outer "if numHosts > 0"
already ensures that condition; replace that branch with a plain "else" to
simplify logic while keeping behavior the same: inside the block that sets
prefillHostPort, use "if s.config.EnablePrefillerSampling { prefillHostPort =
strings.TrimSpace(prefillHostPorts[s.prefillSamplerFn(numHosts)]) } else {
prefillHostPort = strings.TrimSpace(prefillHostPorts[0]) }", referencing
s.config.EnablePrefillerSampling, s.prefillSamplerFn, prefillHostPorts and
prefillHostPort.
pkg/sidecar/proxy/connector_epd_shared_storage_test.go (1)

262-309: Test lacks cleanup for potential goroutine leaks.

fanoutEncoderPrimer spawns goroutines internally. If a test times out or panics before wg.Wait() completes in the implementation, the mock server closure at line 270 could cause issues. Consider adding t.Cleanup or explicit timeout handling.

Additionally, line 275 assigns log.Log directly without initializing a test logger, which may cause nil pointer issues in some test environments.

Suggested improvement
 	encoderURL, err := url.Parse(encoderBackend.URL)
 	assert.NoError(t, err)
 	srv := NewProxy("0", encoderURL, Config{})
-	srv.logger = log.Log
+	srv.logger = logr.Discard() // or use a test logger

 	encoderHostPort := encoderURL.Host
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_epd_shared_storage_test.go` around lines 262 -
309, The test TestFanoutEncoderPrimerDeduplication must ensure the httptest
server is always closed and uses a test-safe logger: register
encoderBackend.Close with t.Cleanup so the mock server is closed even on
panic/timeouts, and replace the direct assignment srv.logger = log.Log with a
test logger (e.g., srv.logger = log.NewTestLogger(t) or the project's test
logger helper) so srv.logger is initialized for test environments; keep the call
to fanoutEncoderPrimer unchanged but consider running it with a short context
timeout if needed to avoid goroutine hangs.
pkg/sidecar/proxy/connector_epd_shared_storage.go (1)

197-208: Only first error returned; subsequent errors silently dropped.

When multiple encoder requests fail, only the first error read from errChan is returned (line 203-204). Other errors are silently discarded. Consider aggregating errors or logging all failures.

Alternative: collect all errors
 	// Check for errors
+	var errs []error
 	for err := range errChan {
 		if err != nil {
-			return err
+			errs = append(errs, err)
 		}
 	}
+	if len(errs) > 0 {
+		return fmt.Errorf("encoder failures: %v", errs)
+	}

 	return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_epd_shared_storage.go` around lines 197 - 208,
The current drain of errChan after wg.Wait() only returns the first non-nil
error and drops subsequent errors; change the logic in the function that uses wg
and errChan to collect all errors (e.g., append non-nil errs to a slice or use a
multi-error aggregator) while still iterating until errChan is closed, then
either return a combined error (or nil if none) and/or log each individual
error; update the block around wg.Wait(), close(errChan) and the for err :=
range errChan loop to build and return an aggregated error instead of returning
immediately on the first non-nil error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 804553bc-ea3a-47ba-882b-f42c3b31f98e

📥 Commits

Reviewing files that changed from the base of the PR and between 740c775 and a238100.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/*.sum
📒 Files selected for processing (75)
  • .github/actions/docker-build-and-push/action.yml
  • .github/actions/trivy-scan/action.yml
  • .github/dependabot.yml
  • .github/workflows/check-typos.yaml
  • .github/workflows/ci-build-images.yaml
  • .github/workflows/ci-dev.yaml
  • .github/workflows/ci-pr-checks.yaml
  • .github/workflows/ci-release.yaml
  • .github/workflows/md-link-check.yml
  • DEVELOPMENT.md
  • Dockerfile.epp
  • Dockerfile.sidecar
  • Makefile
  • Makefile.tools.mk
  • cmd/epp/main.go
  • cmd/pd-sidecar/main.go
  • deploy/components/inference-gateway/deployments.yaml
  • deploy/components/vllm-sim-pd/deployments.yaml
  • deploy/components/vllm-sim/deployments.yaml
  • deploy/config/dp-epp-config.yaml
  • deploy/config/pd-epp-config.yaml
  • deploy/config/sim-epp-external-tokenizer-config.yaml
  • deploy/config/sim-pd-epp-config.yaml
  • docs/architecture.md
  • go.mod
  • pkg/common/common.go
  • pkg/common/common_test.go
  • pkg/metrics/metrics.go
  • pkg/plugins/datalayer/models/datasource_test.go
  • pkg/plugins/datalayer/models/factories.go
  • pkg/plugins/pre-request/pd_prerequest.go
  • pkg/plugins/pre-request/pd_prerequest_test.go
  • pkg/plugins/preparedata/tokenizer.go
  • pkg/plugins/preparedata/tokenizer_test.go
  • pkg/plugins/profile/dp_profile_handler.go
  • pkg/plugins/profile/dp_profile_handler_test.go
  • pkg/plugins/profile/pd_profile_handler.go
  • pkg/plugins/profile/pd_profile_handler_test.go
  • pkg/plugins/profile/prefix_based_pd_decider.go
  • pkg/plugins/register.go
  • pkg/plugins/scorer/active_request.go
  • pkg/plugins/scorer/load_aware.go
  • pkg/plugins/scorer/no_hit_lru.go
  • pkg/plugins/scorer/precise_prefix_cache.go
  • pkg/plugins/scorer/precise_prefix_cache_test.go
  • pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go
  • pkg/plugins/scorer/precise_prefix_cache_uds_test.go
  • pkg/plugins/scorer/session_affinity.go
  • pkg/scheduling/pd/scheduler_test.go
  • pkg/sidecar/proxy/chat_completions.go
  • pkg/sidecar/proxy/chat_completions_test.go
  • pkg/sidecar/proxy/connector_epd_shared_storage.go
  • pkg/sidecar/proxy/connector_epd_shared_storage_test.go
  • pkg/sidecar/proxy/connector_nixlv2.go
  • pkg/sidecar/proxy/connector_nixlv2_test.go
  • pkg/sidecar/proxy/connector_sglang.go
  • pkg/sidecar/proxy/connector_sglang_test.go
  • pkg/sidecar/proxy/connector_test.go
  • pkg/sidecar/proxy/data_parallel.go
  • pkg/sidecar/proxy/data_parallel_test.go
  • pkg/sidecar/proxy/options.go
  • pkg/sidecar/proxy/options_test.go
  • pkg/sidecar/proxy/proxy.go
  • pkg/sidecar/proxy/proxy_helpers.go
  • pkg/sidecar/proxy/proxy_test.go
  • pkg/telemetry/tracing.go
  • scripts/check-commits.sh
  • scripts/kind-dev-env.sh
  • test/e2e/e2e_suite_test.go
  • test/e2e/e2e_test.go
  • test/e2e/yaml/deployments.yaml
  • test/e2e/yaml/rbac.yaml
  • test/e2e/yaml/vllm-sim-dp.yaml
  • test/e2e/yaml/vllm-sim-pd.yaml
  • test/profiling/tokenizerbench/benchmark_test.go
💤 Files with no reviewable changes (2)
  • deploy/config/pd-epp-config.yaml
  • deploy/config/sim-pd-epp-config.yaml

steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n .github/actions/docker-build-and-push/action.yml | head -50

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1833


🏁 Script executed:

wc -l .github/actions/docker-build-and-push/action.yml

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 135


🏁 Script executed:

cat -n .github/actions/docker-build-and-push/action.yml | tail -25

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1171


Pin docker/setup-buildx-action to a full commit SHA.

Line 30 uses mutable-tag pinning. Supply chain risk: a retagged or compromised action revision executes attacker-controlled code in the build pipeline and can exfiltrate registry credentials. (CWE-829, CWE-494)

Remediation
-      uses: docker/setup-buildx-action@v4
+      uses: docker/setup-buildx-action@<full_commit_sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/actions/docker-build-and-push/action.yml at line 30, Replace the
mutable tag usage of docker/setup-buildx-action in the action.yml where the line
reads uses: docker/setup-buildx-action@v4 by pinning to the action’s full commit
SHA; update the uses entry to reference the specific commit SHA for
docker/setup-buildx-action (obtain the canonical commit SHA from the action’s
GitHub releases/commit history) so the workflow uses an immutable revision and
prevents supply-chain tampering.

Comment on lines +22 to +24
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add explicit permissions block with least-privilege scope.

No permissions defined. Default permissions may be overly permissive. Define minimal required permissions for this job (e.g., contents: read, packages: write).

Suggested fix
 jobs:
   docker-build-and-push:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-build-images.yaml around lines 22 - 24, Add a minimal
explicit permissions block to the GitHub Actions job "docker-build-and-push" to
enforce least privilege: update the job definition for docker-build-and-push to
include a permissions mapping (for example permissions: contents: read,
packages: write, id-token: write if using OIDC) instead of relying on defaults
so the job only has the scopes it needs.

- name: Checkout source
uses: actions/checkout@v6
- uses: dorny/paths-filter@v3
- uses: dorny/paths-filter@v4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -path ".github/workflows/ci-pr-checks.yaml" -type f

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 66


🏁 Script executed:

cat -n .github/workflows/ci-pr-checks.yaml | head -30

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 870


🏁 Script executed:

grep -n "uses:" .github/workflows/ci-pr-checks.yaml

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 331


Pin dorny/paths-filter by immutable commit SHA.

Line 19 uses tag pinning (@v4). A compromised or retagged upstream action can execute arbitrary code in CI with access to repository-scoped credentials (CWE-829, CWE-494).

Remediation
-      - uses: dorny/paths-filter@v4
+      - uses: dorny/paths-filter@<full_commit_sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml at line 19, Replace the tag pinning for
the GitHub Action "dorny/paths-filter" (the uses: dorny/paths-filter@v4 entry)
with an immutable commit SHA to prevent supply-chain risks; locate the
dorny/paths-filter repo, pick the commit SHA corresponding to the v4 release you
want, and update the uses line to use that SHA (e.g., uses:
dorny/paths-filter@<commit-sha>), then run the workflow locally or in a safe
branch to verify it behaves the same and document the chosen SHA for future
updates.

Comment on lines +53 to +55
if err := shutdownTracing(ctx); err != nil {
ctrl.Log.Error(err, "Failed to shutdown tracing")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate tracing implementation
fd tracing.go pkg

# Inspect InitTracing and shutdown wiring
rg -n -C4 'func InitTracing|Shutdown\(|TracerProvider|otlptrace|return .*shutdown' pkg/telemetry

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 3517


🏁 Script executed:

cat -n pkg/telemetry/tracing.go

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 5311


🏁 Script executed:

# Verify main.go context flow to confirm the shutdown timing issue
cd cmd/epp && head -60 main.go | tail -15

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 501


🏁 Script executed:

# Check the full main.go to see how ctx is initialized and where SetupSignalHandler is called
head -60 cmd/epp/main.go

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1849


Use a fresh timeout context for tracing shutdown instead of the signal-canceled context.

Line 55 calls shutdownTracing(ctx) with the context from ctrl.SetupSignalHandler(). When the process receives a termination signal, that context is already canceled, causing tp.Shutdown() to return immediately without flushing pending spans. The batch processor and OTLP exporter will not complete their operations, resulting in lost trace data.

🛠️ Proposed change
 import (
+	"context"
 	"os"
+	"time"
@@
 	if shutdownTracing != nil {
 		defer func() {
-			if err := shutdownTracing(ctx); err != nil {
+			shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
+			if err := shutdownTracing(shutdownCtx); err != nil {
 				ctrl.Log.Error(err, "Failed to shutdown tracing")
 			}
 		}()
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := shutdownTracing(ctx); err != nil {
ctrl.Log.Error(err, "Failed to shutdown tracing")
}
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := shutdownTracing(shutdownCtx); err != nil {
ctrl.Log.Error(err, "Failed to shutdown tracing")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/epp/main.go` around lines 53 - 55, The shutdown is using the
signal-canceled ctx from ctrl.SetupSignalHandler(), so tracing shutdown
(shutdownTracing(ctx)) runs with an already-canceled context and returns
immediately; fix by creating and passing a fresh timeout context (e.g.,
context.WithTimeout(context.Background(), <short-duration>)) to shutdownTracing
instead of the signal context, ensure you defer the cancel, and keep
shutdownTracing's call to tp.Shutdown using that fresh timeout context so the
batch processor/OTLP exporter have time to flush.

- [Kubernetes in Docker (KIND)]
- [Kustomize]
- [ZeroMQ]
- [Kubectl] `v1.14`+
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the official kubectl version skew policy relative to the Kubernetes API server, and what minimum kubectl version is appropriate for clusters using Gateway API v1.3-era tooling?

💡 Result:

The official kubectl version skew policy relative to the Kubernetes API server is that kubectl is supported within one minor version older or newer than the kube-apiserver. For example, if the kube-apiserver is at v1.35, kubectl versions v1.34, v1.35, and v1.36 are supported. In highly-available clusters with version skew between kube-apiserver instances (e.g., v1.34 and v1.35), the supported kubectl versions are narrowed to those within one minor version of all apiservers (e.g., only v1.34 and v1.35). For clusters using Gateway API v1.3-era tooling, there is no specific minimum kubectl version beyond the general skew policy, as Gateway API is provided via CRDs (not built-in Kubernetes APIs) and works on Kubernetes v1.26+. Implementations like kgateway (conformant to v1.3) support Kubernetes v1.27+, so kubectl within one minor version of the cluster's API server (e.g., minimum matching the cluster version, such as v1.27 for a v1.27+ cluster) is appropriate. No sources indicate additional kubectl requirements for Gateway API v1.3.

Citations:


🏁 Script executed:

# Get file size first
wc -l DEVELOPMENT.md

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 102


🏁 Script executed:

# Read the relevant sections around lines 11 and 81
cat -n DEVELOPMENT.md | head -30

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1342


🏁 Script executed:

# Also check around line 81
cat -n DEVELOPMENT.md | sed -n '75,95p'

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1017


🏁 Script executed:

# Check for go.mod to see Kubernetes dependencies
head -50 go.mod

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 2031


🏁 Script executed:

# Check Makefile or other version indicators
grep -i "kubernetes\|kind\|kube" Makefile 2>/dev/null | head -20

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 649


🏁 Script executed:

# Look for any version constants or documentation about cluster versions
grep -r "kubernetes.*version\|kind.*version" . --include="*.md" --include="*.go" | head -10

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 66


Update kubectl minimum to v1.34+; v1.14 is 21 minor versions behind the project target.

Line 11 specifies v1.14+, but the project depends on Kubernetes v1.35 (k8s.io/* v0.35.2). Per the official version-skew policy, kubectl must be within one minor version of kube-apiserver. For v1.35 clusters, only kubectl v1.34–v1.36 are supported. Setting v1.14+ will cause immediate setup failures.

Additionally, line 81 uses @latest in go install sigs.k8s.io/cloud-provider-kind@latest, which is non-reproducible and introduces supply-chain risk. Pin to a specific version (e.g., v0.5.0 or the version matching the deployed cloud-provider-kind release).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEVELOPMENT.md` at line 11, Update the DEVELOPMENT.md entries to require a
modern, compatible kubectl by changing the kubectl minimum from "v1.14+" to
"v1.34+" (so it’s within one minor of the project’s k8s v1.35 dependency) and
replace the non-reproducible "go install sigs.k8s.io/cloud-provider-kind@latest"
reference with a pinned release (e.g., "@v0.5.0" or the exact
cloud-provider-kind version that matches your deployed release) so the document
and bootstrap commands reflect supported kubectl versions and a reproducible
cloud-provider-kind install.

Comment on lines +65 to 80
tracer := telemetry.Tracer()
ctx := r.Context()

ctx, prefillSpan := tracer.Start(ctx, "llm_d.pd_proxy.prefill",
trace.WithSpanKind(trace.SpanKindInternal),
)
prefillSpan.SetAttributes(
attribute.String("llm_d.pd_proxy.request_id", uuidStr),
attribute.String("llm_d.pd_proxy.prefill_target", prefillPodHostPort),
attribute.String("llm_d.pd_proxy.connector", "nixlv2"),
)
prefillStart := time.Now()

// 1. Prepare prefill request
ctx := r.Context()
preq := r.Clone(ctx)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix the span lifecycle before this ships.

ctx is overwritten with the prefill span context at Line 68, then reused at Line 177 to start decodeSpan after prefillSpan has already ended. That makes decodeSpan a child of an ended span, and the early returns between span creation and the first explicit End() still leave prefillSpan open. Keep the original request context as the parent for both stages and defer prefillSpan.End() immediately after Start().

Suggested fix
-	ctx := r.Context()
-
-	ctx, prefillSpan := tracer.Start(ctx, "llm_d.pd_proxy.prefill",
+	requestCtx := r.Context()
+
+	prefillCtx, prefillSpan := tracer.Start(requestCtx, "llm_d.pd_proxy.prefill",
 		trace.WithSpanKind(trace.SpanKindInternal),
 	)
+	defer prefillSpan.End()
 	prefillSpan.SetAttributes(
 		attribute.String("llm_d.pd_proxy.request_id", uuidStr),
 		attribute.String("llm_d.pd_proxy.prefill_target", prefillPodHostPort),
 		attribute.String("llm_d.pd_proxy.connector", "nixlv2"),
 	)
@@
-	preq := r.Clone(ctx)
+	preq := r.Clone(prefillCtx)
@@
-	ctx, decodeSpan := tracer.Start(ctx, "llm_d.pd_proxy.decode",
+	decodeCtx, decodeSpan := tracer.Start(requestCtx, "llm_d.pd_proxy.decode",
 		trace.WithSpanKind(trace.SpanKindInternal),
 	)
 	defer decodeSpan.End()
@@
-	dreq := r.Clone(ctx)
+	dreq := r.Clone(decodeCtx)

Also applies to: 177-190

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 65 - 80, The prefill span
lifecycle is wrong: the code overwrites the original request context (ctx :=
r.Context()) with the span context from tracer.Start, causing decodeSpan to
become a child of an ended span and leaving prefillSpan open on early returns;
fix by preserving the original request context (save origCtx := r.Context()),
call tracer.Start using origCtx but assign the returned span context to a new
variable (e.g., spanCtx, prefillSpan := tracer.Start(origCtx, ...)), immediately
defer prefillSpan.End() after Start(), use spanCtx for operations tied to the
prefill span (like cloning the request with r.Clone(spanCtx)), and when starting
decodeSpan later use origCtx (or another appropriate parent) so decodeSpan is
not a child of an already-ended prefillSpan; also ensure any early returns
before the explicit End() still allow the deferred End() to run.

Comment on lines 132 to +150
if isHTTPError(pw.statusCode) {
s.logger.Error(err, "request failed", "code", pw.statusCode)
w.WriteHeader(pw.statusCode)
s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())
prefillSpan.SetStatus(codes.Error, "prefill request failed")
prefillSpan.End()

if shouldFallbackToDecode(pw) {
s.logger.Info("fallback to decode", "request_id", uuidStr)
r.Body = io.NopCloser(strings.NewReader(string(original)))
s.decoderProxy.ServeHTTP(w, r)
} else {
for key, values := range pw.Header() {
for _, v := range values {
w.Header().Add(key, v)
}
}
w.WriteHeader(pw.statusCode)
_, err := w.Write([]byte(pw.buffer.String()))
if err != nil {
s.logger.Error(err, "failed to send error response to client")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop logging raw prefiller error bodies.

Line 133 writes the full upstream error payload into pod logs. In this path the backend can echo prompts, token IDs, or connector details, so this is a CWE-532 sensitive-data leak. err is also stale here, so the log entry is not even accurate. Log request_id and status code only, or a bounded/redacted reason.

Suggested fix
-		s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())
+		s.logger.Info("prefill request failed", "request_id", uuidStr, "code", pw.statusCode)
As per coding guidelines, "Never log sensitive fields (Password, Token, APIKey, Secret.Data)".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isHTTPError(pw.statusCode) {
s.logger.Error(err, "request failed", "code", pw.statusCode)
w.WriteHeader(pw.statusCode)
s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())
prefillSpan.SetStatus(codes.Error, "prefill request failed")
prefillSpan.End()
if shouldFallbackToDecode(pw) {
s.logger.Info("fallback to decode", "request_id", uuidStr)
r.Body = io.NopCloser(strings.NewReader(string(original)))
s.decoderProxy.ServeHTTP(w, r)
} else {
for key, values := range pw.Header() {
for _, v := range values {
w.Header().Add(key, v)
}
}
w.WriteHeader(pw.statusCode)
_, err := w.Write([]byte(pw.buffer.String()))
if err != nil {
s.logger.Error(err, "failed to send error response to client")
if isHTTPError(pw.statusCode) {
s.logger.Info("prefill request failed", "request_id", uuidStr, "code", pw.statusCode)
prefillSpan.SetStatus(codes.Error, "prefill request failed")
prefillSpan.End()
if shouldFallbackToDecode(pw) {
s.logger.Info("fallback to decode", "request_id", uuidStr)
r.Body = io.NopCloser(strings.NewReader(string(original)))
s.decoderProxy.ServeHTTP(w, r)
} else {
for key, values := range pw.Header() {
for _, v := range values {
w.Header().Add(key, v)
}
}
w.WriteHeader(pw.statusCode)
_, err := w.Write([]byte(pw.buffer.String()))
if err != nil {
s.logger.Error(err, "failed to send error response to client")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 132 - 150, The current
error log in the HTTP error branch uses a stale err and prints the full upstream
response body (pw.buffer.String()), risking sensitive-data leaks; update the
s.logger.Error call in the isHTTPError branch (the block handling pw,
prefillSpan, shouldFallbackToDecode and uuidStr) to avoid logging the raw body
and to use a correct error variable (or nil) — only emit non-sensitive fields
such as "request_id" (uuidStr) and "code" (pw.statusCode) or a bounded/redacted
reason instead of pw.buffer.String().

Comment on lines +72 to +79
// Create OTLP trace exporter
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(endpoint),
otlptracegrpc.WithInsecure(), // Use WithTLSCredentials() in production
)
if err != nil {
return nil, fmt.Errorf("failed to create OTLP trace exporter: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded insecure gRPC connection for tracing (CWE-319: Cleartext Transmission of Sensitive Information).

WithInsecure() forces unencrypted transmission of trace data. While the comment acknowledges this, production deployments require TLS. Add an environment-based toggle (e.g., OTEL_EXPORTER_OTLP_INSECURE).

Suggested fix
+	// Check if insecure mode is explicitly enabled
+	insecure := os.Getenv("OTEL_EXPORTER_OTLP_INSECURE") == "true"
+
+	opts := []otlptracegrpc.Option{
+		otlptracegrpc.WithEndpoint(endpoint),
+	}
+	if insecure {
+		opts = append(opts, otlptracegrpc.WithInsecure())
+	}
+
 	// Create OTLP trace exporter
-	exporter, err := otlptracegrpc.New(ctx,
-		otlptracegrpc.WithEndpoint(endpoint),
-		otlptracegrpc.WithInsecure(), // Use WithTLSCredentials() in production
-	)
+	exporter, err := otlptracegrpc.New(ctx, opts...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/tracing.go` around lines 72 - 79, The current call to
otlptracegrpc.New unconditionally uses otlptracegrpc.WithInsecure(), sending
traces unencrypted; change this to honor an environment toggle
(OTEL_EXPORTER_OTLP_INSECURE) so TLS is used by default in production. Read
OTEL_EXPORTER_OTLP_INSECURE as a boolean before calling otlptracegrpc.New; if
true, pass otlptracegrpc.WithInsecure(), otherwise build proper TLS transport
credentials (e.g., credentials.NewClientTLSFromCert or grpc/transport
credentials) and pass them via otlptracegrpc.WithTLSCredentials(...). Update the
exporter creation code around the otlptracegrpc.New call that currently uses
endpoint and WithInsecure() so the choice of WithInsecure vs WithTLSCredentials
is driven by the env var.

Comment on lines +3 to +4
# Exit on error
set -e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use set -euo pipefail for robust error handling.

CWE-391 (Unchecked Error Condition): Using only set -e misses undefined variable errors (-u) and pipeline failures (-o pipefail). The git cat-file | grep pipelines on lines 23 and 28 will silently succeed even if git cat-file fails.

Proposed fix
-# Exit on error
-set -e
+# Exit on error, undefined variables, and pipeline failures
+set -euo pipefail

As per coding guidelines: "Use 'set -euo pipefail' at script start".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Exit on error
set -e
# Exit on error, undefined variables, and pipeline failures
set -euo pipefail
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-commits.sh` around lines 3 - 4, The script currently uses only
"set -e", which misses undefined-variable errors and pipeline failures; replace
that with "set -euo pipefail" at the top where set -e is declared and ensure any
uses of the pipelines (e.g., the "git cat-file | grep" pipelines) will now fail
fast on errors; update the header accordingly so undefined variables (-u) and
pipeline failures (-o pipefail) are detected.

- "inference.networking.x-k8s.io"
resources:
- "inferenceobjectives"
- "inferencemodelrewrites"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether e2e code really needs list/watch on inferencemodelrewrites.
# Expected: if only direct reads by name exist, keep only `get` + resourceNames.

rg -n -C3 'inferencemodelrewrites|InferenceModelRewrite|modelrewrite' .
rg -n -C3 'watch|list' . | rg -n 'inferencemodelrewrites|InferenceModelRewrite'
rg -n -C3 'serviceAccountName:\s*e2e-epp|name:\s*e2e-epp' .

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 4279


Restrict inferencemodelrewrites to least privilege; verbs list/watch unused in codebase (CWE-269, CWE-200).

Line 10 grants e2e-epp namespace-wide get/list/watch on inferencemodelrewrites. Severity: Major. Codebase contains no evidence of list or watch operations on this resource type; reduce to get only, scoped to explicit resourceNames where applicable. This tightens the blast radius if the ServiceAccount is compromised.

Remediation diff
 - apiGroups:
   - "inference.networking.x-k8s.io"
   resources:
   - "inferenceobjectives"
-  - "inferencemodelrewrites"
   verbs:
   - "get"
   - "watch"
   - "list"
+
+- apiGroups:
+  - "inference.networking.x-k8s.io"
+  resources:
+  - "inferencemodelrewrites"
+  resourceNames:
+  - "<specific-rewrite-name>"
+  verbs:
+  - "get"

Note: Same pattern appears in production config at deploy/components/inference-gateway/rbac.yaml:11; apply same fix there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/yaml/rbac.yaml` at line 10, The ClusterRole/Role entry granting
verbs ["get","list","watch"] on the inferencemodelrewrites resource should be
reduced to least privilege: remove "list" and "watch" so only "get" is allowed,
and where the role is intended to target specific objects, add a resourceNames
entry listing the explicit inferencemodelrewrites resource names to scope
access; update the entry referencing inferencemodelrewrites in this YAML and
make the identical change in the production file that also contains
inferencemodelrewrites (deploy/components/inference-gateway/rbac.yaml) to ensure
both dev and prod use only "get" and optional resourceNames scoping.

@openshift-ci
Copy link

openshift-ci bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gregory-Pereira, zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Gregory-Pereira,zdtsw]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link

Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.