Sync upstream to odh march 18#143
Sync upstream to odh march 18#143Gregory-Pereira wants to merge 42 commits intoopendatahub-io:mainfrom
Conversation
* Fix panic in SGLang proxy handling of concurrent requests Signed-off-by: YANG LI <yangligt@google.com> * Add concurrency unit test for SGLang context logic Signed-off-by: YANG LI <yangligt@google.com> --------- Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing
Add centralized telemetry package and custom spans
following the llm-d distributed tracing proposal.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* update Dockerfile.sidecar
Signed-off-by: sallyom <somalley@redhat.com>
* tracing: remove extra success results & startup spans and cleanup
Signed-off-by: sallyom <somalley@redhat.com>
* fix: avoid os.Exit bypassing defer in main
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* fix: address review nits for tracing PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* test: add edge case tests for StripScheme
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* remove extra comments from sidecar spans
Signed-off-by: sallyom <somalley@redhat.com>
* fix lint error
Signed-off-by: sallyom <somalley@redhat.com>
* protect against segfault on tests
Signed-off-by: greg pereira <grpereir@redhat.com>
---------
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.40.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#662) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.65.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.40.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (#664) Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` | | [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` | | [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` | Updates `k8s.io/api` from 0.34.4 to 0.34.5 - [Commits](kubernetes/api@v0.34.4...v0.34.5) Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5 - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5) Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5 - [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5) Updates `k8s.io/client-go` from 0.34.4 to 0.34.5 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.34.4...v0.34.5) Updates `k8s.io/component-base` from 0.34.4 to 0.34.5 - [Commits](kubernetes/component-base@v0.34.4...v0.34.5) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apiextensions-apiserver dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apimachinery dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/client-go dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/component-base dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#674) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0) Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.66.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/lycheeverse/lychee-action/releases) - [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0) --- updated-dependencies: - dependency-name: lycheeverse/lychee-action dependency-version: 2.8.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches Build and push -dev variants of EPP and sidecar container images on pushes to main and release-* branches, tagged with commit SHA. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * ci: extract reusable build workflow and tag dev images by branch Refactor ci-release and ci-dev to call a shared ci-build-images reusable workflow, reducing duplication. Tag dev images with the branch name instead of commit SHA so each branch has exactly one image that gets overwritten on push, avoiding image accumulation. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Newlines at EOF Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.43.5 to 1.44.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.43.5...v1.44.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The v0.44.1 release was removed from GitHub, causing 404 errors in the trivy-scan action. Update to the latest available version. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
* Allow sidecar server to reload TLS certificates Enables TLS certificates to be rotated without restarting sidecar and vLLM deployments. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Pass certPath to reloader Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Improvements Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps the go-dependencies group with 7 updates: | Package | From | To | | --- | --- | --- | | [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.66.0` | `0.67.0` | | [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) | `1.41.0` | `1.42.0` | | [golang.org/x/sync](https://github.com/golang/sync) | `0.19.0` | `0.20.0` | | [google.golang.org/grpc](https://github.com/grpc/grpc-go) | `1.79.1` | `1.79.2` | Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.66.0 to 0.67.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.66.0...zpages/v0.67.0) Updates `go.opentelemetry.io/otel` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/sdk` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `go.opentelemetry.io/otel/trace` from 1.41.0 to 1.42.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.41.0...v1.42.0) Updates `golang.org/x/sync` from 0.19.0 to 0.20.0 - [Commits](golang/sync@v0.19.0...v0.20.0) Updates `google.golang.org/grpc` from 1.79.1 to 1.79.2 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.79.1...v1.79.2) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.67.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.42.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: golang.org/x/sync dependency-version: 0.20.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: google.golang.org/grpc dependency-version: 1.79.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…gs (#682) * feat(sidecar): simplify TLS command line options with StringSlice flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> * address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> * keep deprecated flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format Signed-off-by: Guangya Liu <gyliu513@gmail.com> * Address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar Signed-off-by: roytman <roytman@il.ibm.com> * add Encoding span.SetAttributes, fix bug in connector_sglang_test.go Signed-off-by: roytman <roytman@il.ibm.com> * fix e2e tests Signed-off-by: roytman <roytman@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_shared_storage.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> * flags order Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * remove redundant comment Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* Check for uniqueness of media URLs Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* Implement Options pattern for sidecar proxy Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * options.go and options_test.go review fixes and code enhancements Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * enhance tests, enhance options pattern implementation Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * simplify InferencePool flag to namespace/name format - PR #685 Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * code enhancements, resolve review comments Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * set default connector and KVConnector and migrate connector flag to KVConnector Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * add complete before validate in tests, align KVConnector and Connector initialization as before Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> --------- Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3 to 4. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](dorny/paths-filter@v3...v4) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…red (#691) * NonCachedTokens defines the minimum number of non-cached tokens required to trigger disaggregated PD. A value of 0 disables disaggregation. Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> * updated the signature Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> * updated logging to Construtcor Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> --------- Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>
…support (#694) * Bump kv-cache and GAIE This is needed to consume the new changes related to consuming tokens directly. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Update import paths for GAIE and kv-cache API changes Adapt import paths for packages renamed in GAIE: - pkg/common/util/logging -> pkg/common/observability/logging - pkg/epp/util/metrics -> pkg/common/observability/metrics - pkg/epp/datalayer/http -> pkg/epp/framework/plugins/datalayer/source/http - pkg/epp/datalayer/plugins/approximateprefix -> pkg/epp/framework/plugins/datalayer/attribute/prefix Also adapt to kv-cache API change where NewHTTPDataSource now returns (ds, error) instead of just ds. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Adapt precise-prefix-cache-scorer to kv-cache API changes Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add external tokenizer PrepareData plugin and TokenizedPrompt scorer path Add a PrepareData plugin that calls a tokenizer sidecar over UDS gRPC to tokenize prompts before scheduling. The plugin populates request.TokenizedPrompt with token IDs from the sidecar. It supports both completions (Render) and chat completions (RenderChat) requests. The plugin is fail-open: tokenization errors are logged and the request proceeds without TokenizedPrompt. Update precise-prefix-cache-scorer to use pre-tokenized input when TokenizedPrompt is present on the request, calling GetPodScoresFromTokens to skip internal tokenization. Falls back to existing internal tokenization path when TokenizedPrompt is not set. Register the new tokenizer plugin in RegisterAllPlugins. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add deployment config and kind environment for external tokenizer Add EPP config with prepareDataPlugins feature gate and tokenizer plugin pointing at /tmp/tokenizer/tokenizer-uds.socket. Update kind-dev-env.sh to support EXTERNAL_TOKENIZER_ENABLED flag for selecting the tokenizer config. Signed-off-by: Antonio Cardace <acardace@redhat.com> * tokenizer: use interface to be able to mock in unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add tokenizer plugin unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * pprefix-cache-scorer: use interface to be able to mock in unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add precise-prefix-cache tokenized unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add e2e test for external tokenizer PrepareData plugin Signed-off-by: Antonio Cardace <acardace@redhat.com> * Move Consumes() from PrefixBasedPDDecider to PdProfileHandler The GAIE dependency bump introduced plugin layer execution order validation. PrefixBasedPDDecider has no scheduling interface so it falls into DefaultLayer (-1), while its producer prefix-cache-scorer is SchedulingLayer (2). The check (2 > -1) fails. Move Consumes() to PdProfileHandler which is a ProfileHandler (SchedulingLayer = 2), matching the producer's layer. This is semantically correct as PdProfileHandler is the actual scheduling plugin that consumes the data through the decider. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add end-to-end benchmark for external tokenizer + scorer flow Benchmarks the request latency through the EPP when using the external tokenizer PrepareData plugin with the precise-prefix-cache-scorer. Includes completion, chat completion, shared-prefix, and multi-message scenarios with prompt token count reporting. Usage: EXTERNAL_TOKENIZER_ENABLED=true KV_CACHE_ENABLED=true make env-dev-kind make bench-external-tokenizer Signed-off-by: Antonio Cardace <acardace@redhat.com> --------- Signed-off-by: Antonio Cardace <acardace@redhat.com>
…1.28 (#727) * Corrected configuration file Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * No longer set primaryPort plugin parameter Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Deprecate use of primaryPort plugin parameter Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Deprecate use of the x-data-parallel-host-port header Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Add a real plugin Handle to the DP tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Make sure user understands that Istio >= 1.28.1 is needed Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* build: remove CGO dependency by migrating to pure-Go ZMQ Update llm-d-kv-cache to v0.6.1-0.20260317063900-80aba2cb5a99 (main snapshot), pending merge of llm-d/llm-d-kv-cache#431 which switches from pebbe/zmq4 (CGO) to a pure-Go ZMQ implementation. - go.mod/go.sum: bump kv-cache to current main pseudo-version (placeholder; will be updated to a real tag once llm-d/llm-d-kv-cache#431 is merged) - Makefile: set CGO_ENABLED=0; drop check-dependencies prereq from test targets - Makefile.tools.mk: remove ##@ Dependencies section (check/install-dependencies) - Dockerfile.epp: remove EPEL + zeromq install steps; set CGO_ENABLED=0 in build - DEVELOPMENT.md: remove ZeroMQ from prerequisites list - .github/workflows/ci-pr-checks.yaml: remove CGO configuration and install-dependencies steps; remove CGO env vars from lint step NOTE: This commit is intentionally draft — go.mod must be updated to the tagged kv-cache release that includes the pure-Go ZMQ changes before merging. Signed-off-by: Etai Lev Ran <elevran@gmail.com> * update kv-cache (pure Go zmq, tip of main) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * revert to micro image once CGO is disabled Signed-off-by: Etai Lev Ran <elevran@gmail.com> --------- Signed-off-by: Etai Lev Ran <elevran@gmail.com>
|
Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file |
📝 WalkthroughWalkthroughThis PR refactors the inference scheduler's core architecture and adds multimodal support. Key changes include: renaming pod-based headers to endpoint-based headers throughout; introducing Options-based configuration with deprecation migration in the sidecar proxy; adding a TokenizerPlugin for PrepareData phase with UDS tokenizer integration; wiring OpenTelemetry tracing across core proxy and plugin paths; refactoring proxy connectors (NIXLV2, SGLang, shared-storage) with new EPD protocol support for multimodal preprocessing; removing TLS certificate parameter from Server.Start() and moving TLS setup into startHTTP with dynamic certificate handling; removing ZMQ and CGO dependencies; reorganizing observability-related packages; introducing new CI workflows for multi-image building/pushing; and renaming Connector fields to KVConnector/ECConnector to distinguish connector types. Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Key Issues & FindingsTLS Certificate Handling Risk (High Priority)File: The removal of the
Recommendation: Add synchronous certificate pre-validation in Encoder Endpoint Validation MissingFile: The new encoder target selection and protocol routing logic does not validate encoder hostnames against the SSRF allowlist:
Recommendation: Apply the same SSRF validation to encoder endpoints as used for prefill. Add test coverage for disallowed encoder IPs. OpenTelemetry Span Context LeakageFile: The prefill span context uses ctx := context.WithoutCancel(r.Context()) // Span lives; cancel/deadline removedIf the prefill goroutine blocks indefinitely (slow/hung encoder), the trace context prevents proper garbage collection of the request object. Combined with the span's parent reference, this can cause unbounded memory retention.
Recommendation: Wrap prefill context with a timeout or deadline. Add explicit span End() even on panic. Monitor span count in production. Deprecated Flag Migration IncompleteFile: The
Recommendation: Add explicit guard in Protocol Runner Interface AssumptionsFile: The new server.runPDConnectorProtocol = ... // Assigned once; never validated
server.runEPDConnectorProtocol = ...If initialization order is disrupted or conditional logic prevents assignment, calling Recommendation: Add assertions or guards in protocol dispatch logic. Consider returning error instead of panicking on uninitialized protocol handlers. TestMain Race ConditionFile:
waitForReady(timeout time.Duration) error // Blocks all testsIf multiple test workers are spawned, they all block on the same readiness check, potentially timing out despite the service being healthy. Recommendation: Use a sync mechanism (atomic or channel) to ensure readiness check runs once per process, not per test invocation. Multimodal Item Deduplication LogicFile: The deduplication in // Deduplicates URL-based MM items
// Inline audio (input_audio) is never deduplicatedIf a large request contains duplicate inline audio blobs, each will be forwarded to the encoder independently, wasting bandwidth and compute. Recommendation: Extend deduplication to inline audio by hashing content or adding explicit duplicate detection. Document this limitation in the EPD protocol comments. PrefixCacheMatchInfo Consumption Change BreakingFile: The removal of
This breaks the plugin framework's contract for verifying that required data is produced earlier in the pipeline. A missing or mislabeled producer will no longer be caught. Recommendation: Restore Header Constant Rename IncompleteFile: The rename from
Recommendation: Audit all configuration files (YAML, ConfigMaps, ServiceEntry definitions) for references to the old header name. Add migration docs. 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Greg Pereira <grpereir@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
scripts/kind-dev-env.sh (1)
257-261:⚠️ Potential issue | 🟡 MinorQuote
${KUSTOMIZE_DIR}argument.CWE-78: Unquoted variable is subject to word splitting.
Proposed fix
-kubectl kustomize --enable-helm ${KUSTOMIZE_DIR} \ +kubectl kustomize --enable-helm "${KUSTOMIZE_DIR}" \ | envsubst '${POOL_NAME} ${MODEL_NAME} ${MODEL_NAME_SAFE} ${EPP_NAME} ${EPP_IMAGE} ${VLLM_SIMULATOR_IMAGE} \ ${PD_ENABLED} ${KV_CACHE_ENABLED} ${SIDECAR_IMAGE} ${UDS_TOKENIZER_IMAGE} ${TARGET_PORTS} \ ${VLLM_REPLICA_COUNT} ${VLLM_REPLICA_COUNT_P} ${VLLM_REPLICA_COUNT_D} ${VLLM_DATA_PARALLEL_SIZE}' \ - | kubectl --context ${KUBE_CONTEXT} apply -f - + | kubectl --context "${KUBE_CONTEXT}" apply -f -As per coding guidelines: "Quote all variables to prevent injection".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/kind-dev-env.sh` around lines 257 - 261, The kubectl kustomize invocation uses an unquoted ${KUSTOMIZE_DIR} (and other expansions) which allows word splitting/injection; update the command to quote the variable references (e.g., use "${KUSTOMIZE_DIR}" and quote "${KUBE_CONTEXT}") and ensure the envsubst argument string is passed as a single quoted argument so all expansions are protected (reference the kubectl kustomize call, the KUSTOMIZE_DIR variable, envsubst invocation, and the kubectl --context usage).Makefile.tools.mk (1)
57-61:⚠️ Potential issue | 🟡 MinorAdd
--failflag to curl to detect download failures.The curl command will silently succeed with HTTP errors (404, 500) and pass garbage to tar. Adding
--failor-fensures non-zero exit on HTTP errors.Proposed fix
$(TYPOS): | $(LOCALBIN) `@echo` "Downloading typos $(TYPOS_VERSION)..." - curl -L https://github.com/crate-ci/typos/releases/download/$(TYPOS_VERSION)/typos-$(TYPOS_VERSION)-$(TYPOS_ARCH).tar.gz | tar -xz -C $(LOCALBIN) $(TAR_OPTS) + curl -fL https://github.com/crate-ci/typos/releases/download/$(TYPOS_VERSION)/typos-$(TYPOS_VERSION)-$(TYPOS_ARCH).tar.gz | tar -xz -C $(LOCALBIN) $(TAR_OPTS) chmod +x $(TYPOS) `@echo` "typos installed successfully."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile.tools.mk` around lines 57 - 61, The curl invocation in the Makefile rule for target "$(TYPOS)" can silently succeed on HTTP errors; update the curl command in the "$(TYPOS): | $(LOCALBIN)" recipe to add the --fail (or -f) flag so it exits non-zero on HTTP errors before piping to tar; keep the existing variables (TYPOS_VERSION, TYPOS_ARCH, TAR_OPTS, LOCALBIN) and rest of the pipeline unchanged so failures are detected and the chmod/echo steps don't run on bad downloads.Makefile (1)
220-229:⚠️ Potential issue | 🟡 MinorQuote paths in container build command to handle paths with spaces.
If
KV_CACHE_PATH_CHECKresolves to a path containing spaces (e.g.,/home/user/my projects/kv-cache), the build command will fail or behave unexpectedly.Proposed fix
`@KV_CACHE_PATH_CHECK`=$$(go list -m -f '{{.Dir}}' github.com/llm-d/llm-d-kv-cache 2>/dev/null); \ if [ -z "$$KV_CACHE_PATH_CHECK" ]; then \ echo "Error: Could not find kv-cache module even after download."; \ exit 1; \ fi; \ $(CONTAINER_RUNTIME) build \ --platform linux/$(TARGETARCH) \ -t $(UDS_TOKENIZER_IMAGE) \ - -f $$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile \ - $$KV_CACHE_PATH_CHECK/services/uds_tokenizer + -f "$$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile" \ + "$$KV_CACHE_PATH_CHECK/services/uds_tokenizer"As per coding guidelines: "Quote shell variables in targets".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 220 - 229, The shell variable expansion for KV_CACHE_PATH_CHECK is not quoted in the container build invocation which breaks when the path contains spaces; update the Docker build invocation that uses $(CONTAINER_RUNTIME) and the -f and context arguments to quote the KV_CACHE_PATH_CHECK expansions (e.g., use "$$KV_CACHE_PATH_CHECK/services/uds_tokenizer/Dockerfile" and "$$KV_CACHE_PATH_CHECK/services/uds_tokenizer") and ensure any other variable expansions used in that command (like UDS_TOKENIZER_IMAGE and TARGETARCH if interpolated into paths) are properly quoted to prevent word-splitting.Dockerfile.epp (1)
10-12:⚠️ Potential issue | 🟡 MinorAdd defaults to TARGETOS/TARGETARCH ARGs to prevent empty build parameters.
Lines 10–11 declare build args without defaults. Line 27 references them in the RUN command; if the caller doesn't provide
--build-arg, GOOS and GOARCH will be empty, causing build failures or inconsistent behavior.Remediation
-ARG TARGETOS -ARG TARGETARCH +ARG TARGETOS=linux +ARG TARGETARCH=amd64🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.epp` around lines 10 - 12, The Dockerfile declares build args ARG TARGETOS and ARG TARGETARCH without defaults causing empty GOOS/GOARCH at build time; update the ARG declarations (ARG TARGETOS and ARG TARGETARCH) to include sensible defaults (e.g., linux and amd64) or provide fallback values so the RUN step that uses GOOS/GOARCH always has non-empty values; ensure the Dockerfile.epp ARG lines are changed accordingly so downstream references in the RUN command receive the defaults.pkg/plugins/scorer/precise_prefix_cache_test.go (1)
134-135:⚠️ Potential issue | 🔴 CriticalFix
NewChunkedTokenDatabasecall arity inprecise_prefix_cache_test.go.
kvblock.NewChunkedTokenDatabasereturns(processor, error), as confirmed by production code inprecise_prefix_cache.go. The test file uses single-value assignment at five call sites, which will fail to compile when building with//go:build embedded_tokenizers. This is a critical bug-prone pattern with missing error handling.Affected lines: 134, 246, 319, 389, 500.
Add error handling to each call:
tokenProcessor, err := kvblock.NewChunkedTokenDatabase(kvblock.DefaultTokenProcessorConfig()) require.NoError(t, err)Reference:
precise_prefix_cache_uds_test.goalready implements this pattern correctly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/precise_prefix_cache_test.go` around lines 134 - 135, The test calls to kvblock.NewChunkedTokenDatabase incorrectly use single-value assignment but the function returns (processor, error); update each call (where tokenProcessor is created) to capture the error (e.g., tokenProcessor, err := kvblock.NewChunkedTokenDatabase(kvblock.DefaultTokenProcessorConfig())) and assert no error (require.NoError(t, err)) before using tokenProcessor in TokensToKVBlockKeys and other methods; ensure this pattern is applied at all five call sites to match the implementation used in precise_prefix_cache_uds_test.go..github/workflows/check-typos.yaml (1)
8-16:⚠️ Potential issue | 🟠 MajorPin actions by full SHA and set explicit job permissions.
Line 13 and Line 16 use mutable tags (
@v6,@v1.44.0) instead of full commit SHAs. Malicious tag retargeting can inject code into CI execution (CWE-829, CWE-494). The job lacks an explicitpermissions:block, allowingGITHUB_TOKENto default to overprivileged permissions (CWE-269).Replace
actions/checkout@v6with its full commit SHA, replacecrate-ci/typos@v1.44.0with its full commit SHA, and add job-level permissions restricted tocontents: read.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/check-typos.yaml around lines 8 - 16, The workflow uses mutable action tags (actions/checkout@v6 and crate-ci/typos@v1.44.0) and lacks explicit job permissions; replace those tags with their corresponding full commit SHAs for actions/checkout and crate-ci/typos to pin versions securely, and add a job-level permissions block restricting GITHUB_TOKEN to contents: read so the typos job runs with least privilege; update the uses lines for the action identifiers (actions/checkout and crate-ci/typos) and add a top-level permissions: { contents: read } entry for the typos job..github/workflows/md-link-check.yml (1)
11-20:⚠️ Potential issue | 🟠 MajorPin action refs to immutable SHAs and set job-scoped permissions (High).
Lines 17 and 20 use mutable tags (
@v6,@v2.8.0). If a tag is moved or compromised, the workflow executes attacker-controlled code with repository token access (CWE-829, CWE-494). The job lacks explicitpermissions, increasing blast radius if compromised (CWE-269).Remediation diff
jobs: lychee: name: Check Markdown Links runs-on: ubuntu-latest + permissions: + contents: read steps: - name: Checkout code - uses: actions/checkout@v6 + uses: actions/checkout@<full_sha_for_v6> - name: Run lychee link checker - uses: lycheeverse/lychee-action@v2.8.0 + uses: lycheeverse/lychee-action@<full_sha_for_v2.8.0>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/md-link-check.yml around lines 11 - 20, Replace the mutable action refs in the workflow (uses: actions/checkout@v6 and uses: lycheeverse/lychee-action@v2.8.0) with their corresponding immutable commit SHAs and add an explicit job-level permissions block to restrict token scope; update the lychee step and checkout step to use pinned SHAs and add a top-level or job-scoped permissions section (e.g., read-only for contents and no write where not needed) so the workflow no longer relies on mutable tags and limits repository token privileges.pkg/sidecar/proxy/connector_sglang.go (1)
209-221:⚠️ Potential issue | 🟠 MajorUnbounded request body read - potential memory exhaustion (CWE-400).
io.ReadAll(r.Body)reads the entire request body without size limits. A malicious client could send an arbitrarily large payload to exhaust server memory.Recommended fix
func (s *Server) parseSGLangRequest(r *http.Request) (map[string]interface{}, error) { - body, err := io.ReadAll(r.Body) + const maxBodySize = 10 << 20 // 10 MB limit + body, err := io.ReadAll(io.LimitReader(r.Body, maxBodySize)) if err != nil { return nil, fmt.Errorf("failed to read request body: %w", err) } + if len(body) >= maxBodySize { + return nil, fmt.Errorf("request body exceeds maximum size") + }As per coding guidelines: "Use io.LimitReader for HTTP response bodies (prevent memory exhaustion)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_sglang.go` around lines 209 - 221, parseSGLangRequest currently calls io.ReadAll(r.Body) with no size limit; replace that with a bounded read using io.LimitReader (or http.MaxBytesReader upstream) to prevent memory exhaustion. In parseSGLangRequest, wrap r.Body with io.LimitReader(r.Body, <maxBytes>) (define a sensible constant like maxRequestBodySize), use io.ReadAll on that limited reader, and return a clear error if the body exceeds the limit or the read fails; ensure you reference the function name parseSGLangRequest and the new maxRequestBodySize constant so reviewers can find the change.
🟡 Minor comments (7)
docs/architecture.md-41-41 (1)
41-41:⚠️ Potential issue | 🟡 MinorFix compound modifier hyphenation in architecture description.
Line 41 should use “state-of-the-art” (hyphenated) to avoid ambiguous phrasing in technical docs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/architecture.md` at line 41, Update the phrase in the architecture text where it reads "state of the art" (in the sentence describing the llm-d inference scheduler extending the EPP in [GIE]) to the hyphenated form "state-of-the-art" to correct the compound modifier hyphenation and avoid ambiguous phrasing.scripts/kind-dev-env.sh-229-236 (1)
229-236:⚠️ Potential issue | 🟡 MinorQuote variables in pipeline commands.
CWE-78: Unquoted variables in command arguments risk word splitting if paths contain spaces or special characters.
Proposed fix
-kubectl kustomize deploy/components/crds-gateway-api | - kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f - +kubectl kustomize deploy/components/crds-gateway-api | + kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f - -kubectl kustomize deploy/components/crds-gie | - kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f - +kubectl kustomize deploy/components/crds-gie | + kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f - -kubectl kustomize --enable-helm deploy/components/crds-istio | - kubectl --context ${KUBE_CONTEXT} apply --server-side --force-conflicts -f - +kubectl kustomize --enable-helm deploy/components/crds-istio | + kubectl --context "${KUBE_CONTEXT}" apply --server-side --force-conflicts -f -As per coding guidelines: "Quote all variables to prevent injection".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/kind-dev-env.sh` around lines 229 - 236, The pipeline kubectl commands use unquoted variable expansions which can cause word splitting or injection (e.g., the three lines that pipe `kubectl kustomize ... | kubectl --context ${KUBE_CONTEXT} apply ... -f -`); update those commands to quote the variable expansions (e.g., use `--context "${KUBE_CONTEXT}"`) and quote any other shell variables in the pipeline to ensure paths or contexts with spaces/special characters are handled safely while preserving the existing pipe to `kubectl -f -`.scripts/kind-dev-env.sh-254-254 (1)
254-254:⚠️ Potential issue | 🟡 MinorQuote
${EPP_CONFIG}to prevent word splitting.CWE-78: If
EPP_CONFIGcontains spaces or glob characters, the redirection will fail or behave unexpectedly.Proposed fix
-envsubst '$MODEL_NAME' < ${EPP_CONFIG} > ${TEMP_FILE} +envsubst '$MODEL_NAME' < "${EPP_CONFIG}" > "${TEMP_FILE}"As per coding guidelines: "Quote all variables to prevent injection".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/kind-dev-env.sh` at line 254, The redirection uses unquoted variables which can lead to word-splitting or globbing when EPP_CONFIG or TEMP_FILE contain spaces or special characters; update the envsubst invocation to quote the file variables so that EPP_CONFIG and TEMP_FILE are expanded safely (use "${EPP_CONFIG}" for the input redirection and "${TEMP_FILE}" for the output redirection) while keeping the envsubst variable list ('$MODEL_NAME') as-is; locate the line with envsubst and replace the unquoted references to EPP_CONFIG and TEMP_FILE accordingly.pkg/sidecar/proxy/data_parallel.go-18-22 (1)
18-22:⚠️ Potential issue | 🟡 MinorMove this deprecation warning off the request path.
Line 21 emits an
Infolog for every routed data-parallel request, and this header is still the active routing contract in this codepath. That will flood logs and make valid traffic look like misuse. Log once during startup/plugin init, or demote it to a debug/once-only path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/data_parallel.go` around lines 18 - 22, The deprecation Info log inside dataParallelHandler floods logs on every request; remove or demote that per-request s.logger.Info call and instead emit the deprecation warning only once during startup/initialization (or at plugin init) where the dataParallel routing is configured. Locate the s.dataParallelProxies setup and add a one-time log (e.g., using sync.Once or during the function that populates dataParallelProxies) to warn about DataParallelEndpointHeader deprecation; then delete or change the s.logger.Info call in dataParallelHandler that references dataParallelPodHostPort so per-request handling uses only handler := s.dataParallelProxies[dataParallelPodHostPort].pkg/sidecar/proxy/connector_nixlv2.go-240-261 (1)
240-261:⚠️ Potential issue | 🟡 Minor
true_ttft_msis not TTFT here.Line 250 uses
decodeStart, which is captured before the decode request is even sent downstream. That measures dispatch latency, not time-to-first-token, so this will under-report user-visible latency for both buffered and streaming responses. Either rename the attribute or capture the timestamp of the first downstream write/flush and use that instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 240 - 261, The metric currently computed as trueTTFT uses decodeStart (in function handling spans that sets trueTTFT and calls currentSpan.SetAttributes) which measures dispatch latency, not user-visible time-to-first-token; fix by capturing the actual downstream first-write/flush timestamp (e.g., record firstWriteTime when the proxy writes/flushed the first token downstream and make it available via context or a shared field identified by requestStartTimeKey/firstWriteTimeKey) and compute trueTTFT = firstWriteTime.Sub(requestStart); then replace the attribute "llm_d.pd_proxy.true_ttft_ms" to use that value (or if you choose not to change behavior, rename the attribute to reflect dispatch latency instead of TTFT to avoid mislabeling).pkg/sidecar/proxy/connector_sglang_test.go-123-170 (1)
123-170:⚠️ Potential issue | 🟡 MinorData race on
prefillFinishedvariable.
prefillFinishedis a plain bool written at line 128 inside a goroutine (via the HTTP handler) and read at line 170 from the main test goroutine. This is a data race detectable by-race.Fix using atomic
- var prefillFinished bool + var prefillFinished atomic.Bool slowPrefill := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { testInfo.prefillHandler.ServeHTTP(w, r) time.Sleep(300 * time.Millisecond) // Simulated load delay on KV Cache - prefillFinished = true + prefillFinished.Store(true) }) ... - Expect(prefillFinished).To(BeTrue()) + Expect(prefillFinished.Load()).To(BeTrue())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_sglang_test.go` around lines 123 - 170, The test has a data race on the plain bool prefillFinished (written inside slowPrefill HTTP handler and read later in the test). Change prefillFinished to a uint32 (e.g., var prefillFinished uint32), set it in slowPrefill using atomic.StoreUint32(&prefillFinished, 1), and check it with atomic.LoadUint32(&prefillFinished) == 1 in the Expect assertion; also add the "sync/atomic" import. Update references in slowPrefill and the Expect(prefillFinished) check accordingly.pkg/sidecar/proxy/connector_epd_shared_storage.go-79-102 (1)
79-102:⚠️ Potential issue | 🟡 MinorShallow copy may cause data corruption.
buildEncoderRequestperforms a shallow copy at lines 82-84. IforiginalRequestcontains nested maps/slices (likemessages), they share references. Concurrent modifications tooriginalRequestwhile encoder requests are in-flight could corrupt data.The fix at line 96 replaces
messagesentirely which is safe, but other nested fields copied fromoriginalRequestcould still be affected.Consider deep copy for safety
func buildEncoderRequest(originalRequest map[string]any, mmItem map[string]any) map[string]any { - // Create a deep copy of the original request - encoderRequest := make(map[string]any) - for k, v := range originalRequest { - encoderRequest[k] = v - } + // Create a proper deep copy via JSON round-trip + data, _ := json.Marshal(originalRequest) + var encoderRequest map[string]any + _ = json.Unmarshal(data, &encoderRequest)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_epd_shared_storage.go` around lines 79 - 102, buildEncoderRequest currently makes a shallow copy of originalRequest (lines creating encoderRequest) which can leave nested maps/slices shared and lead to concurrent data corruption; change it to perform a deep copy of originalRequest before mutating it (for example by JSON marshal/unmarshal into map[string]any or by a recursive copy helper) so all nested structures are duplicated, then set "messages", "max_tokens", "stream" and delete "stream_options" on that independent copy; keep the function name buildEncoderRequest and the encoderRequest variable but ensure the deep-copy approach preserves the original nested types.
🧹 Nitpick comments (19)
Makefile (2)
189-192:post-deploy-testhas redundant and inconsistent output.Line 191 uses
echowithout@prefix, causing make to print both the command and its output. The two echo statements also appear redundant.Proposed fix
.PHONY: post-deploy-test post-deploy-test: ## Run post deployment tests - echo Success! `@echo` "Post-deployment tests passed."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 189 - 192, The Makefile target post-deploy-test prints both the command and output and duplicates messages; remove the redundant plain echo "Success!" and keep a single, quieted message by using an `@-prefixed` echo (e.g., replace the two echo lines with one `@echo` "Post-deployment tests passed."). Update the post-deploy-test target to only output the final success message and suppress printing the command invocation.
112-114: Hardcodedupstream/mainmay fail if remote is named differently.The
signed-commits-checktarget hardcodesupstream/main, but some contributors may have a differently named remote (e.g.,origin). Consider making this configurable or documenting the expected remote setup.Proposed fix
+UPSTREAM_BRANCH ?= upstream/main + .PHONY: signed-commits-check signed-commits-check: - @./scripts/check-commits.sh upstream/main + @./scripts/check-commits.sh $(UPSTREAM_BRANCH)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 112 - 114, The signed-commits-check Makefile target hardcodes "upstream/main", which breaks for repos using a different remote; change the target to call ./scripts/check-commits.sh with a configurable Make variable (e.g., $(GIT_REMOTE) or $(REMOTE)) that defaults to "upstream" or "origin" if not set, and update any README/docs to mention how to override the variable (invoke make signed-commits-check GIT_REMOTE=origin). Ensure the unique symbols referenced are the signed-commits-check target and the ./scripts/check-commits.sh invocation so reviewers can locate and update the Makefile accordingly.scripts/check-commits.sh (1)
18-31: Use${variable}bracing consistently and handle potential failures in pipelines.The loop iterates correctly, but the pipeline error masking issue compounds here. If
git cat-fileorgit logfails for a commit, the script will incorrectly pass that commit instead of failing.Proposed fix with explicit error handling
for commit in $(git rev-list "$RANGE"); do - HASH=$(git rev-parse --short "$commit") + HASH=$(git rev-parse --short "$commit") || { echo "Failed to parse commit"; exit 1; } # 1. Check for cryptographic signature header (gpgsig) # Using cat-file bypasses the SSH/GPG verification engine entirely - if ! git cat-file commit "$commit" | grep -q "^gpgsig"; then + if ! git cat-file commit "$commit" 2>/dev/null | grep -q "^gpgsig"; then ERR_SIG="$ERR_SIG $HASH" fi # 2. Check for DCO Sign-off text in the commit message - if ! git log -1 --format="%b" "$commit" | grep -q "Signed-off-by:"; then + if ! git log -1 --format="%b" "$commit" 2>/dev/null | grep -q "Signed-off-by:"; then ERR_DCO="$ERR_DCO $HASH" fi done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-commits.sh` around lines 18 - 31, Enable pipefail and use braced variable names, then treat git command failures as errors: add set -o pipefail before the loop and replace unbraced variables with ${RANGE}, ${commit}, ${HASH}, ${ERR_SIG}, ${ERR_DCO}; for each check run git cat-file commit "${commit}" and git log -1 --format="%b" "${commit}" and if either command fails (non-zero exit) or the grep test fails, append ${HASH} to the appropriate ${ERR_SIG} or ${ERR_DCO} variable so a failing git command does not silently pass the check.pkg/metrics/metrics.go (1)
7-7: Add an alias to the importedmetricspackage to improve clarity.Line 7 imports a package as
metricsinsidepackage metrics. While this compiles correctly, it's non-idiomatic Go and reduces readability during code reviews and refactors. Use an explicit alias:Suggested diff
- "sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics" + obsmetrics "sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics"Update line 26:
- Help: metrics.HelpMsgWithStability("Total number of P/D disaggregation decisions made", compbasemetrics.ALPHA), + Help: obsmetrics.HelpMsgWithStability("Total number of P/D disaggregation decisions made", compbasemetrics.ALPHA),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/metrics/metrics.go` at line 7, The import shadows the current package name (package metrics) by importing the same identifier; change the import to an explicit alias (for example: observability or observabilityMetrics) for "sigs.k8s.io/gateway-api-inference-extension/pkg/common/observability/metrics" and update every reference in this file that currently uses metrics.<...> (e.g., metrics.NewCounter, metrics.Register, etc.) to use the new alias (observability.NewCounter, observability.Register) so the package no longer conflicts with the file's package name.pkg/plugins/datalayer/models/datasource_test.go (1)
48-48: BoundPollwith a timeout context to avoid hanging tests on network stalls.Line 48 uses
context.Background()from setup; a blocked dial/read can hold this test indefinitely depending on HTTP client behavior.Remediation diff
- err = source.Poll(ctx, endpoint) + pollCtx, cancel := context.WithTimeout(ctx, 2*time.Second) + defer cancel() + err = source.Poll(pollCtx, endpoint)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/datalayer/models/datasource_test.go` at line 48, The test calls source.Poll(ctx, endpoint) with the setup's background context and can hang on network stalls; change it to pass a cancellable timeout context (e.g., ctx, cancel := context.WithTimeout(context.Background(), <reasonable duration>) with defer cancel()) so Poll is bounded, and ensure time is imported in the test file; reference the Poll call in datasource_test.go and replace the existing ctx (from context.Background()) with the new timeout context.pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go (1)
107-137: Assert fallback execution whenTokenIDsis empty.This test only checks that
ScoreTokensis not called. It should also assert that the non-tokenized scoring path (GetPodScores) is invoked, otherwise a silent no-op path can still pass.Proposed test hardening
func TestPrecisePrefixCacheScorer_SkipsTokenizedPromptWhenEmpty(t *testing.T) { ctx := utils.NewTestContext(t) fromTokensCalled := false + getPodScoresCalled := false scorer := &PrecisePrefixCacheScorer{ typedName: plugin.TypedName{Type: PrecisePrefixCachePluginType, Name: "test"}, kvEventsConfig: &kvevents.Config{}, kvCacheIndexer: &mockKVCacheIndexer{ scoreTokensFunc: func(_ context.Context, _ []uint32, _ string, _ []string) (map[string]float64, error) { fromTokensCalled = true return map[string]float64{}, nil }, getPodScoresFunc: func(_ context.Context, _ *types.RenderChatRequest, _ string, _ string, _ []string) (map[string]float64, error) { + getPodScoresCalled = true return map[string]float64{}, nil }, }, } @@ scorer.Score(ctx, scheduling.NewCycleState(), request, testEndpoints) assert.False(t, fromTokensCalled, "ScoreTokens should not be called with empty TokenIDs") + assert.True(t, getPodScoresCalled, "GetPodScores should be called as fallback when TokenIDs is empty") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/plugins/scorer/precise_prefix_cache_tokenized_test.go` around lines 107 - 137, The test only verifies ScoreTokens wasn't called but should also assert the fallback non-tokenized path is used: modify the mockKVCacheIndexer in TestPrecisePrefixCacheScorer_SkipsTokenizedPromptWhenEmpty to set a flag (e.g., fromPodScoresCalled) inside getPodScoresFunc and then after calling scorer.Score(...) assert that fromPodScoresCalled is true; keep the existing fromTokensCalled check and ensure you reference the same scorer.Score invocation and the TokenizedPrompt.TokenIDs empty input so the assertion proves GetPodScores (getPodScoresFunc) executes when TokenIDs is empty..github/workflows/ci-pr-checks.yaml (1)
56-56: Make dependency drift fail fast aftergo mod tidy.Line 56 runs
go mod tidybut does not assertgo.mod/go.sumstayed unchanged, so CI can pass with uncommitted module graph rewrites.Remediation
- - name: Install dependencies - run: go mod tidy + - name: Verify module files are tidy + run: | + go mod tidy + git diff --exit-code -- go.mod go.sumAs per coding guidelines,
**:REVIEW PRIORITIES: 3. Bug-prone patterns and error handling gaps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci-pr-checks.yaml at line 56, The CI step currently runs "run: go mod tidy" but doesn't fail when go.mod/go.sum are modified; add a follow-up step after the "run: go mod tidy" command that verifies no changes were produced (for example invoking git diff --exit-code go.mod go.sum or git diff --quiet && git diff --exit-code) and fail the job if the files changed so dependency drift fails fast; reference the existing "run: go mod tidy" step to place the new verification step immediately after it.pkg/sidecar/proxy/options_test.go (2)
142-154: Duplicate test cases with misleading name.Test cases "empty string does not set values" (lines 143-147) and "deprecated flags take precedence when InferencePool is empty" (lines 149-153) have identical setup and expected values. The second case name implies deprecated flags are set, but they aren't. Either remove the duplicate or set deprecated flags to actually test precedence.
Suggested fix
{ - name: "deprecated flags take precedence when InferencePool is empty", - inferencePool: "", - expectedNamespace: "", - expectedName: "", + name: "deprecated flags take precedence when InferencePool is empty", + inferencePool: "", + expectedNamespace: "deprecated-ns", // Set in test setup + expectedName: "deprecated-pool", // Set in test setup },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/options_test.go` around lines 142 - 154, The two table-driven test entries named "empty string does not set values" and "deprecated flags take precedence when InferencePool is empty" are identical; update the test cases in pkg/sidecar/proxy/options_test.go so the second case actually exercises deprecated-flag precedence: either remove the duplicate entry or modify the second entry to set the deprecated flags (e.g., set the deprecated namespace/name inputs) and adjust expectedNamespace/expectedName accordingly; look for the test table in the Test... function (the test that iterates over entries with fields inferencePool/expectedNamespace/expectedName) and fix the row with name "deprecated flags take precedence when InferencePool is empty" to reflect the intended scenario.
156-173: Test setup doesn't configure deprecated flags for precedence test.The test loop doesn't set
opts.InferencePoolNamespaceoropts.InferencePoolNamefor the "deprecated flags take precedence" case, so that test case doesn't actually verify precedence behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/options_test.go` around lines 156 - 173, The test isn't setting the deprecated flag values before calling Complete(), so the "deprecated flags take precedence" case never exercises precedence; update the test loop to assign the deprecated values to the options before Complete() (e.g. set opts.InferencePoolNamespace = tt.deprecatedNamespace and opts.InferencePoolName = tt.deprecatedName for that case or for all cases where tt includes those fields) so Complete() can resolve and the assertions on InferencePoolNamespace and InferencePoolName validate precedence; use the NewOptions(), opts.InferencePool, and opts.InferencePoolNamespace/InferencePoolName symbols referenced in the test.test/profiling/tokenizerbench/benchmark_test.go (1)
17-33: Package comment doesn't match package name.Doc comment declares
Package external_tokenizer_scorerbut the actual package istokenizerbench. This will cause godoc inconsistency.Fix
-// Package external_tokenizer_scorer benchmarks the end-to-end request flow +// Package tokenizerbench benchmarks the end-to-end request flow🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/profiling/tokenizerbench/benchmark_test.go` around lines 17 - 33, The package-level comment at the top of the file declares "Package external_tokenizer_scorer" while the actual package is declared as "package tokenizerbench"; update the package comment to match the declared package name (change the comment to "Package tokenizerbench" and adjust any descriptive text accordingly) so godoc and the package declaration are consistent, referencing the existing package declaration symbol "package tokenizerbench" and the erroneous comment text "Package external_tokenizer_scorer".test/e2e/e2e_test.go (1)
676-680: Inconsistent model name handling inrunStreamingChatCompletion.
runStreamingChatCompletionhardcodessimModelName(line 680) whilerunStreamingCompletionacceptstheModelas a parameter. This asymmetry breaks the pattern established by other helper pairs (runCompletion/runChatCompletion).Proposed fix
-func runStreamingChatCompletion(prompt string) (string, string) { +func runStreamingChatCompletion(prompt string, theModel string) (string, string) { ginkgo.By(fmt.Sprintf("Sending Streaming Chat Completion Request: (port %s)", port)) // Use raw HTTP for streaming to capture headers - body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, simModelName, prompt) + body := fmt.Sprintf(`{"model":"%s","messages":[{"role":"user","content":"%s"}],"stream":true}`, theModel, prompt)Update the call site at line 195:
- nsHdr, podHdr = runStreamingChatCompletion(simplePrompt) + nsHdr, podHdr = runStreamingChatCompletion(simplePrompt, simModelName)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/e2e_test.go` around lines 676 - 680, runStreamingChatCompletion currently hardcodes simModelName; change its signature to accept a model parameter (e.g., func runStreamingChatCompletion(theModel string, prompt string) (string, string)) and replace simModelName in the body with theModel so it mirrors runStreamingCompletion/runCompletion/runChatCompletion; then update every call site that invokes runStreamingChatCompletion to pass the desired model (use simModelName where tests expect the simulator) so the helper pair is consistent.pkg/sidecar/proxy/connector_test.go (1)
85-85: Avoid hard-codedhttp://slicing for prefill endpoint headers.Line 85 and Line 146 assume the backend URL always starts with
http://. This is brittle and can mis-handle HTTPS endpoints. Usecommon.StripSchemefor host:port extraction.♻️ Proposed change
- req.Header.Add(common.PrefillEndpointHeader, testInfo.prefillBackend.URL[len("http://"):]) + req.Header.Add(common.PrefillEndpointHeader, common.StripScheme(testInfo.prefillBackend.URL))Also applies to: 146-146
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_test.go` at line 85, Replace the brittle manual slicing of testInfo.prefillBackend.URL when setting the PrefillEndpointHeader with a call to common.StripScheme to reliably extract host:port (instead of req.Header.Add(common.PrefillEndpointHeader, testInfo.prefillBackend.URL[len("http://"):])). Update both occurrences that set common.PrefillEndpointHeader (the one using testInfo.prefillBackend.URL and the second occurrence around the same test) to call common.StripScheme(testInfo.prefillBackend.URL) so HTTPS and other schemes are handled correctly.cmd/epp/main.go (1)
65-69: Log runner failures at the process boundary before returning exit code 1.Line 65-Line 69 returns non-zero on startup/runtime failure but does not emit a top-level error from this boundary, which weakens incident triage in aggregated logs.
🛠️ Proposed change
- if err := runner.NewRunner(). + if err := runner.NewRunner(). WithCustomCollectors(metrics.GetCollectors()...). Run(ctx); err != nil { + ctrl.Log.Error(err, "EPP runner exited with error") return 1 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/epp/main.go` around lines 65 - 69, The runner startup currently returns 1 on error without logging; capture the error returned by NewRunner().WithCustomCollectors(metrics.GetCollectors()...).Run(ctx), log a clear top-level error message including the err value (using the project’s top-level logger, e.g., log/processLogger) and any relevant context before returning 1 so failures at the process boundary are recorded for triage.pkg/sidecar/proxy/proxy.go (2)
147-148: Error from LRU cache creation ignored.Lines 147-148 use
// nolint:allto silence the ignored error. Whilelru.Newonly fails for size ≤ 0 (and 16 is safe), explicitly handling or documenting this is cleaner than blanket nolint.Alternative: explicit handling
- prefillerCache, _ := lru.New[string, http.Handler](16) // nolint:all - encoderCache, _ := lru.New[string, http.Handler](16) // nolint:all + prefillerCache, err := lru.New[string, http.Handler](16) + if err != nil { + panic("BUG: invalid cache size") // size is compile-time constant + } + encoderCache, err := lru.New[string, http.Handler](16) + if err != nil { + panic("BUG: invalid cache size") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/proxy.go` around lines 147 - 148, The LRU constructor errors for prefillerCache and encoderCache are being ignored with a blanket nolint; replace that with explicit error handling: call lru.New[string,http.Handler](16) for prefillerCache and encoderCache, check the returned error (from lru.New) and handle it—either propagate it, log and exit, or panic with a clear message—so failures are not silently suppressed; update references in proxy.go where prefillerCache and encoderCache are created to use this explicit check instead of "// nolint:all".
235-250: Unknown EC connector silently ignored.
setECConnectorsilently returns for unknown connector values (line 247-248). Consider logging a warning so operators know their configuration was not applied.Add warning log
switch ecConnector { case ECExampleConnector: s.runEPDConnectorProtocol = s.runEPDProtocol default: - // Unknown EC connector value, skip encoder stage + // Unknown EC connector value, skip encoder stage + // Consider: log.Log.Info("Unknown EC connector, encoder stage disabled", "connector", ecConnector) return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/proxy.go` around lines 235 - 250, The switch in setECConnector silently returns on unknown EC connector values; update the default case in Server.setECConnector to emit a warning that includes the invalid ecConnector value so operators are notified (use the Server's logger field, e.g., s.logger.Warnf or s.log.Warnf; if the Server has no logger, use the package logger) and keep the current behavior of skipping the encoder stage.pkg/sidecar/proxy/connector_sglang_test.go (1)
53-54: Flaky synchronization usingtime.Sleep.
time.Sleep(1 * time.Second)at lines 54 and 152 for waiting on proxy startup is fragile. Consider polling for readiness or using a ready channel.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_sglang_test.go` around lines 53 - 54, The test uses a brittle time.Sleep(1 * time.Second) to wait for the proxy to start; replace that sleep with a deterministic readiness check: poll the proxy's readiness/health endpoint or use a ready channel returned by the proxy start routine (replace/remove the time.Sleep calls in connector_sglang_test.go), implementing a small waitForProxyReady helper that retries with a timeout/context until the proxy accepts connections or returns healthy, and use that helper in place of time.Sleep at both locations.pkg/sidecar/proxy/chat_completions.go (1)
82-90: Redundant condition check.Line 86
else if numHosts > 0is redundant since the outerif numHosts > 0at line 82 already guarantees this.Simplify
if numHosts > 0 { if s.config.EnablePrefillerSampling { // Sample a host value from the list prefillHostPort = strings.TrimSpace(prefillHostPorts[s.prefillSamplerFn(numHosts)]) - } else if numHosts > 0 { + } else { // Select only the first header value, consistent with previous behavior prefillHostPort = strings.TrimSpace(prefillHostPorts[0]) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/chat_completions.go` around lines 82 - 90, The else-if check "else if numHosts > 0" is redundant because the outer "if numHosts > 0" already ensures that condition; replace that branch with a plain "else" to simplify logic while keeping behavior the same: inside the block that sets prefillHostPort, use "if s.config.EnablePrefillerSampling { prefillHostPort = strings.TrimSpace(prefillHostPorts[s.prefillSamplerFn(numHosts)]) } else { prefillHostPort = strings.TrimSpace(prefillHostPorts[0]) }", referencing s.config.EnablePrefillerSampling, s.prefillSamplerFn, prefillHostPorts and prefillHostPort.pkg/sidecar/proxy/connector_epd_shared_storage_test.go (1)
262-309: Test lacks cleanup for potential goroutine leaks.
fanoutEncoderPrimerspawns goroutines internally. If a test times out or panics beforewg.Wait()completes in the implementation, the mock server closure at line 270 could cause issues. Consider addingt.Cleanupor explicit timeout handling.Additionally, line 275 assigns
log.Logdirectly without initializing a test logger, which may cause nil pointer issues in some test environments.Suggested improvement
encoderURL, err := url.Parse(encoderBackend.URL) assert.NoError(t, err) srv := NewProxy("0", encoderURL, Config{}) - srv.logger = log.Log + srv.logger = logr.Discard() // or use a test logger encoderHostPort := encoderURL.Host🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_epd_shared_storage_test.go` around lines 262 - 309, The test TestFanoutEncoderPrimerDeduplication must ensure the httptest server is always closed and uses a test-safe logger: register encoderBackend.Close with t.Cleanup so the mock server is closed even on panic/timeouts, and replace the direct assignment srv.logger = log.Log with a test logger (e.g., srv.logger = log.NewTestLogger(t) or the project's test logger helper) so srv.logger is initialized for test environments; keep the call to fanoutEncoderPrimer unchanged but consider running it with a short context timeout if needed to avoid goroutine hangs.pkg/sidecar/proxy/connector_epd_shared_storage.go (1)
197-208: Only first error returned; subsequent errors silently dropped.When multiple encoder requests fail, only the first error read from
errChanis returned (line 203-204). Other errors are silently discarded. Consider aggregating errors or logging all failures.Alternative: collect all errors
// Check for errors + var errs []error for err := range errChan { if err != nil { - return err + errs = append(errs, err) } } + if len(errs) > 0 { + return fmt.Errorf("encoder failures: %v", errs) + } return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sidecar/proxy/connector_epd_shared_storage.go` around lines 197 - 208, The current drain of errChan after wg.Wait() only returns the first non-nil error and drops subsequent errors; change the logic in the function that uses wg and errChan to collect all errors (e.g., append non-nil errs to a slice or use a multi-error aggregator) while still iterating until errChan is closed, then either return a combined error (or nil if none) and/or log each individual error; update the block around wg.Wait(), close(errChan) and the for err := range errChan loop to build and return an aggregated error instead of returning immediately on the first non-nil error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 804553bc-ea3a-47ba-882b-f42c3b31f98e
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/*.sum
📒 Files selected for processing (75)
.github/actions/docker-build-and-push/action.yml.github/actions/trivy-scan/action.yml.github/dependabot.yml.github/workflows/check-typos.yaml.github/workflows/ci-build-images.yaml.github/workflows/ci-dev.yaml.github/workflows/ci-pr-checks.yaml.github/workflows/ci-release.yaml.github/workflows/md-link-check.ymlDEVELOPMENT.mdDockerfile.eppDockerfile.sidecarMakefileMakefile.tools.mkcmd/epp/main.gocmd/pd-sidecar/main.godeploy/components/inference-gateway/deployments.yamldeploy/components/vllm-sim-pd/deployments.yamldeploy/components/vllm-sim/deployments.yamldeploy/config/dp-epp-config.yamldeploy/config/pd-epp-config.yamldeploy/config/sim-epp-external-tokenizer-config.yamldeploy/config/sim-pd-epp-config.yamldocs/architecture.mdgo.modpkg/common/common.gopkg/common/common_test.gopkg/metrics/metrics.gopkg/plugins/datalayer/models/datasource_test.gopkg/plugins/datalayer/models/factories.gopkg/plugins/pre-request/pd_prerequest.gopkg/plugins/pre-request/pd_prerequest_test.gopkg/plugins/preparedata/tokenizer.gopkg/plugins/preparedata/tokenizer_test.gopkg/plugins/profile/dp_profile_handler.gopkg/plugins/profile/dp_profile_handler_test.gopkg/plugins/profile/pd_profile_handler.gopkg/plugins/profile/pd_profile_handler_test.gopkg/plugins/profile/prefix_based_pd_decider.gopkg/plugins/register.gopkg/plugins/scorer/active_request.gopkg/plugins/scorer/load_aware.gopkg/plugins/scorer/no_hit_lru.gopkg/plugins/scorer/precise_prefix_cache.gopkg/plugins/scorer/precise_prefix_cache_test.gopkg/plugins/scorer/precise_prefix_cache_tokenized_test.gopkg/plugins/scorer/precise_prefix_cache_uds_test.gopkg/plugins/scorer/session_affinity.gopkg/scheduling/pd/scheduler_test.gopkg/sidecar/proxy/chat_completions.gopkg/sidecar/proxy/chat_completions_test.gopkg/sidecar/proxy/connector_epd_shared_storage.gopkg/sidecar/proxy/connector_epd_shared_storage_test.gopkg/sidecar/proxy/connector_nixlv2.gopkg/sidecar/proxy/connector_nixlv2_test.gopkg/sidecar/proxy/connector_sglang.gopkg/sidecar/proxy/connector_sglang_test.gopkg/sidecar/proxy/connector_test.gopkg/sidecar/proxy/data_parallel.gopkg/sidecar/proxy/data_parallel_test.gopkg/sidecar/proxy/options.gopkg/sidecar/proxy/options_test.gopkg/sidecar/proxy/proxy.gopkg/sidecar/proxy/proxy_helpers.gopkg/sidecar/proxy/proxy_test.gopkg/telemetry/tracing.goscripts/check-commits.shscripts/kind-dev-env.shtest/e2e/e2e_suite_test.gotest/e2e/e2e_test.gotest/e2e/yaml/deployments.yamltest/e2e/yaml/rbac.yamltest/e2e/yaml/vllm-sim-dp.yamltest/e2e/yaml/vllm-sim-pd.yamltest/profiling/tokenizerbench/benchmark_test.go
💤 Files with no reviewable changes (2)
- deploy/config/pd-epp-config.yaml
- deploy/config/sim-pd-epp-config.yaml
| steps: | ||
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 | ||
| uses: docker/setup-buildx-action@v4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .github/actions/docker-build-and-push/action.yml | head -50Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 1833
🏁 Script executed:
wc -l .github/actions/docker-build-and-push/action.ymlRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 135
🏁 Script executed:
cat -n .github/actions/docker-build-and-push/action.yml | tail -25Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 1171
Pin docker/setup-buildx-action to a full commit SHA.
Line 30 uses mutable-tag pinning. Supply chain risk: a retagged or compromised action revision executes attacker-controlled code in the build pipeline and can exfiltrate registry credentials. (CWE-829, CWE-494)
Remediation
- uses: docker/setup-buildx-action@v4
+ uses: docker/setup-buildx-action@<full_commit_sha>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/actions/docker-build-and-push/action.yml at line 30, Replace the
mutable tag usage of docker/setup-buildx-action in the action.yml where the line
reads uses: docker/setup-buildx-action@v4 by pinning to the action’s full commit
SHA; update the uses entry to reference the specific commit SHA for
docker/setup-buildx-action (obtain the canonical commit SHA from the action’s
GitHub releases/commit history) so the workflow uses an immutable revision and
prevents supply-chain tampering.
| jobs: | ||
| docker-build-and-push: | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add explicit permissions block with least-privilege scope.
No permissions defined. Default permissions may be overly permissive. Define minimal required permissions for this job (e.g., contents: read, packages: write).
Suggested fix
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
+ permissions:
+ contents: read
+ packages: write
steps:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jobs: | |
| docker-build-and-push: | |
| runs-on: ubuntu-latest | |
| jobs: | |
| docker-build-and-push: | |
| runs-on: ubuntu-latest | |
| permissions: | |
| contents: read | |
| packages: write | |
| steps: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci-build-images.yaml around lines 22 - 24, Add a minimal
explicit permissions block to the GitHub Actions job "docker-build-and-push" to
enforce least privilege: update the job definition for docker-build-and-push to
include a permissions mapping (for example permissions: contents: read,
packages: write, id-token: write if using OIDC) instead of relying on defaults
so the job only has the scopes it needs.
| - name: Checkout source | ||
| uses: actions/checkout@v6 | ||
| - uses: dorny/paths-filter@v3 | ||
| - uses: dorny/paths-filter@v4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -path ".github/workflows/ci-pr-checks.yaml" -type fRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 66
🏁 Script executed:
cat -n .github/workflows/ci-pr-checks.yaml | head -30Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 870
🏁 Script executed:
grep -n "uses:" .github/workflows/ci-pr-checks.yamlRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 331
Pin dorny/paths-filter by immutable commit SHA.
Line 19 uses tag pinning (@v4). A compromised or retagged upstream action can execute arbitrary code in CI with access to repository-scoped credentials (CWE-829, CWE-494).
Remediation
- - uses: dorny/paths-filter@v4
+ - uses: dorny/paths-filter@<full_commit_sha>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/ci-pr-checks.yaml at line 19, Replace the tag pinning for
the GitHub Action "dorny/paths-filter" (the uses: dorny/paths-filter@v4 entry)
with an immutable commit SHA to prevent supply-chain risks; locate the
dorny/paths-filter repo, pick the commit SHA corresponding to the v4 release you
want, and update the uses line to use that SHA (e.g., uses:
dorny/paths-filter@<commit-sha>), then run the workflow locally or in a safe
branch to verify it behaves the same and document the chosen SHA for future
updates.
| if err := shutdownTracing(ctx); err != nil { | ||
| ctrl.Log.Error(err, "Failed to shutdown tracing") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate tracing implementation
fd tracing.go pkg
# Inspect InitTracing and shutdown wiring
rg -n -C4 'func InitTracing|Shutdown\(|TracerProvider|otlptrace|return .*shutdown' pkg/telemetryRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 3517
🏁 Script executed:
cat -n pkg/telemetry/tracing.goRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 5311
🏁 Script executed:
# Verify main.go context flow to confirm the shutdown timing issue
cd cmd/epp && head -60 main.go | tail -15Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 501
🏁 Script executed:
# Check the full main.go to see how ctx is initialized and where SetupSignalHandler is called
head -60 cmd/epp/main.goRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 1849
Use a fresh timeout context for tracing shutdown instead of the signal-canceled context.
Line 55 calls shutdownTracing(ctx) with the context from ctrl.SetupSignalHandler(). When the process receives a termination signal, that context is already canceled, causing tp.Shutdown() to return immediately without flushing pending spans. The batch processor and OTLP exporter will not complete their operations, resulting in lost trace data.
🛠️ Proposed change
import (
+ "context"
"os"
+ "time"
@@
if shutdownTracing != nil {
defer func() {
- if err := shutdownTracing(ctx); err != nil {
+ shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ if err := shutdownTracing(shutdownCtx); err != nil {
ctrl.Log.Error(err, "Failed to shutdown tracing")
}
}()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := shutdownTracing(ctx); err != nil { | |
| ctrl.Log.Error(err, "Failed to shutdown tracing") | |
| } | |
| shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | |
| defer cancel() | |
| if err := shutdownTracing(shutdownCtx); err != nil { | |
| ctrl.Log.Error(err, "Failed to shutdown tracing") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/epp/main.go` around lines 53 - 55, The shutdown is using the
signal-canceled ctx from ctrl.SetupSignalHandler(), so tracing shutdown
(shutdownTracing(ctx)) runs with an already-canceled context and returns
immediately; fix by creating and passing a fresh timeout context (e.g.,
context.WithTimeout(context.Background(), <short-duration>)) to shutdownTracing
instead of the signal context, ensure you defer the cancel, and keep
shutdownTracing's call to tp.Shutdown using that fresh timeout context so the
batch processor/OTLP exporter have time to flush.
| - [Kubernetes in Docker (KIND)] | ||
| - [Kustomize] | ||
| - [ZeroMQ] | ||
| - [Kubectl] `v1.14`+ |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the official kubectl version skew policy relative to the Kubernetes API server, and what minimum kubectl version is appropriate for clusters using Gateway API v1.3-era tooling?
💡 Result:
The official kubectl version skew policy relative to the Kubernetes API server is that kubectl is supported within one minor version older or newer than the kube-apiserver. For example, if the kube-apiserver is at v1.35, kubectl versions v1.34, v1.35, and v1.36 are supported. In highly-available clusters with version skew between kube-apiserver instances (e.g., v1.34 and v1.35), the supported kubectl versions are narrowed to those within one minor version of all apiservers (e.g., only v1.34 and v1.35). For clusters using Gateway API v1.3-era tooling, there is no specific minimum kubectl version beyond the general skew policy, as Gateway API is provided via CRDs (not built-in Kubernetes APIs) and works on Kubernetes v1.26+. Implementations like kgateway (conformant to v1.3) support Kubernetes v1.27+, so kubectl within one minor version of the cluster's API server (e.g., minimum matching the cluster version, such as v1.27 for a v1.27+ cluster) is appropriate. No sources indicate additional kubectl requirements for Gateway API v1.3.
Citations:
- 1: https://kubernetes.io/releases/version-skew-policy/
- 2: https://kubernetes.io/docs/setup/release/version-skew-policy/
- 3: https://kubernetes.io/docs/concepts/services-networking/gateway/
- 4: https://gateway-api.sigs.k8s.io/
- 5: https://kubernetes.io/blog/2025/06/02/gateway-api-v1-3/
- 6: https://kubernetes.io/blog/2023/10/31/gateway-api-ga/
- 7: https://v1-34.docs.kubernetes.io/blog/2025/11/06/gateway-api-v1-4/
- 8: https://gateway-api.sigs.k8s.io/concepts/versioning/
🏁 Script executed:
# Get file size first
wc -l DEVELOPMENT.mdRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 102
🏁 Script executed:
# Read the relevant sections around lines 11 and 81
cat -n DEVELOPMENT.md | head -30Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 1342
🏁 Script executed:
# Also check around line 81
cat -n DEVELOPMENT.md | sed -n '75,95p'Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 1017
🏁 Script executed:
# Check for go.mod to see Kubernetes dependencies
head -50 go.modRepository: opendatahub-io/llm-d-inference-scheduler
Length of output: 2031
🏁 Script executed:
# Check Makefile or other version indicators
grep -i "kubernetes\|kind\|kube" Makefile 2>/dev/null | head -20Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 649
🏁 Script executed:
# Look for any version constants or documentation about cluster versions
grep -r "kubernetes.*version\|kind.*version" . --include="*.md" --include="*.go" | head -10Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 66
Update kubectl minimum to v1.34+; v1.14 is 21 minor versions behind the project target.
Line 11 specifies v1.14+, but the project depends on Kubernetes v1.35 (k8s.io/* v0.35.2). Per the official version-skew policy, kubectl must be within one minor version of kube-apiserver. For v1.35 clusters, only kubectl v1.34–v1.36 are supported. Setting v1.14+ will cause immediate setup failures.
Additionally, line 81 uses @latest in go install sigs.k8s.io/cloud-provider-kind@latest, which is non-reproducible and introduces supply-chain risk. Pin to a specific version (e.g., v0.5.0 or the version matching the deployed cloud-provider-kind release).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DEVELOPMENT.md` at line 11, Update the DEVELOPMENT.md entries to require a
modern, compatible kubectl by changing the kubectl minimum from "v1.14+" to
"v1.34+" (so it’s within one minor of the project’s k8s v1.35 dependency) and
replace the non-reproducible "go install sigs.k8s.io/cloud-provider-kind@latest"
reference with a pinned release (e.g., "@v0.5.0" or the exact
cloud-provider-kind version that matches your deployed release) so the document
and bootstrap commands reflect supported kubectl versions and a reproducible
cloud-provider-kind install.
| tracer := telemetry.Tracer() | ||
| ctx := r.Context() | ||
|
|
||
| ctx, prefillSpan := tracer.Start(ctx, "llm_d.pd_proxy.prefill", | ||
| trace.WithSpanKind(trace.SpanKindInternal), | ||
| ) | ||
| prefillSpan.SetAttributes( | ||
| attribute.String("llm_d.pd_proxy.request_id", uuidStr), | ||
| attribute.String("llm_d.pd_proxy.prefill_target", prefillPodHostPort), | ||
| attribute.String("llm_d.pd_proxy.connector", "nixlv2"), | ||
| ) | ||
| prefillStart := time.Now() | ||
|
|
||
| // 1. Prepare prefill request | ||
| ctx := r.Context() | ||
| preq := r.Clone(ctx) | ||
|
|
There was a problem hiding this comment.
Fix the span lifecycle before this ships.
ctx is overwritten with the prefill span context at Line 68, then reused at Line 177 to start decodeSpan after prefillSpan has already ended. That makes decodeSpan a child of an ended span, and the early returns between span creation and the first explicit End() still leave prefillSpan open. Keep the original request context as the parent for both stages and defer prefillSpan.End() immediately after Start().
Suggested fix
- ctx := r.Context()
-
- ctx, prefillSpan := tracer.Start(ctx, "llm_d.pd_proxy.prefill",
+ requestCtx := r.Context()
+
+ prefillCtx, prefillSpan := tracer.Start(requestCtx, "llm_d.pd_proxy.prefill",
trace.WithSpanKind(trace.SpanKindInternal),
)
+ defer prefillSpan.End()
prefillSpan.SetAttributes(
attribute.String("llm_d.pd_proxy.request_id", uuidStr),
attribute.String("llm_d.pd_proxy.prefill_target", prefillPodHostPort),
attribute.String("llm_d.pd_proxy.connector", "nixlv2"),
)
@@
- preq := r.Clone(ctx)
+ preq := r.Clone(prefillCtx)
@@
- ctx, decodeSpan := tracer.Start(ctx, "llm_d.pd_proxy.decode",
+ decodeCtx, decodeSpan := tracer.Start(requestCtx, "llm_d.pd_proxy.decode",
trace.WithSpanKind(trace.SpanKindInternal),
)
defer decodeSpan.End()
@@
- dreq := r.Clone(ctx)
+ dreq := r.Clone(decodeCtx)Also applies to: 177-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 65 - 80, The prefill span
lifecycle is wrong: the code overwrites the original request context (ctx :=
r.Context()) with the span context from tracer.Start, causing decodeSpan to
become a child of an ended span and leaving prefillSpan open on early returns;
fix by preserving the original request context (save origCtx := r.Context()),
call tracer.Start using origCtx but assign the returned span context to a new
variable (e.g., spanCtx, prefillSpan := tracer.Start(origCtx, ...)), immediately
defer prefillSpan.End() after Start(), use spanCtx for operations tied to the
prefill span (like cloning the request with r.Clone(spanCtx)), and when starting
decodeSpan later use origCtx (or another appropriate parent) so decodeSpan is
not a child of an already-ended prefillSpan; also ensure any early returns
before the explicit End() still allow the deferred End() to run.
| if isHTTPError(pw.statusCode) { | ||
| s.logger.Error(err, "request failed", "code", pw.statusCode) | ||
| w.WriteHeader(pw.statusCode) | ||
| s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String()) | ||
| prefillSpan.SetStatus(codes.Error, "prefill request failed") | ||
| prefillSpan.End() | ||
|
|
||
| if shouldFallbackToDecode(pw) { | ||
| s.logger.Info("fallback to decode", "request_id", uuidStr) | ||
| r.Body = io.NopCloser(strings.NewReader(string(original))) | ||
| s.decoderProxy.ServeHTTP(w, r) | ||
| } else { | ||
| for key, values := range pw.Header() { | ||
| for _, v := range values { | ||
| w.Header().Add(key, v) | ||
| } | ||
| } | ||
| w.WriteHeader(pw.statusCode) | ||
| _, err := w.Write([]byte(pw.buffer.String())) | ||
| if err != nil { | ||
| s.logger.Error(err, "failed to send error response to client") |
There was a problem hiding this comment.
Stop logging raw prefiller error bodies.
Line 133 writes the full upstream error payload into pod logs. In this path the backend can echo prompts, token IDs, or connector details, so this is a CWE-532 sensitive-data leak. err is also stale here, so the log entry is not even accurate. Log request_id and status code only, or a bounded/redacted reason.
Suggested fix
- s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String())
+ s.logger.Info("prefill request failed", "request_id", uuidStr, "code", pw.statusCode)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isHTTPError(pw.statusCode) { | |
| s.logger.Error(err, "request failed", "code", pw.statusCode) | |
| w.WriteHeader(pw.statusCode) | |
| s.logger.Error(err, "request failed", "code", pw.statusCode, "body", pw.buffer.String()) | |
| prefillSpan.SetStatus(codes.Error, "prefill request failed") | |
| prefillSpan.End() | |
| if shouldFallbackToDecode(pw) { | |
| s.logger.Info("fallback to decode", "request_id", uuidStr) | |
| r.Body = io.NopCloser(strings.NewReader(string(original))) | |
| s.decoderProxy.ServeHTTP(w, r) | |
| } else { | |
| for key, values := range pw.Header() { | |
| for _, v := range values { | |
| w.Header().Add(key, v) | |
| } | |
| } | |
| w.WriteHeader(pw.statusCode) | |
| _, err := w.Write([]byte(pw.buffer.String())) | |
| if err != nil { | |
| s.logger.Error(err, "failed to send error response to client") | |
| if isHTTPError(pw.statusCode) { | |
| s.logger.Info("prefill request failed", "request_id", uuidStr, "code", pw.statusCode) | |
| prefillSpan.SetStatus(codes.Error, "prefill request failed") | |
| prefillSpan.End() | |
| if shouldFallbackToDecode(pw) { | |
| s.logger.Info("fallback to decode", "request_id", uuidStr) | |
| r.Body = io.NopCloser(strings.NewReader(string(original))) | |
| s.decoderProxy.ServeHTTP(w, r) | |
| } else { | |
| for key, values := range pw.Header() { | |
| for _, v := range values { | |
| w.Header().Add(key, v) | |
| } | |
| } | |
| w.WriteHeader(pw.statusCode) | |
| _, err := w.Write([]byte(pw.buffer.String())) | |
| if err != nil { | |
| s.logger.Error(err, "failed to send error response to client") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/sidecar/proxy/connector_nixlv2.go` around lines 132 - 150, The current
error log in the HTTP error branch uses a stale err and prints the full upstream
response body (pw.buffer.String()), risking sensitive-data leaks; update the
s.logger.Error call in the isHTTPError branch (the block handling pw,
prefillSpan, shouldFallbackToDecode and uuidStr) to avoid logging the raw body
and to use a correct error variable (or nil) — only emit non-sensitive fields
such as "request_id" (uuidStr) and "code" (pw.statusCode) or a bounded/redacted
reason instead of pw.buffer.String().
| // Create OTLP trace exporter | ||
| exporter, err := otlptracegrpc.New(ctx, | ||
| otlptracegrpc.WithEndpoint(endpoint), | ||
| otlptracegrpc.WithInsecure(), // Use WithTLSCredentials() in production | ||
| ) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create OTLP trace exporter: %w", err) | ||
| } |
There was a problem hiding this comment.
Hardcoded insecure gRPC connection for tracing (CWE-319: Cleartext Transmission of Sensitive Information).
WithInsecure() forces unencrypted transmission of trace data. While the comment acknowledges this, production deployments require TLS. Add an environment-based toggle (e.g., OTEL_EXPORTER_OTLP_INSECURE).
Suggested fix
+ // Check if insecure mode is explicitly enabled
+ insecure := os.Getenv("OTEL_EXPORTER_OTLP_INSECURE") == "true"
+
+ opts := []otlptracegrpc.Option{
+ otlptracegrpc.WithEndpoint(endpoint),
+ }
+ if insecure {
+ opts = append(opts, otlptracegrpc.WithInsecure())
+ }
+
// Create OTLP trace exporter
- exporter, err := otlptracegrpc.New(ctx,
- otlptracegrpc.WithEndpoint(endpoint),
- otlptracegrpc.WithInsecure(), // Use WithTLSCredentials() in production
- )
+ exporter, err := otlptracegrpc.New(ctx, opts...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/telemetry/tracing.go` around lines 72 - 79, The current call to
otlptracegrpc.New unconditionally uses otlptracegrpc.WithInsecure(), sending
traces unencrypted; change this to honor an environment toggle
(OTEL_EXPORTER_OTLP_INSECURE) so TLS is used by default in production. Read
OTEL_EXPORTER_OTLP_INSECURE as a boolean before calling otlptracegrpc.New; if
true, pass otlptracegrpc.WithInsecure(), otherwise build proper TLS transport
credentials (e.g., credentials.NewClientTLSFromCert or grpc/transport
credentials) and pass them via otlptracegrpc.WithTLSCredentials(...). Update the
exporter creation code around the otlptracegrpc.New call that currently uses
endpoint and WithInsecure() so the choice of WithInsecure vs WithTLSCredentials
is driven by the env var.
| # Exit on error | ||
| set -e |
There was a problem hiding this comment.
Use set -euo pipefail for robust error handling.
CWE-391 (Unchecked Error Condition): Using only set -e misses undefined variable errors (-u) and pipeline failures (-o pipefail). The git cat-file | grep pipelines on lines 23 and 28 will silently succeed even if git cat-file fails.
Proposed fix
-# Exit on error
-set -e
+# Exit on error, undefined variables, and pipeline failures
+set -euo pipefailAs per coding guidelines: "Use 'set -euo pipefail' at script start".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Exit on error | |
| set -e | |
| # Exit on error, undefined variables, and pipeline failures | |
| set -euo pipefail |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/check-commits.sh` around lines 3 - 4, The script currently uses only
"set -e", which misses undefined-variable errors and pipeline failures; replace
that with "set -euo pipefail" at the top where set -e is declared and ensure any
uses of the pipelines (e.g., the "git cat-file | grep" pipelines) will now fail
fast on errors; update the header accordingly so undefined variables (-u) and
pipeline failures (-o pipefail) are detected.
| - "inference.networking.x-k8s.io" | ||
| resources: | ||
| - "inferenceobjectives" | ||
| - "inferencemodelrewrites" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether e2e code really needs list/watch on inferencemodelrewrites.
# Expected: if only direct reads by name exist, keep only `get` + resourceNames.
rg -n -C3 'inferencemodelrewrites|InferenceModelRewrite|modelrewrite' .
rg -n -C3 'watch|list' . | rg -n 'inferencemodelrewrites|InferenceModelRewrite'
rg -n -C3 'serviceAccountName:\s*e2e-epp|name:\s*e2e-epp' .Repository: opendatahub-io/llm-d-inference-scheduler
Length of output: 4279
Restrict inferencemodelrewrites to least privilege; verbs list/watch unused in codebase (CWE-269, CWE-200).
Line 10 grants e2e-epp namespace-wide get/list/watch on inferencemodelrewrites. Severity: Major. Codebase contains no evidence of list or watch operations on this resource type; reduce to get only, scoped to explicit resourceNames where applicable. This tightens the blast radius if the ServiceAccount is compromised.
Remediation diff
- apiGroups:
- "inference.networking.x-k8s.io"
resources:
- "inferenceobjectives"
- - "inferencemodelrewrites"
verbs:
- "get"
- "watch"
- "list"
+
+- apiGroups:
+ - "inference.networking.x-k8s.io"
+ resources:
+ - "inferencemodelrewrites"
+ resourceNames:
+ - "<specific-rewrite-name>"
+ verbs:
+ - "get"Note: Same pattern appears in production config at deploy/components/inference-gateway/rbac.yaml:11; apply same fix there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/yaml/rbac.yaml` at line 10, The ClusterRole/Role entry granting
verbs ["get","list","watch"] on the inferencemodelrewrites resource should be
reduced to least privilege: remove "list" and "watch" so only "get" is allowed,
and where the role is intended to target specific objects, add a resourceNames
entry listing the explicit inferencemodelrewrites resource names to scope
access; update the entry referencing inferencemodelrewrites in this YAML and
make the identical change in the production file that also contains
inferencemodelrewrites (deploy/components/inference-gateway/rbac.yaml) to ensure
both dev and prod use only "get" and optional resourceNames scoping.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Gregory-Pereira, zdtsw The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Cannot approve the pull request: Error: openshift-ci[bot] is not included in the approvers role in the OWNERS file |
Dupe of #129 but from a non main branch so I can force-push to it to resolve any merge conflicts
Summary by CodeRabbit
New Features
Improvements