Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a0c8d17
Fix panic in SGLang proxy handling of concurrent requests (#632)
yangligt2 Feb 19, 2026
1519a28
Add otel tracing instrumentation (#506)
sallyom Feb 25, 2026
ed55c9c
bump kvc import to v0.5.1-rc2 (#657)
vMaroon Feb 27, 2026
1dae683
pull in v0.6.0 of kvcache (#660)
Gregory-Pereira Feb 28, 2026
f278bfe
deps(go): bump go.opentelemetry.io/otel/sdk from 1.39.0 to 1.40.0 (#661)
dependabot[bot] Feb 28, 2026
9fe0948
deps(go): bump the go-dependencies group across 1 directory with 2 up…
dependabot[bot] Feb 28, 2026
416d4a0
fix(docs): Updatede development guide (#666)
gyliu513 Mar 1, 2026
8437ae1
Optimized request prefill error messages (#652)
learner0810 Mar 1, 2026
f21dcac
fix(makefile): use shell variable for kv-cache path in UDS tokenizer …
gyliu513 Mar 1, 2026
dbb0b5d
fix: remove kustomize dependency (#665)
gyliu513 Mar 1, 2026
70a3fd0
deps(go): bump the kubernetes group with 5 updates (#673)
dependabot[bot] Mar 2, 2026
db7a7ba
deps(go): bump the go-dependencies group across 1 directory with 5 up…
dependabot[bot] Mar 3, 2026
0ccf2ed
deps(actions): bump lycheeverse/lychee-action from 2.7.0 to 2.8.0 (#671)
dependabot[bot] Mar 3, 2026
9903474
ci: add dev image workflow for main and release branches (#668)
pierDipi Mar 3, 2026
a4f9d45
deps(actions): bump crate-ci/typos from 1.43.5 to 1.44.0 (#670)
dependabot[bot] Mar 3, 2026
c34ad8f
fix(ci): update Trivy to v0.69.2 (#675)
pierDipi Mar 4, 2026
c910eeb
Allow sidecar server to reload TLS certificates (#607)
pierDipi Mar 4, 2026
091312c
use trivy action for scanning (#688)
elevran Mar 9, 2026
da0d089
deps(go): bump the go-dependencies group with 7 updates (#692)
dependabot[bot] Mar 10, 2026
927052d
feat(sidecar): simplify TLS command line options with StringSlice fla…
gyliu513 Mar 10, 2026
bd3ba8c
fix terminolgy and add links (#695)
elevran Mar 10, 2026
3ce43fe
replace map[string]bool with map[string]struct{} (#696)
roytman Mar 10, 2026
12c2dd7
add make targets for presubmit (#687)
elevran Mar 10, 2026
7675f18
run newer version with explicit auth tokens (#698)
elevran Mar 10, 2026
f5a626e
remove extra trivy params (#702)
elevran Mar 10, 2026
8112a3a
fix: simplify InferencePool flag to namespace/name format (#685)
gyliu513 Mar 10, 2026
9083ec0
Trivy complains of user without password (#704)
elevran Mar 11, 2026
0f30fa6
fix(test): Add unit test for pd_prerequest.go (#706)
gyliu513 Mar 11, 2026
700325d
remove trivy cache and enable workflow dispatch (#713)
elevran Mar 12, 2026
be97ee1
initial E/PD extension of the sidecar (#643)
roytman Mar 12, 2026
3e62967
Check for uniqueness of media URLs (#717)
roytman Mar 15, 2026
e0f7b8d
move typo checking from tools makefile to main, under lint (#719)
elevran Mar 15, 2026
d1a19ef
rename EncoderPodsHeader according to other constants (#721)
roytman Mar 16, 2026
2078503
Implement Options pattern for sidecar proxy (#697)
Mohamedma96 Mar 16, 2026
fb7e3af
rename common constants (#722)
roytman Mar 16, 2026
b9a4a82
deps(actions): bump dorny/paths-filter from 3 to 4 (#723)
dependabot[bot] Mar 17, 2026
2d38fc1
enable major version updates to gh actions (#714)
elevran Mar 17, 2026
e8e709d
NonCachedTokens defines the minimum number of non-cached tokens requi…
modassarrana89-new Mar 17, 2026
89cbbbb
Add external tokenizer PrepareData plugin and TokenizedPrompt scorer …
acardace Mar 17, 2026
37fac64
Deprecate the workaround used to support vLLM Data Parallel on Istio …
shmuelk Mar 18, 2026
63914ae
build: remove CGO dependency by migrating to pure-Go ZMQ (#728)
elevran Mar 18, 2026
a238100
Merge branch 'main' into sync-upstream-to-odh-march-18
Gregory-Pereira Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/docker-build-and-push/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ runs:
using: "composite"
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n .github/actions/docker-build-and-push/action.yml | head -50

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1833


🏁 Script executed:

wc -l .github/actions/docker-build-and-push/action.yml

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 135


🏁 Script executed:

cat -n .github/actions/docker-build-and-push/action.yml | tail -25

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1171


Pin docker/setup-buildx-action to a full commit SHA.

Line 30 uses mutable-tag pinning. Supply chain risk: a retagged or compromised action revision executes attacker-controlled code in the build pipeline and can exfiltrate registry credentials. (CWE-829, CWE-494)

Remediation
-      uses: docker/setup-buildx-action@v4
+      uses: docker/setup-buildx-action@<full_commit_sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/actions/docker-build-and-push/action.yml at line 30, Replace the
mutable tag usage of docker/setup-buildx-action in the action.yml where the line
reads uses: docker/setup-buildx-action@v4 by pinning to the action’s full commit
SHA; update the uses entry to reference the specific commit SHA for
docker/setup-buildx-action (obtain the canonical commit SHA from the action’s
GitHub releases/commit history) so the workflow uses an immutable revision and
prevents supply-chain tampering.


- name: Login to GitHub Container Registry
run: echo "${{ inputs.github-token }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
Expand Down
22 changes: 10 additions & 12 deletions .github/actions/trivy-scan/action.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,17 @@
name: Trivy Scan
description: Scan container image with Trivy
description: Scan container image with official Aqua Security Trivy action
inputs:
image:
required: true
description: "Image to scan (e.g., 'my-repo/my-image:latest')"

runs:
using: "composite"
steps:
- name: Install Trivy
run: |
wget https://github.com/aquasecurity/trivy/releases/download/v0.44.1/trivy_0.44.1_Linux-64bit.deb
sudo dpkg -i trivy_0.44.1_Linux-64bit.deb
shell: bash


- name: Scan image
run: |
trivy image --severity HIGH,CRITICAL --no-progress ${{ inputs.image }}
shell: bash
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.35.0
with:
image-ref: ${{ inputs.image }}
format: 'table'
severity: 'HIGH,CRITICAL'
exit-code: '1'
11 changes: 8 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ updates:
update-types: ["version-update:semver-major", "version-update:semver-minor"]
- dependency-name: "sigs.k8s.io/*"
update-types: ["version-update:semver-major", "version-update:semver-minor"]
# Ignore major updates for all packages
# Ignore major updates for all Go packages
- dependency-name: "*"
update-types: ["version-update:semver-major"]
update-types: ["version-update:semver-major"]
groups:
go-dependencies:
patterns:
Expand All @@ -46,8 +46,13 @@ updates:
- "release-note-none"
commit-message:
prefix: "deps(actions)"
# No "ignore" block here: This allows major version updates
groups:
github-actions:
patterns:
- "*"

# 3. Docker base image updates (e.g., for Dockerfile FROM lines)
# 3. Docker base image updates
- package-ecosystem: "docker"
directory: "/"
schedule:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/check-typos.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ jobs:
uses: actions/checkout@v6

- name: Check typos
uses: crate-ci/typos@v1.43.5
uses: crate-ci/typos@v1.44.0

62 changes: 62 additions & 0 deletions .github/workflows/ci-build-images.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: Build and Push Container Images

on:
workflow_call:
inputs:
epp-image-name:
required: true
type: string
sidecar-image-name:
required: true
type: string
tag:
required: true
type: string
prerelease:
required: true
type: string
secrets:
GHCR_TOKEN:
required: true

jobs:
docker-build-and-push:
runs-on: ubuntu-latest
Comment on lines +22 to +24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add explicit permissions block with least-privilege scope.

No permissions defined. Default permissions may be overly permissive. Define minimal required permissions for this job (e.g., contents: read, packages: write).

Suggested fix
 jobs:
   docker-build-and-push:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
     steps:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
jobs:
docker-build-and-push:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-build-images.yaml around lines 22 - 24, Add a minimal
explicit permissions block to the GitHub Actions job "docker-build-and-push" to
enforce least privilege: update the job definition for docker-build-and-push to
include a permissions mapping (for example permissions: contents: read,
packages: write, id-token: write if using OIDC) instead of relying on defaults
so the job only has the scopes it needs.

steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false

- name: Checkout source
uses: actions/checkout@v6

- name: Build and push EPP image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.epp
tag: ${{ inputs.tag }}
image-name: ${{ inputs.epp-image-name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ inputs.prerelease }}

- name: Build and push sidecar image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.sidecar
tag: ${{ inputs.tag }}
image-name: ${{ inputs.sidecar-image-name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ inputs.prerelease }}

- name: Run Trivy scan on EPP image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ inputs.epp-image-name }}:${{ inputs.tag }}

- name: Run Trivy scan on sidecar image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ inputs.sidecar-image-name }}:${{ inputs.tag }}
39 changes: 39 additions & 0 deletions .github/workflows/ci-dev.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: CI - Dev - Docker Container Image

on:
push:
branches:
- main
- 'release-*'
workflow_dispatch:

jobs:
set-params:
runs-on: ubuntu-latest
outputs:
project_name: ${{ steps.version.outputs.project_name }}
sidecar_name: ${{ steps.version.outputs.sidecar_name }}
tag: ${{ steps.tag.outputs.tag }}
steps:
- name: Set image names
id: version
run: |
repo="${GITHUB_REPOSITORY##*/}"
echo "project_name=${repo}-dev" >> "$GITHUB_OUTPUT"
echo "sidecar_name=llm-d-routing-sidecar-dev" >> "$GITHUB_OUTPUT"

- name: Set branch name as tag
id: tag
run: |
echo "tag=${GITHUB_REF_NAME}" >> "$GITHUB_OUTPUT"

build-and-push:
needs: set-params
uses: ./.github/workflows/ci-build-images.yaml
with:
epp-image-name: ${{ needs.set-params.outputs.project_name }}
sidecar-image-name: ${{ needs.set-params.outputs.sidecar_name }}
tag: ${{ needs.set-params.outputs.tag }}
prerelease: "true"
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
14 changes: 2 additions & 12 deletions .github/workflows/ci-pr-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
steps:
- name: Checkout source
uses: actions/checkout@v6
- uses: dorny/paths-filter@v3
- uses: dorny/paths-filter@v4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -path ".github/workflows/ci-pr-checks.yaml" -type f

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 66


🏁 Script executed:

cat -n .github/workflows/ci-pr-checks.yaml | head -30

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 870


🏁 Script executed:

grep -n "uses:" .github/workflows/ci-pr-checks.yaml

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 331


Pin dorny/paths-filter by immutable commit SHA.

Line 19 uses tag pinning (@v4). A compromised or retagged upstream action can execute arbitrary code in CI with access to repository-scoped credentials (CWE-829, CWE-494).

Remediation
-      - uses: dorny/paths-filter@v4
+      - uses: dorny/paths-filter@<full_commit_sha>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci-pr-checks.yaml at line 19, Replace the tag pinning for
the GitHub Action "dorny/paths-filter" (the uses: dorny/paths-filter@v4 entry)
with an immutable commit SHA to prevent supply-chain risks; locate the
dorny/paths-filter repo, pick the commit SHA corresponding to the v4 release you
want, and update the uses line to use that SHA (e.g., uses:
dorny/paths-filter@<commit-sha>), then run the workflow locally or in a safe
branch to verify it behaves the same and document the chosen SHA for future
updates.

id: filter
with:
filters: |
Expand Down Expand Up @@ -52,25 +52,15 @@ jobs:
go-version: "${{ env.GO_VERSION }}"
cache-dependency-path: ./go.sum

- name: Configure CGO for ZMQ
run: |
echo "CGO_ENABLED=1" >> $GITHUB_ENV
echo "PKG_CONFIG_PATH=/usr/lib/pkgconfig" >> $GITHUB_ENV

- name: Install dependencies
run: |
go mod tidy
sudo -E env "PATH=$PATH" make install-dependencies
run: go mod tidy

- name: Run lint checks
uses: golangci/golangci-lint-action@v9
with:
version: "v2.8.0"
args: "--config=./.golangci.yml"
skip-cache: true
env:
CGO_ENABLED: ${{ env.CGO_ENABLED }}
PKG_CONFIG_PATH: ${{ env.PKG_CONFIG_PATH }}

- name: Run make build
shell: bash
Expand Down
57 changes: 16 additions & 41 deletions .github/workflows/ci-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,21 @@ on:
types: [published] # Also runs when a GitHub release is published

jobs:
docker-build-and-push:
set-params:
runs-on: ubuntu-latest
outputs:
project_name: ${{ steps.version.outputs.project_name }}
sidecar_name: ${{ steps.version.outputs.sidecar_name }}
tag: ${{ steps.tag.outputs.tag }}
prerelease: ${{ steps.tag.outputs.prerelease }}
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false

- name: Checkout source
uses: actions/checkout@v6

- name: Set image names
id: version
run: |
repo="${GITHUB_REPOSITORY##*/}"
echo "project_name=$repo" >> "$GITHUB_OUTPUT"
echo "sidecar_name=llm-d-routing-sidecar" >> "$GITHUB_OUTPUT"

- name: Print project name
run: echo "Project is ${{ steps.version.outputs.project_name }}"

- name: Determine tag name
id: tag
run: |
Expand All @@ -43,32 +37,13 @@ jobs:
echo "prerelease=${PRE_RELEASE}" >> "$GITHUB_OUTPUT"
shell: bash

- name: Build and push EPP image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.epp
tag: ${{ steps.tag.outputs.tag }}
image-name: ${{ steps.version.outputs.project_name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ steps.tag.outputs.prerelease }}

- name: Build and push sidecar image
uses: ./.github/actions/docker-build-and-push
with:
docker-file: Dockerfile.sidecar
tag: ${{ steps.tag.outputs.tag }}
image-name: ${{ steps.version.outputs.sidecar_name }}
registry: ghcr.io/llm-d
github-token: ${{ secrets.GHCR_TOKEN }}
prerelease: ${{ steps.tag.outputs.prerelease }}

- name: Run Trivy scan on EPP image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ steps.version.outputs.project_name }}:${{ steps.tag.outputs.tag }}

- name: Run Trivy scan on sidecar image
uses: ./.github/actions/trivy-scan
with:
image: ghcr.io/llm-d/${{ steps.version.outputs.sidecar_name }}:${{ steps.tag.outputs.tag }}
build-and-push:
needs: set-params
uses: ./.github/workflows/ci-build-images.yaml
with:
epp-image-name: ${{ needs.set-params.outputs.project_name }}
sidecar-image-name: ${{ needs.set-params.outputs.sidecar_name }}
tag: ${{ needs.set-params.outputs.tag }}
prerelease: ${{ needs.set-params.outputs.prerelease }}
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
2 changes: 1 addition & 1 deletion .github/workflows/md-link-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
uses: actions/checkout@v6

- name: Run lychee link checker
uses: lycheeverse/lychee-action@v2.7.0
uses: lycheeverse/lychee-action@v2.8.0
with:
args: '--config .lychee.toml --verbose --no-progress **/*.md'
fail: true
22 changes: 13 additions & 9 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,20 @@ Documentation for developing the inference scheduler.
- [Golang] `v1.24`+
- [Docker] (or [Podman])
- [Kubernetes in Docker (KIND)]
- [Kustomize]
- [ZeroMQ]
- [Kubectl] `v1.14`+
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What is the official kubectl version skew policy relative to the Kubernetes API server, and what minimum kubectl version is appropriate for clusters using Gateway API v1.3-era tooling?

💡 Result:

The official kubectl version skew policy relative to the Kubernetes API server is that kubectl is supported within one minor version older or newer than the kube-apiserver. For example, if the kube-apiserver is at v1.35, kubectl versions v1.34, v1.35, and v1.36 are supported. In highly-available clusters with version skew between kube-apiserver instances (e.g., v1.34 and v1.35), the supported kubectl versions are narrowed to those within one minor version of all apiservers (e.g., only v1.34 and v1.35). For clusters using Gateway API v1.3-era tooling, there is no specific minimum kubectl version beyond the general skew policy, as Gateway API is provided via CRDs (not built-in Kubernetes APIs) and works on Kubernetes v1.26+. Implementations like kgateway (conformant to v1.3) support Kubernetes v1.27+, so kubectl within one minor version of the cluster's API server (e.g., minimum matching the cluster version, such as v1.27 for a v1.27+ cluster) is appropriate. No sources indicate additional kubectl requirements for Gateway API v1.3.

Citations:


🏁 Script executed:

# Get file size first
wc -l DEVELOPMENT.md

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 102


🏁 Script executed:

# Read the relevant sections around lines 11 and 81
cat -n DEVELOPMENT.md | head -30

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1342


🏁 Script executed:

# Also check around line 81
cat -n DEVELOPMENT.md | sed -n '75,95p'

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 1017


🏁 Script executed:

# Check for go.mod to see Kubernetes dependencies
head -50 go.mod

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 2031


🏁 Script executed:

# Check Makefile or other version indicators
grep -i "kubernetes\|kind\|kube" Makefile 2>/dev/null | head -20

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 649


🏁 Script executed:

# Look for any version constants or documentation about cluster versions
grep -r "kubernetes.*version\|kind.*version" . --include="*.md" --include="*.go" | head -10

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 66


Update kubectl minimum to v1.34+; v1.14 is 21 minor versions behind the project target.

Line 11 specifies v1.14+, but the project depends on Kubernetes v1.35 (k8s.io/* v0.35.2). Per the official version-skew policy, kubectl must be within one minor version of kube-apiserver. For v1.35 clusters, only kubectl v1.34–v1.36 are supported. Setting v1.14+ will cause immediate setup failures.

Additionally, line 81 uses @latest in go install sigs.k8s.io/cloud-provider-kind@latest, which is non-reproducible and introduces supply-chain risk. Pin to a specific version (e.g., v0.5.0 or the version matching the deployed cloud-provider-kind release).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEVELOPMENT.md` at line 11, Update the DEVELOPMENT.md entries to require a
modern, compatible kubectl by changing the kubectl minimum from "v1.14+" to
"v1.34+" (so it’s within one minor of the project’s k8s v1.35 dependency) and
replace the non-reproducible "go install sigs.k8s.io/cloud-provider-kind@latest"
reference with a pinned release (e.g., "@v0.5.0" or the exact
cloud-provider-kind version that matches your deployed release) so the document
and bootstrap commands reflect supported kubectl versions and a reproducible
cloud-provider-kind install.


[Make]:https://www.gnu.org/software/make/
[Golang]:https://go.dev/
[Docker]:https://www.docker.com/
[Podman]:https://podman.io/
[Kubernetes in Docker (KIND)]:https://github.com/kubernetes-sigs/kind
[Kustomize]:https://kubectl.docs.kubernetes.io/installation/kustomize/
[ZeroMQ]:https://zeromq.org/
[Kubectl]:https://kubectl.docs.kubernetes.io/installation/kubectl/

> [!NOTE]
> Before committing and pushing changes to an upstream repository, you may want to
> explicitly run the `make presubmit` target to avoid failing PR checks. The checks
> are also performed as part of a GitHub action, but running locally can save time
> and an iteration.

> [!NOTE]
> **Python is NOT required** as of v0.5.1. Tokenization is handled by a separate UDS (Unix Domain Socket) tokenizer sidecar container. Previous versions (< v0.5.1) used embedded Python tokenizers with daulet/tokenizers bindings, but these are now deprecated.
Expand Down Expand Up @@ -59,30 +63,30 @@ There are several ways to access the gateway:
**Port forward**:

```bash
$ kubectl --context llm-d-inference-scheduler-dev port-forward service/inference-gateway 8080:80
kubectl --context kind-llm-d-inference-scheduler-dev port-forward service/inference-gateway-istio 8080:80
```

**NodePort**

```bash
# Determine the k8s node address
$ kubectl --context llm-d-inference-scheduler-dev get node -o yaml | grep address
kubectl --context kind-llm-d-inference-scheduler-dev get node -o yaml | grep address
# The service is accessible over port 80 of the worker IP address.
```

**LoadBalancer**

```bash
# Install and run cloud-provider-kind:
$ go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
$ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd DEVELOPMENT.md -type f

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 254


🏁 Script executed:

# Check the context around line 81 in DEVELOPMENT.md
if [ -f "DEVELOPMENT.md" ]; then
  sed -n '70,90p' DEVELOPMENT.md
fi

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 876


🌐 Web query:

cloud-provider-kind github sigs.k8s.io versions releases

💡 Result:

cloud-provider-kind is a Kubernetes SIGs project that provides a cloud provider implementation for KIND clusters, enabling LoadBalancer services, Ingress, and Gateway API support in local KIND environments. It fills the gap in KIND for testing cloud-provider dependent features like Load Balancers. GitHub: https://github.com/kubernetes-sigs/cloud-provider-kind (kubernetes-sigs/cloud-provider-kind) Go module: sigs.k8s.io/cloud-provider-kind Homepage/docs: https://kubernetes-sigs.github.io/cloud-provider-kind/ Latest stable release: v0.10.0 (released 2025-11-29). Changes include fixes for Ingress on Mac/Windows, mirror registry support, Gateway channel options, and improved logging. Recent pre-releases (not tagged as stable releases): - v0.10.1-0.20260302143654-b328f96e9b24 (2026-03-02) - v0.10.1-0.20260201214224-cddfafcbff87 (2026-02-01) Repo last push: 2026-03-15 (active development). Installation: go install sigs.k8s.io/cloud-provider-kind@latest Docker image (v0.4.0+): registry.k8s.io/cloud-provider-kind/cloud-controller-manager Usage example: 1. kind create cluster 2. Run cloud-provider-kind (may need sudo on macOS/WSL2) 3. Deploy LoadBalancer/Ingress/Gateway services for external access. Dependencies (v0.10.0): Kubernetes v0.34.2 libs, sigs.k8s.io/gateway-api v1.4.0, sigs.k8s.io/kind v0.30.0, Envoy v1.36.0. Not integrated into KIND CLI (separate issue #3968 closed as not planned). Compatible with recent KIND v0.31.0 (defaults to K8s v1.35).

Citations:


🏁 Script executed:

grep -n "@latest\|@main\|@master" DEVELOPMENT.md

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 161


🏁 Script executed:

# Also check for other go install commands without version pinning
grep -n "go install" DEVELOPMENT.md

Repository: opendatahub-io/llm-d-inference-scheduler

Length of output: 161


Replace @latest with a pinned version of cloud-provider-kind.

Line 81 introduces non-deterministic behavior during contributor setup. Severity: Medium (CWE-494). A compromised future release would be silently executed by developers. Pin to a specific release (e.g., @v0.10.0):

Remediation
-go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
+go install sigs.k8s.io/cloud-provider-kind@v0.10.0 && cloud-provider-kind &
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
go install sigs.k8s.io/cloud-provider-kind@v0.10.0 && cloud-provider-kind &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEVELOPMENT.md` at line 81, The install command uses an unpinned dependency
"sigs.k8s.io/cloud-provider-kind@latest" which introduces non-determinism;
update the go install invocation to pin a specific version (for example replace
"@latest" with "@v0.10.0") so the command uses a known safe release, i.e.,
change the token "sigs.k8s.io/cloud-provider-kind@latest" in the DEVELOPMENT.md
line to a fixed release string like "sigs.k8s.io/cloud-provider-kind@v0.10.0"
and document the chosen version so contributors can update it deliberately when
needed.

kubectl --context kind-llm-d-inference-scheduler-dev get service inference-gateway-istio
# Wait for the LoadBalancer External-IP to become available. The service is accessible over port 80.
```

You can now make requests matching the IP:port of one of the access mode above:

```bash
$ curl -s -w '\n' http://<IP:port>/v1/completions -H 'Content-Type: application/json' -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
curl -s -w '\n' http://<IP:port>/v1/completions -H 'Content-Type: application/json' -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
```

By default the created inference gateway, can be accessed on port 30080. This can
Expand Down
Loading
Loading