Skip to content

[pull] main from llm-d:main#129

Open
pull[bot] wants to merge 48 commits intoopendatahub-io:mainfrom
llm-d:main
Open

[pull] main from llm-d:main#129
pull[bot] wants to merge 48 commits intoopendatahub-io:mainfrom
llm-d:main

Conversation

@pull
Copy link

@pull pull bot commented Feb 18, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: update to use new kv-cache UDS tokenizer

- change preprocessing to types from kv-cache
- add new unit test case: same tests from the old ones
- keep old test case but mark it wont be used(for now can be removed
  later)
- add new make target to build UDS image
- update image to use the one from llm-d in deploy
- remove parts in Dockerfile to only build go code

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: more changes for UDS in makefile and docs

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: add comments for download-tokenizer and remove as dependecy to
build

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* GHAction: remove lint-and-test which still using python

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: fix rebase

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: lint with 2.8.0

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: code review

- remove env variable
	LDFLAGS
	PYTHON_CONFIG
	CGO_CFLAGS
	TOKENIZER_ARCH
	PYTHON_VERSION
	epp_* and sidecar_* for CGO
- update documentation
- remove make targets related to tokenizer, pythone

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@pull pull bot locked and limited conversation to collaborators Feb 18, 2026
@pull pull bot added ⤵️ pull merge-conflict Resolve conflicts manually labels Feb 18, 2026
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* Fix panic in SGLang proxy handling of concurrent requests

Signed-off-by: YANG LI <yangligt@google.com>

* Add concurrency unit test for SGLang context logic

Signed-off-by: YANG LI <yangligt@google.com>

---------

Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing

    Add centralized telemetry package and custom spans
    following the llm-d distributed tracing proposal.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* update Dockerfile.sidecar

Signed-off-by: sallyom <somalley@redhat.com>

* tracing: remove extra success results & startup spans and cleanup

Signed-off-by: sallyom <somalley@redhat.com>

* fix: avoid os.Exit bypassing defer in main

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* fix: address review nits for tracing PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* test: add edge case tests for StripScheme

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>

* remove extra comments from sidecar spans

Signed-off-by: sallyom <somalley@redhat.com>

* fix lint error

Signed-off-by: sallyom <somalley@redhat.com>

* protect against segfault on tests

Signed-off-by: greg pereira <grpereir@redhat.com>

---------

Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
vMaroon and others added 12 commits February 27, 2026 06:59
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#662)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.65.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (#664)

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` |
| [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` |


Updates `k8s.io/api` from 0.34.4 to 0.34.5
- [Commits](kubernetes/api@v0.34.4...v0.34.5)

Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5)

Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5
- [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5)

Updates `k8s.io/client-go` from 0.34.4 to 0.34.5
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.34.4...v0.34.5)

Updates `k8s.io/component-base` from 0.34.4 to 0.34.5
- [Commits](kubernetes/component-base@v0.34.4...v0.34.5)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/client-go
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
- dependency-name: k8s.io/component-base
  dependency-version: 0.34.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: kubernetes
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#674)

Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go).


Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0)

Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
  dependency-version: 0.66.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/lycheeverse/lychee-action/releases)
- [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0)

---
updated-dependencies:
- dependency-name: lycheeverse/lychee-action
  dependency-version: 2.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches

Build and push -dev variants of EPP and sidecar container images
on pushes to main and release-* branches, tagged with commit SHA.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* ci: extract reusable build workflow and tag dev images by branch

Refactor ci-release and ci-dev to call a shared ci-build-images
reusable workflow, reducing duplication. Tag dev images with the
branch name instead of commit SHA so each branch has exactly one
image that gets overwritten on push, avoiding image accumulation.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Newlines at EOF

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
gyliu513 and others added 9 commits March 10, 2026 12:18
…gs (#682)

* feat(sidecar): simplify TLS command line options with StringSlice flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* keep deprecated flags

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

* Address comments from Etai

Signed-off-by: Guangya Liu <gyliu513@gmail.com>

---------

Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
elevran and others added 11 commits March 12, 2026 12:13
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar

Signed-off-by: roytman <roytman@il.ibm.com>

* add Encoding span.SetAttributes, fix bug in connector_sglang_test.go

Signed-off-by: roytman <roytman@il.ibm.com>

* fix e2e tests

Signed-off-by: roytman <roytman@il.ibm.com>

* Update pkg/sidecar/proxy/connector_epd_shared_storage.go

Co-authored-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>

* flags order

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

* remove redundant comment

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Alexey Roytman <roytman@il.ibm.com>
Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* Check for uniqueness of media URLs

Signed-off-by: roytman <roytman@il.ibm.com>

* fix comments

Signed-off-by: roytman <roytman@il.ibm.com>

---------

Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* Implement Options pattern for sidecar proxy

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* options.go and options_test.go review fixes and code enhancements

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* enhance tests, enhance options pattern implementation

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* simplify InferencePool flag to namespace/name format - PR #685

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* code enhancements, resolve review comments

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* set default connector and KVConnector and migrate connector flag to KVConnector

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

* add complete before validate in tests, align KVConnector and Connector initialization as before

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>

---------

Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3 to 4.
- [Release notes](https://github.com/dorny/paths-filter/releases)
- [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md)
- [Commits](dorny/paths-filter@v3...v4)

---
updated-dependencies:
- dependency-name: dorny/paths-filter
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…red (#691)

* NonCachedTokens defines the minimum number of non-cached tokens required to trigger disaggregated PD. A value of 0 disables disaggregation.

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

* updated the signature

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

* updated logging to Construtcor

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>

---------

Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>
…support (#694)

* Bump kv-cache and GAIE

This is needed to consume the new changes related to consuming tokens
directly.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Update import paths for GAIE and kv-cache API changes

Adapt import paths for packages renamed in GAIE:
- pkg/common/util/logging -> pkg/common/observability/logging
- pkg/epp/util/metrics -> pkg/common/observability/metrics
- pkg/epp/datalayer/http -> pkg/epp/framework/plugins/datalayer/source/http
- pkg/epp/datalayer/plugins/approximateprefix -> pkg/epp/framework/plugins/datalayer/attribute/prefix

Also adapt to kv-cache API change where NewHTTPDataSource now
returns (ds, error) instead of just ds.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Adapt precise-prefix-cache-scorer to kv-cache API changes

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add external tokenizer PrepareData plugin and TokenizedPrompt scorer path

Add a PrepareData plugin that calls a tokenizer sidecar over UDS gRPC
to tokenize prompts before scheduling. The plugin populates
request.TokenizedPrompt with token IDs from the sidecar. It supports
both completions (Render) and chat completions (RenderChat) requests.
The plugin is fail-open: tokenization errors are logged and the request
proceeds without TokenizedPrompt.

Update precise-prefix-cache-scorer to use pre-tokenized input when
TokenizedPrompt is present on the request, calling
GetPodScoresFromTokens to skip internal tokenization. Falls back to
existing internal tokenization path when TokenizedPrompt is not set.

Register the new tokenizer plugin in RegisterAllPlugins.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add deployment config and kind environment for external tokenizer

Add EPP config with prepareDataPlugins feature gate and tokenizer
plugin pointing at /tmp/tokenizer/tokenizer-uds.socket.

Update kind-dev-env.sh to support EXTERNAL_TOKENIZER_ENABLED flag
for selecting the tokenizer config.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* tokenizer: use interface to be able to mock in unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add tokenizer plugin unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* pprefix-cache-scorer: use interface to be able to mock in unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add precise-prefix-cache tokenized unit tests

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add e2e test for external tokenizer PrepareData plugin

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Move Consumes() from PrefixBasedPDDecider to PdProfileHandler

The GAIE dependency bump introduced plugin layer execution order
validation. PrefixBasedPDDecider has no scheduling interface so it
falls into DefaultLayer (-1), while its producer prefix-cache-scorer
is SchedulingLayer (2). The check (2 > -1) fails.

Move Consumes() to PdProfileHandler which is a ProfileHandler
(SchedulingLayer = 2), matching the producer's layer. This is
semantically correct as PdProfileHandler is the actual scheduling
plugin that consumes the data through the decider.

Signed-off-by: Antonio Cardace <acardace@redhat.com>

* Add end-to-end benchmark for external tokenizer + scorer flow

Benchmarks the request latency through the EPP when using the external
tokenizer PrepareData plugin with the precise-prefix-cache-scorer.

Includes completion, chat completion, shared-prefix, and multi-message
scenarios with prompt token count reporting.

Usage: EXTERNAL_TOKENIZER_ENABLED=true KV_CACHE_ENABLED=true make env-dev-kind
  make bench-external-tokenizer
Signed-off-by: Antonio Cardace <acardace@redhat.com>

---------

Signed-off-by: Antonio Cardace <acardace@redhat.com>
@openshift-merge-robot
Copy link

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

shmuelk and others added 7 commits March 18, 2026 11:45
…1.28 (#727)

* Corrected configuration file

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* No longer set primaryPort plugin parameter

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Deprecate use of primaryPort plugin parameter

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Deprecate use of the x-data-parallel-host-port header

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Add a real plugin Handle to the DP tests

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

* Make sure user understands that Istio >= 1.28.1 is needed

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

---------

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* build: remove CGO dependency by migrating to pure-Go ZMQ

Update llm-d-kv-cache to v0.6.1-0.20260317063900-80aba2cb5a99 (main snapshot),
pending merge of llm-d/llm-d-kv-cache#431 which switches from pebbe/zmq4 (CGO)
to a pure-Go ZMQ implementation.

- go.mod/go.sum: bump kv-cache to current main pseudo-version (placeholder;
  will be updated to a real tag once llm-d/llm-d-kv-cache#431 is merged)
- Makefile: set CGO_ENABLED=0; drop check-dependencies prereq from test targets
- Makefile.tools.mk: remove ##@ Dependencies section (check/install-dependencies)
- Dockerfile.epp: remove EPEL + zeromq install steps; set CGO_ENABLED=0 in build
- DEVELOPMENT.md: remove ZeroMQ from prerequisites list
- .github/workflows/ci-pr-checks.yaml: remove CGO configuration and
  install-dependencies steps; remove CGO env vars from lint step

NOTE: This commit is intentionally draft — go.mod must be updated to the
tagged kv-cache release that includes the pure-Go ZMQ changes before merging.

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* update kv-cache (pure Go zmq, tip of main)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* revert to micro image once CGO is disabled

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.79.2 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.79.2...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Add idleThreshold and maxBusyScore parameters to create a scoring gap
between idle and busy pods, helping distribute prefix cache warmup.

- idleThreshold: max requests to be considered idle (default: 0)
- maxBusyScore: max score for busy pods (default: 1.0 for current behavior)

Examples:
- Binary mode: idleThreshold=0, maxBusyScore=0 (idle=1.0, busy=0.0)
- Hybrid mode: idleThreshold=0, maxBusyScore=0.5 (idle=1.0, busy=0-0.5)
- Flexible: idleThreshold=2, maxBusyScore=0.5 (≤2 req = idle)

Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Docker build enhancements

- reorder steps so go.mod/go.sum are in a cacheable layer (ie when not modified)
- strip debug info by default (stacktraces are NOT affected). Can override with LD_FLAGS
- cache CICD on GH action
- other nits and clean ups (e.g., unused Python arg, comments)

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* pass GO build vars inline, not as ENV settings

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

* allow EPP and sidecar images to run concurrently

Signed-off-by: Etai Lev Ran <elevran@gmail.com>

---------

Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* feat: speculative indexing for PrecisePrefixCacheScorer

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* fix: use ip:port format for PodIdentifier to match KV event topics

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* fix: split Address/Port in test endpoints to match production ip:port format

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* make SpeculativeIndexing optional

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* feat: use confirmed-only scores for PrefixCacheServers cycle state

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* refactor: update PodEntry usage for Annotations struct

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* refactor: replace speculativeCache.Start() with cleanCachePeriodically()

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* fix: add nil metadata guard in PrepareRequestData

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* refactor: remove confirmedScores and simplify Annotations usage

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* fix: adapt to NewChunkedTokenDatabase signature change from PR 415

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* refactor: use KVBlockScorer for scoring and human-readable speculativeTTL

Replace computeScoresFromKeyToPods with kvcache.KVBlockScorer.Score()
to properly integrate device-backend weight configuration.

Change SpeculativeTTL from time.Duration (nanoseconds) to string,
parsed via time.ParseDuration, for human-readable config values
like "2s" or "500ms".

Compact docs/configuration.md per review feedback.

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* fix: gofmt import ordering in precise_prefix_cache.go

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

* docs: move speculative indexing config into architecture.md

Signed-off-by: bongwoobak <bongwoobak@gmail.com>

---------

Signed-off-by: bongwoobak <bongwoobak@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.