Open
Conversation
* feat: update to use new kv-cache UDS tokenizer - change preprocessing to types from kv-cache - add new unit test case: same tests from the old ones - keep old test case but mark it wont be used(for now can be removed later) - add new make target to build UDS image - update image to use the one from llm-d in deploy - remove parts in Dockerfile to only build go code Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: more changes for UDS in makefile and docs Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: add comments for download-tokenizer and remove as dependecy to build Signed-off-by: Wen Zhou <wenzhou@redhat.com> * GHAction: remove lint-and-test which still using python Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: fix rebase Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: lint with 2.8.0 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: code review - remove env variable LDFLAGS PYTHON_CONFIG CGO_CFLAGS TOKENIZER_ARCH PYTHON_VERSION epp_* and sidecar_* for CGO - update documentation - remove make targets related to tokenizer, pythone Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* Fix panic in SGLang proxy handling of concurrent requests Signed-off-by: YANG LI <yangligt@google.com> * Add concurrency unit test for SGLang context logic Signed-off-by: YANG LI <yangligt@google.com> --------- Signed-off-by: YANG LI <yangligt@google.com>
* Add opentelemetry tracing
Add centralized telemetry package and custom spans
following the llm-d distributed tracing proposal.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* update Dockerfile.sidecar
Signed-off-by: sallyom <somalley@redhat.com>
* tracing: remove extra success results & startup spans and cleanup
Signed-off-by: sallyom <somalley@redhat.com>
* fix: avoid os.Exit bypassing defer in main
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* fix: address review nits for tracing PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* test: add edge case tests for StripScheme
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
* remove extra comments from sidecar spans
Signed-off-by: sallyom <somalley@redhat.com>
* fix lint error
Signed-off-by: sallyom <somalley@redhat.com>
* protect against segfault on tests
Signed-off-by: greg pereira <grpereir@redhat.com>
---------
Signed-off-by: sallyom <somalley@redhat.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
Signed-off-by: greg pereira <grpereir@redhat.com>
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.40.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#662) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.64.0 to 0.65.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.64.0...zpages/v0.65.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.40.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.40.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.65.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.40.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: learner0810 <zhongjun.li@daocloud.io>
…build (#664) Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Bumps the kubernetes group with 5 updates: | Package | From | To | | --- | --- | --- | | [k8s.io/api](https://github.com/kubernetes/api) | `0.34.4` | `0.34.5` | | [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) | `0.34.4` | `0.34.5` | | [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) | `0.34.4` | `0.34.5` | | [k8s.io/client-go](https://github.com/kubernetes/client-go) | `0.34.4` | `0.34.5` | | [k8s.io/component-base](https://github.com/kubernetes/component-base) | `0.34.4` | `0.34.5` | Updates `k8s.io/api` from 0.34.4 to 0.34.5 - [Commits](kubernetes/api@v0.34.4...v0.34.5) Updates `k8s.io/apiextensions-apiserver` from 0.34.4 to 0.34.5 - [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases) - [Commits](kubernetes/apiextensions-apiserver@v0.34.4...v0.34.5) Updates `k8s.io/apimachinery` from 0.34.4 to 0.34.5 - [Commits](kubernetes/apimachinery@v0.34.4...v0.34.5) Updates `k8s.io/client-go` from 0.34.4 to 0.34.5 - [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md) - [Commits](kubernetes/client-go@v0.34.4...v0.34.5) Updates `k8s.io/component-base` from 0.34.4 to 0.34.5 - [Commits](kubernetes/component-base@v0.34.4...v0.34.5) --- updated-dependencies: - dependency-name: k8s.io/api dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apiextensions-apiserver dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/apimachinery dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/client-go dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes - dependency-name: k8s.io/component-base dependency-version: 0.34.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: kubernetes ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dates (#674) Bumps the go-dependencies group with 2 updates in the / directory: [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) and [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go). Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.65.0 to 0.66.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.65.0...zpages/v0.66.0) Updates `go.opentelemetry.io/otel` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/sdk` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) Updates `go.opentelemetry.io/otel/trace` from 1.40.0 to 1.41.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.40.0...v1.41.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.66.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.41.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [lycheeverse/lychee-action](https://github.com/lycheeverse/lychee-action) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/lycheeverse/lychee-action/releases) - [Commits](lycheeverse/lychee-action@v2.7.0...v2.8.0) --- updated-dependencies: - dependency-name: lycheeverse/lychee-action dependency-version: 2.8.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci: add dev image workflow for main and release branches Build and push -dev variants of EPP and sidecar container images on pushes to main and release-* branches, tagged with commit SHA. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * ci: extract reusable build workflow and tag dev images by branch Refactor ci-release and ci-dev to call a shared ci-build-images reusable workflow, reducing duplication. Tag dev images with the branch name instead of commit SHA so each branch has exactly one image that gets overwritten on push, avoiding image accumulation. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Newlines at EOF Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
…gs (#682) * feat(sidecar): simplify TLS command line options with StringSlice flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> * address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> * keep deprecated flags Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* fix: simplify InferencePool flag to namespace/name format Signed-off-by: Guangya Liu <gyliu513@gmail.com> * Address comments from Etai Signed-off-by: Guangya Liu <gyliu513@gmail.com> --------- Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: Guangya Liu <gyliu513@gmail.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* initial E/PD extension of the sidecar Signed-off-by: roytman <roytman@il.ibm.com> * add Encoding span.SetAttributes, fix bug in connector_sglang_test.go Signed-off-by: roytman <roytman@il.ibm.com> * fix e2e tests Signed-off-by: roytman <roytman@il.ibm.com> * Update pkg/sidecar/proxy/connector_epd_shared_storage.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> * flags order Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> * remove redundant comment Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com> Signed-off-by: Alexey Roytman <roytman@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com>
* Check for uniqueness of media URLs Signed-off-by: roytman <roytman@il.ibm.com> * fix comments Signed-off-by: roytman <roytman@il.ibm.com> --------- Signed-off-by: roytman <roytman@il.ibm.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Signed-off-by: roytman <roytman@il.ibm.com>
* Implement Options pattern for sidecar proxy Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * options.go and options_test.go review fixes and code enhancements Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * enhance tests, enhance options pattern implementation Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * simplify InferencePool flag to namespace/name format - PR #685 Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * code enhancements, resolve review comments Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * set default connector and KVConnector and migrate connector flag to KVConnector Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> * add complete before validate in tests, align KVConnector and Connector initialization as before Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com> --------- Signed-off-by: mohamedmahameed <mohamed.mahameed@ibm.com>
Signed-off-by: roytman <roytman@il.ibm.com>
Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 3 to 4. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](dorny/paths-filter@v3...v4) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
…red (#691) * NonCachedTokens defines the minimum number of non-cached tokens required to trigger disaggregated PD. A value of 0 disables disaggregation. Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> * updated the signature Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> * updated logging to Construtcor Signed-off-by: Modassar-Rana <modassar.rana@ibm.com> --------- Signed-off-by: Modassar-Rana <modassar.rana@ibm.com>
…support (#694) * Bump kv-cache and GAIE This is needed to consume the new changes related to consuming tokens directly. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Update import paths for GAIE and kv-cache API changes Adapt import paths for packages renamed in GAIE: - pkg/common/util/logging -> pkg/common/observability/logging - pkg/epp/util/metrics -> pkg/common/observability/metrics - pkg/epp/datalayer/http -> pkg/epp/framework/plugins/datalayer/source/http - pkg/epp/datalayer/plugins/approximateprefix -> pkg/epp/framework/plugins/datalayer/attribute/prefix Also adapt to kv-cache API change where NewHTTPDataSource now returns (ds, error) instead of just ds. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Adapt precise-prefix-cache-scorer to kv-cache API changes Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add external tokenizer PrepareData plugin and TokenizedPrompt scorer path Add a PrepareData plugin that calls a tokenizer sidecar over UDS gRPC to tokenize prompts before scheduling. The plugin populates request.TokenizedPrompt with token IDs from the sidecar. It supports both completions (Render) and chat completions (RenderChat) requests. The plugin is fail-open: tokenization errors are logged and the request proceeds without TokenizedPrompt. Update precise-prefix-cache-scorer to use pre-tokenized input when TokenizedPrompt is present on the request, calling GetPodScoresFromTokens to skip internal tokenization. Falls back to existing internal tokenization path when TokenizedPrompt is not set. Register the new tokenizer plugin in RegisterAllPlugins. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add deployment config and kind environment for external tokenizer Add EPP config with prepareDataPlugins feature gate and tokenizer plugin pointing at /tmp/tokenizer/tokenizer-uds.socket. Update kind-dev-env.sh to support EXTERNAL_TOKENIZER_ENABLED flag for selecting the tokenizer config. Signed-off-by: Antonio Cardace <acardace@redhat.com> * tokenizer: use interface to be able to mock in unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add tokenizer plugin unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * pprefix-cache-scorer: use interface to be able to mock in unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add precise-prefix-cache tokenized unit tests Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add e2e test for external tokenizer PrepareData plugin Signed-off-by: Antonio Cardace <acardace@redhat.com> * Move Consumes() from PrefixBasedPDDecider to PdProfileHandler The GAIE dependency bump introduced plugin layer execution order validation. PrefixBasedPDDecider has no scheduling interface so it falls into DefaultLayer (-1), while its producer prefix-cache-scorer is SchedulingLayer (2). The check (2 > -1) fails. Move Consumes() to PdProfileHandler which is a ProfileHandler (SchedulingLayer = 2), matching the producer's layer. This is semantically correct as PdProfileHandler is the actual scheduling plugin that consumes the data through the decider. Signed-off-by: Antonio Cardace <acardace@redhat.com> * Add end-to-end benchmark for external tokenizer + scorer flow Benchmarks the request latency through the EPP when using the external tokenizer PrepareData plugin with the precise-prefix-cache-scorer. Includes completion, chat completion, shared-prefix, and multi-message scenarios with prompt token count reporting. Usage: EXTERNAL_TOKENIZER_ENABLED=true KV_CACHE_ENABLED=true make env-dev-kind make bench-external-tokenizer Signed-off-by: Antonio Cardace <acardace@redhat.com> --------- Signed-off-by: Antonio Cardace <acardace@redhat.com>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…1.28 (#727) * Corrected configuration file Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * No longer set primaryPort plugin parameter Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Deprecate use of primaryPort plugin parameter Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Deprecate use of the x-data-parallel-host-port header Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Add a real plugin Handle to the DP tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Make sure user understands that Istio >= 1.28.1 is needed Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
* build: remove CGO dependency by migrating to pure-Go ZMQ Update llm-d-kv-cache to v0.6.1-0.20260317063900-80aba2cb5a99 (main snapshot), pending merge of llm-d/llm-d-kv-cache#431 which switches from pebbe/zmq4 (CGO) to a pure-Go ZMQ implementation. - go.mod/go.sum: bump kv-cache to current main pseudo-version (placeholder; will be updated to a real tag once llm-d/llm-d-kv-cache#431 is merged) - Makefile: set CGO_ENABLED=0; drop check-dependencies prereq from test targets - Makefile.tools.mk: remove ##@ Dependencies section (check/install-dependencies) - Dockerfile.epp: remove EPEL + zeromq install steps; set CGO_ENABLED=0 in build - DEVELOPMENT.md: remove ZeroMQ from prerequisites list - .github/workflows/ci-pr-checks.yaml: remove CGO configuration and install-dependencies steps; remove CGO env vars from lint step NOTE: This commit is intentionally draft — go.mod must be updated to the tagged kv-cache release that includes the pure-Go ZMQ changes before merging. Signed-off-by: Etai Lev Ran <elevran@gmail.com> * update kv-cache (pure Go zmq, tip of main) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * revert to micro image once CGO is disabled Signed-off-by: Etai Lev Ran <elevran@gmail.com> --------- Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.79.2 to 1.79.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.79.2...v1.79.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.79.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Etai Lev Ran <elevran@gmail.com>
Add idleThreshold and maxBusyScore parameters to create a scoring gap between idle and busy pods, helping distribute prefix cache warmup. - idleThreshold: max requests to be considered idle (default: 0) - maxBusyScore: max score for busy pods (default: 1.0 for current behavior) Examples: - Binary mode: idleThreshold=0, maxBusyScore=0 (idle=1.0, busy=0.0) - Hybrid mode: idleThreshold=0, maxBusyScore=0.5 (idle=1.0, busy=0-0.5) - Flexible: idleThreshold=2, maxBusyScore=0.5 (≤2 req = idle) Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Docker build enhancements - reorder steps so go.mod/go.sum are in a cacheable layer (ie when not modified) - strip debug info by default (stacktraces are NOT affected). Can override with LD_FLAGS - cache CICD on GH action - other nits and clean ups (e.g., unused Python arg, comments) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * pass GO build vars inline, not as ENV settings Signed-off-by: Etai Lev Ran <elevran@gmail.com> * allow EPP and sidecar images to run concurrently Signed-off-by: Etai Lev Ran <elevran@gmail.com> --------- Signed-off-by: Etai Lev Ran <elevran@gmail.com>
* feat: speculative indexing for PrecisePrefixCacheScorer Signed-off-by: bongwoobak <bongwoobak@gmail.com> * fix: use ip:port format for PodIdentifier to match KV event topics Signed-off-by: bongwoobak <bongwoobak@gmail.com> * fix: split Address/Port in test endpoints to match production ip:port format Signed-off-by: bongwoobak <bongwoobak@gmail.com> * make SpeculativeIndexing optional Signed-off-by: bongwoobak <bongwoobak@gmail.com> * feat: use confirmed-only scores for PrefixCacheServers cycle state Signed-off-by: bongwoobak <bongwoobak@gmail.com> * refactor: update PodEntry usage for Annotations struct Signed-off-by: bongwoobak <bongwoobak@gmail.com> * refactor: replace speculativeCache.Start() with cleanCachePeriodically() Signed-off-by: bongwoobak <bongwoobak@gmail.com> * fix: add nil metadata guard in PrepareRequestData Signed-off-by: bongwoobak <bongwoobak@gmail.com> * refactor: remove confirmedScores and simplify Annotations usage Signed-off-by: bongwoobak <bongwoobak@gmail.com> * fix: adapt to NewChunkedTokenDatabase signature change from PR 415 Signed-off-by: bongwoobak <bongwoobak@gmail.com> * refactor: use KVBlockScorer for scoring and human-readable speculativeTTL Replace computeScoresFromKeyToPods with kvcache.KVBlockScorer.Score() to properly integrate device-backend weight configuration. Change SpeculativeTTL from time.Duration (nanoseconds) to string, parsed via time.ParseDuration, for human-readable config values like "2s" or "500ms". Compact docs/configuration.md per review feedback. Signed-off-by: bongwoobak <bongwoobak@gmail.com> * fix: gofmt import ordering in precise_prefix_cache.go Signed-off-by: bongwoobak <bongwoobak@gmail.com> * docs: move speculative indexing config into architecture.md Signed-off-by: bongwoobak <bongwoobak@gmail.com> --------- Signed-off-by: bongwoobak <bongwoobak@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )