Skip to content

Releases: llm-d/llm-d-inference-scheduler

v0.2.0-RC2

21 Jul 13:30
v0.2.0-rc.2
eabb332

Choose a tag to compare

v0.2.0-RC2 Pre-release
Pre-release

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-rc.2

What's Changed

  • bump GIE to v0.5.0-RC3
  • Disable Prefix-Cache-Aware decision, making P/D the default
  • build: change epp-config default manifests` image pull policy
  • Update Prefix-Cache-Scorer cache_tracking mode to use v0.2 KVCache.Indexer

Full Changelog: v0.2.0-RC1...v0.2.0-rc.2

v0.2.0-RC1

17 Jul 16:39
80cd926

Choose a tag to compare

v0.2.0-RC1 Pre-release
Pre-release

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1

What's Changed

  • Update to GIE v0.5.0
  • Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
  • Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
  • Support for secure communications between Prefill and Decode worker (via sidecar).
  • Label selector based filtering, to implement various model server topologies (e.g., LWS based).
  • Numerous bug fixes from multiple contributors.

New Contributors

Full Changelog: v0.1.0...v0.2.0-RC1

v0.1.0

20 May 12:03
55c58aa

Choose a tag to compare

v0.1.0 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: 0.0.3...v0.1.0