Skip to content

v0.2.0-RC1

Pre-release
Pre-release

Choose a tag to compare

@elevran elevran released this 17 Jul 16:39
· 106 commits to main since this release
80cd926

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1

What's Changed

  • Update to GIE v0.5.0
  • Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
  • Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
  • Support for secure communications between Prefill and Decode worker (via sidecar).
  • Label selector based filtering, to implement various model server topologies (e.g., LWS based).
  • Numerous bug fixes from multiple contributors.

New Contributors

Full Changelog: v0.1.0...v0.2.0-RC1