Releases: llm-d/llm-d-inference-scheduler
Releases · llm-d/llm-d-inference-scheduler
v0.2.0-RC2
Image:
ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-rc.2
What's Changed
- bump GIE to v0.5.0-RC3
- Disable Prefix-Cache-Aware decision, making P/D the default
- build: change epp-config default manifests` image pull policy
- Update Prefix-Cache-Scorer
cache_trackingmode to use v0.2 KVCache.Indexer
Full Changelog: v0.2.0-RC1...v0.2.0-rc.2
v0.2.0-RC1
Image:
ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1
What's Changed
- Update to GIE v0.5.0
- Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
- Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
- Support for secure communications between Prefill and Decode worker (via sidecar).
- Label selector based filtering, to implement various model server topologies (e.g., LWS based).
- Numerous bug fixes from multiple contributors.
New Contributors
- @irar2 made their first contribution in #139
- @david-martin made their first contribution in #138
- @clubanderson made their first contribution in #130
- @kfirtoledo made their first contribution in #155
- @relyt0925 made their first contribution in #172
- @d0w made their first contribution in #175
- @russellb made their first contribution in #182
- @nekomeowww made their first contribution in #205
- @terrytangyuan made their first contribution in #213
- @carlory made their first contribution in #216
- @sagar0x0 made their first contribution in #222
- @Jooho made their first contribution in #232
Full Changelog: v0.1.0...v0.2.0-RC1
v0.1.0
What's Changed
- Merge Dev by @vMaroon in #96
- fixing cross builds by @Gregory-Pereira in #112
- bug fixes related to pd scheduling by @mayabar in #115
- fix bug in session scorer by @mayabar in #117
- docs/tutorial: creating a new scheduler filter by @elevran in #118
- Support running disaggregated PD on Kind by @shmuelk in #121
- Make prefix block size configurable via env var by @oglok in #120
- fixed broken link by @nirrozenbaum in #122
- Drop llm-d Private Pull Token & Release on Push by @vMaroon in #124
New Contributors
- @Gregory-Pereira made their first contribution in #112
- @nirrozenbaum made their first contribution in #122
Full Changelog: 0.0.3...v0.1.0