Releases · llm-d/llm-d-inference-scheduler · GitHub

21 Jul 13:30

elevran

v0.2.0-RC2 Pre-release

Pre-release

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-rc.2

What's Changed

bump GIE to v0.5.0-RC3
Disable Prefix-Cache-Aware decision, making P/D the default
build: change epp-config default manifests` image pull policy
Update Prefix-Cache-Scorer cache_tracking mode to use v0.2 KVCache.Indexer

Full Changelog: v0.2.0-RC1...v0.2.0-rc.2

Assets 2

17 Jul 16:39

elevran

v0.2.0-RC1 Pre-release

Pre-release

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1

What's Changed

Update to GIE v0.5.0
Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
Support for secure communications between Prefill and Decode worker (via sidecar).
Label selector based filtering, to implement various model server topologies (e.g., LWS based).
Numerous bug fixes from multiple contributors.

New Contributors

@irar2 made their first contribution in #139
@david-martin made their first contribution in #138
@clubanderson made their first contribution in #130
@kfirtoledo made their first contribution in #155
@relyt0925 made their first contribution in #172
@d0w made their first contribution in #175
@russellb made their first contribution in #182
@nekomeowww made their first contribution in #205
@terrytangyuan made their first contribution in #213
@carlory made their first contribution in #216
@sagar0x0 made their first contribution in #222
@Jooho made their first contribution in #232

Full Changelog: v0.1.0...v0.2.0-RC1

Contributors

russellb, clubanderson, and 10 other contributors

Assets 2

20 May 12:03

elevran

v0.1.0 Pre-release

Pre-release

What's Changed

Merge Dev by @vMaroon in #96
fixing cross builds by @Gregory-Pereira in #112
bug fixes related to pd scheduling by @mayabar in #115
fix bug in session scorer by @mayabar in #117
docs/tutorial: creating a new scheduler filter by @elevran in #118
Support running disaggregated PD on Kind by @shmuelk in #121
Make prefix block size configurable via env var by @oglok in #120
fixed broken link by @nirrozenbaum in #122
Drop llm-d Private Pull Token & Release on Push by @vMaroon in #124

New Contributors

@Gregory-Pereira made their first contribution in #112
@nirrozenbaum made their first contribution in #122

Full Changelog: 0.0.3...v0.1.0

Contributors

oglok, mayabar, and 5 other contributors

Assets 2