v0.2.0-RC1

Pre-release

Pre-release

elevran released this 17 Jul 16:39

· 106 commits to main since this release

80cd926

Image:

ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1

What's Changed

Update to GIE v0.5.0
Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
Support for secure communications between Prefill and Decode worker (via sidecar).
Label selector based filtering, to implement various model server topologies (e.g., LWS based).
Numerous bug fixes from multiple contributors.

New Contributors

@irar2 made their first contribution in #139
@david-martin made their first contribution in #138
@clubanderson made their first contribution in #130
@kfirtoledo made their first contribution in #155
@relyt0925 made their first contribution in #172
@d0w made their first contribution in #175
@russellb made their first contribution in #182
@nekomeowww made their first contribution in #205
@terrytangyuan made their first contribution in #213
@carlory made their first contribution in #216
@sagar0x0 made their first contribution in #222
@Jooho made their first contribution in #232

Full Changelog: v0.1.0...v0.2.0-RC1

Contributors

russellb, clubanderson, and 10 other contributors

Assets 2