v0.2.0-RC1
Pre-release
Pre-release
Image:
ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0-RC1
What's Changed
- Update to GIE v0.5.0
- Flexible file based configuration (breaking change: inference-scheduler is no longer configured via environment variables).
- Improved UX for prefix aware routing by allowing simple selection of either estimated or accurate (KV cache based) tracking of request prefix distribution.
- Support for secure communications between Prefill and Decode worker (via sidecar).
- Label selector based filtering, to implement various model server topologies (e.g., LWS based).
- Numerous bug fixes from multiple contributors.
New Contributors
- @irar2 made their first contribution in #139
- @david-martin made their first contribution in #138
- @clubanderson made their first contribution in #130
- @kfirtoledo made their first contribution in #155
- @relyt0925 made their first contribution in #172
- @d0w made their first contribution in #175
- @russellb made their first contribution in #182
- @nekomeowww made their first contribution in #205
- @terrytangyuan made their first contribution in #213
- @carlory made their first contribution in #216
- @sagar0x0 made their first contribution in #222
- @Jooho made their first contribution in #232
Full Changelog: v0.1.0...v0.2.0-RC1