v0.2.0
Overview
the image is available here: docker pull ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.0
Major Highlights
-
Enhanced Scheduling Framework: enhanced the scheduling framework to include additional extension points and established a well defined mechanism for scheduler plugins inter-communication.
-
New Config API: A new Config API which allows the configuration of plugins through a config file without touching core code.
-
Helm Charts: helm chart update to support the reuse of Config API easily.
-
Plugins Improvements: improved multiple plugins implementations and consolidated some of the GIE and llm-d plugins into a single robust implementation.
What's Changed
- Fixes according to review comments in issue #109 by @mayabar in #129
- update GIE upstream version + code adaptations by @nirrozenbaum in #135
- test: Improve scheduler test by @irar2 in #139
- Wait for storage pods to exist before moving on by @david-martin in #138
- Dependabot configuration for Go, GH acions and Docker by @elevran in #141
- link checker - fails PR if links are broken by @clubanderson in #130
- fix: Cleanup build and kind development environment by @shmuelk in #156
- Replace ubi9 image for ubi-minimal to reduce footprint by @oglok in #153
- [Build]: Update clean-env-dev-kind target in Makefile and fix vllm-sim image version by @kfirtoledo in #155
- remove version.json references from Makefile by @elevran in #137
- simplify config by @nirrozenbaum in #171
- Refactor Redis configuration handling in KV cache scorer by @relyt0925 in #172
- fix: Add MODEL_NAME to kind deploy by @kfirtoledo in #178
- feat: A small set of CI updates by @shmuelk in #176
- DEVELOPMENT.md minor fixes by @d0w in #175
- deps(go): bump the go-dependencies group across 1 directory with 7 updates by @dependabot[bot] in #180
- docs: clarify scorer and filter configuration reference by @kfirtoledo in #185
- Prevent auto updates of GIE by @elevran in #186
- Make URL construction IPv6-compatible by @russellb in #182
- Prefix aware scorer initialization by @mayabar in #143
- update scheduler to use latest GIE by @nirrozenbaum in #179
- fixed by labels link in the docs example markdown by @nirrozenbaum in #194
- removed unused passthrough and random filter/scorer by @nirrozenbaum in #195
- sync with latest GIE after some changes to the scheduling plugins by @nirrozenbaum in #197
- build: add support for development on kubernetes cluster by @kfirtoledo in #190
- chore: Update dependencies to use the latest GIE by @shmuelk in #202
- deps(go): bump github.com/redis/go-redis/v9 from 9.10.0 to 9.11.0 in the go-dependencies group across 1 directory by @dependabot[bot] in #203
- fix: rename kvcache-aware to use underscore to match other src files by @nekomeowww in #205
- Change Prefill and Decode filters to be based on a common filter by @mayabar in #188
- fix: rename pd-profile-handler to use underscore to match other src files by @nekomeowww in #210
- refactor: removes the prefix-aware scorer in favor of the unified GIE prefix scorer by @kfirtoledo in #207
- feat: Add factory functions for all plugins by @shmuelk in #208
- fix: allow prefill when prefixState is unavailable by @kfirtoledo in #211
- Add IDE files to .gitignore by @terrytangyuan in #213
- updated README by @nirrozenbaum in #204
- Fixed some typos by @terrytangyuan in #212
- Fix outdate debug info by @carlory in #216
- Migrate the llm-d-inference-scheduler's configuration to the new text based configuration by @shmuelk in #214
- remove dependency on datastore + updated TypedName of plugins by @nirrozenbaum in #215
- filter rename by @nirrozenbaum in #224
- fixed broken link by @nirrozenbaum in #226
- Add unit tests for session affinity scorer by @sagar0x0 in #222
- nit: fix typo by @Jooho in #232
- prefill header is set in the form of ip:port by @nirrozenbaum in #233
- updated scheduler unit test by @nirrozenbaum in #229
- ignore score in scheduler unit test by @nirrozenbaum in #234
- updated prefill header name to x-prefill-host-port by @nirrozenbaum in #236
- fix: add validation for load-aware scorer to handle invalid queue thresholds by @kfirtoledo in #240
- build: sync with latest GIE v0.0.0-20250715021823 by @kfirtoledo in #239
- Integrate prefix-cache configuration into a single knob by @kfirtoledo in #237
- Clear prefill target header if set in incoming request by @elevran in #244
- bump gie version to latest 0.5.0 rc by @nirrozenbaum in #247
- updating GIE CRD version by @Gregory-Pereira in #248
- bump gie to rc3 by @nirrozenbaum in #250
- Always P/D - Disable Prefix-Cache-Aware Decision Making for P/D by Default by @vMaroon in #253
- build: change epp-config default yamls and image pull policy by @kfirtoledo in #249
- Update Prefix-Cache-Scorer
cache_trackingMode with The v0.2 KVCache.Indexer by @dmitripikus in #245
New Contributors
- @irar2 made their first contribution in #139
- @david-martin made their first contribution in #138
- @clubanderson made their first contribution in #130
- @kfirtoledo made their first contribution in #155
- @relyt0925 made their first contribution in #172
- @d0w made their first contribution in #175
- @dependabot[bot] made their first contribution in #180
- @russellb made their first contribution in #182
- @nekomeowww made their first contribution in #205
- @terrytangyuan made their first contribution in #213
- @carlory made their first contribution in #216
- @sagar0x0 made their first contribution in #222
- @Jooho made their first contribution in #232
Full Changelog: v0.1.0...v0.2.0