v0.4.0-rc.1
Pre-release
Pre-release
·
3 commits
to release-0.4
since this release
TL;DR
- We have made major refactor to the EPP, allowing for a more modular and maintainable system.
- As a part of this overall, we have implemented a pluggable, extendable scheduler system. Allowing users to create their own custom, sophisticated routing logic
- We have also included native support for Prefix Cache Aware Routing
What's Changed
- Adding larger logo by @robscott in #630
- Minor fixes to the user guide by @nicolexin in #633
- Add istio to implementations.md by @LiorLieberman in #631
- Update e2e test config by @kfswain in #636
- Fix parsing issue in BBR helm by @rramkumar1 in #638
- fixed bug - sleep is expecting to get a string by @nirrozenbaum in #618
- #632 Add favicon for doc site by @Conor0Callaghan in #634
- Move integration test utils to central package by @rramkumar1 in #626
- BBR readme fixes by @rramkumar1 in #640
- Add integration tests to exercise streaming mode in BBR by @rramkumar1 in #627
- Adding 2 new reviewers to the reviewers alias by @kfswain in #644
- Add initial implementer's guide by @nicolexin in #635
- Update BBR istio.yaml to use FULL_DUPLEX_STREAMED mode by @rramkumar1 in #629
- Docs: Bumps Kgateway to v2.0.0 by @danehans in #646
- remove deprecated v1alpha2.AddToScheme and use v1alpha2.Install instead by @nirrozenbaum in #649
- removed time.sleep and using ticker instead by @nirrozenbaum in #648
- update release version in README by @nirrozenbaum in #653
- fix some issues in e2e tests by @nirrozenbaum in #621
- Refactor scheduler to make it more readable by @liu-cong in #645
- Getting started docs version bump by @SachinVarghese in #654
- expose "Normalized Time Per Output Token" (NTPOT) metric by @kaushikmitr in #643
- Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.23.4 by @dependabot in #657
- Bump google.golang.org/grpc from 1.71.0 to 1.71.1 by @dependabot in #658
- Fix links and description in implementations.md by @xiaolin593 in #650
- fix manifests and description in the user guides by @cr7258 in #652
- Bump github.com/onsi/gomega from 1.36.3 to 1.37.0 by @dependabot in #659
- adjust the gpu deployment to increase max batch size by @ahg-g in #642
- Cleaning up config pkg by @ahg-g in #663
- Rename pkg/body-based-routing to pkg/bbr by @rramkumar1 in #664
- deploy: Enable logging for GKE gateway by default by @smarterclayton in #666
- moved IsPodReady func to podutils by @nirrozenbaum in #662
- removed double loop on docs in hermetic test by @nirrozenbaum in #668
- fix bbr dockerfile that was broken in PR #664 by @nirrozenbaum in #669
- Use dedicated namespace for e2e test code by @rramkumar1 in #661
- cleaning up inferencePool helm docs by @ahg-g in #665
- move inf model IsCritial func out of datastore by @nirrozenbaum in #670
- Consolidating down to FULL_DUPLEX_STREAMED supported ext-proc server by @kfswain in #672
- Document model server compatibility and config options by @liu-cong in #537
- Bump github.com/prometheus/client_model from 0.6.1 to 0.6.2 by @dependabot in #687
- Bump github.com/prometheus/client_golang from 1.21.1 to 1.22.0 by @dependabot in #688
- added badges to README by @nirrozenbaum in #682
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.6.0 to 4.7.0 by @dependabot in #686
- docs(gateways): fix Envoy AI Gateway link by @maxbrunet in #700
- minor changes in few places by @nirrozenbaum in #702
- Docs: Adds Kgateway Cleanup to Quickstart by @danehans in #701
- using namespaced name by @nirrozenbaum in #707
- EPP Architecture proposal by @kfswain in #683
- removed unused Fake struct by @nirrozenbaum in #723
- epp: return correct response for trailers by @howardjohn in #726
- Refactor scheduler to run plugins by @liu-cong in #677
- Complete the InferencePool documentation by @nicolexin in #673
- reduce log level in metrics logger not to trash the log by @nirrozenbaum in #708
- few updates in datastore by @nirrozenbaum in #713
- scheduler restructuring by @nirrozenbaum in #730
- filter irrelevant pods in pod controller by @nayihz in #696
- EPP: Update GetRandomPod() to return nil if no pods exist by @danehans in #731
- Move filter and scorer plugins registration to a separate file by @mayabar in #729
- Update issue templates by @kfswain in #738
- docs: add concepts and definitions to README.md by @shaneutt in #734
- Add unit tests for pod APIs under pkg/datastore by @rlakhtakia in #712
- added a target dedicated for running unit-test only by @nirrozenbaum in #739
- Updating proposal directories to match their PR number by @kfswain in #741
- Fixing errors in new template & disabling the default blank template by @kfswain in #742
- fixed broken link to implementations by @nirrozenbaum in #750
- Weighted scorers by @nirrozenbaum in #737
- add max score picker by @nirrozenbaum in #752
- Add GetEnvString helper function by @liu-cong in #758
- Bump the kubernetes group with 6 updates by @dependabot in #754
- extract pod representation from backend/metrics to backend by @nirrozenbaum in #751
- Request for adding Alibaba Cloud Container Service for Kubernetes (ACK) into implementations by @delavet in #748
- fixed error message in scheduler when no pods are available by @nirrozenbaum in #759
- feat: Initial setup for conformance test suite by @SinaChavoshi in #720
- Move scheduler initialization up to the main by @liu-cong in #757
- Add inference_extension_info metric for project metadata by @JeffLuoo in #744
- chore: make SchedulerConfig fields configurable by @shaneutt in #764
- fix: pass commit hash from the cloud build default variable by @JeffLuoo in #763
- Small refactor to capture request data for route. by @kfswain in #765
- Add queue and kv-cache scorers by @liu-cong in #762
- Add scheduler e2e latency metric by @liu-cong in #767
- Parse request x-request-id and expose it in contextual logger by @delavet in #746
- put SchedulerConfig fields private again. added NewSchedulerConfig func by @nirrozenbaum in #771
- Create unit test for request handler by @rlakhtakia in #745
- Add feature request link for adding Triton LoRA metric by @liu-cong in #773
- remove EndpointSlice from RBAC by @nirrozenbaum in #774
- passing headers to scheduler plugins by @nirrozenbaum in #775
- add labels to pod metadata for the use of scheduler plugins by @nirrozenbaum in #779
- Update istio version by @LiorLieberman in #780
- feat: Add metric that records length of queue for each model server pods by @JeffLuoo in #776
- chore: update golang.google.org/grpc dep from v1.71.1 to v1.72.0 by @shaneutt in #777
- docs: fixed inference pool docs by @capri-xiyue in #784
- Bump sigs.k8s.io/gateway-api from 1.2.1 to 1.3.0 by @dependabot in #785
- Healthcheck fix by @kfswain in #788
- Docs: Updates Benchmark Guide by @danehans in #789
- e2e: Fixes 404 Not Found Error by @danehans in #793
- EPP architectural refactor by @kfswain in #781
- remove empty request_test.go file. by @nirrozenbaum in #796
- Clean up filters by @liu-cong in #802
- Refactor: Improve env utility by @LukeAVanDrie in #803
- refactor scheduler filters package by @nirrozenbaum in #797
- fix labels not cloned bug by @nirrozenbaum in #804
- fixed datastore bug to clean all go routines when pool is unset by @nirrozenbaum in #810
- Optimize Dockerfile for Multiple Extensions by @GunaKKIBM in #811
- merge has capacity filter with sheddable filter. by @nirrozenbaum in #809
- feat(conformance): Add initial InferencePool tests and shared Gateway setup by @SinaChavoshi in #772
- Add prefix cache aware scheduling by @liu-cong in #768
- merge functions in env utils by @nirrozenbaum in #819
- generalize scheduling cycle state concept by @nirrozenbaum in #818
- remove Model field from LLMRequest by @nirrozenbaum in #782
- feat: Add support to invoke PostResponse plugins by @shmuelk in #800
- Add prefix aware request scheduling proposal by @liu-cong in #602
- Docs: Bumps Kgateway to v2.0.2 by @danehans in #823
- renamed Metrics to MetricsState and move to a separate file by @nirrozenbaum in #822
- feat: Add build reference to the info metrics by @JeffLuoo in #817
- Introduce SaturationDetector component by @LukeAVanDrie in #808
- support extracting prompt from chat completions API by @delavet in #798
- Fix Test Flakiness by adding short sleep in TestMetricsRefresh by @LukeAVanDrie in #824
- chore(conformance): Add timeout configuration by @SinaChavoshi in #795
- Scheduler subsystem high level design proposal by @smarterclayton in #603
- Updating top level readme by @kfswain in #831
- Meeting is at 10am, not 8 by @alexsnaps in #836
- docs: roll out guide by @capri-xiyue in #829
- reduce log level of "prefix cached servers" to TRACE by @nirrozenbaum in #842
- add regression testing docs by @kaushikmitr in #755
- fixed log before picker by @nirrozenbaum in #844
- Reorganize scheduling plugins by @liu-cong in #837
- updated godoc on scheduler filters, pickers and prefix plugin by @nirrozenbaum in #850
- Fix: Ignore header order in hermetic test by @LukeAVanDrie in #849
- Bump the kubernetes group with 6 updates by @dependabot in #851
- Bump github.com/prometheus/common from 0.63.0 to 0.64.0 by @dependabot in #853
- Updating readme to reflect llm-d collab! by @kfswain in #855
- fix: typo ('endpoing' -> 'endpoint') by @t3hmrman in #857
- Updating readme wording by @kfswain in #858
- adding logging & support for better Client response by @kfswain in #847
- Adding util func for splitting large bodies into chunks by @kfswain in #859
- Scheduler config refactor for simplifying plugins registration by @nirrozenbaum in #835
- Chunk implementation by @kfswain in #860
- feat: merge two metric servers by @nayihz in #728
- docs: added examples to address various generative AI application scenarios by using gateway api inference extension by @capri-xiyue in #812
- docs: Update link to Slack channel by @terrytangyuan in #867
- Multi cycle scheduler by @nirrozenbaum in #862
- feat(conformance): Add test for HTTPRouteInvalidInferencePoolRef by @SinaChavoshi in #807
- feat(conformance): tests for inferencepool_resolvedrefs_condition by @SinaChavoshi in #832
- Update
002-api-proposal/
to reflectapi/v1alpha2
inferencePool and InferenceModel by @shotarok in #870 - use namespacedname instead of name/namespace as separate args in tests by @nirrozenbaum in #873
- remove the PreCycle plugin from scheduler by @nirrozenbaum in #876
- feat(conformance): Update InferencePoolResolvedRefsCondition test for E2E request validation by @SinaChavoshi in #866
- minor changes to saturation detector by @nirrozenbaum in #882
- updated controller-runtime to v0.21.0 and its dependencies by @nirrozenbaum in #890
- fix: broken ext-proc links by @Xunzhuo in #894
- Initial Scheduler Subsystem interface by @kfswain in #845
- fix(README): typo on dashboard by @EyalPazz in #904
- Fix typos and lint errors to pass golangci-lint by @shotarok in #902
- docs: update Istio gateway name for consistency by @shotarok in #903
- chor(conformance): fix header and remove extra comments by @SinaChavoshi in #883
- Tools: Fixes test-e2e.sh script by @danehans in #900
- Amend the endpoint picker protocol to support multiple fallback endpoints by @wbpcode in #761
- remove SchedulingContext, flatten scheduler interfaces by @nirrozenbaum in #889
- Boilerplate verification to ensure LICENSE information is present by @bharathbrat in #880
- Update the Cleanup section for Istio in Getting Started by @shotarok in #906
- Refactor: Externalize Scheduler's saturation logic and criticality-based service differentiation by @LukeAVanDrie in #805
- chore(deps): bump the kubernetes group with 6 updates by @dependabot in #908
- chore(deps): bump github.com/go-logr/logr from 1.4.2 to 1.4.3 by @dependabot in #909
- Adds vLLM Simulator Support by @danehans in #898
- fixed typo in makefile by @nirrozenbaum in #913
- test chat completions api in e2e case by @delavet in #868
- added GetEnvBool function and unit-tests by @nirrozenbaum in #916
- [Refactor] Simplify hermetic test setup for EPP by @LukeAVanDrie in #917
- move PostResponse plugins to requestcontrol instead of scheduler by @nirrozenbaum in #914
- renamed interface from PostResponsePlugin to PostResponse by @nirrozenbaum in #919
- Remove redundant SheddableCapacityFilter. by @LukeAVanDrie in #910
- Add the option to specify epp env vars in helm chart by @liu-cong in #924
- remove Critical boolean from scheduling request by @nirrozenbaum in #921
- Add prefix cache plugin configuration guide by @liu-cong in #923
- added context argument to scheduling profile picker by @nirrozenbaum in #926
- Bumps vLLM Simulator Tag by @danehans in #930
- feat(Conformance): Add a header based filter to make a controllable epp behavior determined by request header. by @zetxqx in #922
- Docs: Fixes Meeting Recording Link by @danehans in #931
- metrics: Add documentation for sample alert rules by @JeffLuoo in #912
- scheduler proposal continuation by @nirrozenbaum in #905
- Changes to multi-model guide in documentation by @elevran in #941
- docs: use inference gateway terminology by @capri-xiyue in #891
- scheduler redesign continuation by @nirrozenbaum in #937
- fix: Mark alert block as yaml to fix syntax error by @JeffLuoo in #954
- docs: Try to polish the go doc comments for InferenceModelSpec by @waltforme in #948
- docs: dashboards README metrics link fix by @EyalPazz in #952
- chore: update golang.google.org/grpc dep from v1.71.1 to v1.72.0 by @ahg-g in #965
- Pin PyPI package versions in requirements.txt by @shotarok in #963
- Fix the DestinationRule's referenced model service name by @keithmattix in #958
- moved main code to runner package under epp/cmd by @nirrozenbaum in #956
New Contributors
- @Conor0Callaghan made their first contribution in #634
- @SachinVarghese made their first contribution in #654
- @xiaolin593 made their first contribution in #650
- @cr7258 made their first contribution in #652
- @maxbrunet made their first contribution in #700
- @howardjohn made their first contribution in #726
- @nayihz made their first contribution in #696
- @mayabar made their first contribution in #729
- @shaneutt made their first contribution in #734
- @rlakhtakia made their first contribution in #712
- @delavet made their first contribution in #748
- @SinaChavoshi made their first contribution in #720
- @capri-xiyue made their first contribution in #784
- @LukeAVanDrie made their first contribution in #803
- @GunaKKIBM made their first contribution in #811
- @shmuelk made their first contribution in #800
- @alexsnaps made their first contribution in #836
- @t3hmrman made their first contribution in #857
- @shotarok made their first contribution in #870
- @EyalPazz made their first contribution in #904
- @wbpcode made their first contribution in #761
- @bharathbrat made their first contribution in #880
- @zetxqx made their first contribution in #922
- @elevran made their first contribution in #941
- @waltforme made their first contribution in #948
Full Changelog: v0.3.0...v0.4.0-rc.1