The stack deployment of vLLM
What's Changed
- [Feat] Added option to specify priority class by @Fabhiahn in #557
- [CI/Build] Change CI runner to L4 by @Shaoting-Feng in #595
- [Bugfix] fix dynamic config by @zerofishnoodles in #598
- [refactor] redesign RST documentation by @kobe0938 in #592
- [Misc] revert uv.lock by @kobe0938 in #604
- [CI/Build] Specify transformers version in router end to end test by @Shaoting-Feng in #607
- [Feat] allow service discovery by service names by @learner0810 in #586
- feat: add hpa for router by @BrianPark314 in #568
- [Misc] add helm configuration values table by @zerofishnoodles in #599
- [Feat] Use sidecar to download lora model for helm deployment by @zerofishnoodles in #618
- [Router] Improve performance of round-robin router by @zhouwfang in #584
- [Feat] Use sidecar to download lora model for operator deployment by @zerofishnoodles in #622
- [Feature] Add method to check Pod termination status and update Pod readiness logic by @KevinCheung2259 in #602
- [Feat] Add Sentry Continuous Profiling Support to vLLM Router by @ikaadil in #624
- [Feat][Helm] Add HTTPRoute template for Gateway API support by @Hexoplon in #610
- [Feat] Init Container "extraVolumeMount" by @cm-enfuse in #600
- [Bugfix][Router]: simplify test payload by @max-wittig in #613
- [Feat] Add Support HAMi resources variables by @andresd95 in #579
- feature/KV-cache-aware-routing by @BrianPark314 in #550
- [Router] Replace httpx with aiohttp in vllm_router for enhanced high-concurrency performance by @ikaadil in #589
- feature/prefix-aware-routing by @BrianPark314 in #546
- [Feat][Router]: add extra support for YAML config file by @antoineauger in #621
- [CI] Add stress testing for router by @kobe0938 in #633
- [Misc] Auto-size Minikube memory via calculate_safe_memory by @fulvius31 in #637
- [Bugfix][Router]: reconfigure callbacks with dynamic config by @antoineauger in #642
- [Doc] Add a missing word in the description by @JiangJiaWei1103 in #645
- [Bugfix] Correct the routing logic for KV cache aware routing by @JiangJiaWei1103 in #648
- [Router][CI/CD and misc.] Add RoundRobinRouter logic testing by @lucas-tucker in #639
- [Docs] Modify the kvaware routing doc by @zerofishnoodles in #652
- [Router] Optimize request parsing by removing duplicate await calls by @ikaadil in #629
- [Feat][Router] Add configurable timeout_seconds for Kubernetes watchers by @ikaadil in #654
- [Misc] Change community meeting time by @zerofishnoodles in #662
- [Bugfix] Fix install script path prefix by @nicolasj92 in #665
- [Feat] Env from secret by @redno2 in #641
- [Bugfix] Fix routing to delete endpoint by @zerofishnoodles in #668
- Bugfix(vllm-operator): add missing RBAC permissions for PVCs and Ingresses by @mahmoudk1000 in #647
- [feat]: add transcription API endpoint using OpenAI Whisper-small by @davidgao7 in #469
- Bump helm chart version by @philandstuff in #674
New Contributors
- @Fabhiahn made their first contribution in #557
- @ikaadil made their first contribution in #624
- @cm-enfuse made their first contribution in #600
- @andresd95 made their first contribution in #579
- @antoineauger made their first contribution in #621
- @fulvius31 made their first contribution in #637
- @JiangJiaWei1103 made their first contribution in #645
- @nicolasj92 made their first contribution in #665
- @redno2 made their first contribution in #641
- @mahmoudk1000 made their first contribution in #647
- @davidgao7 made their first contribution in #469
- @philandstuff made their first contribution in #674
Full Changelog: vllm-stack-0.1.6...vllm-stack-0.1.7