Skip to content

vllm-stack-0.1.8

Choose a tag to compare

@github-actions github-actions released this 19 Nov 21:31
a2576d6

The stack deployment of vLLM

What's Changed

  • [Feat] Add GKE example for lmcache cpu ram + local disk offloading by @dannawang0221 in #678
  • [Feat] Use the lmcache 0.3.5 for kvaware routing by @zerofishnoodles in #673
  • [Feat]: add pull policy option to to ray-cluster.yaml (helm chart) by @moriabs88 in #686
  • [Feat] Add support for scaling down to zero in KEDA by @Romero027 in #679
  • [bugfix] Small fix to observability tutorial by @Romero027 in #695
  • [Feat][Router]: add vision model type by @max-wittig in #603
  • Adding Support for Sleep Mode for vLLM Container without Command Args by @dumb0002 in #696
  • [Bugfix] Increase liveness failure threshold for crd by @zerofishnoodles in #688
  • [bugfix] Add close method for static discovery by @zerofishnoodles in #692
  • [Bugfix][Router]: loop through model_names by @max-wittig in #694
  • [Misc] bump up otel col version and use a simplified image by @JaredTan95 in #698
  • [vllm-router] fall back to remote tokenizer as 2nd path by @panpan0000 in #702
  • [Bugfix][Router]: do not filter by model label in transcription by @max-wittig in #712
  • [CI] move e2e machine to self hosted by @zerofishnoodles in #716
  • [Feat] Add Production-ready vLLM EKS terraform stack tutorial by @brokedba in #704
  • [bugfix] Add annotation to pod after loading the lora adapter to trigger the modify event by @zerofishnoodles in #703
  • [Feat] [Router] [Misc] [Doc] increased configurability of affinity and probes by @Garrukh in #715
  • [Bugfix] fix pd client initialization issue by @zerofishnoodles in #717
  • [Bugfix] Update aiohttp to resolve CVE-2024-23334 vulnerability by @ikaadil in #722
  • [Bugfix/Feature] Support extraPorts in service-vllm by @NargiT in #725
  • Update gateway-inference-extension.rst by @linsun in #728
  • feat(helm): Use emptyDir as pvcStorage by @Jimmy-Newtron in #616
  • [Bugfix] Support service discovery by service name: add missing role and rolebinding for #586 by @NargiT in #724
  • Update doc 04-GCP-GKE-lmcache-local-disk.md by @dannawang0221 in #727
  • [Feat] Enable MIG support for Ray Head Node using chart.resources helper by @shima8823 in #732
  • [feat] Enable session key in request body by @zerofishnoodles in #741
  • [Feat] Add basic integration path for semantic router by @zerofishnoodles in #740
  • [Bugfix] Pod rolebinding are requiered even with k8s_discovery_mode=serivce-name by @NargiT in #744
  • [Feat] allow annotation on router pod by @NargiT in #743
  • [Integration]: Add Intelligent Semantic Routing with vLLM-SR by @Xunzhuo in #750
  • [Integration]: Update Docs with vLLM-SR by @Xunzhuo in #752
  • [Bugfix] kv aware routing for lmcache 0.3.9 by @zerofishnoodles in #697
  • [Feat] Ability to add labels to model pvc by @NargiT in #754
  • [Bugfix] Helm: Add security context support, fix #756 by @aplufr in #757
  • [Bugfix] lmcache server points to wrong file in entrypoint by @Senne-Mennes in #730
  • [Feat] Add per-model runtimeClassName configuration support by @HanFa in #755
  • Bumping version to 0.1.8 by @YuhanLiu11 in #738

New Contributors

Full Changelog: vllm-stack-0.1.7...vllm-stack-0.1.8