Skip to content

Releases: vllm-project/production-stack

vllm-stack-0.1.10

27 Feb 23:44
62e8137

Choose a tag to compare

The stack deployment of vLLM

What's Changed

  • Add servingEngineSpec environment variable by @shernshiou in #799
  • [Fix] Handle missing max_tokens in disaggregated prefill requests by @keyuchen21 in #797
  • [Router]: add routes for Image and Audio API by @nmiguel in #820
  • [Router][Fix]: fixed name of images/edits endpoint by @nmiguel in #822
  • Update contact information in README.md by @ruizhang0101 in #821
  • Fix OCI OKE deployment script (entry_point.sh) — end-to-end tested by @fede-kamel in #811
  • mention resources at the values.yaml as valid option by @eladmotola in #806
  • [Doc] Update README for global env on servingEngineSpec by @shernshiou in #814
  • feat(helm): add standard Kubernetes labels to deployments and services by @keyuchen21 in #810
  • [BugFix][Feat]: fix serviceEngineSpec probe field and improve probe management in helm template by @emanuelecassese in #809
  • [Bugfix] Increase router default memory size by @ruizhang0101 in #804
  • [FEAT] Add per-model token and error Prometheus metrics (part of #699) by @ardecode in #813
  • [CI/CD] Add stable router image by @ruizhang0101 in #823
  • [Feat] Add toleration for vllmRunTimes by @mahmoudk1000 in #825
  • [Feat] Operator : add GPUType for resources to replace "nvidia.com/gpu" in vllmruntime by @dotmobo in #829
  • [Bugfix] Update aiohttp and python-multipart by @shernshiou in #831
  • fix: make --log-level CLI argument actually control router log levels by @keyuchen21 in #832
  • fix: Exclude content-length from response headers in route_general_transcriptions by @fidoriel in #733
  • [Feat] Reorder hfTokenSecret for vllmRunTimes by @mahmoudk1000 in #826
  • feat(router): add initial support for anthropic messages endpoint by @nejch in #775
  • [Feat] Add token redaction for logger debug by @shernshiou in #824
  • refactor: replace logging.getLogger() with init_logger() across codebase by @keyuchen21 in #835
  • [CI/CD] add ci/cd for production stack operator by @ruizhang0101 in #843
  • fix: filter hop-by-hop headers from streaming responses by @keyuchen21 in #836
  • fix: upgrade h11 to 0.16.0 to resolve GHSA-vqfr-h8mv-ghfj by @keyuchen21 in #837
  • Increase timeout values in e2e test workflow by @ruizhang0101 in #848
  • [Feat][Router] Add request migration with configurable failover reroute attempts by @ikaadil in #839
  • feat(helm) add support for extra manifests and annotation on pvc by @enneitex in #847
  • feat: add --root-path CLI option for hosting router under a subpath by @keyuchen21 in #844
  • [Misc] Expose LMCache log level as configurable Helm value and default to INFO. by @NargiT in #846
  • [Feat] Add --log-format json option for structured logging by @keyuchen21 in #849
  • [Router]: image edit routes multi-part form request by @nmiguel in #850
  • [Docs] Update readme by @ruizhang0101 in #856
  • Bump chart version to 0.1.10 by @ruizhang0101 in #859

New Contributors

Full Changelog: vllm-stack-0.1.9...vllm-stack-0.1.10

vllm-stack-0.1.9

30 Jan 00:15
20a6580

Choose a tag to compare

The stack deployment of vLLM

What's Changed

  • [Feat] Add imagePullSecrets support for router and cache-server deplo… by @HanFa in #762
  • [Feat] Add production-ready vLLM Nebius MK8s terraform tutorial by @brokedba in #748
  • [Feat] Allow declaring modelSpec resources directly by @danhubern in #729
  • [Router] Introduction of /v1/responses endpoint by @sebastiaanvduijn in #691
  • [Bugfix][Router] Fix router startup race when using multiple replicas by @bcdonadio in #768
  • [Docs] Correct parameter in transcription API tutorial by @davidgao7 in #685
  • [Bugfix] Concurrent requests to model are currently limited to 100 due to aiohttp default by @dermodmaster in #767
  • Update nixlPeerHost to pd-llama-decode-engine-service by @Xunzhuo in #771
  • [Feat] Production Stack Router: Add OpenTelemetry tracing support with W3C context propagation by @HanFa in #772
  • [Feat]: Add support for chatTemplates by @mahmoudk1000 in #779
  • [Build][Router] Update vllm to v0.13.0 by @shernshiou in #770
  • [Feat] Add nodeSelectorTerms for vllmRunTimes by @mahmoudk1000 in #778
  • Update calendar link for community meetings by @ruizhang0101 in #783
  • Update the documentation of the semantic router deployment to use helm by @szedan-rh in #786
  • Fix incorrect import path in batch processor initialization by @keyuchen21 in #784
  • [Build][Router] Update aiohttp by @shernshiou in #793
  • Update Slack channel link in README by @keyuchen21 in #798
  • [Doc] Remove official email link from README by @ruizhang0101 in #805
  • feat(oci): Add Oracle Cloud Infrastructure (OKE) deployment support by @fede-kamel in #794
  • [Feat] add keda support by @eladmotola in #781
  • [CI/Build] Add stable version tags to Docker images during release by @ardecode in #801
  • Fix score payload typo and add regression test by @keyuchen21 in #769
  • [Feat] Include runner and convert flag by @shernshiou in #803

New Contributors

Full Changelog: vllm-stack-0.1.8...vllm-stack-0.1.9

vllm-stack-0.1.8

19 Nov 21:31
a2576d6

Choose a tag to compare

The stack deployment of vLLM

What's Changed

  • [Feat] Add GKE example for lmcache cpu ram + local disk offloading by @dannawang0221 in #678
  • [Feat] Use the lmcache 0.3.5 for kvaware routing by @zerofishnoodles in #673
  • [Feat]: add pull policy option to to ray-cluster.yaml (helm chart) by @moriabs88 in #686
  • [Feat] Add support for scaling down to zero in KEDA by @Romero027 in #679
  • [bugfix] Small fix to observability tutorial by @Romero027 in #695
  • [Feat][Router]: add vision model type by @max-wittig in #603
  • Adding Support for Sleep Mode for vLLM Container without Command Args by @dumb0002 in #696
  • [Bugfix] Increase liveness failure threshold for crd by @zerofishnoodles in #688
  • [bugfix] Add close method for static discovery by @zerofishnoodles in #692
  • [Bugfix][Router]: loop through model_names by @max-wittig in #694
  • [Misc] bump up otel col version and use a simplified image by @JaredTan95 in #698
  • [vllm-router] fall back to remote tokenizer as 2nd path by @panpan0000 in #702
  • [Bugfix][Router]: do not filter by model label in transcription by @max-wittig in #712
  • [CI] move e2e machine to self hosted by @zerofishnoodles in #716
  • [Feat] Add Production-ready vLLM EKS terraform stack tutorial by @brokedba in #704
  • [bugfix] Add annotation to pod after loading the lora adapter to trigger the modify event by @zerofishnoodles in #703
  • [Feat] [Router] [Misc] [Doc] increased configurability of affinity and probes by @Garrukh in #715
  • [Bugfix] fix pd client initialization issue by @zerofishnoodles in #717
  • [Bugfix] Update aiohttp to resolve CVE-2024-23334 vulnerability by @ikaadil in #722
  • [Bugfix/Feature] Support extraPorts in service-vllm by @NargiT in #725
  • Update gateway-inference-extension.rst by @linsun in #728
  • feat(helm): Use emptyDir as pvcStorage by @Jimmy-Newtron in #616
  • [Bugfix] Support service discovery by service name: add missing role and rolebinding for #586 by @NargiT in #724
  • Update doc 04-GCP-GKE-lmcache-local-disk.md by @dannawang0221 in #727
  • [Feat] Enable MIG support for Ray Head Node using chart.resources helper by @shima8823 in #732
  • [feat] Enable session key in request body by @zerofishnoodles in #741
  • [Feat] Add basic integration path for semantic router by @zerofishnoodles in #740
  • [Bugfix] Pod rolebinding are requiered even with k8s_discovery_mode=serivce-name by @NargiT in #744
  • [Feat] allow annotation on router pod by @NargiT in #743
  • [Integration]: Add Intelligent Semantic Routing with vLLM-SR by @Xunzhuo in #750
  • [Integration]: Update Docs with vLLM-SR by @Xunzhuo in #752
  • [Bugfix] kv aware routing for lmcache 0.3.9 by @zerofishnoodles in #697
  • [Feat] Ability to add labels to model pvc by @NargiT in #754
  • [Bugfix] Helm: Add security context support, fix #756 by @aplufr in #757
  • [Bugfix] lmcache server points to wrong file in entrypoint by @Senne-Mennes in #730
  • [Feat] Add per-model runtimeClassName configuration support by @HanFa in #755
  • Bumping version to 0.1.8 by @YuhanLiu11 in #738

New Contributors

Full Changelog: vllm-stack-0.1.7...vllm-stack-0.1.8

vllm-stack-0.1.7

03 Sep 22:10
b6dd717

Choose a tag to compare

The stack deployment of vLLM

What's Changed

  • [Feat] Added option to specify priority class by @Fabhiahn in #557
  • [CI/Build] Change CI runner to L4 by @Shaoting-Feng in #595
  • [Bugfix] fix dynamic config by @zerofishnoodles in #598
  • [refactor] redesign RST documentation by @kobe0938 in #592
  • [Misc] revert uv.lock by @kobe0938 in #604
  • [CI/Build] Specify transformers version in router end to end test by @Shaoting-Feng in #607
  • [Feat] allow service discovery by service names by @learner0810 in #586
  • feat: add hpa for router by @BrianPark314 in #568
  • [Misc] add helm configuration values table by @zerofishnoodles in #599
  • [Feat] Use sidecar to download lora model for helm deployment by @zerofishnoodles in #618
  • [Router] Improve performance of round-robin router by @zhouwfang in #584
  • [Feat] Use sidecar to download lora model for operator deployment by @zerofishnoodles in #622
  • [Feature] Add method to check Pod termination status and update Pod readiness logic by @KevinCheung2259 in #602
  • [Feat] Add Sentry Continuous Profiling Support to vLLM Router by @ikaadil in #624
  • [Feat][Helm] Add HTTPRoute template for Gateway API support by @Hexoplon in #610
  • [Feat] Init Container "extraVolumeMount" by @cm-enfuse in #600
  • [Bugfix][Router]: simplify test payload by @max-wittig in #613
  • [Feat] Add Support HAMi resources variables by @andresd95 in #579
  • feature/KV-cache-aware-routing by @BrianPark314 in #550
  • [Router] Replace httpx with aiohttp in vllm_router for enhanced high-concurrency performance by @ikaadil in #589
  • feature/prefix-aware-routing by @BrianPark314 in #546
  • [Feat][Router]: add extra support for YAML config file by @antoineauger in #621
  • [CI] Add stress testing for router by @kobe0938 in #633
  • [Misc] Auto-size Minikube memory via calculate_safe_memory by @fulvius31 in #637
  • [Bugfix][Router]: reconfigure callbacks with dynamic config by @antoineauger in #642
  • [Doc] Add a missing word in the description by @JiangJiaWei1103 in #645
  • [Bugfix] Correct the routing logic for KV cache aware routing by @JiangJiaWei1103 in #648
  • [Router][CI/CD and misc.] Add RoundRobinRouter logic testing by @lucas-tucker in #639
  • [Docs] Modify the kvaware routing doc by @zerofishnoodles in #652
  • [Router] Optimize request parsing by removing duplicate await calls by @ikaadil in #629
  • [Feat][Router] Add configurable timeout_seconds for Kubernetes watchers by @ikaadil in #654
  • [Misc] Change community meeting time by @zerofishnoodles in #662
  • [Bugfix] Fix install script path prefix by @nicolasj92 in #665
  • [Feat] Env from secret by @redno2 in #641
  • [Bugfix] Fix routing to delete endpoint by @zerofishnoodles in #668
  • Bugfix(vllm-operator): add missing RBAC permissions for PVCs and Ingresses by @mahmoudk1000 in #647
  • [feat]: add transcription API endpoint using OpenAI Whisper-small by @davidgao7 in #469
  • Bump helm chart version by @philandstuff in #674

New Contributors

Full Changelog: vllm-stack-0.1.6...vllm-stack-0.1.7

vllm-stack-0.1.6

22 Jul 04:55
3bb6b73

Choose a tag to compare

The stack deployment of vLLM

What's changed

  • [CI]: change the entrypoint of nightly docker images (#514) (by @sammshen )
  • Add support for sleep and wake_up endpoints (#498) (by @dumb0002 )
  • [Bugfix] add health probe for lmcache server (#520) (by @zerofishnoodles )
  • [Doc, Feat] basic KEDA support and tutorials (#487) (by @Romero027 )
  • [Misc] Delete Unnecessary file (#521) (by @zerofishnoodles )
  • change keda name (#529) (by @zerofishnoodles )
  • [CI/CD] Add roundrobin router e2e test (#525) (by @zerofishnoodles )
  • [Doc] Add CRD deployment docs (#530) (by @kobe0938 )
  • [Doc] Kubernetes in Docker (kind) tutorial (#534) (by @lucas-tucker )
  • FEAT introduce ruff to project 1 - tests (#527) (by @BrianPark314 )
  • [CI/CD] Add static e2e test for prefixaware (#532) (by @zerofishnoodles )
  • fix(request): make sure to extend full_response (#536) (by @max-wittig )
  • [CI/CD] Add prefix aware routing test (#523) (by @zerofishnoodles )
  • [Bugfix][Helm] prevent duplicate securitycontext entry for containers (#544) (by @Hexoplon )
  • feature/gateway-inference-extension (#537) (by @BrianPark314 )
  • Add Artifact Hub metadata for verified publisher (#540) (by @kobe0938 )
  • [CI/CD] Add multiple routing logic test (#547) (by @zerofishnoodles )
  • [Doc] Adding security context for disaggregated prefill (#555) (by @YuhanLiu11 )
  • [CI/CD] Add checkov security check for infomation (by @zerofishnoodles )
  • fix(reconciler): trigger update when image or replicas are changed (#554) (by @googs1025 )
  • [Feat] Terraform Quickstart Tutorials for MS Azure (#552) (by @falconlee236 )
  • [Router] Expose /tokenize and /detokenize endpoints (#541) (by @Exchioz )
  • feature/ruff-router (#553) (by @BrianPark314 )
  • [Doc] Adding tutorial for Gateway Inference Extension support (#570) (@YuhanLiu11 )
  • fix: race condition in trie insert (by @zhouwfang )
  • [Feature] Moving default vLLM version from v0 to v1 (#580) (@YuhanLiu11 )
  • feat(helm): make imagePullPolicy configurable & fix router service annotation for LoadBalancer (#573) (by @lonelygo )
  • perf: minimize lock contention (#581) (by @zhouwfang )
  • [BugFix] fix lora controller reconcile logic (#565) (by @zerofishnoodles )
  • [FEAT] Add LoRA helm deployment (#563) (by @zerofishnoodles )

vllm-stack-0.1.5

17 Jun 04:23
0b6a61c

Choose a tag to compare

The stack deployment of vLLM

vllm-stack-0.1.4

05 Jun 21:10
6e3c06f

Choose a tag to compare

The stack deployment of vLLM

What's changed

vllm-stack-0.1.3

30 May 06:39
ff7a6c1

Choose a tag to compare

The stack deployment of vLLM

Changes made

vllm-stack-0.1.2

29 Apr 19:56
2404918

Choose a tag to compare

The stack deployment of vLLM

What's Changed

  • [Feat] Adding support to turn on/off engine deployment by @dumb0002 #311
  • [Feat] Add nodeSelectorTerms for router & cacher servers by @kinoute #314
  • [Bugfix] Update logger handler to handle stdout/stderr properly @corona10 #320
  • [CI] Always upload logs of Helm functionality checks @pwuersch #321
  • [CI/Build] Remove sudo requirements in CI/CD @Shaoting-Feng #325
  • [Feat] Multiple service creation when multiple models specified @lucas-tucker #326
  • [CI] Add coverage tracking @zhuohangu #330
  • [CLI/Doc]Update on gke deployment with gpu quota @EaminC #334
  • [Bugfix] Fix thread creation to pass parameters properly. @corona10 #336
  • [Feat] OpenTelemetry Support Example @lucas-tucker #346
  • [Feat] Tool calling support for MCP client integration @YuhanLiu11 #352
  • [Benchmark] Add api key option @Kimdongui #354
  • [Bugfix] fix init container pvc volume mount @zerofishnoodles #359
  • [Feat] Enabled latency monitor and added average latency computation logic @insukim1994 #362
  • [Feat] Added a tutorial document for deploying production stack on amd gpus @insukim1994 #364
  • [Bugfix] Deprecated least loaded routing logic @insukim1994 #366
  • [Bugfix] added model name to deployment selector @TamKej #367
  • [Feat] helm: add routerSpec.serviceType value @marquiz #368
  • [Feat] Support Multi-Model Deployment with Enhanced vLLM Configurations @haitwang-cloud #371
  • [Bugfix] Fixing issues on the engine svc labels @dumb0002 #376
  • [Bugfix] Declare logger properly for protocols.py @corona10 #381
  • [Feat] Adding a tutorial for using vLLM v1 in production stack @YuhanLiu11 #390

vllm-stack-0.1.1

19 Mar 18:39
82b47eb

Choose a tag to compare

The stack deployment of vLLM

What's Changed

New Contributors

Full Changelog: vllm-stack-0.1.0...vllm-stack-0.1.1