vLLM Worker v0.15.0 — Upgrade from v0.11.x to v0.15.0#259
Merged
TimPietruskyRunPod merged 15 commits intorunpod-workers:mainfrom Feb 12, 2026
Merged
vLLM Worker v0.15.0 — Upgrade from v0.11.x to v0.15.0#259TimPietruskyRunPod merged 15 commits intorunpod-workers:mainfrom
TimPietruskyRunPod merged 15 commits intorunpod-workers:mainfrom
Conversation
DeJayDev
suggested changes
Feb 4, 2026
| "gpuCount": 1, | ||
| "allowedCudaVersions": ["12.9", "12.8", "12.7", "12.6", "12.5", "12.4"], | ||
| "allowedCudaVersions": ["12.9", "12.8"], | ||
| "presets": [ |
Contributor
There was a problem hiding this comment.
Is it possible for hub listings to use minimumCudaVersion? This is more sustainable than updating this list. For example, CUDA 13 is already an available target for allowedCudaVersions.
Contributor
There was a problem hiding this comment.
this does exist, but as we need to get this version out, i will test this in a different repo first and if it is actually working, i will also do this for vllm
Contributor
There was a problem hiding this comment.
A lot of settings are removed from hub.json but not in this file, are the changes to this file accurate?
Contributor
Author
There was a problem hiding this comment.
Did not cleaned documentation files yet.
Co-authored-by: Dj Isaac <contact@dejaydev.com>
Co-authored-by: Dj Isaac <contact@dejaydev.com>
Co-authored-by: Dj Isaac <contact@dejaydev.com>
velaraptor-runpod
approved these changes
Feb 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
vLLM Worker v0.15.0 — Upgrade from v0.11.x to v0.15.0
This PR upgrades the vLLM serverless worker from vLLM v0.11.x to v0.15.0, spanning four major releases (v0.12.0, v0.13.0, v0.14.0, v0.15.0) with ~1,900+ upstream commits.
Worker-Specific Fixes
EngineDeadErroron the first request. Without LoRA, startup behavior is unchanged.vllm.entrypoints.openai.*module structure (chat_completion, completion, engine, models submodules).DISABLE_LOG_REQUESTSwithENABLE_LOG_REQUESTS.VLLM_ATTENTION_BACKEND→ATTENTION_BACKEND,DISABLE_LOG_REQUESTS→ENABLE_LOG_REQUESTS), removeduse_v2_block_manager(V2 is now the only option), added new args (attention_backend,async_scheduling,stream_interval,cpu_offload_gb).Major vLLM Changes (v0.12.0 → v0.15.0)
Performance
Model Support
Engine
--max-model-len auto: Automatically fits context length to available GPU memory (v0.14.0)Hardware
Quantization
API & Frontend
Breaking Changes
use_v2_block_managerremoved)num_lookahead_slots,best_of,lora_extra_vocab,quantization_param_path,guided_decoding_backend,worker_use_ray,rope_scaling,rope_theta, tokenizer pool args, preemption args, and all individual speculative decoding args (nowspeculative_configJSON)disable_log_requestsdeprecated in favor ofenable_log_requestsVLLM_ATTENTION_BACKENDenv var replaced with--attention-backendCLI arg