Skip to content

Commit 1b73b4c

Browse files
committed
Fix torch 2.8 release
Signed-off-by: Luka Govedič <[email protected]>
1 parent 4f994c8 commit 1b73b4c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-08-20-torch-compile.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ Passes can be added via the `PostGradPassManager`, CLI (`--compilation-config`),
181181
We’ve come very far on the vLLM-torch.compile integration. Here are some areas that we’re focusing on in the next six months.
182182

183183
**Improving stability**
184-
The vLLM-torch.compile integration uses many private (begin with an underscore) torch.compile APIs and relies on unstable implementation details. We did this because using the public torch.compile API wasn’t sufficient to fulfill our requirements \- vLLM wants fast serving performance and no recompilations during model serving. This has led to issues like weird caching issues, or needing to disable vLLM’s torch.compile cache for certain models. The PyTorch compiler team is working on upstreaming vLLM (and general inference) related features from vLLM to torch.compile and migrating vLLM to using more stable APIs. A lot of these features are already present in torch 2.8, which will likely be added in the next vLLM release (v0.11.0 at the time of writing)
184+
The vLLM-torch.compile integration uses many private (begin with an underscore) torch.compile APIs and relies on unstable implementation details. We did this because using the public torch.compile API wasn’t sufficient to fulfill our requirements \- vLLM wants fast serving performance and no recompilations during model serving. This has led to issues like weird caching issues, or needing to disable vLLM’s torch.compile cache for certain models. The PyTorch compiler team is working on upstreaming vLLM (and general inference) related features from vLLM to torch.compile and migrating vLLM to using more stable APIs. A lot of these features are already present in torch 2.8, which is coming to vLLM [soon](https://github.com/vllm-project/vllm/pull/20358)!
185185

186186
**Improving start-up time**
187187
We’ve heard that start-up time is a huge pain point with vLLM torch.compile and CUDAGraphs, especially in the autoscaling setting where one dynamically spins up new machines according to demand. We plan to significantly reduce both cold (first time) and warm (second time and on) start up for vLLM, especially as related to Dynamo and Inductor compilation. Please follow the [startup-ux label](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3Astartup-ux) on GitHub or join the [\#feat-startup-ux](https://vllm-dev.slack.com/archives/C0911AKUZQX) channel on [vLLM Slack](http://slack.vllm.ai) to stay updated on the progress\!

0 commit comments

Comments
 (0)