Fix torch 2.8 release

ProExpertProg · ProExpertProg · commit 1b73b4cd8482 · 2025-08-21T14:06:57.000-04:00
Signed-off-by: Luka Govedič &lt;luka@neuralmagic.com&gt;
diff --git a/_posts/2025-08-20-torch-compile.md b/_posts/2025-08-20-torch-compile.md
@@ -181,7 +181,7 @@ Passes can be added via the `PostGradPassManager`, CLI (`--compilation-config`),
 We’ve come very far on the vLLM-torch.compile integration. Here are some areas that we’re focusing on in the next six months.
 
 **Improving stability**  
-The vLLM-torch.compile integration uses many private (begin with an underscore) torch.compile APIs and relies on unstable implementation details. We did this because using the public torch.compile API wasn’t sufficient to fulfill our requirements \- vLLM wants fast serving performance and no recompilations during model serving. This has led to issues like weird caching issues, or needing to disable vLLM’s torch.compile cache for certain models. The PyTorch compiler team is working on upstreaming vLLM (and general inference) related features from vLLM to torch.compile and migrating vLLM to using more stable APIs. A lot of these features are already present in torch 2.8, which will likely be added in the next vLLM release (v0.11.0 at the time of writing)
+The vLLM-torch.compile integration uses many private (begin with an underscore) torch.compile APIs and relies on unstable implementation details. We did this because using the public torch.compile API wasn’t sufficient to fulfill our requirements \- vLLM wants fast serving performance and no recompilations during model serving. This has led to issues like weird caching issues, or needing to disable vLLM’s torch.compile cache for certain models. The PyTorch compiler team is working on upstreaming vLLM (and general inference) related features from vLLM to torch.compile and migrating vLLM to using more stable APIs. A lot of these features are already present in torch 2.8, which is coming to vLLM [soon](https://github.com/vllm-project/vllm/pull/20358)!
 
 **Improving start-up time**  
 We’ve heard that start-up time is a huge pain point with vLLM torch.compile and CUDAGraphs, especially in the autoscaling setting where one dynamically spins up new machines according to demand. We plan to significantly reduce both cold (first time) and warm (second time and on) start up for vLLM, especially as related to Dynamo and Inductor compilation. Please follow the [startup-ux label](https://github.com/vllm-project/vllm/issues?q=is%3Aissue%20state%3Aopen%20label%3Astartup-ux) on GitHub or join the [\#feat-startup-ux](https://vllm-dev.slack.com/archives/C0911AKUZQX) channel on [vLLM Slack](http://slack.vllm.ai) to stay updated on the progress\!