[Doc]: Building for CUDA 12.8 and other issues #29

mendeel · 2025-12-03T17:00:02Z

mendeel
Dec 3, 2025

📚 The doc issue

Hello y'all! So, I was following the readme instructions for building from source and I stumbled on multiple errors during the process. Both because I have CUDA 12.8 but others appeared for reasons you'll see below.

First of I had to clone a specific branch and change the cuda version on the pytorch installation.
Then, all commands ran smoothly until the last pip install . --no-build-isolation which gave me a lot of errors so I needed some time with trials and errors until it worked. Gemini 3 Pro helped me a lot here. Here are all the steps I had to go through to make it work.

I always use command prompt unless I need to use powershell for some specific commands.

Create venv

python -m venv .venv
./.venv/Scripts/Activate

Cloned vLLM for Windows v0.11.0 instead of latest

As per @SystemPanic comment here, I used branch v0.11.0 when cloning the repo because this version does not require pytorch 2.8 built from source.
git clone --single-branch --branch v0.11.0 https://github.com/SystemPanic/vllm-windows.git

Pytorch installation

When installing pytorch, I made sure to use the cu128 versions:
pip install torch==2.7.1+cu128 torchaudio==2.7.1+cu128 torchvision==0.22.1+cu128 --index-url https://download.pytorch.org/whl/cu128

Visual Studio installation

This one is not related to the CUDA version but I think it would be nice to add to the README.md file because it simply says "Visual Studio 2019 or newer is required" and it doesn't directly say what you need to actually install.

Download Visual Studio 2026 installer (Community free download): https://visualstudio.microsoft.com/downloads/
When the installer opens, install the workload "Desktop development with C++"

This makes sure that you'll have all the needed dependencies to compile the package.

Fix NVCC error caused by Visual Studio being too recent

I installed the latest VS version and NVCC was erroring out so I had to add these env variables

set NVCC_APPEND_FLAGS=-allow-unsupported-compiler
set NVCC=nvcc --allow-unsupported-compiler

Set env variables

I used this variables but it obviously will depend on your context and machine so don't simply copy them blindly specially the cuDNN because you may not want to use it/may not have it installed.

set DISTUTILS_USE_SDK=1
set VLLM_TARGET_DEVICE=cuda
set MAX_JOBS=12
set USE_CUDNN=1
set CUDNN_LIBRARY_PATH=C:\cudnn-89729-cuda12\lib\x64
set CUDNN_INCLUDE_PATH=C:\cudnn-89729-cuda12\include
set NVCC_APPEND_FLAGS=-allow-unsupported-compiler
set NVCC=nvcc --allow-unsupported-compiler

Replace any cu126 mention to cu128

I replaced all "cu126" occurrences with "cu128" in all files inside the project folder before running the pip install commands.

Enable long paths

I was getting "filename too long" errors when compiling vLLM for Windows so I had to enable long paths in Windows

Powershell commands

In PowerShell running as administrator, run these commands:
git config --global core.longpaths true
Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1

Rename vllm folder

I renamed my folder from C:\vllm-windows to C:\v

Edit Marlin "generate_kernels.py"

This file was erroring out because it tries to use Linux command "rm -rf" so I changed it to use python os lib instead

Step 1: Open the file in your preferred code editor

C:\v\csrc\quantization\gptq_marlin\generate_kernels.py

Step 2: Edit line 54

From subprocess.call(["rm", "-f", filename])
To

if os.path.exists(filename):
    os.remove(filename)

Shorten CUDA folder path creating a junction folder

The CUDA folder with spaces was also erroring out when building so I created a simpler junction folder.

In the the same CMD:

mklink /J "C:\CUDA" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8"
set CUDA_PATH=C:\CUDA
set CUDA_HOME=C:\CUDA

Build commands

Every time I needed to retry the build process, it's important to delete the build folder to delete any cache.

rmdir /s /q build
rmdir /s /q .deps
pip install . --no-build-isolation

Suggest a potential alternative/fix

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

SystemPanic · 2025-12-03T17:14:05Z

SystemPanic
Dec 3, 2025
Maintainer

Hello, thank you for your contribution. Moved to discussions as this is not exactly an Issue.

Feel free to update vllm-for-windows branch with your code changes on generate_kernels.py and update README guide if you consider it appropriate.

0 replies

Vesemir · 2026-02-25T22:56:40Z

Vesemir
Feb 25, 2026

Shorten CUDA folder path creating a junction folder

you can actually escape the path here:

if(VLLM_GPU_LANG STREQUAL "CUDA")
-  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -I${CUDA_TOOLKIT_ROOT_DIR}/include")
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -I\"${CUDA_TOOLKIT_ROOT_DIR}/include\"")

0 replies

Vesemir · 2026-02-25T23:24:45Z

Vesemir
Feb 25, 2026

On another note: trying to build with pinned pytorch 2.11 (torch==2.11.0.dev20260216+cu126) nightly build + CUDA 12.8 + msvc2022 I actually got

      layernorm_quant_kernels.cu
      E:\vllm-windows\.venv\Lib\site-packages\torch\include\c10/cuda/CUDACachingAllocator.h(105): error: invalid combination of type specifiers
          StreamSegmentSize(cudaStream_t s, bool char, size_t sz)
                                                 ^

      E:\vllm-windows\.venv\Lib\site-packages\torch\include\c10/cuda/CUDACachingAllocator.h(106): error: type name is not allowed
              : stream(s), is_small_pool(char), total_size(sz) {}
                                         ^

      2 errors detected in the compilation of "E:/vllm-windows/csrc/layernorm_quant_kernels.cu".

cuda kernel compilation error (fixed by renaming small -> is_small in kernel because apparently some macro renames "small" to "char" in that versions combination).

0 replies

wzgrx · 2026-03-15T15:22:24Z

wzgrx
Mar 15, 2026

cu130support？

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: Building for CUDA 12.8 and other issues #29

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Doc]: Building for CUDA 12.8 and other issues #29

Uh oh!

mendeel Dec 3, 2025

📚 The doc issue

Create venv

Cloned vLLM for Windows v0.11.0 instead of latest

Pytorch installation

Visual Studio installation

Fix NVCC error caused by Visual Studio being too recent

Set env variables

Replace any cu126 mention to cu128

Enable long paths

Powershell commands

Rename vllm folder

Edit Marlin "generate_kernels.py"

Step 1: Open the file in your preferred code editor

Step 2: Edit line 54

Shorten CUDA folder path creating a junction folder

Build commands

Suggest a potential alternative/fix

Before submitting a new issue...

Replies: 4 comments

Uh oh!

SystemPanic Dec 3, 2025 Maintainer

Uh oh!

Vesemir Feb 25, 2026

Uh oh!

Vesemir Feb 25, 2026

Uh oh!

wzgrx Mar 15, 2026

mendeel
Dec 3, 2025

SystemPanic
Dec 3, 2025
Maintainer

Vesemir
Feb 25, 2026

Vesemir
Feb 25, 2026

wzgrx
Mar 15, 2026