-
Notifications
You must be signed in to change notification settings - Fork 82
[pytorch] Extend build_prod_wheels.py to build pytorch nightly. #959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Includes a couple of workarounds that are unfortunate but would be hard to patch/fix at the root in one step: * Some of the environment variables needed to locate ROCm for the PyTorch build (which shouldn't be necessary at all but c'est la vie for now) conflict badly with the clang driver heuristics for locating device bitcode. Workaround is to also manually set the `HIP_DEVICE_LIB_PATH` and curse at the stars about removing all of these legacy special vars. * PyTorch at head uses rocm-smi-lib for distributed, and including those headers does not advertise the transitive include dirs for the sysdeps, which causes us to not find libdrm (presumably most old ROCm installs treated that as a system library, whereas we vendor it and need to propagate its header path through find_package). Workaround is to manually add the rocm_sysdeps include and lib dir. * Adds a `--build-triton` `--no-build-triton` flag for ergonomics when iterating. * Unconditionally sets `USE_ROCM=ON` in all paths, not just for Windows: new branches require this. * Adds the `_rocm_init.py` for rocm wheel bootstrapping logic as described in `docs/packaging/python_packaging.md` and also updates that to match how it was actually landed in PyTorch. * Adds docs indicating that it is valid to checkout the pytorch `nightly` branch, which tracks the most recent pytorch.org nightly build. Tested: Local build with gfx94X wheels produced a working torch/torchaudio/torchvision install. I wasn't actually running on a 942 system so it didn't do much from there but did import and let me create tensors. The radeon rocm wheels are known broken today due to a CK bug, so can pick up with those tomorrow.
ScottTodd
approved these changes
Jul 2, 2025
Co-authored-by: Scott Todd <[email protected]>
Addressed ergonomic comments and verified locally that resulting torch+triton installs and functions. |
stellaraccident
added a commit
that referenced
this pull request
Jul 2, 2025
ScottTodd
added a commit
that referenced
this pull request
Jul 10, 2025
Progress on #827. Follow-up to #959, expanding support on Windows. Without this I get a warning from `build_prod_wheels.py` on Windows: ``` cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=False -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\b\pytorch_v2.7.0\torch -DCMAKE_PREFIX_PATH=D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages;D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\_rocm_sdk_devel\lib\cmake -DPython_EXECUTABLE=D:\projects\TheRock\external-builds\pytorch\.venv\Scripts\python.exe -DTORCH_BUILD_VERSION=2.7.0a0+rocmsdk20250709 -DUSE_FLASH_ATTENTION=0 -DUSE_GLOO=OFF -DUSE_KINETO=OFF -DUSE_MEM_EFF_ATTENTION=0 -DUSE_NUMPY=True -DUSE_ROCM=ON D:\b\pytorch_v2.7.0 cmake --build . --target install --config Release rocm version 7.0.0.dev0: PYTHON VERSION: 3.12.8 (tags/v3.12.8:2dc476b, Dec 3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)] CMAKE_PREFIX_PATH = D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\_rocm_sdk_devel\lib\cmake BIN = D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\_rocm_sdk_devel\bin ROCM_HOME = D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\_rocm_sdk_devel PATH = ... Using default PYTORCH_ROCM_ARCH from rocm-sdk targets: gfx1100;gfx1101;gfx1102 WARNING: Default location of device libs not found. Relying on clang heuristics which are known to be buggy in this configuration --- Not building triton (no --triton-dir) Default PYTORCH_BUILD_VERSION: 2.7.0a0+rocmsdk20250709 --- PYTORCH_EXTRA_INSTALL_REQUIREMENTS = rocm[libraries]==7.0.0.dev0 ``` Followed by errors late into the build: ``` [6566/7081] Linking CXX shared library bin\torch_cpu.dll [6567/7081] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/torch_hip_generated_Sleep.hip.obj FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/torch_hip_generated_Sleep.hip.obj D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/torch_hip_generated_Sleep.hip.obj C:\Windows\system32\cmd.exe /C "cd /D D:\b\pytorch_v2.7.0\build\caffe2\CMakeFiles\torch_hip.dir\__\aten\src\ATen\hip && D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\cmake\data\bin\cmake.exe -E make_directory D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/. && D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\cmake\data\bin\cmake.exe -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/./torch_hip_generated_Sleep.hip.obj -P D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/torch_hip_generated_Sleep.hip.obj.cmake" clang: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument] clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library failed to execute:""D:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/lib/llvm/bin\clang.exe" --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 -O3 -c -x hip "D:/b/pytorch_v2.7.0/aten/src/ATen/hip/Sleep.hip" -o "D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/./torch_hip_generated_Sleep.hip.obj" --offload-compress -fclang-abi-compat=17 -DUSE_ROCM -D__HIP_PLATFORM_AMD__ -DTORCH_HIP_BUILD_MAIN_LIB -DROCM_ON_WINDOWS -DROCM_VERSION=85772 -DTORCH_HIP_VERSION=605 -DONNX_ML=1 -DONNXIFI_ENABLE_EXT=1 -DONNX_NAMESPACE=onnx_torch -D_CRT_SECURE_NO_DEPRECATE=1 -DUSE_EXTERNAL_MZCRC -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DEXPORT_AOTI_FUNCTIONS -DWIN32_LEAN_AND_MEAN -D_UCRT_LEGACY_INFINITY -DNOMINMAX -DUSE_MIMALLOC -DUSE_PROF_API=1 -DAT_PER_OPERATOR_HEADERS -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_AMD__ -DROCM_USE_FLOAT16 -D__HIP_PLATFORM_AMD__ -DFMT_HEADER_ONLY=1 -fms-runtime-lib=dll -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=605 -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-duplicate-decl-specifier -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -std=c++17 -DHIPBLAS_V2 -fms-extensions -Wno-ignored-attributes -fno-gpu-rdc -ID:/b/pytorch_v2.7.0/build/aten/src -ID:/b/pytorch_v2.7.0/aten/src -ID:/b/pytorch_v2.7.0/build -ID:/b/pytorch_v2.7.0 -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/b/pytorch_v2.7.0/third_party/protobuf/src -ID:/b/pytorch_v2.7.0/third_party/XNNPACK/include -ID:/b/pytorch_v2.7.0/third_party/ittapi/include -ID:/b/pytorch_v2.7.0/cmake/../third_party/eigen -ID:/b/pytorch_v2.7.0/third_party/onnx -ID:/b/pytorch_v2.7.0/build/third_party/onnx -ID:/b/pytorch_v2.7.0/torch/include -ID:/b/pytorch_v2.7.0/third_party/ideep/include -ID:/b/pytorch_v2.7.0/nlohmann -ID:/b/pytorch_v2.7.0/INTERFACE -ID:/b/pytorch_v2.7.0/third_party/nlohmann/include -ID:/b/pytorch_v2.7.0/third_party/mimalloc/include -I/include -I/hcc/include -I/rocblas/include -I/hipsparse/include -I/include/rccl/ -ID:/b/pytorch_v2.7.0/aten/src/THH -ID:/b/pytorch_v2.7.0/aten/src/ATen/hip -ID:/b/pytorch_v2.7.0/aten/src/ATen/../../../third_party/composable_kernel/include -ID:/b/pytorch_v2.7.0/aten/src/ATen/../../../third_party/composable_kernel/library/include -ID:/b/pytorch_v2.7.0/build/caffe2/aten/src/ATen/composable_kernel -ID:/b/pytorch_v2.7.0/third_party/fmt/include -ID:/b/pytorch_v2.7.0/aten/src -ID:/b/pytorch_v2.7.0/build/caffe2/aten/src -ID:/b/pytorch_v2.7.0/build/aten/src -ID:/b/pytorch_v2.7.0/aten/src -ID:/b/pytorch_v2.7.0/aten/src/ATen/.. -ID:/b/pytorch_v2.7.0/c10/hip/../.. -ID:/b/pytorch_v2.7.0/build -ID:/b/pytorch_v2.7.0/c10/../ -ID:/b/pytorch_v2.7.0/build -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/b/pytorch_v2.7.0/torch/csrc/api -ID:/b/pytorch_v2.7.0/torch/csrc/api/include -ID:/b/pytorch_v2.7.0/third_party/protobuf/src -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include/hiprand -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include/rocrand -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/b/pytorch_v2.7.0/build/aten/src -ID:/b/pytorch_v2.7.0/aten/src -ID:/b/pytorch_v2.7.0/build -ID:/b/pytorch_v2.7.0 -ID:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/_rocm_sdk_devel/include -ID:/b/pytorch_v2.7.0/third_party/protobuf/src -ID:/b/pytorch_v2.7.0/third_party/XNNPACK/include -ID:/b/pytorch_v2.7.0/third_party/ittapi/include -ID:/b/pytorch_v2.7.0/cmake/../third_party/eigen -ID:/b/pytorch_v2.7.0/third_party/onnx -ID:/b/pytorch_v2.7.0/build/third_party/onnx -ID:/b/pytorch_v2.7.0/torch/include -ID:/b/pytorch_v2.7.0/third_party/ideep/include -ID:/b/pytorch_v2.7.0/nlohmann -ID:/b/pytorch_v2.7.0/INTERFACE -ID:/b/pytorch_v2.7.0/third_party/nlohmann/include -ID:/b/pytorch_v2.7.0/third_party/mimalloc/include" CMake Error at torch_hip_generated_Sleep.hip.obj.cmake:200 (message): Error generating file D:/b/pytorch_v2.7.0/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/hip/./torch_hip_generated_Sleep.hip.obj ``` Also setting `env` entries using an explicit `str()` instead of a raw `Path` since that led to other errors.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
HIP_DEVICE_LIB_PATH
and curse at the stars about removing all of these legacy special vars.--build-triton
--no-build-triton
flag for ergonomics when iterating.USE_ROCM=ON
in all paths, not just for Windows: new branches require this._rocm_init.py
for rocm wheel bootstrapping logic as described indocs/packaging/python_packaging.md
and also updates that to match how it was actually landed in PyTorch.nightly
branch, which tracks the most recent pytorch.org nightly build.This should be NFC on every other pytorch build. Followup needs to add an actual head-on-head nightly build pipeline.
Tested: Local build with gfx94X wheels produced a working torch/torchaudio/torchvision install. I wasn't actually running on a 942 system so it didn't do much from there but did import and let me create tensors. The radeon rocm wheels are known broken today due to a CK bug, so can pick up with those tomorrow.