Fix torch.version.hip issue containing trailing hash code #2809
+18
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #SWDEV-565427, github issue
Motivation
Apex is not building because the hip version from hipcc --version is not matching torch.version.hip.
torch.version.hip has been changed due to recent commit to fix issue. This commit stores rocm version in torch.version.hip.
The solution is to fix the torch.version.hip so that it uses the hipcc header values and removes the trailing hash code. In addition, torch.version.rocm variable is created to store the rocm version.
Technical Details
Fix torch.version.hip
HIP_VERSION variable is computed in https://github.com/ROCm/hip/blob/develop/cmake/FindHIP.cmake. This runs hipcc –version and extracts the output of HIP version line.
e.g.
For recent dockers, HIP_VERSION variable contains the hash code at the end.
For the torch.version.hip to be parsable with packaging code, it should not contain the hash code.
torch.version.hip is a variable mentioned in torch/version.py which is created by tools/generate_torch_version.py and called in the installation process - torch/CMakeLists.txt.
Before the revert, the torch.version.hip was based on HIP_VERSION variable. For torch.version.hip to be parsable, HIP_VERSION should also be parsable.
This extra code removes the trailing hashcode from the HIP_VERSION variable so that the torch.version.hip is parsable by packaging version parse method.
Add torch.version.rocm
Code changes:
Create unit test to check both torch.version.hip and torch.version.rocm are parsable.
Testing
Tested on docker registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16831_ubuntu24.04_py3.12_pytorch_rocm7.1_internal_testing_5fc1aeaa
Successfully build pytorch and apex. Tested above parsing torch.version.hip code.