-
Notifications
You must be signed in to change notification settings - Fork 75
[develop] Initial IFU on Oct 29, 2025 #2766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins build for ca8f06fe939367ae00f2983008327c864a49c90a commit finished as FAILURE |
ca8f06f to
de516b7
Compare
|
Jenkins build for de516b7f83e67572f85ded54f55add882a0ccb90 commit finished as FAILURE |
========================================== Triton build conditionalized on ROCM_VERSION Include the ROCm version in triton version (cherry picked from commit 7d33910) (cherry picked from commit 0412eb4) Update triton-rocm.txt to triton.txt (cherry picked from commit 0ce9f6e) Use ROCm/triton for install_triton.sh (cherry picked from commit 6e9714b) update triton commit Revert "Use ROCm/triton for install_triton.sh" This reverts commit 81b0cbc8435122030044049c661f252ee8aa7ae5. change triton repo Update triton.txt to use release/internal/3.3.x branch Use ROCm/triton Use ROCm/triton for install_triton.sh (cherry picked from commit 0036db5)
…on (#2482) Related to https://github.com/ROCm/builder/pull/90/files http://rocm-ci.amd.com/job/mainline-pytorch_internal-manylinux-wheels/305/ PyTorch wheel installs successfully when building torchvision/torchaudio (cherry picked from commit c1ee54d)
Fixes #ISSUE_NUMBER (cherry picked from commit 0ea0592)
…A helper functions ======================================================================================= Implementation of PyTorch ut parsing script - QA helper function (#1386) * Initial implementation of PyTorch ut parsing script * Extracted path variables * Use nested dict to save results * Fixes typo * Cleanup * Fixes several issues * Minor name change * Update run_pytorch_unit_tests.py * Added file banners * Supported running from API * Added more help info * Consistent naming * Format help text --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> Print consolidated log file for pytorch unit test automation scripts (#1433) * Print consolidated log file for pytorch uts * Update run_entire_tests subprocess call as well * lint * Add ERROR string [SWDEV-466849] Enhancements for PyTorch UT helper scripts (#1491) * Check that >1 GPUs are visible when running TEST_CONFIG=distributed * Add EXECUTION_TIME to file-level and aggregate statistics PyTorch unit test helper scripts enhancements (#1517) * Fail earlier for distributed-on-1-GPU scenario * print cmd in consolidated log with prettier formatting * python->python3 Fixes https://ontrack-internal.amd.com/browse/SWDEV-477264 --------- Co-authored-by: blorange-amd <[email protected]> Several issues fix of QA helper script (#1564) Fixes SWDEV-475071: https://ontrack-internal.amd.com/browse/SWDEV-475071 Removed args inside function (#1595) Fixes SWDEV-475071 (cherry picked from commit 041aa1b47978154de63edc6b7ffcdea218a847a3) QA script - Added multi gpu check with priority_tests (#1604) Fixes SWDEV-487907. Verified throwing exception for distributed is working correctly on single gpu with command: python .automation_scripts/run_pytorch_unit_tests.py --priority_test (cherry picked from commit 57cc742271cbf4547f9213710e57f6444bbc983e) (cherry picked from commit 6d5c3dc) (cherry picked from commit 2ee3aa2)
de516b7 to
ff7b627
Compare
|
Jenkins build for ff7b627e284ce3b40732f5a48e383d90bba1cf3b commit finished as FAILURE |
|
Jenkins build for ff7b627e284ce3b40732f5a48e383d90bba1cf3b commit finished as FAILURE Detected error during Pytorch building: |
ff7b627 to
15f16c6
Compare
|
Jenkins build for 15f16c689c43d26c21e5fee59f6a370c3488ddcb commit finished as NOT_BUILT |
|
Jenkins build for 15f16c689c43d26c21e5fee59f6a370c3488ddcb commit finished as FAILURE Detected error during Pytorch building: |
|
Jenkins build for c77773b206173ff2a76c1d15ce57dd2adb15839d commit finished as ABORTED |
|
Jenkins build for c77773b206173ff2a76c1d15ce57dd2adb15839d commit finished as FAILURE |
* Use triton commit same as that used for release/2.6 branch since both are triton version 3.2.0, so assuming they're compatible. Relates to: https://github.com/ROCm/rocAutomation/pull/660/files https://github.com/ROCm/builder/pull/70/files Validation http://ml-ci-internal.amd.com:8080/job/pytorch/job/manylinux_rocm_wheels/568/ --------- Co-authored-by: Jithun Nair <[email protected]> Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit 14c1417) (cherry picked from commit c20a8f8)
* Add trailing comma for consistency in gfx architecture list Signed-off-by: Jagadish Krishnamoorthy <[email protected]> * ROCm: Enable tf32 testing on test_nn Signed-off-by: Jagadish Krishnamoorthy <[email protected]> --------- Signed-off-by: Jagadish Krishnamoorthy <[email protected]> (cherry picked from commit c113e14)
…-deps flags (#2121) Cherry-pick of #2103 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit 1dea6e8)
Relates to: ROCm/builder#82 Validation: http://rocm-ci.amd.com/job/mainline-pytorch_internal-manylinux-wheels/98/ Using `registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16180_ubuntu24.04_py3.12_pytorch_lw_rocm7.0_IT_upgrade_numpy_452f3df6`: ``` root@d92befdbb2a6:/# pip list | egrep "numpy|pandas" numpy 2.1.2 pandas 2.2.3 root@d92befdbb2a6:/# python3 Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> import torch >>> import numpy >>> exit() root@d92befdbb2a6:/data/pytorch-micro-benchmarking# HIP_VISIBLE_DEVICES=1 python3 micro_benchmarking_pytorch.py --network resnet50 INFO: running forward and backward for warmup. INFO: running the benchmark.. OK: finished running benchmark.. --------------------SUMMARY-------------------------- Microbenchmark for network : resnet50 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.11369450092315674 Throughput [img/sec] : 562.9120096428937 ``` --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit cf32479)
…2269) Fixes SWDEV-536456 Fixes error post-#2256: ``` 00:12:44.248 #22 155.3 ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.61.0 Requires-Python >=3.10; 0.61.0rc1 Requires-Python >=3.10; 0.61.0rc2 Requires-Python >=3.10; 0.61.1rc1 Requires-Python >=3.10; 0.61.2 Requires-Python >=3.10; 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10; 3.4 Requires-Python >=3.10; 3.4.1 Requires-Python >=3.10; 3.4.2 Requires-Python >=3.10; 3.4rc0 Requires-Python >=3.10; 3.5 Requires-Python >=3.11; 3.5rc0 Requires-Python >=3.11; 8.2.0 Requires-Python >=3.10; 8.2.1 Requires-Python >=3.10 00:12:44.248 #22 155.3 ERROR: Could not find a version that satisfies the requirement numba==0.61.2 (from versions: 0.1, 0.2, 0.3, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.12.2, 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.18.1, 0.18.2, 0.19.1, 0.19.2, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.1, 0.29.0, 0.30.0, 0.30.1, 0.31.0, 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.1, 0.36.2, 0.37.0, 0.38.0, 0.38.1, 0.39.0, 0.40.0, 0.40.1, 0.41.0, 0.42.0, 0.42.1, 0.43.0, 0.43.1, 0.44.0, 0.44.1, 0.45.0, 0.45.1, 0.46.0, 0.47.0, 0.48.0, 0.49.0, 0.49.1rc1, 0.49.1, 0.50.0rc1, 0.50.0, 0.50.1, 0.51.0rc1, 0.51.0, 0.51.1, 0.51.2, 0.52.0rc2, 0.53.0rc1.post1, 0.53.0rc2, 0.53.0rc3, 0.53.0, 0.53.1, 0.54.0rc2, 0.54.0rc3, 0.54.0, 0.54.1rc1, 0.54.1, 0.55.0rc1, 0.55.0, 0.55.1, 0.55.2, 0.56.0rc1, 0.56.0, 0.56.2, 0.56.3, 0.56.4, 0.57.0rc1, 0.57.0, 0.57.1rc1, 0.57.1, 0.58.0rc1, 0.58.0rc2, 0.58.0, 0.58.1, 0.59.0rc1, 0.59.0, 0.59.1, 0.60.0rc1, 0.60.0) 00:12:44.248 #22 155.3 ERROR: No matching distribution found for numba==0.61.2 ``` Validation: * Docker image: http://rocm-ci.amd.com/job/mainline-framework-pytorch-internal-cs9-ci/132 * Wheels: http://rocm-ci.amd.com/job/mainline-pytorch_internal-manylinux-wheels/102/ From `registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16180_ubuntu22.04_py3.9_pytorch_lw_rocm7.0_IT_py3.9_a11d94ad`: ``` root@f43861a0a856:/# pip list | egrep "numpy|pandas" numpy 2.0.2 pandas 2.2.3 root@f43861a0a856:/# python Python 3.9.23 (main, Jun 4 2025, 08:55:38) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> import numpy >>> import pandas root@f43861a0a856:/data/pytorch-micro-benchmarking# HIP_VISIBLE_DEVICES=1 python3 micro_benchmarking_pytorch.py --network resnet50 INFO: running forward and backward for warmup. INFO: running the benchmark.. OK: finished running benchmark.. --------------------SUMMARY-------------------------- Microbenchmark for network : resnet50 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.11354223489761353 Throughput [img/sec] : 563.6669038416574 ``` (cherry picked from commit a0a9d81)
…cm7.0/7.1 (#2239) Revamped version of #2108 PR to: - enable complex data types for sparse matmul on ROCm - fix sparse addmm/baddbmm on ROCm - fix sparse hipification for ROCm - fix/enable sparse tests on ROCm (~50 tests total for non-fp16/bf16): - enable fp16/bf16 sparse path for rocm7.0 - enable fp16/bf16 sparse tests for rocm7.0/7.1 ``` test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_* test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_* test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64 test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCS* test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_* test_sparse_csr.py::TestSparseCSRCUDA::test_sparse_addmm_cuda_float16 ``` (cherry picked from commit cc2a69c)
#2326) Fixes https://ontrack-internal.amd.com/browse/SWDEV-541809 Upgrading tensorboard after numpy upgrade Ran in **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16381_ubuntu24.04_py3.12_pytorch_lw_rocm7.0_internal_testing_afe8b782** ``` 7 git checkout rocm7.0_IT_upgrade_tensorboard 8 pip install .ci/docker/requirements-ci.txt 9 pip install -r .ci/docker/requirements-ci.txt 10 PYTORCH_TEST_WITH_ROCM=1 python test/test_monitor.py TestMonitorTensorboard.test_event_handler root@ubb4-rack-22:/var/lib/jenkins/pytorch# PYTORCH_TEST_WITH_ROCM=1 python test/test_monitor.py TestMonitorTensorboard.test_event_handler /opt/venv/lib/python3.12/site-packages/google/protobuf/internal/well_known_types.py:91: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC). _EPOCH_DATETIME_NAIVE = datetime.datetime.utcfromtimestamp(0) . ---------------------------------------------------------------------- Ran 1 test in 0.327s OK root@ubb4-rack-22:/var/lib/jenkins/pytorch# ``` (cherry picked from commit c7f61f4)
Tested locally successfully ``` root@rocm-framework-47:/var/lib/jenkins/pytorch# pip install -r requirements.txt Ignoring numpy: markers 'python_version == "3.9"' don't match your environment Requirement already satisfied: setuptools<80.0,>=70.1.0 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 2)) (79.0.1) Requirement already satisfied: cmake>=3.31.4 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 3)) (4.0.0) Requirement already satisfied: ninja==1.11.1.3 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 4)) (1.11.1.3) Requirement already satisfied: numpy==2.1.2 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 5)) (2.1.2) Requirement already satisfied: packaging==25.0 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 6)) (25.0) Requirement already satisfied: pyyaml==6.0.2 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 7)) (6.0.2) Requirement already satisfied: requests==2.32.4 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 8)) (2.32.4) Requirement already satisfied: six==1.17.0 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 9)) (1.17.0) Requirement already satisfied: typing-extensions==4.14.1 in /opt/venv/lib/python3.10/site-packages (from -r /var/lib/jenkins/pytorch/requirements-build.txt (line 10)) (4.14.1) Requirement already satisfied: expecttest==0.3.0 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 8)) (0.3.0) Requirement already satisfied: filelock==3.18.0 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 9)) (3.18.0) Requirement already satisfied: fsspec==2025.7.0 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 10)) (2025.7.0) Requirement already satisfied: hypothesis==5.35.1 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 11)) (5.35.1) Requirement already satisfied: jinja2==3.1.6 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 12)) (3.1.6) Requirement already satisfied: lintrunner==0.12.7 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 13)) (0.12.7) Requirement already satisfied: networkx==2.8.8 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 14)) (2.8.8) Requirement already satisfied: optree==0.13.0 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 18)) (0.13.0) Requirement already satisfied: psutil==7.0.0 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 19)) (7.0.0) Requirement already satisfied: sympy==1.13.3 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 20)) (1.13.3) Requirement already satisfied: wheel==0.45.1 in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 22)) (0.45.1) Requirement already satisfied: build[uv] in /opt/venv/lib/python3.10/site-packages (from -r requirements.txt (line 7)) (1.3.0) Requirement already satisfied: charset_normalizer<4,>=2 in /opt/venv/lib/python3.10/site-packages (from requests==2.32.4->-r /var/lib/jenkins/pytorch/requirements-build.txt (line 8)) (3.4.3) Requirement already satisfied: idna<4,>=2.5 in /opt/venv/lib/python3.10/site-packages (from requests==2.32.4->-r /var/lib/jenkins/pytorch/requirements-build.txt (line 8)) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/venv/lib/python3.10/site-packages (from requests==2.32.4->-r /var/lib/jenkins/pytorch/requirements-build.txt (line 8)) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /opt/venv/lib/python3.10/site-packages (from requests==2.32.4->-r /var/lib/jenkins/pytorch/requirements-build.txt (line 8)) (2025.8.3) Requirement already satisfied: attrs>=19.2.0 in /opt/venv/lib/python3.10/site-packages (from hypothesis==5.35.1->-r requirements.txt (line 11)) (25.3.0) Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /opt/venv/lib/python3.10/site-packages (from hypothesis==5.35.1->-r requirements.txt (line 11)) (2.4.0) Requirement already satisfied: MarkupSafe>=2.0 in /opt/venv/lib/python3.10/site-packages (from jinja2==3.1.6->-r requirements.txt (line 12)) (3.0.2) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/venv/lib/python3.10/site-packages (from sympy==1.13.3->-r requirements.txt (line 20)) (1.3.0) Requirement already satisfied: pyproject_hooks in /opt/venv/lib/python3.10/site-packages (from build[uv]->-r requirements.txt (line 7)) (1.2.0) Requirement already satisfied: tomli>=1.1.0 in /opt/venv/lib/python3.10/site-packages (from build[uv]->-r requirements.txt (line 7)) (2.2.1) Requirement already satisfied: uv>=0.1.18 in /opt/venv/lib/python3.10/site-packages (from build[uv]->-r requirements.txt (line 7)) (0.8.10) root@rocm-framework-47:/var/lib/jenkins/pytorch# pip install -r requirements-build.txt ``` (cherry picked from commit 6e6e454)
Signed-off-by: Jagadish Krishnamoorthy <[email protected]> (cherry picked from commit 1ad5bb95d796283d5f56ac1edd16f1731d24a49d) (cherry picked from commit 519160d)
c77773b to
ba4531d
Compare
|
Jenkins build for ba4531d2560231e22e80f0d0cae1ec7d555d7ea1 commit finished as FAILURE |
|
Jenkins build for ba4531d2560231e22e80f0d0cae1ec7d555d7ea1 commit finished as FAILURE |
This PR is an attempt to reduce extra rocm commits in
rocm7.1_internal_testingand place them on top ofupstream/main. The extra commits were identified using the following command:git log --oneline upstream/main..rocm7.1_internal_testing | grep -n "^" > ~/commits.txtI initially did this on 10/14/2025 and there were 153 extra commits, here's the list:
commits_rocm7.1_1014.txt
On 10/29/2025, there were 10 extra commits that went into
rocm7.1_internal_testing. Those additional commits are here:commits_rocm7.1_1029.txt
I have labeled each commit with the following labels:
CHERRY_PICKED - These are the commits that I cherry-picked in this PR and will carry forward as rocm only commits.
UPSTREAMED - These commits are no longer need to be cherry-picked as they are already in upstream/main
SKIPPED - These were skipped as either they are not relevant and a newer commit supersedes them (e.g. triton bumps).
SKIPPED_UT - These were changes to just skip UTs, we will not carry forward these commits.
NEED_UPSTREAMING - For the commits that were cherry-picked, this indicates that we need to upstream those.
Wheel Build Job: http://rocm-ci.amd.com/job/pytorch-manylinux-wheel-builder_rel-preview/102/