[release/2.9]Update requirements-ci.txt to use upstream's pin of onnx #2808

ethanwee1 · 2025-11-17T16:23:38Z

Fixes SWDEV-561962

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date (cherry picked from commit 71a30ea)

…for py3.9; upgrade tensorboard compatible with numpy 2 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit e867a3d) (cherry picked from commit c7a1e32) (cherry picked from commit 2a215e4) (cherry picked from commit 866cc1d) (cherry picked from commit 4b46310)

(cherry picked from commit 3d102a0)

(cherry picked from commit cb98724)

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e) (cherry picked from commit 85ac538)

…_rcpf(x) instead of 1.f/x (#1800) Cherry-pick of #1688 Co-authored-by: Michael Halkenhäuser <[email protected]> Co-authored-by: Hashem Hashemi <[email protected]> (cherry picked from commit f8544af) (cherry picked from commit ed48754) (cherry picked from commit d62a39e) (cherry picked from commit b26ddb8)

Related to c7a1e32 Fixes https://ontrack-internal.amd.com/browse/SWDEV-537835 Not a Navi specific failure: ``` File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1412, in only_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1671, in test_cuda_tensor_pow_scalar_tensor self._test_pow(base, exp) File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1482, in _test_pow self.assertEqual(actual, expected) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4052, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: The values for attribute 'dtype' do not match: torch.float32 != torch.float64. ``` Using .to(actual) without specifying dtype/device assumes actual is a tensor or tensor-like, which may fail silently or promote. Fixed by explicitly matching dtype and device. Going from pytorch#107302 Fix: ``` root@ubb4-rack-22:/var/lib/jenkins/pytorch# TEST_CONFIG=default HIP_VISIBLE_DEVICES=0 PYTORCH_TEST_WITH_ROCM=1 python test/test_binary_ufuncs.py TestBinaryUfuncsCUDA.test_cuda_tensor_pow_scalar_tensor_cuda /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Running tests... ---------------------------------------------------------------------- . ---------------------------------------------------------------------- Ran 1 test in 0.141s OK Generating XML reports... root@ubb4-rack-22:/var/lib/jenkins/pytorch# pip list | grep numpy numpy 2.1.2 ``` (cherry picked from commit a4d60fa) (cherry picked from commit 9f11871)

This PR fixes the unit test, test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s] ``` Traceback (most recent call last): File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda") RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432] ``` This error occurs only on gfx1101 arch. This error is coming from an integer overflow when another unit test, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel creates a tensor with a huge numel, which overflows into a higher torch.cuda.max_memory_reserved() when you call test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction afterward. To avoid this we introduced torch.cuda.empty_cache() and torch.cuda.reset_peak_memory_stats() to clean up CUDA states. JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295 (cherry picked from commit f86d184) (cherry picked from commit 1b44228)

…g torch and numpy tensors (#2362) Cherry-pick of #2340 Co-authored-by: Dmitry Nikolaev <[email protected]> (cherry picked from commit 22c98ea) (cherry picked from commit 2d72fcd)

pip installed requirements.txt and .ci/docker/requirements-ci.txt Local validation: `Successfully installed jinja2-3.1.6 lintrunner-0.12.7 mypy-1.14.0 onnxscript-0.2.2 sympy-1.13.3 tlparse-0.3.30 z3-solver-4.12.6.0` (cherry picked from commit 30508ff) (cherry picked from commit 22d02e8)

Adds initial autotuning for foreach support required for https://ontrack-internal.amd.com/browse/SWDEV-539076 4x improvement for some kernels Before: triton_for_fused_18.kd 🔍 | 4.986 ms | 4.986 ms | 2.493 ms | 2 | triton_for_fused_6.kd 🔍 | 0.098 ms | 0.098 ms | 0.049 ms | 2 | triton_for_fused_7.kd 🔍 | 0.036 ms | 0.036 ms | 0.018 ms | 2 | After: triton_for_fused_18.kd 🔍 | 1.273 ms | 1.273 ms | 0.636 ms | 2 | triton_for_fused_6.kd 🔍 | 0.044 ms | 0.044 ms | 0.022 ms | 2 | triton_for_fused_7.kd 🔍 | 0.024 ms | 0.024 ms | 0.012 ms | 2 | (cherry picked from commit f07b7f7) (cherry picked from commit ed0d0a7)

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit f0aebdc) (cherry picked from commit 9c429dd)

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit 77a6760)

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit eb47158)

Perf improvement for triton tanh (cherry picked from commit 4febbd8)

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit c03be63)

Fixes SWDEV-543698 (https://ontrack-internal.amd.com/browse/SWDEV-543698) Cherry-picked from #2502 This PR fixes the errors like below: ``` [rank3]: RuntimeError: The following operation failed in the TorchScript interpreter. [rank3]: Traceback of TorchScript (most recent call last): [rank3]: RuntimeError: /tmp/comgr-28f951/input/CompileSourceACC062:67:7: error: unknown type name 'uint32_t'; did you mean '__hip_internal::uint32_t'? [rank3]: 67 | uint32_t int32; [rank3]: | ^~~~~~~~ [rank3]: | __hip_internal::uint32_t ``` Earlier uint32_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0. (cherry picked from commit b2fb688)

…2598) Cherry-pick of #2597 Co-authored-by: Jerry Mannil <[email protected]> (cherry picked from commit 9ea02c4)

Original PR (#2417) had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only. Tested locally on test_torchinductor: Ran 894 tests in 952.242s FAILED (failures=1, skipped=28) And completed autotune runs for microbench models Microbenchmark for network : resnet152 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.09107530117034912 Throughput [img/sec] : 702.7152167226226 (cherry picked from commit db3ba66)

cherry-pick of 8d42697 (cherry picked from commit 0b82d9a)

* cherry-pick of pytorch@2aadcea (cherry picked from commit bd74018)

cherry-pick of pytorch#163869 (cherry picked from commit dfd386f)

…IFU_2025-10-14

[AUTOGENERATED] release/2.9_IFU_2025-10-14

Cherry-pick of #2693 Co-authored-by: Gheorghe-Teodor Bercea <[email protected]>

Valdiation: http://rocm-ci.amd.com/job/mainline-pytorch2.9-manylinux-wheels/21/

Cherry-pick of #2710 Co-authored-by: Jerry Mannil <[email protected]>

…2722) These changes from upstream result in a breakage when loading external library ``` 61170: calling init: /opt/venv/lib/python3.12/site-packages/torchvision/_C.so 61170: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Fatal Python error: Aborted Current thread 0x00007f229fb36080 (most recent call first): File "/usr/lib/python3.12/ctypes/__init__.py", line 379 in __init__ File "/pytorch/torch/_ops.py", line 1488 in load_library File "/opt/venv/lib/python3.12/site-packages/torchvision/extension.py", line 34 in <module> File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed File "<frozen importlib._bootstrap_external>", line 995 in exec_module File "<frozen importlib._bootstrap>", line 935 in _load_unlocked File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 1360 in _find_and_load File "/opt/venv/lib/python3.12/site-packages/torchvision/__init__.py", line 9 in <module> ``` This was already reverted in rocm/7.1_internal_testing, need to investigate whether upstream needs a fix

These changes are currently in progress of being upstreamed. Bring into release 2.9 for customer model perf improvement --------- Co-authored-by: Nichols A. Romero <[email protected]> Co-authored-by: Sampsa Riikonen <[email protected]> Co-authored-by: Nichols A. Romero <[email protected]> Co-authored-by: AmdSampsa <[email protected]>

Latest updates from triton

Fixes [2.9 board issue](https://github.com/orgs/ROCm/projects/18/views/1?filterQuery=sprint%3A%40current+assignee%3A%22ethanwee1%22&pane=issue&itemId=132061338&issue=ROCm%7Cframeworks-internal%7C13983) Validation: http://rocm-ci.amd.com/job/mainline-pytorch2.9-manylinux-wheels/22/

Ref.: pytorch#164572

Validation: http://rocm-ci.amd.com/job/mainline-pytorch2.9-manylinux-wheels/26/

…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <[email protected]>

A few UT failures are caused by `HIPBLASLT_ALLOW_TF32` Fixes pytorch#157094 Fixes pytorch#157093 Fixes pytorch#157092 Fixes pytorch#157091 Fixes pytorch#157064 Fixes pytorch#157063 Fixes pytorch#157062 Fixes pytorch#157061 Fixes pytorch#157042 Fixes pytorch#157041 Fixes pytorch#157039 Fixes pytorch#157004 Pull Request resolved: pytorch#162998 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]>

…3373) Early assignment of `__AOTRITON_LIB` breaks the usage of environment variable `$AOTRITON_INSTALLED_PREFIX` Pull Request resolved: pytorch#163373 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily

## Major Changes * Efficient Attention on ROCM requires last dimensions of input tensors align with 16 bytes. - Unlike FA, ME does not pad input tensors in `scaled_dot_product_attention` and hence this is required. * Fix `atomic_counter` handling in varlen FA API * Unskips a few unit tests. Fixes pytorch#157120 Fixes pytorch#157121 Fixes pytorch#157122 Fixes pytorch#157167 Fixes pytorch#155217 Fixes pytorch#157043 Fixes pytorch#157060 Pull Request resolved: pytorch#163745 Approved by: https://github.com/jeffdaily

- TheRock build system for ROCm builds OpenBLAS from source and uses a custom name for the library. - Following existing conventions in `FindOpenBLAS.cmake` to support finding a custom named version of OpenBLAS.

Cherry-pick of #2738 Co-authored-by: Jerry Mannil <[email protected]>

Cherry-pick of #2740 Co-authored-by: Jerry Mannil <[email protected]>

Cherry-pick of #2743 Co-authored-by: Jerry Mannil <[email protected]>

Cherry-pick of #2675 (original commit #2055), for Navi4x only these testcases skipped until support of next kernels will be added (progress can be tracked here ROCm/rocm-libraries#2237): for test_freeze_conv_relu_fusion_not_forward and test_freeze_conv_relu_fusion: ConvBinWinogradRxSf2x3g1Fused for test_cudnn_convolution_relu: ConvBinWinogradRxSf2x3g1, ConvBinWinogradRxSf2x3g1Fused and ConvWinoFuryRxS<2-3> Fixes #SWDEV-555401 Co-authored-by: Divin Honnappa <[email protected]> Co-authored-by: Dmitry Nikolaev <[email protected]>

…2778) added missed test routine to release/2.9 which present in release/2.8 and main branch of upstream. Fixes #SWDEV-555401 Signed-off-by: Artem Kuzmitckii <[email protected]>

… Native accuracy issue (#2788) Skip for `test_batchnorm_3D_train_NCHW_vs_native_mixed_float16` Cherry-pick of #2370 ~Need to resolve conflicts~ - resolved --------- Co-authored-by: Dmitry Nikolaev <[email protected]>

hipblaslt should provide better performance in general

Cherry-pick of #2786 Co-authored-by: Ethan Wee <[email protected]>

Added a check that includes autotune configs for 2D POI only if their size is big enough.

causten · 2025-11-17T16:30:18Z

.ci/docker/requirements-ci.txt

-#Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
+onnx==1.19.1 ; python_version < "3.14"
+# Unpin once Python 3.14 is supported. See  onnxruntime issue 26309.
+onnx==1.18.0 ; python_version == "3.14"


If Python is 3.12 then you will be installing 1.19.1. If Python is newer (3.14) then you install an older version of onnx. Was that intentional?

rocm-repo-management-api · 2025-11-17T16:59:29Z

Jenkins build for 5b09b489290cb8680743a38c3f5abac067a6ed8d commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

jithunnair-amd and others added 30 commits October 10, 2025 14:55

[release/2.8] Enable wheels

476d3c1

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date (cherry picked from commit 71a30ea)

Use ROCm/triton and update triton.txt

3ff7844

(cherry picked from commit 3d102a0)

Add related_commits file (#2396)

fe59c33

(cherry picked from commit cb98724)

Add QA automation scripts for running PyTorch unit tests

14b0f0e

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e) (cherry picked from commit 85ac538)

[AUTOGENERATED] [release/2.7] [release/2.6] Fix dtype before comparin…

4361e47

…g torch and numpy tensors (#2362) Cherry-pick of #2340 Co-authored-by: Dmitry Nikolaev <[email protected]> (cherry picked from commit 22c98ea) (cherry picked from commit 2d72fcd)

[release/2.7] [SWDEV-543214] Reland #2416 Fix warps runtime (#2421)

4142eef

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit f0aebdc) (cherry picked from commit 9c429dd)

[AUTOGENERATED] [release/2.8] [release/2.7] [SWDEV-543214] Reland #2416…

5e67be1

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit 77a6760)

[AUTOGENERATED] [release/2.8] [SWDEV-539215] - Autotune support for p…

c58ceb1

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit eb47158)

[SWDEV-539119] [release/2.8] Add fast_tanh support (#2484)

406100f

Perf improvement for triton tanh (cherry picked from commit 4febbd8)

[AUTOGENERATED] [release/2.8] Change triton package name depending on…

fcc0d85

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit c03be63)

[AUTOGENERATED] [release/2.8] [ROCm] OffsetCalc Unroll Optimization (#…

2711b3e

…2598) Cherry-pick of #2597 Co-authored-by: Jerry Mannil <[email protected]> (cherry picked from commit 9ea02c4)

[ROCm] Fix indexing_backward_kernel perf (#2667)

7b8bc05

cherry-pick of 8d42697 (cherry picked from commit 0b82d9a)

[ROCm] Improve perf for elementwise broadcast with mixed dtype (#2672)

506d5ce

* cherry-pick of pytorch@2aadcea (cherry picked from commit bd74018)

[ROCm] Implement float32 copy kernel (#2683)

55b2445

cherry-pick of pytorch#163869 (cherry picked from commit dfd386f)

Bump triton to 3.5.x

123b638

Update fbgemm submodule to avoid ck errors

426b2e8

Merge remote-tracking branch 'upstream/release/2.9' into release/2.9_…

31b3b8e

…IFU_2025-10-14

Merge pull request #2709 from ROCm/release/2.9_IFU_2025-10-14

06ee6e4

[AUTOGENERATED] release/2.9_IFU_2025-10-14

Update version to 2.9.0

c126ff5

[ROCm] Fix non-stride-one backwards indexing performance

4fe15f2

Cherry-pick of #2693 Co-authored-by: Gheorghe-Teodor Bercea <[email protected]>

[release/2.9] remove amdgpu-coerce-illegal-types=1 (#2720)

9bb5bae

Valdiation: http://rocm-ci.amd.com/job/mainline-pytorch2.9-manylinux-wheels/21/

[ROCm] Adjust grid size for non-unit stride backwards indexing

fa57f9c

Cherry-pick of #2710 Co-authored-by: Jerry Mannil <[email protected]>

jataylo and others added 22 commits October 17, 2025 09:18

Update to tip of release/internal/3.5.x (#2727)

2ef9fcd

Latest updates from triton

Optimized BiLiear 2D Up Sampling for AMD MI devices (#2729)

5ed482a

Ref.: pytorch#164572

[release/2.9] Corrective PR (#2733)

7dc7c69

Validation: http://rocm-ci.amd.com/job/mainline-pytorch2.9-manylinux-wheels/26/

[ROCm] Custom OpenBLAS library name (#2752)

8b386d1

- TheRock build system for ROCm builds OpenBLAS from source and uses a custom name for the library. - Following existing conventions in `FindOpenBLAS.cmake` to support finding a custom named version of OpenBLAS.

[ROCm] [Normalization] Update block size

642c8c0

Cherry-pick of #2738 Co-authored-by: Jerry Mannil <[email protected]>

[ROCm] Deserialize loads in planer sum portion of reduce() of norm

38af0bd

Cherry-pick of #2740 Co-authored-by: Jerry Mannil <[email protected]>

[ROCm] deserialize loads in planer sum portion of stats() of norm

25a49ce

Cherry-pick of #2743 Co-authored-by: Jerry Mannil <[email protected]>

new autotuning configs for wri0 (#2767)

55c9130

[release/2.9] add AMD routine to common_utils.py of test framework (#…

e7df144

…2778) added missed test routine to release/2.9 which present in release/2.8 and main branch of upstream. Fixes #SWDEV-555401 Signed-off-by: Artem Kuzmitckii <[email protected]>

[release/2.9] Add gfx110X and gfx115X to prefered hipBLASLt list (#2742)

c8b6bc9

hipblaslt should provide better performance in general

Remove --no-use-pep517

832d147

Cherry-pick of #2786 Co-authored-by: Ethan Wee <[email protected]>

[NO CP] triton sanity check for 2D POI (#2798)

3082a53

Added a check that includes autotune configs for 2D POI only if their size is big enough.

Update requirements-ci.txt to use upstream's pin of onnx

5b09b48

ethanwee1 requested a review from jeffdaily as a code owner November 17, 2025 16:23

causten reviewed Nov 17, 2025

View reviewed changes

jeffdaily force-pushed the release/2.9 branch from 78640c9 to 6ecd7c5 Compare November 17, 2025 20:54

jeffdaily requested review from jataylo, jithunnair-amd and pruthvistony as code owners November 17, 2025 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.9]Update requirements-ci.txt to use upstream's pin of onnx #2808

[release/2.9]Update requirements-ci.txt to use upstream's pin of onnx #2808

Uh oh!

ethanwee1 commented Nov 17, 2025

Uh oh!

causten Nov 17, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

[release/2.9]Update requirements-ci.txt to use upstream's pin of onnx #2808

Are you sure you want to change the base?

[release/2.9]Update requirements-ci.txt to use upstream's pin of onnx #2808

Uh oh!

Conversation

ethanwee1 commented Nov 17, 2025

Uh oh!

causten Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

causten Nov 17, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 17, 2025 •

edited

Loading