[release/2.9] fix miopen batchnorm changing output format #2813

jerrymannil · 2025-11-19T00:52:21Z

Fixes #SWDEV-567460

rocm-repo-management-api · 2025-11-19T01:03:33Z

Jenkins build for bd4bf5ba56fe97f9417050d8b8eaa66bbfd4ab1f commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last. This also unskips a number of related unit tests. Pull Request resolved: pytorch#162112 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Dmitry Nikolaev <[email protected]> Co-authored-by: Jithun Nair <[email protected]>

rocm-repo-management-api · 2025-11-19T18:04:54Z

Jenkins build for f4c2dd57acfe5382ff8d12c072df9bc2bc8a6fef commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

jerrymannil · 2025-11-19T22:16:32Z

Unit tests also ran fine.
The one failure in test_nn happens with baseline as well.

PYTORCH_TEST_WITH_ROCM=1 python test/nn/test_convolution.py --verbose
 ----------------------------------------------------------------------
Ran 1164 tests in 375.355s

OK (skipped=422, expected failures=25)

PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py --verbose
----------------------------------------------------------------------
FAIL: test_rnn_check_device (__main__.TestNN.test_rnn_check_device)
----------------------------------------------------------------------
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__miopen_rnn)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/torch/testing/_internal/common_utils.py", line 3241, in wrapper
    method(*args, **kwargs)
  File "/var/lib/jenkins/pytorch/test/test_nn.py", line 3958, in test_rnn_check_device
    with self.assertRaisesRegex(RuntimeError,
AssertionError: "Input and parameter tensors are not at the same device" does not match "Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__miopen_rnn)"

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_nn.py TestNN.test_rnn_check_device

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
----------------------------------------------------------------------
Ran 3634 tests in 476.463s

FAILED (failures=1, skipped=984, expected failures=3)

jerrymannil requested review from jeffdaily and jithunnair-amd as code owners November 19, 2025 00:52

jerrymannil closed this Nov 19, 2025

jerrymannil force-pushed the 2.9_batchnorm_fix branch from bd4bf5b to f363ae8 Compare November 19, 2025 17:32

jerrymannil reopened this Nov 19, 2025

jerrymannil requested a review from dnikolaev-amd November 19, 2025 17:58

jerrymannil merged commit 846316e into release/2.9 Nov 19, 2025
2 of 4 checks passed

jerrymannil deleted the 2.9_batchnorm_fix branch November 19, 2025 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.9] fix miopen batchnorm changing output format #2813

[release/2.9] fix miopen batchnorm changing output format #2813

Uh oh!

jerrymannil commented Nov 19, 2025

Uh oh!

rocm-repo-management-api bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

jerrymannil commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[release/2.9] fix miopen batchnorm changing output format #2813

[release/2.9] fix miopen batchnorm changing output format #2813

Uh oh!

Conversation

jerrymannil commented Nov 19, 2025

Uh oh!

rocm-repo-management-api bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrymannil commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rocm-repo-management-api bot commented Nov 19, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 19, 2025 •

edited

Loading