[rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2370

dnikolaev-amd · 2025-07-15T15:30:25Z

Skip for test_nn.py::TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_float16
Test failed on weight gradient comparison MIOpen/CuDNN vs Native batchnorm.

But CPU test test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16 passed.
It looks like FP16 Native batchnorm issue.

Failed on MI200/MI300 and V100
It passed somehow on Navi (with enabled MIOpen)

Fixes SWDEV-541024, SWDEV-539171

python test_nn.py -v -k test_batchnorm_3D_train_NCHW_vs_native_mixed_float16

test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... skipped '3D float16 NCHW train failed on CUDA and ROCm due to Native batchnorm accuracy issue SWDEV-541024'

OK (skipped=1)

Cherry-picked to release/2.7 branch via #2390

Cherry-picked to release/2.6 branch via #2391

Cherry-picked to release/2.8 branch via #2652

Cherry-picked to release/2.9 branch via #2788

rocm-repo-management-api · 2025-07-15T15:58:40Z

Jenkins build for 2f9e18c5fb255cbd1f554c070bcf6d852ab9b848 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

dnikolaev-amd · 2025-07-21T18:34:55Z

! cherry-pick --onto release/2.7

dnikolaev-amd · 2025-07-21T18:35:40Z

! cherry-pick --onto release/2.6

…tive accuracy issue (#2370) Skip for `test_nn.py::TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_float16` Test failed on `weight gradient` comparison MIOpen/CuDNN vs Native batchnorm. But CPU test `test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16` passed. It looks like FP16 Native batchnorm issue. Failed on MI200/MI300 and V100 It passed somehow on Navi (with enabled MIOpen) Fixes SWDEV-541024, SWDEV-539171 ``` python test_nn.py -v -k test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... skipped '3D float16 NCHW train failed on CUDA and ROCm due to Native batchnorm accuracy issue SWDEV-541024' OK (skipped=1) ```

okakarpa · 2025-07-21T19:24:11Z

Created branch autogenerated/release/2.7_cherry-pick_pr-2370 and #2390

…tive accuracy issue (#2370) Skip for `test_nn.py::TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_float16` Test failed on `weight gradient` comparison MIOpen/CuDNN vs Native batchnorm. But CPU test `test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16` passed. It looks like FP16 Native batchnorm issue. Failed on MI200/MI300 and V100 It passed somehow on Navi (with enabled MIOpen) Fixes SWDEV-541024, SWDEV-539171 ``` python test_nn.py -v -k test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... skipped '3D float16 NCHW train failed on CUDA and ROCm due to Native batchnorm accuracy issue SWDEV-541024' OK (skipped=1) ```

okakarpa · 2025-07-21T19:28:07Z

Created branch autogenerated/release/2.6_cherry-pick_pr-2370 and #2391

… Native accuracy issue (#2391) Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

… Native accuracy issue (#2390) Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

#2440) This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659. In this Jira, there are 3 test files with failing tests. 1) distributed.test_distributed_spawn 2) test_binary_ufuncs 3) test_nn The test files **distributed.test_distributed_spawn** & **test_binary_ufuncs** are passing with latest mainline build- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. The test file **test_nn** has 2 failing tests- **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** & **test_RNN_dropout_state**. The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is skipped from PR #2370. The **test_RNN_dropout_state** is fixed by cherry picking upstream commit 1aa971a. Tested on MI200 with docker image- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. --------- Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>

dnikolaev-amd · 2025-09-17T23:21:29Z

! cherry-pick --onto release/2.8

…tive accuracy issue (#2370) Skip for `test_nn.py::TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_float16` Test failed on `weight gradient` comparison MIOpen/CuDNN vs Native batchnorm. But CPU test `test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16` passed. It looks like FP16 Native batchnorm issue. Failed on MI200/MI300 and V100 It passed somehow on Navi (with enabled MIOpen) Fixes SWDEV-541024, SWDEV-539171 ``` python test_nn.py -v -k test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN) ... skipped '3D float16 NCHW train failed on CUDA and ROCm due to Native batchnorm accuracy issue SWDEV-541024' OK (skipped=1) ```

dhonnappa-amd · 2025-09-18T00:02:26Z

Created branch autogenerated/release/2.8_cherry-pick_pr-2370 and #2652

Comment processed by Build

Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

dnikolaev-amd · 2025-11-05T19:57:39Z

! cherry-pick --onto release/2.9

rocm-repo-management-api · 2025-11-05T20:19:05Z

Created branch autogenerated/release/2.9_cherry-pick_pr-2370 and #2788. It contains a merge conflict. Please resolve it

Comment processed by Build

… Native accuracy issue (#2788) Skip for `test_batchnorm_3D_train_NCHW_vs_native_mixed_float16` Cherry-pick of #2370 ~Need to resolve conflicts~ - resolved --------- Co-authored-by: Dmitry Nikolaev <[email protected]>

Skip 3D NCHW FP16 batchnorm test due to Native accuracy issue

2f9e18c

dnikolaev-amd requested a review from jithunnair-amd July 15, 2025 15:30

dnikolaev-amd changed the title ~~Skip 3D NCHW FP16 batchnorm test due to Native accuracy issue~~ [rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue Jul 15, 2025

pruthvistony approved these changes Jul 19, 2025

View reviewed changes

pruthvistony merged commit 4eaa5bf into rocm7.0_internal_testing Jul 19, 2025
0 of 4 checks passed

pruthvistony deleted the skip_fp16_nchw_native_batchnorm_test branch July 19, 2025 05:26

okakarpa mentioned this pull request Jul 21, 2025

[AUTOGENERATED] [release/2.7] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2390

Merged

okakarpa mentioned this pull request Jul 21, 2025

[AUTOGENERATED] [release/2.6] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2391

Merged

jithunnair-amd pushed a commit that referenced this pull request Jul 24, 2025

[AUTOGENERATED] [release/2.6] skip 3D NCHW FP16 batchnorm test due to…

9663f2d

… Native accuracy issue (#2391) Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

jithunnair-amd pushed a commit that referenced this pull request Jul 24, 2025

[AUTOGENERATED] [release/2.7] skip 3D NCHW FP16 batchnorm test due to…

d5542b8

… Native accuracy issue (#2390) Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

akashveramd mentioned this pull request Jul 31, 2025

[release/2.7] Fix test_rnn_check_device tests for P1 Jira SWDEV-542659 #2440

Merged

dhonnappa-amd mentioned this pull request Sep 18, 2025

[AUTOGENERATED] [release/2.8] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2652

Merged

dhonnappa-amd added a commit that referenced this pull request Sep 18, 2025

skip 3D NCHW FP16 batchnorm test due to Native accuracy issue

665147a

Cherry-pick of #2370 Co-authored-by: Dmitry Nikolaev <[email protected]>

rocm-repo-management-api bot mentioned this pull request Nov 5, 2025

[AUTOGENERATED] [release/2.9] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2370

[rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2370

Uh oh!

dnikolaev-amd commented Jul 15, 2025 •

edited by rocm-repo-management-api bot

Loading

Uh oh!

rocm-repo-management-api bot commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

dnikolaev-amd commented Jul 21, 2025

Uh oh!

dnikolaev-amd commented Jul 21, 2025

Uh oh!

okakarpa commented Jul 21, 2025

Uh oh!

okakarpa commented Jul 21, 2025

Uh oh!

dnikolaev-amd commented Sep 17, 2025

Uh oh!

dhonnappa-amd commented Sep 18, 2025

Uh oh!

dnikolaev-amd commented Nov 5, 2025

Uh oh!

rocm-repo-management-api bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2370

[rocm7.0_internal_testing] skip 3D NCHW FP16 batchnorm test due to Native accuracy issue #2370

Uh oh!

Conversation

dnikolaev-amd commented Jul 15, 2025 • edited by rocm-repo-management-api bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dnikolaev-amd commented Jul 21, 2025

Uh oh!

dnikolaev-amd commented Jul 21, 2025

Uh oh!

okakarpa commented Jul 21, 2025

Uh oh!

okakarpa commented Jul 21, 2025

Uh oh!

dnikolaev-amd commented Sep 17, 2025

Uh oh!

dhonnappa-amd commented Sep 18, 2025

Uh oh!

dnikolaev-amd commented Nov 5, 2025

Uh oh!

rocm-repo-management-api bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dnikolaev-amd commented Jul 15, 2025 •

edited by rocm-repo-management-api bot

Loading

rocm-repo-management-api bot commented Jul 15, 2025 •

edited

Loading