You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix distributed failures
- Skip *_stress_cuda UTs for all archs
- Symmetric Memory is not yet supported on rocm7.0_internal_testing
branch
- test_extra_cuda_context - add a barrier to ensure all nodes finish
init_process_group before continuing with the test
- test_sac_ilp: skip for all rocm arch (was already skipped for MI300
and NAVI)
- test_fsdp2_mem_tracker: update tol
- test_scaled_mm - this is row-wise scaling dependent, skipped for now
- test_allreduce_inductor_cudagraph_trees: Skipped as flaky upstream as
well
- test_distributed_spawn - skipped, will be fixed in next IFU
Also fixes: https://ontrack-internal.amd.com/browse/SWDEV-544875
Cherry-pick of #2425
Co-authored-by: Prachi Gupta <[email protected]>
0 commit comments