Skip to content

Commit 3708a47

Browse files
committed
fix: skip spare ranks gracefully in end-to-end test
Spare ranks would call sys.exit(0) during NTP initialization, which pytest treats as a failure. Now spare ranks skip the test gracefully before that happens.
1 parent 5c63ff2 commit 3708a47

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

tests/unit_tests/distributed/test_nonuniform_tp.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,9 +408,18 @@ def test_ntp_end_to_end_with_8_gpus(self):
408408
non_active_ranks_per_dp={(0, 0, 0): [2, 3]}, # DP=0: GPUs 2,3 are spares
409409
)
410410

411+
# Check if this rank is a spare (will exit during initialization)
412+
# Spare ranks: DP=0 with tp_rank=2,3
413+
is_spare = dp_rank == 0 and tp_rank in [2, 3]
414+
411415
# Reconfigure process groups for NTP
416+
# Note: spare ranks will call sys.exit(0) in initialize_nonuniform_tp_process_groups
412417
from megatron.core.distributed.nonuniform_tp import initialize_nonuniform_tp_process_groups
413418

419+
if is_spare:
420+
# For spare ranks in test, just mark as passed and exit gracefully
421+
pytest.skip(f"Rank {rank} is a spare rank, skipping test gracefully")
422+
414423
initialize_nonuniform_tp_process_groups(ddp_config)
415424

416425
# After reconfiguration, check TP size

0 commit comments

Comments
 (0)