Skip to content

Fix --local-rank argument compatibility with torchrun#165

Open
Mr-Neutr0n wants to merge 1 commit intoSysCV:mainfrom
Mr-Neutr0n:fix-local-rank-arg
Open

Fix --local-rank argument compatibility with torchrun#165
Mr-Neutr0n wants to merge 1 commit intoSysCV:mainfrom
Mr-Neutr0n:fix-local-rank-arg

Conversation

@Mr-Neutr0n
Copy link

Summary

  • Accept both --local-rank (hyphen) and --local_rank (underscore) for the distributed training argument
  • Fixes compatibility with PyTorch's torchrun which passes --local-rank instead of --local_rank

Problem

When using torchrun for distributed training:

torchrun --nproc_per_node=6 train.py ...

The script fails with:

HQ-SAM: error: unrecognized arguments: --local-rank=5

This happens because torchrun passes --local-rank (with hyphen), but the script only accepted --local_rank (with underscore).

Solution

Changed the argument definition to accept both forms using argparse's multiple option names feature.

Test plan

  • Run distributed training with torchrun --nproc_per_node=N train.py
  • Verify backward compatibility with torch.distributed.launch (deprecated)

Fixes #153

Accept both --local-rank and --local_rank for distributed training.
PyTorch's torchrun uses --local-rank (with hyphen), but the script
only accepted --local_rank (with underscore), causing the error:
"unrecognized arguments: --local-rank=5"

Fixes SysCV#153
@Mr-Neutr0n
Copy link
Author

bump — this fixes the --local-rank argument to work with torchrun's --local_rank convention. otherwise it crashes on launch. lmk if anything needs changing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

local-rank argument issue

1 participant