Fix --local-rank argument compatibility with torchrun#165
Open
Mr-Neutr0n wants to merge 1 commit intoSysCV:mainfrom
Open
Fix --local-rank argument compatibility with torchrun#165Mr-Neutr0n wants to merge 1 commit intoSysCV:mainfrom
Mr-Neutr0n wants to merge 1 commit intoSysCV:mainfrom
Conversation
Accept both --local-rank and --local_rank for distributed training. PyTorch's torchrun uses --local-rank (with hyphen), but the script only accepted --local_rank (with underscore), causing the error: "unrecognized arguments: --local-rank=5" Fixes SysCV#153
Author
|
bump — this fixes the --local-rank argument to work with torchrun's --local_rank convention. otherwise it crashes on launch. lmk if anything needs changing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--local-rank(hyphen) and--local_rank(underscore) for the distributed training argumenttorchrunwhich passes--local-rankinstead of--local_rankProblem
When using
torchrunfor distributed training:The script fails with:
This happens because
torchrunpasses--local-rank(with hyphen), but the script only accepted--local_rank(with underscore).Solution
Changed the argument definition to accept both forms using argparse's multiple option names feature.
Test plan
torchrun --nproc_per_node=N train.pytorch.distributed.launch(deprecated)Fixes #153