Commit 3b2b91a
authored
Fixes for multi-node execution with torchrun + LocalExecutor (#251)
- do prepare stage only from single process or rank
- for --node-rank, also look for SLURM_NODEID
Signed-off-by: Pramod Kumbhar <[email protected]>1 parent a61734b commit 3b2b91a
File tree
2 files changed
+7
-1
lines changed- nemo_run/run
- torchx_backend/components
2 files changed
+7
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
665 | 665 | | |
666 | 666 | | |
667 | 667 | | |
668 | | - | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
669 | 672 | | |
670 | 673 | | |
671 | 674 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
| 131 | + | |
131 | 132 | | |
132 | 133 | | |
| 134 | + | |
| 135 | + | |
133 | 136 | | |
134 | 137 | | |
135 | 138 | | |
| |||
0 commit comments