Skip to content

Using multi-node with LocalExecutorΒ #130

@LopezGG

Description

@LopezGG

I noticed LocalExecutor has a hard-coded value for nnodes.

https://github.com/NVIDIA/NeMo-Run/blob/b4e2258f61b88c53b77996b5f9ed871ee666d85f/src/nemo_run/core/execution/local.py#L53-L54

Is there a reason multi-nodes are disabled ? It feeds into torch_run which seems to support multi-nodes

https://github.com/NVIDIA/NeMo-Run/blob/b4e2258f61b88c53b77996b5f9ed871ee666d85f/src/nemo_run/run/torchx_backend/components/torchrun.py#L104-L124

Asking because I am using this with AML where I can usually get multi-node working with torchrun

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions