Skip to content

Make sure all Torch Distributed initialization use device_id=torch.device(f"cuda:{device_id}") #202

@mawad-amd

Description

@mawad-amd

We have some code that looks like this:

    dist.init_process_group(
        backend="nccl",
        rank=rank,
        world_size=world_size,
        init_method="tcp://127.0.0.1:29500"
)

ALL code should look like this:

    dist.init_process_group(
        backend="nccl",
        rank=rank,
        world_size=world_size,
        init_method="tcp://127.0.0.1:29500",
        device_id=torch.device(f"cuda:{device_id}")
    )

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingexamplesExamples showcasing Iris APIs and usagehelp wantedExtra attention is neededirisIris project issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions