`Fabric.all_reduce` silently fails when tensors are on CPU

### 📚 Documentation

I wrote some code involving CPU offloading in a DDP context, and I used `Fabric.all_reduce` with arguments which were tensors on the CPU for each process associated with a rank.

This just fails silently, in that the tensors for each rank are the same afterwards. The docs are also silent about this. It should say that `all_reduce` works only if for rank `k`, the tensor passed as argument must be on `device("cuda", k)`. Otherwise, it just fails silently.

When you do CPU offloading, there are valid reasons to exchange tensors stored on CPU between processes, so this is (I think) not entirely dumb. I think the docs should be clear, and even better there should be an exception thrown.

cc @lantiga @justusschock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Fabric.all_reduce` silently fails when tensors are on CPU #21530

📚 Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fabric.all_reduce silently fails when tensors are on CPU #21530

Description

📚 Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Fabric.all_reduce` silently fails when tensors are on CPU #21530