-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingdistributedGeneric distributed-related topicGeneric distributed-related topicver: 2.5.x
Description
Bug description
As of torch 2.8.0, a warning is issued whenever no device id is provided via init_process_group
or barrier
.
This warning is issued at every step of training and clogs logs for long training runs.
This warning is issued due to a call to torch.distributed.barrier
. This call doesn't pass device_ids
, and the ProcessGroup
passed lacks a bound_device_id
, which triggers the warning.
The warning is
~/minimal-no-device-id/.venv/lib/python3.13/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
Versions
lightning==2.5.3
torch==2.8.0
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
import torch
from torch.utils.data import DataLoader, TensorDataset
import pytorch_lightning as pl
from torch import nn
# Minimal synthetic dataset: 100 samples, 10 features
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=16)
class DummyModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.layer = nn.Linear(10, 2)
self.loss = nn.CrossEntropyLoss()
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
x, y = batch
logits = self(x)
loss = self.loss(logits, y)
return loss
def configure_optimizers(self):
return torch.optim.SGD(self.parameters(), lr=0.01)
if __name__ == "__main__":
model = DummyModel()
trainer = pl.Trainer(max_epochs=2, accelerator="auto")
trainer.fit(model, dataloader)
Error messages and logs
~/minimal-no-device-id/.venv/lib/python3.13/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdistributedGeneric distributed-related topicGeneric distributed-related topicver: 2.5.x