-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.1.x
Description
Bug description
Hi, my problem is even if my environments have multiple gpus, it only runs on one GPU. Could you help me?
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
You are using a CUDA device ('NVIDIA RTX A6000') that has Tensor Cores. To properly utilize them, you should set `torch.set_flo
at32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorc
h.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1,2]
What version are you seeing the problem on?
v2.1
How to reproduce the bug
The running scripts:
trainer = pl.Trainer(
accelerator="gpu",
devices=2,
max_epochs=100,
strategy="ddp"
trainer.fit(model, data_module)
Error messages and logs
# Error messages and logs here please
Environment
Current environment
python 3.9.19 h955ad1f_1
pytorch-lightning 2.0.0 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.32.3 pypi_0 pypi
setuptools 72.1.0 py39h06a4308_0
sqlite 3.45.3 h5eee18b_0
sympy 1.13.2 pypi_0 pypi
tk 8.6.14 h39e8969_0
torch 2.0.0+cu118 pypi_0 pypi
torchaudio 2.0.0+cu118 pypi_0 pypi
torchmetrics 1.4.1 pypi_0 pypi
torchvision 0.15.0+cu118 pypi_0 pypi
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.1.x