-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Description
Bug description
Unfortunately, when using FSDP it is not possible to directly pass an Accelerator
instance to the Trainer
, but instead it requires "gpu" or "cuda" keywords. This is counter-intuitive, since it works for the DDP strategy.
Example to reproduce below.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
import torch
from torch.utils.data import DataLoader, TensorDataset
import lightning as L
from lightning.pytorch.accelerators.cuda import CUDAAccelerator
from torch.nn import Linear
from my_lightning_module import MyLightningModule
if __name__ == "__main__":
param_size = 8192 * 2
# Same parameters as DDP script
x = torch.randn(256, param_size)
y = torch.randn(256, param_size)
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=256)
model = MyLightningModule(param_size, param_size)
trainer = L.Trainer(max_epochs=5, devices=4, strategy="fsdp", accelerator=CUDAAccelerator())
trainer.fit(model, dataloader)
Error messages and logs
Traceback (most recent call last):
File "<anonymized>/train_lightning_fsdp.py", line 20, in <module>
trainer = L.Trainer(max_epochs=5, devices=4, strategy="fsdp", accelerator=CUDAAccelerator())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
^^^^^^^^^^^^^^^^^^
File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 404, in __init__
self._accelerator_connector = _AcceleratorConnector(
^^^^^^^^^^^^^^^^^^^^^^
File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 156, in __init__
self._check_strategy_and_fallback()
File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 463, in _check_strategy_and_fallback
raise ValueError(
ValueError: The strategy `fsdp` requires a GPU accelerator, but got: <lightning.pytorch.accelerators.cuda.CUDAAccelerator object at 0x71ecdb3f2240>
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
bhimrazy