Skip to content

Strategy fsdp requires a GPU accelerator, but got CUDAAcceleratorΒ #20957

@liopeer

Description

@liopeer

Bug description

Unfortunately, when using FSDP it is not possible to directly pass an Accelerator instance to the Trainer, but instead it requires "gpu" or "cuda" keywords. This is counter-intuitive, since it works for the DDP strategy.

Example to reproduce below.

What version are you seeing the problem on?

v2.5

Reproduced in studio

No response

How to reproduce the bug

import torch
from torch.utils.data import DataLoader, TensorDataset
import lightning as L
from lightning.pytorch.accelerators.cuda import CUDAAccelerator
from torch.nn import Linear

from my_lightning_module import MyLightningModule

if __name__ == "__main__":
    param_size = 8192 * 2

    # Same parameters as DDP script
    x = torch.randn(256, param_size)
    y = torch.randn(256, param_size)
    dataset = TensorDataset(x, y)
    dataloader = DataLoader(dataset, batch_size=256)
    
    model = MyLightningModule(param_size, param_size)
    trainer = L.Trainer(max_epochs=5, devices=4, strategy="fsdp", accelerator=CUDAAccelerator())
    trainer.fit(model, dataloader)

Error messages and logs

Traceback (most recent call last):
  File "<anonymized>/train_lightning_fsdp.py", line 20, in <module>
    trainer = L.Trainer(max_epochs=5, devices=4, strategy="fsdp", accelerator=CUDAAccelerator())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/utilities/argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/trainer.py", line 404, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 156, in __init__
    self._check_strategy_and_fallback()
  File "<anonymized>/.venv/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py", line 463, in _check_strategy_and_fallback
    raise ValueError(
ValueError: The strategy `fsdp` requires a GPU accelerator, but got: <lightning.pytorch.accelerators.cuda.CUDAAccelerator object at 0x71ecdb3f2240>

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

cc @justusschock @lantiga

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions