Abnormally slow both single-gpu & DDP training, what is the problem here?

### Bug description

I just adapt my training into lightning framework for convenient ddp model training. But I got almost 10 times slower than my previous manually torch ddp training, the speed is shown below. I have no idea what is wrong here, could anyone help me figure out what may caus this problem and how to fix it?

<img width="726" alt="Image" src="https://github.com/user-attachments/assets/eef7193b-321b-4178-b349-3fa1c3afbddd" />

I have set:
```
pl.Trainer(
    accelerator='gpu',
    devices=#GPUS,
    strategy='ddp',
    sync_batchnorm=True,
    deterministic=True,
    gradient_clip_val=$CLIP_VALUE
)

```

### What version are you seeing the problem on?

v2.3

### How to reproduce the bug

```python

```

### Error messages and logs

```
# Error messages and logs here please
```


### Environment

<details>
  <summary>Current environment</summary>

```
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
```

</details>


### More info

_No response_

cc @justusschock @lantiga

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Abnormally slow both single-gpu & DDP training, what is the problem here? #20702

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abnormally slow both single-gpu & DDP training, what is the problem here? #20702

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions