PyTorchProfiler does not profile GPU

### Bug description

Using PyTorchProfiler I don't get GPU profiling in Tensorboard view, the logs indicate that GPU is being used. 

![Image](https://github.com/user-attachments/assets/8029e8b9-3689-49b4-9411-9de0fc2d8aca)

### What version are you seeing the problem on?

v2.5

### How to reproduce the bug

```python
Conda env

name: lightning_tutorials
channels:
  - conda-forge
dependencies:
  - python=3.12
  - lightning
  - torchvision=0.21.0=cuda126_py312_h361dbbe_0
  - tensorboard
  - torch-tb-profiler

train.py

import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data
import torchvision as tv
from lightning.pytorch.loggers import TensorBoardLogger
from lightning.pytorch.profilers import PyTorchProfiler

# --------------------------------
# Step 1: Define a LightningModule
# --------------------------------
# A LightningModule (nn.Module subclass) defines a full *system*
# (ie: an LLM, diffusion model, autoencoder, or simple image classifier).


class LitAutoEncoder(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3)
        )
        self.decoder = nn.Sequential(
            nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28)
        )

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop. It is independent of forward
        x, _ = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer


# -------------------
# Step 2: Define data
# -------------------
dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor())
train, val = data.random_split(dataset, [55000, 5000])

# -------------------
# Step 3: Train
# -------------------
autoencoder = LitAutoEncoder()
logger = TensorBoardLogger(save_dir="tb_logs")
profiler = PyTorchProfiler()
trainer = L.Trainer(logger=logger, profiler=profiler, max_epochs=2)
trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val))
```

### Error messages and logs

```
(lightning_tutorials) PS C:\Users\anguzo\Projects\work\Machine-Learning-Collection> & C:/Users/anguzo/.local/share/mamba/envs/lightning_tutorials/python.exe "c:/Users/anguzo/Projects/work/Machine-Learning-Collection/ML/Pytorch/pytorch_lightning/9.1 Prof/train.py"
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
C:\Users\anguzo\.local\share\mamba\envs\lightning_tutorials\Lib\site-packages\lightning\pytorch\trainer\configuration_validator.py:68: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name    | Type       | Params | Mode
-----------------------------------------------
0 | encoder | Sequential | 100 K  | train
1 | decoder | Sequential | 101 K  | train
-----------------------------------------------
202 K     Trainable params
0         Non-trainable params
202 K     Total params
0.810     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode
C:\Users\anguzo\.local\share\mamba\envs\lightning_tutorials\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
Epoch 0:   0%|                                                                                                                                                 | 4/55000 [00:00<39:03, 23.47it/s, v_num=0][W226 15:37:54.000000000 collection.cpp:647] Warning: Optimizer.step#Adam.step (function operator ())
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55000/55000 [03:34<00:00, 256.89it/s, v_num=0]`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55000/55000 [03:34<00:00, 256.88it/s, v_num=0] 
FIT Profiler Report
Profile stats for: records
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls       
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                          ProfilerStep*        18.41%      10.089ms        60.02%      32.896ms      10.965ms       8.532ms        15.48%      32.903ms      10.968ms             3        
                        [pl][profile]run_training_batch         0.22%     118.400us        27.99%      15.341ms       7.670ms     107.000us         0.19%      15.349ms       7.675ms             2        
[pl][profile][LightningModule]LitAutoEncoder.optimiz...         0.13%      70.600us        27.77%      15.222ms       7.611ms      48.000us         0.09%      15.242ms       7.621ms             2        
                               Optimizer.step#Adam.step        20.39%      11.178ms        27.64%      15.152ms       7.576ms      11.207ms        20.33%      15.194ms       7.597ms             2        
[pl][profile][Strategy]SingleDeviceStrategy.backward...        19.32%      10.590ms        20.25%      11.099ms       3.700ms      10.435ms        18.93%      11.136ms       3.712ms             3        
[pl][profile][Strategy]SingleDeviceStrategy.training...         3.26%       1.788ms        10.56%       5.790ms       1.930ms       1.524ms         2.77%       5.809ms       1.936ms             3        
    autograd::engine::evaluate_function: AddmmBackward0         1.48%     812.100us         5.97%       3.271ms     272.583us     473.000us         0.86%       3.406ms     283.833us            12        
                                         AddmmBackward0         1.47%     806.000us         4.16%       2.280ms     189.967us     578.000us         1.05%       2.577ms     214.750us            12        
                                                aten::t         1.95%       1.066ms         3.62%       1.986ms      34.840us       1.088ms         1.97%       2.560ms      44.912us            57        
[pl][profile][_TrainingEpochLoop].train_dataloader_n...         0.21%     113.100us         3.95%       2.167ms     722.333us      94.000us         0.17%       2.214ms     738.000us             3        
enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         2.32%       1.273ms         3.75%       2.054ms     684.633us     924.000us         1.68%       2.120ms     706.667us             3        
[pl][module]torch.nn.modules.container.Sequential: e...         0.58%     315.500us         3.02%       1.653ms     551.067us     300.000us         0.54%       1.691ms     563.667us             3        
                                        aten::transpose         1.61%     880.300us         1.68%     919.700us      16.135us       1.056ms         1.92%       1.472ms      25.825us            57        
autograd::engine::evaluate_function: torch::autograd...         0.49%     267.800us         2.12%       1.161ms      48.367us     425.000us         0.77%       1.408ms      58.667us            24        
                                           aten::linear         0.42%     232.500us         2.34%       1.285ms     107.042us     233.000us         0.42%       1.398ms     116.500us            12        
[pl][profile][Callback]TQDMProgressBar.on_train_batc...         2.27%       1.246ms         2.34%       1.284ms     427.967us       1.278ms         2.32%       1.333ms     444.333us             3        
                                             aten::item         1.50%     821.700us         1.53%     836.700us      16.406us     818.000us         1.48%       1.254ms      24.588us            51        
[pl][module]torch.nn.modules.container.Sequential: d...         0.48%     265.400us         2.17%       1.191ms     397.033us     201.000us         0.36%       1.201ms     400.333us             3        
                                           aten::detach         1.34%     737.000us         1.46%     802.700us      17.838us     737.000us         1.34%       1.142ms      25.378us            45        
                                      aten::result_type         0.03%      14.100us         0.03%      14.100us       0.117us     997.000us         1.81%     997.000us       8.308us           120        
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
Self CPU time total: 54.812ms
Self CUDA time total: 55.115ms
```


### Environment

<details>
  <summary>Current environment</summary>

* CUDA:
        - GPU:
                - NVIDIA GeForce GTX 1080
        - available:         True
        - version:           12.6
* Lightning:
        - lightning:         2.5.0.post0
        - lightning-utilities: 0.12.0
        - pytorch-lightning: 2.5.0.post0
        - torch:             2.6.0
        - torch-tb-profiler: 0.4.3
        - torchmetrics:      1.6.1
        - torchvision:       0.21.0
* Packages:
        - absl-py:           2.1.0
        - autocommand:       2.2.2
        - backports.tarfile: 1.2.0
        - brotli:            1.1.0
        - certifi:           2025.1.31
        - charset-normalizer: 3.4.1
        - colorama:          0.4.6
        - filelock:          3.17.0
        - fsspec:            2025.2.0
        - grpcio:            1.67.1
        - idna:              3.10
        - importlib-metadata: 8.6.1
        - inflect:           7.3.1
        - jaraco.collections: 5.1.0
        - jaraco.context:    5.3.0
        - jaraco.functools:  4.0.1
        - jaraco.text:       3.12.1
        - jinja2:            3.1.5
        - lightning:         2.5.0.post0
        - lightning-utilities: 0.12.0
        - markdown:          3.6
        - markupsafe:        3.0.2
        - more-itertools:    10.3.0
        - mpmath:            1.3.0
        - networkx:          3.4.2
        - numpy:             2.2.3
        - optree:            0.14.0
        - packaging:         24.2
        - pandas:            2.2.3
        - pillow:            11.1.0
        - pip:               25.0.1
        - platformdirs:      4.2.2
        - protobuf:          5.28.3
        - pybind11:          2.13.6
        - pybind11-global:   2.13.6
        - pysocks:           1.7.1
        - python-dateutil:   2.9.0.post0
        - pytorch-lightning: 2.5.0.post0
        - pytz:              2024.1
        - pyyaml:            6.0.2
        - requests:          2.32.3
        - setuptools:        75.8.0
        - six:               1.17.0
        - sympy:             1.13.3
        - tensorboard:       2.19.0
        - tensorboard-data-server: 0.7.0
        - tomli:             2.0.1
        - torch:             2.6.0
        - torch-tb-profiler: 0.4.3
        - torchmetrics:      1.6.1
        - torchvision:       0.21.0
        - tqdm:              4.67.1
        - typeguard:         4.3.0
        - typing-extensions: 4.12.2
        - tzdata:            2025.1
        - urllib3:           2.2.2
        - werkzeug:          3.1.3
        - wheel:             0.45.1
        - win-inet-pton:     1.1.0
        - zipp:              3.21.0
* System:
        - OS:                Windows
        - architecture:
                - 64bit
                - WindowsPE
        - processor:         AMD64 Family 25 Model 116 Stepping 1, AuthenticAMD
        - python:            3.12.9
        - release:           11
        - version:           10.0.22631

</details>

### More info

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorchProfiler does not profile GPU #20604

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyTorchProfiler does not profile GPU #20604

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions