-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
Description
Bug description
Using PyTorchProfiler I don't get GPU profiling in Tensorboard view, the logs indicate that GPU is being used.
What version are you seeing the problem on?
v2.5
How to reproduce the bug
Conda env
name: lightning_tutorials
channels:
- conda-forge
dependencies:
- python=3.12
- lightning
- torchvision=0.21.0=cuda126_py312_h361dbbe_0
- tensorboard
- torch-tb-profiler
train.py
import lightning as L
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data
import torchvision as tv
from lightning.pytorch.loggers import TensorBoardLogger
from lightning.pytorch.profilers import PyTorchProfiler
# --------------------------------
# Step 1: Define a LightningModule
# --------------------------------
# A LightningModule (nn.Module subclass) defines a full *system*
# (ie: an LLM, diffusion model, autoencoder, or simple image classifier).
class LitAutoEncoder(L.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3)
)
self.decoder = nn.Sequential(
nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28)
)
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
def training_step(self, batch, batch_idx):
# training_step defines the train loop. It is independent of forward
x, _ = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# -------------------
# Step 2: Define data
# -------------------
dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor())
train, val = data.random_split(dataset, [55000, 5000])
# -------------------
# Step 3: Train
# -------------------
autoencoder = LitAutoEncoder()
logger = TensorBoardLogger(save_dir="tb_logs")
profiler = PyTorchProfiler()
trainer = L.Trainer(logger=logger, profiler=profiler, max_epochs=2)
trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val))
Error messages and logs
(lightning_tutorials) PS C:\Users\anguzo\Projects\work\Machine-Learning-Collection> & C:/Users/anguzo/.local/share/mamba/envs/lightning_tutorials/python.exe "c:/Users/anguzo/Projects/work/Machine-Learning-Collection/ML/Pytorch/pytorch_lightning/9.1 Prof/train.py"
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
C:\Users\anguzo\.local\share\mamba\envs\lightning_tutorials\Lib\site-packages\lightning\pytorch\trainer\configuration_validator.py:68: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params | Mode
-----------------------------------------------
0 | encoder | Sequential | 100 K | train
1 | decoder | Sequential | 101 K | train
-----------------------------------------------
202 K Trainable params
0 Non-trainable params
202 K Total params
0.810 Total estimated model params size (MB)
8 Modules in train mode
0 Modules in eval mode
C:\Users\anguzo\.local\share\mamba\envs\lightning_tutorials\Lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
Epoch 0: 0%| | 4/55000 [00:00<39:03, 23.47it/s, v_num=0][W226 15:37:54.000000000 collection.cpp:647] Warning: Optimizer.step#Adam.step (function operator ())
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55000/55000 [03:34<00:00, 256.89it/s, v_num=0]`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55000/55000 [03:34<00:00, 256.88it/s, v_num=0]
FIT Profiler Report
Profile stats for: records
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
ProfilerStep* 18.41% 10.089ms 60.02% 32.896ms 10.965ms 8.532ms 15.48% 32.903ms 10.968ms 3
[pl][profile]run_training_batch 0.22% 118.400us 27.99% 15.341ms 7.670ms 107.000us 0.19% 15.349ms 7.675ms 2
[pl][profile][LightningModule]LitAutoEncoder.optimiz... 0.13% 70.600us 27.77% 15.222ms 7.611ms 48.000us 0.09% 15.242ms 7.621ms 2
Optimizer.step#Adam.step 20.39% 11.178ms 27.64% 15.152ms 7.576ms 11.207ms 20.33% 15.194ms 7.597ms 2
[pl][profile][Strategy]SingleDeviceStrategy.backward... 19.32% 10.590ms 20.25% 11.099ms 3.700ms 10.435ms 18.93% 11.136ms 3.712ms 3
[pl][profile][Strategy]SingleDeviceStrategy.training... 3.26% 1.788ms 10.56% 5.790ms 1.930ms 1.524ms 2.77% 5.809ms 1.936ms 3
autograd::engine::evaluate_function: AddmmBackward0 1.48% 812.100us 5.97% 3.271ms 272.583us 473.000us 0.86% 3.406ms 283.833us 12
AddmmBackward0 1.47% 806.000us 4.16% 2.280ms 189.967us 578.000us 1.05% 2.577ms 214.750us 12
aten::t 1.95% 1.066ms 3.62% 1.986ms 34.840us 1.088ms 1.97% 2.560ms 44.912us 57
[pl][profile][_TrainingEpochLoop].train_dataloader_n... 0.21% 113.100us 3.95% 2.167ms 722.333us 94.000us 0.17% 2.214ms 738.000us 3
enumerate(DataLoader)#_SingleProcessDataLoaderIter._... 2.32% 1.273ms 3.75% 2.054ms 684.633us 924.000us 1.68% 2.120ms 706.667us 3
[pl][module]torch.nn.modules.container.Sequential: e... 0.58% 315.500us 3.02% 1.653ms 551.067us 300.000us 0.54% 1.691ms 563.667us 3
aten::transpose 1.61% 880.300us 1.68% 919.700us 16.135us 1.056ms 1.92% 1.472ms 25.825us 57
autograd::engine::evaluate_function: torch::autograd... 0.49% 267.800us 2.12% 1.161ms 48.367us 425.000us 0.77% 1.408ms 58.667us 24
aten::linear 0.42% 232.500us 2.34% 1.285ms 107.042us 233.000us 0.42% 1.398ms 116.500us 12
[pl][profile][Callback]TQDMProgressBar.on_train_batc... 2.27% 1.246ms 2.34% 1.284ms 427.967us 1.278ms 2.32% 1.333ms 444.333us 3
aten::item 1.50% 821.700us 1.53% 836.700us 16.406us 818.000us 1.48% 1.254ms 24.588us 51
[pl][module]torch.nn.modules.container.Sequential: d... 0.48% 265.400us 2.17% 1.191ms 397.033us 201.000us 0.36% 1.201ms 400.333us 3
aten::detach 1.34% 737.000us 1.46% 802.700us 17.838us 737.000us 1.34% 1.142ms 25.378us 45
aten::result_type 0.03% 14.100us 0.03% 14.100us 0.117us 997.000us 1.81% 997.000us 8.308us 120
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 54.812ms
Self CUDA time total: 55.115ms
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA GeForce GTX 1080
- available: True
- version: 12.6 - Lightning:
- lightning: 2.5.0.post0
- lightning-utilities: 0.12.0
- pytorch-lightning: 2.5.0.post0
- torch: 2.6.0
- torch-tb-profiler: 0.4.3
- torchmetrics: 1.6.1
- torchvision: 0.21.0 - Packages:
- absl-py: 2.1.0
- autocommand: 2.2.2
- backports.tarfile: 1.2.0
- brotli: 1.1.0
- certifi: 2025.1.31
- charset-normalizer: 3.4.1
- colorama: 0.4.6
- filelock: 3.17.0
- fsspec: 2025.2.0
- grpcio: 1.67.1
- idna: 3.10
- importlib-metadata: 8.6.1
- inflect: 7.3.1
- jaraco.collections: 5.1.0
- jaraco.context: 5.3.0
- jaraco.functools: 4.0.1
- jaraco.text: 3.12.1
- jinja2: 3.1.5
- lightning: 2.5.0.post0
- lightning-utilities: 0.12.0
- markdown: 3.6
- markupsafe: 3.0.2
- more-itertools: 10.3.0
- mpmath: 1.3.0
- networkx: 3.4.2
- numpy: 2.2.3
- optree: 0.14.0
- packaging: 24.2
- pandas: 2.2.3
- pillow: 11.1.0
- pip: 25.0.1
- platformdirs: 4.2.2
- protobuf: 5.28.3
- pybind11: 2.13.6
- pybind11-global: 2.13.6
- pysocks: 1.7.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.5.0.post0
- pytz: 2024.1
- pyyaml: 6.0.2
- requests: 2.32.3
- setuptools: 75.8.0
- six: 1.17.0
- sympy: 1.13.3
- tensorboard: 2.19.0
- tensorboard-data-server: 0.7.0
- tomli: 2.0.1
- torch: 2.6.0
- torch-tb-profiler: 0.4.3
- torchmetrics: 1.6.1
- torchvision: 0.21.0
- tqdm: 4.67.1
- typeguard: 4.3.0
- typing-extensions: 4.12.2
- tzdata: 2025.1
- urllib3: 2.2.2
- werkzeug: 3.1.3
- wheel: 0.45.1
- win-inet-pton: 1.1.0
- zipp: 3.21.0 - System:
- OS: Windows
- architecture:
- 64bit
- WindowsPE
- processor: AMD64 Family 25 Model 116 Stepping 1, AuthenticAMD
- python: 3.12.9
- release: 11
- version: 10.0.22631
More info
No response
Mr-Blue-Sky-Candy and oseymour