PyTorch Profiler Stats Only Showing for "Records" #18400
Unanswered
alexander-zhang
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
Just bumping this topic. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Lightning,
I am trying to use PyTorch Lightning Profiler (Lightning 1.9.0) to find bottlenecks for my script and am encountering strange outputs.
Here is an example Profiler output file (3 more tables not shown):
I notice that the tables are titled “FIT Profiler Report”, and that the stats are for “record rank: 0”.
I expected output similar to the Lightning Docs, where we’d see “Profile stats for: training_step”.
https://lightning.ai/docs/pytorch/LTS/tuning/profiler_intermediate.html?highlight=profiler
I am getting some negative time measurements too. There is another ticket that shows a similar issue to what I am encountering, but it has not been resolved.
#https://lightning.ai/forums/t/pytorch-profiler-only-reports-stats-for-records/1104
Context: We are experimenting with 3D parallelism with NeMo Megatron using Pytorch Lightning Profiler, and we want to see CPU Utilization.
Expected Behavior:
“Profile stats for: training_step“
A single table for each rank showing metrics for the training_step.
Actual Behavior:
“Profile stats for: records”
5 tables for each rank where tables are not differentiated in title. (We are using 5 node pipeline parallelism)
Could someone explain the output and why it looks so different than expected in the lightning docs: https://lightning.ai/docs/pytorch/LTS/tuning/profiler_intermediate.html?highlight=profiler
Just as a reminder, I am using Lightning 1.9.0
Thank you,
Alexander
Beta Was this translation helpful? Give feedback.
All reactions