-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x
Description
Bug description
I am using wandb logger on single TPU core, if I invoke self.log
in training_step
or validation_step
, a XLA graph recompile will be triggered.
The behavior is observed by setting PT_XLA_DEBUG=1
and notice many Compilation Cause: most likely user code trying to access tensor value before mark_step
in the log.
I have attached a sample code for those two functions, with out any self.log
the code runs fine.
What version are you seeing the problem on?
master
How to reproduce the bug
def training_step(self, batch: tuple[torch.Tensor, torch.Tensor]) -> torch.Tensor:
sample, target = batch
pred = self(sample)
loss = self.train_loss(pred, target)
self.log("Training Loss", loss)
return loss
def validation_step(self, batch: tuple[torch.Tensor, torch.Tensor]) -> torch.Tensor:
sample, target = batch
pred = self(sample)
loss = self.valid_loss(pred, target)
self.log(
"Validation Accuracy Top 1",
self.valid_acc_top_1(pred, target)
)
self.log(
"Validation Accuracy Top 5",
self.valid_acc_top_5(pred, target)
)
self.log("Validation Loss", loss)
return loss
Error messages and logs
INFO: GPU available: False, used: False
INFO:lightning.pytorch.utilities.rank_zero:GPU available: False, used: False
INFO: TPU available: True, using: 1 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: True, using: 1 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: No netrc file found, creating one.
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: Currently logged in as: catalpa to https://api.wandb.ai/. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.8
wandb: Run data is saved locally in ./wandb/run-20250322_222633-ttri785i
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run morning-feather-1
wandb: ⭐️ View project at https://wandb.ai/catalpa/Theia
wandb: 🚀 View run at https://wandb.ai/catalpa/Theia/runs/ttri785i
| Name | Type | Params | Mode
-------------------------------------------------------------------
0 | stem | Sequential | 38.7 K | train
1 | blocks | ModuleList | 12.1 M | train
2 | classifier | Sequential | 775 K | train
3 | train_loss | SoftTargetCrossEntropy | 0 | train
4 | valid_loss | CrossEntropyLoss | 0 | train
5 | valid_acc_top_1 | MulticlassAccuracy | 0 | train
6 | valid_acc_top_5 | MulticlassAccuracy | 0 | train
-------------------------------------------------------------------
12.9 M Trainable params
0 Non-trainable params
12.9 M Total params
51.709 Total estimated model params size (MB)
339 Modules in train mode
0 Modules in eval mode
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: b9a9a96de5d2e35627c50c1849afaa51
Compilation Analysis: Number of Graph Inputs: 1
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:105)
Compilation Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:265)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Compilation Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Compilation Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Compilation Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Compilation Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Compilation Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.000002 GB
Post Compilation Analysis: Graph output size: 0.000002 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 0.000000 GB
Post Compilation Analysis: Compiled program size: 0.000025 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: b9a9a96de5d2e35627c50c1849afaa51
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:105)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:265)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: adfea99482db8ca265e5f21e69b2412c
Compilation Analysis: Number of Graph Inputs: 1
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:110)
Compilation Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:265)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Compilation Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Compilation Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Compilation Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Compilation Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Compilation Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.000002 GB
Post Compilation Analysis: Graph output size: 0.000002 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 0.000000 GB
Post Compilation Analysis: Compiled program size: 0.000025 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: adfea99482db8ca265e5f21e69b2412c
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:110)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:265)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: b9a9a96de5d2e35627c50c1849afaa51
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:105)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:278)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: adfea99482db8ca265e5f21e69b2412c
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:110)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:278)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:208)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: fit (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:561)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: b9a9a96de5d2e35627c50c1849afaa51
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:105)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:202)
Execution Analysis: on_run_start (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:414)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:212)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: adfea99482db8ca265e5f21e69b2412c
Execution Analysis: Number of Graph Inputs: 1
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: has_len_all_ranks (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/utilities/data.py:110)
Execution Analysis: setup_data (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:202)
Execution Analysis: on_run_start (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:414)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/fit_loop.py:212)
Execution Analysis: _run_stage (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1056)
Execution Analysis: _run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1012)
Execution Analysis: _fit_impl (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:599)
Execution Analysis: _call_and_handle_interrupt (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:48)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Epoch 0: 0% 0/73 [00:00<?, ?it/s]
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: user mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: fbba2ba25aff1cda058689b761890f05
Compilation Analysis: Number of Graph Inputs: 323
Compilation Analysis: Number of Graph Outputs: 1138
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: mark_step (/usr/local/lib/python3.11/dist-packages/torch_xla/core/xla_model.py:1061)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/plugins/precision/xla.py:75)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:239)
Compilation Analysis: step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/optimizer.py:154)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/module.py:1302)
Compilation Analysis: _call_lightning_module_hook (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:176)
Compilation Analysis: _optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:270)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:192)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.072663 GB
Post Compilation Analysis: Graph output size: 0.222699 GB
Post Compilation Analysis: Aliased Input size: 0.048717 GB
Post Compilation Analysis: Intermediate tensor size: 4.298608 GB
Post Compilation Analysis: Compiled program size: 0.093338 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: user mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: fbba2ba25aff1cda058689b761890f05
Execution Analysis: Number of Graph Inputs: 323
Execution Analysis: Number of Graph Outputs: 1138
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: mark_step (/usr/local/lib/python3.11/dist-packages/torch_xla/core/xla_model.py:1061)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/plugins/precision/xla.py:75)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:239)
Execution Analysis: step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/optimizer.py:154)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/module.py:1302)
Execution Analysis: _call_lightning_module_hook (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:176)
Execution Analysis: _optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:270)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:192)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
pt-xla-profiler: TransferFromDeviceTime too frequent: 6 counts during 1 steps
Epoch 0: 1% 1/73 [01:37<1:56:46, 97.31s/it, v_num=785i]
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: user mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: e61678c26ffda4f9f726a4af30dae523
Compilation Analysis: Number of Graph Inputs: 830
Compilation Analysis: Number of Graph Outputs: 1135
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: mark_step (/usr/local/lib/python3.11/dist-packages/torch_xla/core/xla_model.py:1061)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/plugins/precision/xla.py:75)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:239)
Compilation Analysis: step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/optimizer.py:154)
Compilation Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/module.py:1302)
Compilation Analysis: _call_lightning_module_hook (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:176)
Compilation Analysis: _optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:270)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:192)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.169890 GB
Post Compilation Analysis: Graph output size: 0.222694 GB
Post Compilation Analysis: Aliased Input size: 0.145945 GB
Post Compilation Analysis: Intermediate tensor size: 4.341870 GB
Post Compilation Analysis: Compiled program size: 0.095390 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: user mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: e61678c26ffda4f9f726a4af30dae523
Execution Analysis: Number of Graph Inputs: 830
Execution Analysis: Number of Graph Outputs: 1135
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: mark_step (/usr/local/lib/python3.11/dist-packages/torch_xla/core/xla_model.py:1061)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/plugins/precision/xla.py:75)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:239)
Execution Analysis: step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/optimizer.py:154)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/module.py:1302)
Execution Analysis: _call_lightning_module_hook (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:176)
Execution Analysis: _optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:270)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:192)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
pt-xla-profiler: TransferFromDeviceTime too frequent: 6 counts during 2 steps
Epoch 0: 3% 2/73 [03:17<1:56:50, 98.75s/it, v_num=785i]
SKIPPING TO THE END OF THE EPOCH
Epoch 0: 99% 72/73 [04:04<00:03, 3.40s/it, v_num=785i]
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: user mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: e61678c26ffda4f9f726a4af30dae523
Execution Analysis: Number of Graph Inputs: 830
Execution Analysis: Number of Graph Outputs: 1135
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: mark_step (/usr/local/lib/python3.11/dist-packages/torch_xla/core/xla_model.py:1061)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/plugins/precision/xla.py:75)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:239)
Execution Analysis: step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/optimizer.py:154)
Execution Analysis: optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/core/module.py:1302)
Execution Analysis: _call_lightning_module_hook (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/call.py:176)
Execution Analysis: _optimizer_step (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:270)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/optimization/automatic.py:192)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Epoch 0: 100% 73/73 [04:05<00:00, 3.36s/it, v_num=785i]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: 0% 0/31 [00:00<?, ?it/s]
Validation DataLoader 0: 0% 0/31 [00:00<?, ?it/s]
Validation DataLoader 0: 3% 1/31 [00:01<00:39, 1.33s/it]
Validation DataLoader 0: 6% 2/31 [00:01<00:27, 1.05it/s]
Validation DataLoader 0: 10% 3/31 [00:02<00:21, 1.28it/s]
Validation DataLoader 0: 13% 4/31 [00:02<00:19, 1.41it/s]
Validation DataLoader 0: 16% 5/31 [00:03<00:16, 1.55it/s]
Validation DataLoader 0: 19% 6/31 [00:03<00:15, 1.66it/s]
Validation DataLoader 0: 23% 7/31 [00:04<00:13, 1.75it/s]
Validation DataLoader 0: 26% 8/31 [00:04<00:12, 1.84it/s]
Validation DataLoader 0: 29% 9/31 [00:04<00:11, 1.94it/s]
Validation DataLoader 0: 32% 10/31 [00:05<00:10, 2.00it/s]
Validation DataLoader 0: 35% 11/31 [00:05<00:09, 2.06it/s]
Validation DataLoader 0: 39% 12/31 [00:05<00:08, 2.13it/s]
Validation DataLoader 0: 42% 13/31 [00:05<00:08, 2.19it/s]
Validation DataLoader 0: 45% 14/31 [00:06<00:07, 2.22it/s]
Validation DataLoader 0: 48% 15/31 [00:06<00:07, 2.26it/s]
Validation DataLoader 0: 52% 16/31 [00:06<00:06, 2.31it/s]
Validation DataLoader 0: 55% 17/31 [00:07<00:05, 2.35it/s]
Validation DataLoader 0: 58% 18/31 [00:07<00:05, 2.37it/s]
Validation DataLoader 0: 61% 19/31 [00:07<00:04, 2.40it/s]
Validation DataLoader 0: 65% 20/31 [00:08<00:04, 2.44it/s]
Validation DataLoader 0: 68% 21/31 [00:08<00:04, 2.47it/s]
Validation DataLoader 0: 71% 22/31 [00:08<00:03, 2.50it/s]
Validation DataLoader 0: 74% 23/31 [00:09<00:03, 2.52it/s]
Validation DataLoader 0: 77% 24/31 [00:09<00:02, 2.54it/s]
Validation DataLoader 0: 81% 25/31 [00:09<00:02, 2.57it/s]
Validation DataLoader 0: 84% 26/31 [00:10<00:01, 2.59it/s]
Validation DataLoader 0: 87% 27/31 [00:10<00:01, 2.61it/s]
Validation DataLoader 0: 90% 28/31 [00:10<00:01, 2.64it/s]
Validation DataLoader 0: 94% 29/31 [00:10<00:00, 2.65it/s]
Validation DataLoader 0: 97% 30/31 [00:11<00:00, 2.65it/s]
Validation DataLoader 0: 100% 31/31 [00:11<00:00, 2.64it/s]
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: 58a97574047f2c30444efbb2217f260d
Compilation Analysis: Number of Graph Inputs: 360
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Compilation Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Compilation Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Compilation Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Compilation Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 2.256581 GB
Post Compilation Analysis: Graph output size: 0.000002 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 0.905377 GB
Post Compilation Analysis: Compiled program size: 0.043901 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: 58a97574047f2c30444efbb2217f260d
Execution Analysis: Number of Graph Inputs: 360
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Execution Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Execution Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Execution Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Execution Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: 5c3d7a91c419c821771bf8f85eeb7e0a
Compilation Analysis: Number of Graph Inputs: 360
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Compilation Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Compilation Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Compilation Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Compilation Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 2.256593 GB
Post Compilation Analysis: Graph output size: 0.000002 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 0.909032 GB
Post Compilation Analysis: Compiled program size: 0.044282 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: 5c3d7a91c419c821771bf8f85eeb7e0a
Execution Analysis: Number of Graph Inputs: 360
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Execution Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Execution Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Execution Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Execution Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: 56dedd005204d85a65e1f9272d88cc0b
Compilation Analysis: Number of Graph Inputs: 358
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Compilation Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Compilation Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Compilation Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Compilation Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Compilation Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Compilation Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 2.256577 GB
Post Compilation Analysis: Graph output size: 0.000002 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 0.904372 GB
Post Compilation Analysis: Compiled program size: 0.043720 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: 56dedd005204d85a65e1f9272d88cc0b
Execution Analysis: Number of Graph Inputs: 358
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: to_item (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:134)
Execution Analysis: <dictcomp> (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:72)
Execution Analysis: convert_tensors_to_scalars (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:136)
Execution Analysis: log_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:106)
Execution Analysis: log_eval_end_metrics (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:151)
Execution Analysis: on_run_end (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:306)
Execution Analysis: run (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/loops/evaluation_loop.py:152)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Epoch 1: 0% 0/73 [00:00<?, ?it/s, v_num=785i]INFO:
Detected KeyboardInterrupt, attempting graceful shutdown ...
INFO:lightning.pytorch.utilities.rank_zero:
Detected KeyboardInterrupt, attempting graceful shutdown ...
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: a0fddc367f604e909016927c79f693e5
Compilation Analysis: Number of Graph Inputs: 316
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Compilation Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Compilation Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Compilation Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Compilation Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.072671 GB
Post Compilation Analysis: Graph output size: 0.000018 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 4.321018 GB
Post Compilation Analysis: Compiled program size: 0.084074 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: a0fddc367f604e909016927c79f693e5
Execution Analysis: Number of Graph Inputs: 316
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Execution Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Execution Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Execution Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Execution Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: 49ba580755d1d589f2d6554e98e81d23
Compilation Analysis: Number of Graph Inputs: 316
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Compilation Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Compilation Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Compilation Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Compilation Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.072672 GB
Post Compilation Analysis: Graph output size: 0.000018 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 4.321048 GB
Post Compilation Analysis: Compiled program size: 0.084075 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: 49ba580755d1d589f2d6554e98e81d23
Execution Analysis: Number of Graph Inputs: 316
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Execution Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Execution Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Execution Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Execution Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: bd36ce82dec854af372a639cd429280e
Compilation Analysis: Number of Graph Inputs: 316
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Compilation Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Compilation Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Compilation Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Compilation Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
Post Compilation Analysis: ================================================================================
Post Compilation Analysis: Graph input size: 0.072928 GB
Post Compilation Analysis: Graph output size: 0.000276 GB
Post Compilation Analysis: Aliased Input size: 0.000000 GB
Post Compilation Analysis: Intermediate tensor size: 4.321064 GB
Post Compilation Analysis: Compiled program size: 0.084111 GB
Post Compilation Analysis: --------------------------------------------------------------------------------
Post Compilation Analysis: ================================================================================
Execution Analysis: ================================================================================
Execution Analysis: Execution Cause
Execution Analysis: most likely user code trying to access tensor value before mark_step
Execution Analysis: Graph Info:
Execution Analysis: Graph Hash: bd36ce82dec854af372a639cd429280e
Execution Analysis: Number of Graph Inputs: 316
Execution Analysis: Number of Graph Outputs: 1
Execution Analysis: Python Frame Triggered Execution:
Execution Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Execution Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Execution Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Execution Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Execution Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Execution Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Execution Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Execution Analysis: ..........
Execution Analysis: --------------------------------------------------------------------------------
Execution Analysis: ================================================================================
Compilation Analysis: ================================================================================
Compilation Analysis: Compilation Cause
Compilation Analysis: most likely user code trying to access tensor value before mark_step
Compilation Analysis: Graph Info:
Compilation Analysis: Graph Hash: 89e1a79ffb6333962ca03038f475745d
Compilation Analysis: Number of Graph Inputs: 316
Compilation Analysis: Number of Graph Outputs: 1
Compilation Analysis: Python Frame Triggered Execution:
Compilation Analysis: batch_to (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:104)
Compilation Analysis: apply_to_collection (/usr/local/lib/python3.11/dist-packages/lightning_utilities/core/apply_func.py:66)
Compilation Analysis: move_data_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/apply_func.py:110)
Compilation Analysis: _optimizer_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:41)
Compilation Analysis: _optimizers_to_device (/usr/local/lib/python3.11/dist-packages/lightning_fabric/utilities/optimizer.py:27)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/strategy.py:532)
Compilation Analysis: teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/strategies/single_xla.py:121)
Compilation Analysis: _teardown (/usr/local/lib/python3.11/dist-packages/pytorch_lightning/trainer/trainer.py:1035)
Compilation Analysis: ..........
Compilation Analysis: --------------------------------------------------------------------------------
Compilation Analysis: ================================================================================
MANY COMPILATIONS FOLLOW
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0): 2.5.1
#- PyTorch Version (e.g., 2.5): 2.6
#- Python version (e.g., 3.12): 3.11
#- OS (e.g., Linux): Linux
#- CUDA/cuDNN version: N/A
#- GPU models and configuration: Colab TPU v2-8
#- How you installed Lightning(`conda`, `pip`, source): pip
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x