-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x
Description
Bug description
There seem to be a bottleneck to the training pipeline, so i tried profiling and found that setting on_step=True
in self.log causes progress bar to take very long. This makes sense, as logging to progress bar requires calling .item() on GPU tensor, causing cudaStreamSynchronize.
FIT Profiler Report
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Action | Mean duration (s) | Num calls | Total time (s) | Percentage % |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Total | - | 32523 | 164.08 | 100 % |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| run_training_epoch | 8.0065 | 12 | 96.078 | 58.556 |
| [Callback]ExportCallback.on_fit_end | 62.609 | 1 | 62.609 | 38.158 |
| [Callback]TQDMProgressBar.on_train_batch_end | 0.24358 | 144 | 35.075 | 21.377 |
| [Callback]ModelCheckpoint{'monitor': 'valid/macro_f1', 'mode': 'max', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_validation_end | 1.3881 | 12 | 16.657 | 10.152 |
| [Strategy]SingleDeviceStrategy.batch_to_device | 0.0040199 | 1056 | 4.245 | 2.5872 |
But when i change on_step=False
, though TQDMProgressBar.on_train_batch_end
takes much less time, transfer_batch_to_device
takes almost 10 times more. This behavior is completely baffling, and i don't know what to suspect.
Is this expected and what's causing this behavior?
FIT Profiler Report
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Action | Mean duration (s) | Num calls | Total time (s) | Percentage % |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Total | - | 32523 | 163.98 | 100 % |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| run_training_epoch | 7.9402 | 12 | 95.282 | 58.104 |
| [Strategy]SingleDeviceStrategy.batch_to_device | 0.036941 | 1056 | 39.01 | 23.789 |
| [LightningModule]SingleLabelClassification.transfer_batch_to_device | 0.036852 | 1056 | 38.916 | 23.732 |
| [Callback]ModelCheckpoint{'monitor': 'valid/macro_f1', 'mode': 'max', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_validation_end | 1.3867 | 12 | 16.64 | 10.147 |
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x