Changing `on_step` in `self.log` causes `batch_to_device` to change

### Bug description

There seem to be a bottleneck to the training pipeline, so i tried profiling and found that setting `on_step=True` in self.log causes progress bar to take very long. This makes sense, as logging to progress bar requires calling .item() on GPU tensor, causing cudaStreamSynchronize. 
```
FIT Profiler Report
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                     	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                      	|  -              	|  32523          	|  164.08         	|  100 %          	|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                         	|  8.0065         	|  12             	|  96.078         	|  58.556         	|
|  [Callback]ExportCallback.on_fit_end                                                                                                                                        	|  62.609         	|  1              	|  62.609         	|  38.158         	|
|  [Callback]TQDMProgressBar.on_train_batch_end                                                                                                                               	|  0.24358        	|  144            	|  35.075         	|  21.377         	|
|  [Callback]ModelCheckpoint{'monitor': 'valid/macro_f1', 'mode': 'max', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_validation_end        	|  1.3881         	|  12             	|  16.657         	|  10.152         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                             	|  0.0040199      	|  1056           	|  4.245          	|  2.5872         	|
```

But when i change `on_step=False`, though `TQDMProgressBar.on_train_batch_end` takes much less time, `transfer_batch_to_device` takes almost 10 times more. This behavior is completely baffling, and i don't know what to suspect.

Is this expected and what's causing this behavior?
```
FIT Profiler Report
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                     	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                      	|  -              	|  32523          	|  163.98         	|  100 %          	|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                         	|  7.9402         	|  12             	|  95.282         	|  58.104         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                             	|  0.036941       	|  1056           	|  39.01          	|  23.789         	|
|  [LightningModule]SingleLabelClassification.transfer_batch_to_device                                                                                                        	|  0.036852       	|  1056           	|  38.916         	|  23.732         	|
|  [Callback]ModelCheckpoint{'monitor': 'valid/macro_f1', 'mode': 'max', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}.on_validation_end        	|  1.3867         	|  12             	|  16.64          	|  10.147         	|

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changing `on_step` in `self.log` causes `batch_to_device` to change #20995

Bug description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Changing on_step in self.log causes batch_to_device to change #20995

Description

Bug description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Changing `on_step` in `self.log` causes `batch_to_device` to change #20995