Logging metrics lazily without stalling GPU #16472

froody · 2023-06-19T06:36:05Z

froody
Jun 19, 2023

In my training loop I use tqdm to iterate over batches I update the tqdm status bar with the summary of metrics (loss, acc, etc). Looking at a trace of the 3rd epoch, I see a lot of time spent in MemcpyD2H. I realize this is just waiting for cuda streams to complete, but if I disable the status bar update (so tqdm just logs it/s) I notice a 15% performance improvement (66ms/batch -> 77ms/batch). I only sometimes care about live metrics, so I'd rather not take this penalty on every step. Is this a common problem? Is there a way to lazily copy results back to the host?

davisyoshida · 2023-06-19T17:03:36Z

davisyoshida
Jun 19, 2023
Collaborator

I'm also curious if there's a clever way to do something here. Usually what I do is just log a summary every N steps, so that its effect on runtime is negligible.

For my wandb logging I have a callback which aggregates metrics, but that's probably overkill for this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logging metrics lazily without stalling GPU #16472

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Logging metrics lazily without stalling GPU #16472

Uh oh!

froody Jun 19, 2023

Replies: 1 comment

Uh oh!

davisyoshida Jun 19, 2023 Collaborator

froody
Jun 19, 2023

davisyoshida
Jun 19, 2023
Collaborator