Progress tracking in progress bar #9305

awaelchli · 2021-09-03T10:35:47Z

awaelchli
Sep 3, 2021

A missing piece for fault-tolerant training is to properly update the progress bar when we resume the training.

In our ProgressBarBase class we track progress independently of the loop. There was an idea to replace that with progress tracking from the loops if I recall correctly. However note, if we do that then the progress bar will be locked to a particular loop structure (fitloop -> epoch loop) and their corresponding progress attributes. Is this acceptable? For a new loop structure, potentially a new progress bar callback would be needed. What are your thoughts on this?

Alternatives:

save a state dict with the on_save_checkpoint callback hook

tchaton · 2021-09-03T10:38:30Z

tchaton
Sep 3, 2021
Maintainer

Yes, I think `on_save_checkpoint would be the cleanest solution.

5 replies

carmocca Sep 8, 2021

(copying my offline answer)
I don't get how the alternative solves the issue.

What if we make it so a given loop structure generates as many tqdm bars as loops automatically, where each bar relies on the progress.current values.
With a mechanism to disable the bars for specific loop children

awaelchli Sep 8, 2021
Author

I don't get how the alternative solves the issue.

the progress bar maintains its own counter (like now) and will be updated solely based on hook calls. in order to save the state of all this, we implement the progress bar callback hook for on_save_checkpoint and on_load_checkpoint.

carmocca Sep 8, 2021

But how does it resolve the following?

For a new loop structure, potentially a new progress bar callback would be needed

awaelchli Sep 8, 2021
Author

that's what I explained. it would rely on the hooks not he loop structure, like it does today. the only difference is that the callback would maintain its own state loading and saving.

ananthsub Sep 8, 2021

My question is: for a different loop structure, what guarantees are there that the same hooks exist or that they'll be called in the same order? Would these reuse the same callback API we have today? For a custom project, it'd likely be easier to skip writing a callback in favor of adding the logic directly into the loop, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Progress tracking in progress bar #9305

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Progress tracking in progress bar #9305

Uh oh!

Uh oh!

awaelchli Sep 3, 2021

Replies: 1 comment · 5 replies

Uh oh!

tchaton Sep 3, 2021 Maintainer

Uh oh!

carmocca Sep 8, 2021

Uh oh!

awaelchli Sep 8, 2021 Author

Uh oh!

carmocca Sep 8, 2021

Uh oh!

Uh oh!

awaelchli Sep 8, 2021 Author

Uh oh!

ananthsub Sep 8, 2021

awaelchli
Sep 3, 2021

Replies: 1 comment 5 replies

tchaton
Sep 3, 2021
Maintainer

awaelchli Sep 8, 2021
Author

awaelchli Sep 8, 2021
Author