[fix] Correct progress bar initialization in train_instruct_pix2pix.py #9791

GhostCai · 2024-10-29T07:38:23Z

This PR ensures the progress bar starts from 0 when resuming from a checkpoint, fixing an issue where it previously started at global_step and caused wrong progress tracking.

bghira · 2024-10-29T14:54:04Z

this isn't correct. it's supposed to resume from global_step

GhostCai · 2024-10-29T15:04:03Z

this isn't correct. it's supposed to resume from global_step

Hi, thank you for the feedback! I believe there’s still an issue—the progress bar is updated within the resume-skipping section of the code (L855). To resolve this, we could either remove the progress bar update in that section or adjust its range accordingly.

This issue was already addressed in train_instruct_pix2pix_sdxl.py by setting the full range, but it seems the fix wasn’t applied here somehow.

bghira · 2024-10-29T15:13:04Z

the other scripts iirc are simply jumping straight to the resume range position, and then continue inside the loop is antiquated and should be removed instead

bghira · 2024-10-29T15:14:49Z

it's debated whether stepping the dataloader was needed, or if that's needed at all any longer, since the random states are resumed by accelerate. i believe before accelerate was as robust/full-featured, stepping the loop N times was needed to continue training in a reproducible manner.

GhostCai · 2024-10-29T15:20:13Z

Exactly. Using continue in this way is safer but can lead to unnecessary data loading. In this code, both a shortened range and continue are used together, resulting in an incorrect range. For example, if I want to train for 20k steps and resume from an 8k checkpoint, the range is set to range(8k, 20k). However, after all the continue statements, the current iteration counter updates to 16k, leaving only 4k steps instead of the expected 12k for training. So I think we should adjust the range.

bghira · 2024-10-29T15:23:51Z

well, no - remove the continue statement, so that this script becomes consistent across the project's other script examples.

GhostCai · 2024-10-29T15:32:28Z

OK that can solve the problem too. I’ve removed the continue statement and set the initial value to global_step instead.

HuggingFaceDocBuilderDev · 2024-11-06T00:51:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2024-11-07T02:12:37Z

examples/instruct_pix2pix/train_instruct_pix2pix.py

-                if step % args.gradient_accumulation_steps == 0:
-                    progress_bar.update(1)


Why are we removing this?

Because we have set initial=global_step in the progress_bar so it will be consistent with train_instruct_pix2pix_sdxl.py's progress_bar setup? As is proposed by bghira in here. If we don't remove this statement, the initial should be set to zero.

sayakpaul · 2024-11-07T02:37:10Z

Can we try to harmonize the changes by following what we do here?

diffusers/examples/text_to_image/train_text_to_image_sdxl.py

Line 1058 in 5588725

progress_bar = tqdm(

I think it's better to have better consistency that way.

GhostCai · 2024-11-07T03:00:32Z

Can we try to harmonize the changes by following what we do here?

diffusers/examples/text_to_image/train_text_to_image_sdxl.py

Line 1058 in 5588725

progress_bar = tqdm(

I think it's better to have better consistency that way.

Yes, we can work towards harmonizing the changes. The main difference here lies in whether or not we skip the loop N times. This file does while the SDXL file doesn't. However, based on our previous discussion, skipping the loop N times in this implementation is needed for maintaining reproducibility in training. So, to ensure consistency, it may actually be the SDXL version that needs to be aligned with this approach. Or, if we believe accelerate is robust enough, we could remove the continue in this file to match the SDXL version. Thoughts?

sayakpaul · 2024-11-07T13:20:58Z

Thanks for explaining further!

Could you please provide a visual example comparing the changes you would like to see if it's not too much?

sayakpaul · 2024-11-28T11:02:05Z

Gentle ping @GhostCai

github-actions · 2024-12-22T15:03:47Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

GhostCai and others added 2 commits October 29, 2024 15:30

Fix progress bar range in train_instruct_pix2pix.py

a71d8ba

Merge branch 'main' into main

e653418

[fix] Correct progress bar initialization in train_instruct_pix2pix.py

d4aaec0

GhostCai added 3 commits October 30, 2024 20:43

Merge branch 'main' into main

febd3f2

Merge branch 'main' into main

1ad7f98

Merge branch 'main' into main

f40e7da

yiyixuxu requested a review from sayakpaul November 6, 2024 00:44

Merge branch 'main' into main

8db6039

Merge branch 'main' into main

3687ae6

sayakpaul reviewed Nov 7, 2024

View reviewed changes

github-actions bot added the stale Issues that haven't received updates label Dec 22, 2024

		if step % args.gradient_accumulation_steps == 0:
		progress_bar.update(1)

Uh oh!

[fix] Correct progress bar initialization in train_instruct_pix2pix.py #9791

Are you sure you want to change the base?

[fix] Correct progress bar initialization in train_instruct_pix2pix.py #9791

Uh oh!

Conversation

GhostCai commented Oct 29, 2024

Uh oh!

bghira commented Oct 29, 2024

Uh oh!

GhostCai commented Oct 29, 2024

Uh oh!

bghira commented Oct 29, 2024

Uh oh!

bghira commented Oct 29, 2024

Uh oh!

GhostCai commented Oct 29, 2024

Uh oh!

bghira commented Oct 29, 2024

Uh oh!

GhostCai commented Oct 29, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 6, 2024

Uh oh!

sayakpaul Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

GhostCai Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Nov 7, 2024

Uh oh!

GhostCai commented Nov 7, 2024

Uh oh!

sayakpaul commented Nov 7, 2024

Uh oh!

sayakpaul commented Nov 28, 2024

Uh oh!

github-actions bot commented Dec 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants