[bugfix] reduce float value error when adding noise #9004

gameofdimension · 2024-07-29T12:55:56Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

reduce float value error for bfloat16

bghira · 2024-08-07T15:31:19Z

cc @sayakpaul @linoytsaban @yiyixuxu

gameofdimension · 2024-08-08T07:38:30Z

it actually adds no noise at all for step 0/1 when using torch.bfloat16. run the following code to reproduce that.

a demo output

import torch
from diffusers import DDPMScheduler


def demo(step, latents, noise_scheduler):
    bsz = latents.shape[0]
    noise = torch.randn_like(latents)
    timesteps = torch.randint(
        step, step+1,
        (bsz,), device=latents.device)
    timesteps = timesteps.long()

    noisy_latents = noise_scheduler.add_noise(
        latents, noise, timesteps)

    delta = (noisy_latents-latents).abs().max()
    return delta.item()


def main():
    pretrained_model_name_or_path = '/apps/dat/file/llm/model/stable-diffusion-v1-5'
    # pretrained_model_name_or_path = 'runwayml/stable-diffusion-v1-5'
    noise_scheduler = DDPMScheduler.from_pretrained(
        pretrained_model_name_or_path, subfolder="scheduler")
    bsz = 2
    latents = torch.randn(
        (bsz, 4, 64, 64),
        dtype=torch.bfloat16,
        device='cuda')

    finfo = torch.finfo()
    step = 0
    delta = demo(step, latents, noise_scheduler)
    print(f"delta after add step {step} noise", delta)
    assert delta < finfo.eps

    step = 1
    delta = demo(step, latents, noise_scheduler)
    print(f"delta after add step {step} noise", delta)
    assert delta < finfo.eps

    step = 2
    delta = demo(step, latents, noise_scheduler)
    print(f"delta after add step {step} noise", delta)


if __name__ == '__main__':
    main()

bghira · 2024-08-08T14:12:19Z

well, step 0 is timestep 0. and there's an off-by-one error throughout Diffusers because of how the scheduler steps in order with the rest of the steps. so you get to the end of the schedule and there's one prediction missing. it has to add an extra zero sigma step to complete it. especially with DDIM.

Timesteps: tensor([0], device='mps:0')
delta after add step 0 noise 0.0
Timesteps: tensor([1000], device='mps:0')
delta after add step 1000 noise 5.6875

is that what you're observing? it's normal not to add noise when the sigma is zero.

bghira · 2024-08-08T14:14:45Z

ah i see. using float32 makes the first value 0.13. but the batch size changes the amount of noise too.

gameofdimension · 2024-08-08T14:24:15Z

well, step 0 is timestep 0. and there's an off-by-one error throughout Diffusers because of how the scheduler steps in order with the rest of the steps. so you get to the end of the schedule and there's one prediction missing. it has to add an extra zero sigma step to complete it. especially with DDIM.
Timesteps: tensor([0], device='mps:0')
delta after add step 0 noise 0.0
Timesteps: tensor([1000], device='mps:0')
delta after add step 1000 noise 5.6875
is that what you're observing? it's normal not to add noise when the sigma is zero.

what about step 1

gameofdimension · 2024-08-08T14:28:09Z

ah i see. using float32 makes the first value 0.13. but the batch size changes the amount of noise too.

i don't think so. batch size is irrelevant.

bghira · 2024-08-08T14:31:38Z

no, it literally changes the result

gameofdimension · 2024-08-08T14:36:53Z

my conclusion is, for step 0/1 you didn't add any noise when it's torch.bfloat16, regardless of any value of batch size

bghira · 2024-08-08T14:43:00Z

    def add_noise(
        self,
        original_samples: torch.Tensor,
        noise: torch.Tensor,
        timesteps: torch.IntTensor,
    ) -> torch.Tensor:
        if original_samples.dtype == torch.bfloat16:
            original_samples = original_samples.to(dtype=torch.float32)
        if noise.dtype == torch.bfloat16:
            noise = noise.to(original_samples.device, dtype=torch.float32)

update DDPM/DDIMScheduler to have this

bghira · 2024-08-08T14:43:47Z

we're already handling casting on mps using UniPC scheduler but DDPM or DDIM doesn't seem to do it. probably others with the issue

bghira · 2024-08-08T14:46:56Z

    noise_scheduler = EulerDiscreteScheduler.from_pretrained(
        pretrained_model_name_or_path, subfolder="scheduler", rescale_betas_zero_snr=True, timestep_spacing="trailing")
    bsz = 1
    print(f"Batch size: {1}")
    latents = torch.randn(
        (bsz, 4, 64, 64),
        dtype=torch.bfloat16,
        device='mps')

Batch size: 1
Timesteps: tensor([0], device='mps:0')
delta after add step 0 noise 0.115234375

Euler works on bf16 noise/latents

gameofdimension · 2024-08-08T15:39:20Z

we're already handling casting on mps using UniPC scheduler but DDPM or DDIM doesn't seem to do it. probably others with the issue

it seems that DDPM is usually used when do training.

bghira · 2024-08-08T15:41:37Z

huggingface uses euler for training

bghira · 2024-08-08T15:43:44Z

also this is an inference/training issue not necessarily just for training.. though it really only applies during img2img then and i'm not sure what the implications are. it's definitely worse for training.

github-actions · 2024-09-14T15:04:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bghira · 2024-09-14T18:04:31Z

again cc @sayakpaul @linoytsaban @yiyixuxu

sayakpaul · 2024-09-15T06:33:48Z

Sorry for the delay on my part, @gameofdimension! Did you notice the same behaviour for other scripts or is this specific to ControlNet?

gameofdimension · 2024-09-16T17:03:30Z

it seems like all calls of noise_scheduler.add_noise are susceptible to this issue, given bf16 training is used

github-actions · 2024-10-15T15:03:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu · 2024-10-16T00:54:48Z

let's fix this for training first
if we run into issues with inference then we can look into scheduler

gameofdimension · 2024-10-16T02:18:49Z

let's fix this for training first if we run into issues with inference then we can look into scheduler

what else should i do?

bghira · 2024-10-21T13:34:22Z

idk honestly its ready for merge. diffusers team are you all ok?

bghira · 2024-10-21T13:35:25Z

@sayakpaul @yiyixuxu this is an issue for all noise addition and makes training with the examples produce worse results

HuggingFaceDocBuilderDev · 2024-10-21T18:25:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Update train_controlnet.py reduce float value error for bfloat16 * Update train_controlnet_sdxl.py * style --------- Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: yiyixuxu <[email protected]>

gameofdimension added 2 commits July 29, 2024 20:53

Update train_controlnet.py

11547f0

reduce float value error for bfloat16

Update train_controlnet_sdxl.py

eff5bcd

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

Merge branch 'main' into main

d669244

github-actions bot removed the stale Issues that haven't received updates label Sep 15, 2024

yiyixuxu self-assigned this Sep 17, 2024

yiyixuxu added the scheduler label Sep 20, 2024

github-actions bot added the stale Issues that haven't received updates label Oct 15, 2024

a-r-r-o-w removed the stale Issues that haven't received updates label Oct 15, 2024

a-r-r-o-w requested a review from yiyixuxu October 15, 2024 15:06

Merge branch 'main' into main

b814df7

style

d761630

yiyixuxu approved these changes Oct 21, 2024

View reviewed changes

yiyixuxu merged commit 63a0c9e into huggingface:main Oct 21, 2024
8 checks passed

Uh oh!

[bugfix] reduce float value error when adding noise #9004

[bugfix] reduce float value error when adding noise #9004

Uh oh!

Conversation

gameofdimension commented Jul 29, 2024

What does this PR do?

Before submitting

Who can review?

Uh oh!

bghira commented Aug 7, 2024

Uh oh!

gameofdimension commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

gameofdimension commented Aug 8, 2024

Uh oh!

gameofdimension commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

gameofdimension commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

gameofdimension commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

bghira commented Aug 8, 2024

Uh oh!

github-actions bot commented Sep 14, 2024

Uh oh!

bghira commented Sep 14, 2024

Uh oh!

sayakpaul commented Sep 15, 2024

Uh oh!

gameofdimension commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 15, 2024

Uh oh!

yiyixuxu commented Oct 16, 2024

Uh oh!

gameofdimension commented Oct 16, 2024

Uh oh!

bghira commented Oct 21, 2024

Uh oh!

bghira commented Oct 21, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Oct 21, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gameofdimension commented Aug 8, 2024 •

edited

Loading

bghira commented Aug 8, 2024 •

edited

Loading

gameofdimension commented Sep 16, 2024 •

edited

Loading