Skip to content

Conversation

@DN6
Copy link
Collaborator

@DN6 DN6 commented Nov 14, 2024

What does this PR do?

This PR: #9711 changed the latent preparation step so that the vae_scale_factor and default sample size are more clear.

However, we overlooked the fact that this operation (with scale_factor == 16)

        height = 2 * (int(height) // self.vae_scale_factor)
        width = 2 * (int(width) // self.vae_scale_factor)

Is not equivalent to this operation with scale_factor == 8

height = int(height) // self.vae_scale_factor

For resolutions that are divisible by 8 but not by 16 the current pipeline on main will error out, because the implementation does not account for the fact that the latent width and height must be divisible by 2 for the packing step to work.

The old implementation scaled the height and width by //16 and then multiplied by 2. This results in slightly resized images if the height or width aren't multiples of 16.

This PR:

  1. Adds steps to account for latent packing. Which will result in resized output images (like we had before)
  2. Adds a warning that the image will be resized if an incompatible image is provided
  3. Adds fast tests to check Flux with expected and unexpected image shapes to verify that the scaling is properly applied.

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DN6 DN6 requested a review from yiyixuxu November 14, 2024 11:23
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bghira
Copy link
Contributor

bghira commented Nov 16, 2024

this seems really important but considering how slowly things move around here, not so sure it will be merged in a timely manner.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks for the fix!
I made a comment

width = int(width) // self.vae_scale_factor
# VAE applies 8x compression on images but we must also account for packing which requires
# latent height and width to be divisible by 2.
height = int(height) // self.vae_scale_factor - ((int(height) // self.vae_scale_factor) % 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this same as this? (basically revert back to the original code but divisible by self.vae_scale_factor*2
a little bit easier to understand, no?

height = 2 * (int(height) // (self.vae_scale_factor*2))

@DN6
Copy link
Collaborator Author

DN6 commented Nov 20, 2024

Failing LoRA tests are unrelated. Merging.

@DN6 DN6 merged commit f6f7afa into main Nov 20, 2024
17 of 18 checks passed
@sayakpaul sayakpaul deleted the flux-latents-fix branch November 21, 2024 03:24
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* update

* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants