Skip to content

Conversation

@preetam1407
Copy link

Fixes #12755.

This PR documents the expected shape of the latents argument in Flux2Pipeline.__call__.

For the default AutoencoderKLFlux2 VAE used by FLUX.2, the pipeline first applies 8× spatial compression in the VAE,
and then a 2×2 patch packing step in the pipeline. This results in:

  • an effective 16× downsampling in height and width, and
  • 4× more channels in the latent space.

The expected shape for user-provided latents is therefore:

(batch_size, 128, height // 16, width // 16)

where height and width are the requested output image size. Passing latents with a different shape leads to shape
mismatches inside the VAE and transformer.

Tests

  • Docs-only change; no functional behavior modified.
  • Verified that providing latents of shape (1, 128, H // 16, W // 16) runs end-to-end with the FLUX.2-dev checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flux 2: The shape of the latent argument is undocumented

1 participant