Document Flux2Pipeline latents shape #12807

preetam1407 · 2025-12-08T13:59:36Z

This PR documents the expected shape of the latents argument in Flux2Pipeline.__call__.

For the default AutoencoderKLFlux2 VAE used by FLUX.2, the pipeline first applies 8× spatial compression in the VAE,
and then a 2×2 patch packing step in the pipeline. This results in:

an effective 16× downsampling in height and width, and
4× more channels in the latent space.

The expected shape for user-provided latents is therefore:

(batch_size, 128, height // 16, width // 16)

where height and width are the requested output image size. Passing latents with a different shape leads to shape
mismatches inside the VAE and transformer.

Tests

Docs-only change; no functional behavior modified.
Verified that providing latents of shape (1, 128, H // 16, W // 16) runs end-to-end with the FLUX.2-dev checkpoint.

Document Flux2Pipeline latents shape

f6e1fad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document Flux2Pipeline latents shape #12807

Document Flux2Pipeline latents shape #12807

Uh oh!

preetam1407 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Document Flux2Pipeline latents shape #12807

Are you sure you want to change the base?

Document Flux2Pipeline latents shape #12807

Uh oh!

Conversation

preetam1407 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant