-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
For prompts much shorter than the max_sequence_length (which is most prompts, because default 512), prompts are not followed because the attention calculation spends most of the attention to the padding tokens of the encoder hidden state.
Prompt: Portrait photo of an angry man
First picture: Flux pipeline with defaults
Second: No attention to padding; here, using the workaround of setting a short max_sequence_length. A generic implementation would use attention masking (and not always use the max_sequence_length but only pad up to the longest prompt in the batch)
Other model pipelines might be affected similarly.
This has been discussed here #10194 and other issues in more detail, but I decided to create a separate issue to bring to your attention that this also affects default use of the pipeline, not only finetuning.
Reproduction
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16
).to('cuda')
prompt = "Portrait photo of an angry man"
prompt_T5_token_length = 7
generator = torch.Generator(device='cuda').manual_seed(1)
images = pipe(
prompt=prompt,
guidance_scale=3.5,
height=1024,
width=1024,
num_inference_steps=20,
generator=generator,
).images
images[0].save(f"attention-to-padding.jpg")
generator = torch.Generator(device='cuda').manual_seed(1)
images = pipe(
prompt=prompt,
guidance_scale=3.5,
height=1024,
width=1024,
num_inference_steps=20,
generator=generator,
max_sequence_length = prompt_T5_token_length,
).images
images[0].save(f"no-attention-to-padding.jpg")
Logs
System Info
certifi==2025.4.26
charset-normalizer==3.4.2
diffusers==0.33.1
filelock==3.18.0
fsspec==2025.3.2
hf-xet==1.1.0
huggingface-hub==0.31.1
idna==3.10
importlib_metadata==8.7.0
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
packaging==25.0
pillow==11.2.1
protobuf==6.30.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
sentencepiece==0.2.0
sympy==1.14.0
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.51.3
triton==3.3.0
typing_extensions==4.13.2
urllib3==2.4.0
zipp==3.21.0

