-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
If I use pipe.enable_model_cpu_offload(device=device), the model can perform inference correctly after warming up. However, if I comment out this line, the inference results are noisy.
Reproduction
from diffusers import (
FluxPipeline,
FluxTransformer2DModel
)
from transformers import T5EncoderModel, CLIPTextModel,CLIPTokenizer,T5TokenizerFast
from optimum.quanto import freeze, qfloat8, quantize
import torch
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKL
dtype = torch.bfloat16
bfl_repo = f"black-forest-labs/FLUX.1-dev"
device = "cuda"
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(bfl_repo, subfolder="scheduler", torch_dtype=dtype)
text_encoder = CLIPTextModel.from_pretrained(bfl_repo, subfolder="text_encoder", torch_dtype=dtype)
tokenizer = CLIPTokenizer.from_pretrained(bfl_repo, subfolder="tokenizer", torch_dtype=dtype, clean_up_tokenization_spaces=True)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer_2 = T5TokenizerFast.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype, clean_up_tokenization_spaces=True)
vae = AutoencoderKL.from_pretrained(bfl_repo, subfolder="vae", torch_dtype=dtype)
transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)
pipe = FluxPipeline(
scheduler=scheduler,
text_encoder=text_encoder,
tokenizer=tokenizer,
text_encoder_2=text_encoder_2,
tokenizer_2=tokenizer_2,
vae=vae,
transformer=transformer
).to(device, dtype=dtype) # edit
# pipe.enable_model_cpu_offload(device=device)
params = {
"prompt": "a cat",
"num_images_per_prompt": 1,
"num_inference_steps":1,
"width": 64,
"height": 64,
"guidance_scale": 7,
}
image = pipe(**params).images[0] # wamup
params = {
"prompt": "a cat",
"num_images_per_prompt": 1,
"num_inference_steps":25,
"width": 512,
"height": 512,
"guidance_scale": 7,
}
image = pipe(**params).images[0]
image.save("1.jpg")Logs
No response
System Info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1+cu121 with CUDA 1201 (you have 2.4.1+cu121)
Python 3.10.15 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
- π€ Diffusers version: 0.32.0.dev0
- Platform: Linux-6.8.0-49-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.13
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.46.2
- Accelerate version: 0.31.0
- PEFT version: 0.14.0
- Bitsandbytes version: not installed
- Safetensors version: 0.4.3
- xFormers version: 0.0.28.post3
- Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
NVIDIA GeForce RTX 3090, 24576 MiB - Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working