Skip to content

Using FP8 for inference without CPU offloading can introduce noise.Β #10302

@todochenxi

Description

@todochenxi

Describe the bug

If I use pipe.enable_model_cpu_offload(device=device), the model can perform inference correctly after warming up. However, if I comment out this line, the inference results are noisy.

Reproduction

from diffusers import (
    FluxPipeline, 
    FluxTransformer2DModel
)
from transformers import T5EncoderModel, CLIPTextModel,CLIPTokenizer,T5TokenizerFast
from optimum.quanto import freeze, qfloat8, quantize
import torch
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKL
dtype = torch.bfloat16
bfl_repo = f"black-forest-labs/FLUX.1-dev" 
device = "cuda"
scheduler       = FlowMatchEulerDiscreteScheduler.from_pretrained(bfl_repo, subfolder="scheduler", torch_dtype=dtype)
text_encoder    = CLIPTextModel.from_pretrained(bfl_repo, subfolder="text_encoder", torch_dtype=dtype)
tokenizer       = CLIPTokenizer.from_pretrained(bfl_repo, subfolder="tokenizer", torch_dtype=dtype, clean_up_tokenization_spaces=True)
text_encoder_2  = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
tokenizer_2     = T5TokenizerFast.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype, clean_up_tokenization_spaces=True)
vae             = AutoencoderKL.from_pretrained(bfl_repo, subfolder="vae", torch_dtype=dtype)

transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)

pipe = FluxPipeline(
            scheduler=scheduler,
            text_encoder=text_encoder,
            tokenizer=tokenizer,
            text_encoder_2=text_encoder_2,
            tokenizer_2=tokenizer_2,
            vae=vae,
            transformer=transformer
        ).to(device, dtype=dtype)  # edit

# pipe.enable_model_cpu_offload(device=device)            
params = {
                "prompt": "a cat",
                "num_images_per_prompt": 1,
                "num_inference_steps":1,
                "width": 64,
                "height": 64,
                "guidance_scale": 7,
            }
image = pipe(**params).images[0]    # wamup
params = {
                "prompt": "a cat",
                "num_images_per_prompt": 1,
                "num_inference_steps":25,
                "width": 512,
                "height": 512,
                "guidance_scale": 7,
            }
image = pipe(**params).images[0]    
image.save("1.jpg")

Logs

No response

System Info

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1+cu121 with CUDA 1201 (you have 2.4.1+cu121)
Python 3.10.15 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • πŸ€— Diffusers version: 0.32.0.dev0
  • Platform: Linux-6.8.0-49-generic-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.46.2
  • Accelerate version: 0.31.0
  • PEFT version: 0.14.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.3
  • xFormers version: 0.0.28.post3
  • Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
    NVIDIA GeForce RTX 3090, 24576 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions