-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Flux FP8 model with optimum.quanto
pipe.enable_model_cpu_offload() - Works
pipe.enable_sequential_cpu_offload() - Doesn't work
Reproduction
import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize
bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2
# pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
guidance_scale=3.5,
output_type="pil",
num_inference_steps=20,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-fp8-dev.png")
Logs
(venv) C:\ai1\diffuser_t2i>python FLUX_FP8_optimum-quanto.py
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββ| 2/2 [00:01<00:00, 1.25it/s]
Loading pipeline components...: 60%|βββββββββββββββββββ | 3/5 [00:00<00:00, 4.05it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββ| 5/5 [00:01<00:00, 3.27it/s]
Traceback (most recent call last):
File "C:\ai1\diffuser_t2i\FLUX_FP8_optimum-quanto.py", line 22, in <module>
pipe.enable_sequential_cpu_offload()
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1180, in enable_sequential_cpu_offload
cpu_offload(model, device, offload_buffers=offload_buffers)
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\big_modeling.py", line 204, in cpu_offload
attach_align_device_hook(
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
attach_align_device_hook(
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
attach_align_device_hook(
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
attach_align_device_hook(
[Previous line repeated 4 more times]
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 503, in attach_align_device_hook
add_hook_to_module(module, hook, append=True)
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 161, in add_hook_to_module
module = hook.init_hook(module)
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 308, in init_hook
set_module_tensor_to_device(module, name, "meta")
File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\utils\modeling.py", line 368, in set_module_tensor_to_device
new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'System Info
Make sure to merge locally 365/head and https://github.com/huggingface/optimum-quanto/pull/366/files
Windows 11
(venv) C:\ai1\diffuser_t2i>python --version
Python 3.10.11
(venv) C:\ai1\diffuser_t2i>echo %CUDA_PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6
(venv) C:\ai1\diffuser_t2i>pip list
Package Version
------------------ ------------
accelerate 1.1.0.dev0
bitsandbytes 0.45.0
diffusers 0.33.0.dev0
gguf 0.13.0
numpy 2.2.1
optimum-quanto 0.2.6.dev0
torch 2.5.1+cu124
torchao 0.7.0
torchvision 0.20.1+cu124
transformers 4.47.1
Who can help?
No response
josephrocca
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working