-
Couldn't load subscription status.
- Fork 6.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error
Traceback (most recent call last):
File "/workspace/test.py", line 12, in <module>
pipe.to("cuda")
File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
Reproduction
from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel
transformer = FluxTransformer2DModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="transformer")
text_encoder_2 = T5EncoderModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="text_encoder_2")
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=0.0,
num_inference_steps=4,
max_sequence_length=256,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")
Logs
root@4e27fd69b49a:/workspace# python test.py
Fetching 2 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 3650.40it/s]
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 3533.53it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:04<00:00, 2.44s/it]
Loading pipeline components...: 57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 4/7 [00:00<00:00, 20.54it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 7/7 [00:00<00:00, 15.46it/s]
Traceback (most recent call last):
File "/workspace/test.py", line 10, in <module>
pipe.to("cuda")
File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.System Info
- π€ Diffusers version: 0.32.0.dev0
- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.4.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.47.0.dev0
- Accelerate version: 1.1.1
- PEFT version: not installed
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning
The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working