Moving a pipeline that has a quantized component, to cuda, causes an error

### Describe the bug

After trying out the new quantization method added to the diffusers library, I encountered a bug. I could not move the pipeline to cuda as I got this error

```
Traceback (most recent call last):
  File "/workspace/test.py", line 12, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
```



### Reproduction

```import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel


transformer = FluxTransformer2DModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="transformer")
text_encoder_2 = T5EncoderModel.from_pretrained("cozy-creator/Flux.1-schnell-8bit", torch_dtype=torch.bfloat16, subfolder="text_encoder_2")
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)

pipe.to("cuda")

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")
```

### Logs

```shell
root@4e27fd69b49a:/workspace# python test.py
Fetching 2 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3650.40it/s]
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3533.53it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.44s/it]
Loading pipeline components...:  57%|████████████████████████████████████████████████████                                       | 4/7 [00:00<00:00, 20.54it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.46it/s]
Traceback (most recent call last):
  File "/workspace/test.py", line 10, in <module>
    pipe.to("cuda")
  File "/usr/local/lib/python3.11/dist-packages/diffusers/pipelines/pipeline_utils.py", line 414, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
```


### System Info

- 🤗 Diffusers version: 0.32.0.dev0
- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.4.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.47.0.dev0
- Accelerate version: 1.1.1
- PEFT version: not installed
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA A100-SXM4-80GB, 81920 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

### Who can help?

@sayakpaul 

It's also worth noting that it doesn't crash if it is just transformer that is passed in. It just gives this warning 
```The module 'FluxTransformer2DModel' has been loaded in `bitsandbytes` 8bit and moving it to cuda via `.to()` is not supported. Module is still on cuda:0.```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Moving a pipeline that has a quantized component, to cuda, causes an error #9953

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions