Flux FP8 with optimum.quanto TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'

### Describe the bug

Flux FP8 model with optimum.quanto

pipe.enable_model_cpu_offload() - Works
pipe.enable_sequential_cpu_offload() - Doesn't work

### Reproduction

```
import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2

# pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

image.save("flux-fp8-dev.png")
```

### Logs

```shell
(venv) C:\ai1\diffuser_t2i>python FLUX_FP8_optimum-quanto.py
Downloading shards: 100%|███████████████████████████████████████████████████| 2/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:01<00:00,  1.25it/s]
Loading pipeline components...:  60%|██████████████████▌            | 3/5 [00:00<00:00,  4.05it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:01<00:00,  3.27it/s]
Traceback (most recent call last):
  File "C:\ai1\diffuser_t2i\FLUX_FP8_optimum-quanto.py", line 22, in <module>
    pipe.enable_sequential_cpu_offload()
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1180, in enable_sequential_cpu_offload
    cpu_offload(model, device, offload_buffers=offload_buffers)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\big_modeling.py", line 204, in cpu_offload
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  [Previous line repeated 4 more times]
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 503, in attach_align_device_hook
    add_hook_to_module(module, hook, append=True)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 161, in add_hook_to_module
    module = hook.init_hook(module)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 308, in init_hook
    set_module_tensor_to_device(module, name, "meta")
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\utils\modeling.py", line 368, in set_module_tensor_to_device
    new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
```


### System Info

Make sure to merge locally [365/head](https://github.com/huggingface/optimum-quanto.git@refs/pull/365/head) and https://github.com/huggingface/optimum-quanto/pull/366/files

Windows 11
```
(venv) C:\ai1\diffuser_t2i>python --version
Python 3.10.11

(venv) C:\ai1\diffuser_t2i>echo %CUDA_PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6
```

```
(venv) C:\ai1\diffuser_t2i>pip list
Package            Version
------------------ ------------
accelerate         1.1.0.dev0
bitsandbytes       0.45.0
diffusers          0.33.0.dev0
gguf               0.13.0
numpy              2.2.1
optimum-quanto     0.2.6.dev0
torch              2.5.1+cu124
torchao            0.7.0
torchvision        0.20.1+cu124
transformers       4.47.1

```

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flux FP8 with optimum.quanto TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype' #10526

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flux FP8 with optimum.quanto TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype' #10526

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Flux FP8 with optimum.quanto TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype' #10526