- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.5k
 
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
As explained in the documentation, I am trying to use this feature to save memory
https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md#cpu-offloading
I understand that enable_sequential_cpu_offload is currently not possible with bitsandbytes - int4 for which bug is logged bitsandbytes-foundation/bitsandbytes#1525
But as shown in example it should work for pipelines without quantization.
I can confirm it works for kandinsky3 / AutoPipelineForText2Image.
Reproduction
import torch
from diffusers import Lumina2Text2ImgPipeline
pipe = Lumina2Text2ImgPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", torch_dtype=torch.bfloat16
)
# pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
prompt = "Hitoshi Ashinano style. A young girl with vibrant green hair and large purple eyes peeks out from behind a white wooden door. She is wearing a white shirt and have a curious expression on her face. The background shows a blue sky with a few clouds, and there's a white fence visible. Green leaves hang down from the top left corner, and a small white circle can be seen in the sky. The scene captures a moment of innocent curiosity and wonder."
image = pipe(
    prompt, 
    negative_prompt="blurry, ugly, bad, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, cropped, out of frame, worst quality, low quality, jpeg artifacts, fused fingers, morbid, mutilated, extra fingers, mutated hands, bad anatomy, bad proportion, extra limbs", 
    guidance_scale=6,
    num_inference_steps=35, 
    generator=torch.manual_seed(10)
).images[0]
image.save("lumina2.png")Logs
(venv) C:\aiOWN\diffuser_webui>python lumina2_lora.py
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββ| 2/2 [00:07<00:00,  3.54s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββ| 3/3 [00:09<00:00,  3.17s/it]
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββ| 5/5 [00:17<00:00,  3.59s/it]
The 'batch_size' argument of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.
Traceback (most recent call last):
  File "C:\aiOWN\diffuser_webui\lumina2_lora.py", line 13, in <module>
    image = pipe(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 648, in __call__
    ) = self.encode_prompt(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 293, in encode_prompt
    prompt_embeds, prompt_attention_mask = self._get_gemma_prompt_embeds(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 221, in _get_gemma_prompt_embeds
    prompt_embeds = self.text_encoder(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\models\gemma2\modeling_gemma2.py", line 575, in forward
    past_key_values = HybridCache(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\cache_utils.py", line 1657, in __init__
    cache_shape = global_cache_shape if not self.is_sliding[i] else sliding_cache_shape
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\_meta_registrations.py", line 6471, in meta_local_scalar_dense
    raise RuntimeError("Tensor.item() cannot be called on meta tensors")
RuntimeError: Tensor.item() cannot be called on meta tensorsSystem Info
(venv) C:\aiOWN\diffuser_webui>diffusers-cli env
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
- π€ Diffusers version: 0.33.0.dev0
 - Platform: Windows-10-10.0.26100-SP0
 - Running on Google Colab?: No
 - Python version: 3.10.11
 - PyTorch version (GPU?): 2.5.1+cu124 (True)
 - Flax version (CPU?/GPU?/TPU?): not installed (NA)
 - Jax version: not installed
 - JaxLib version: not installed
 - Huggingface_hub version: 0.27.1
 - Transformers version: 4.48.1
 - Accelerate version: 1.4.0.dev0
 - PEFT version: 0.14.0
 - Bitsandbytes version: 0.45.3.dev0
 - Safetensors version: 0.5.2
 - xFormers version: not installed
 - Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
 - Using GPU in script?:
 - Using distributed or parallel set-up in script?:
 
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working