Skip to content

Tensor.item() cannot be called on meta tensorsΒ #10869

@nitinmukesh

Description

@nitinmukesh

Describe the bug

As explained in the documentation, I am trying to use this feature to save memory
https://github.com/huggingface/diffusers/blob/main/docs/source/en/optimization/memory.md#cpu-offloading

I understand that enable_sequential_cpu_offload is currently not possible with bitsandbytes - int4 for which bug is logged bitsandbytes-foundation/bitsandbytes#1525

But as shown in example it should work for pipelines without quantization.

I can confirm it works for kandinsky3 / AutoPipelineForText2Image.

Reproduction

import torch
from diffusers import Lumina2Text2ImgPipeline

pipe = Lumina2Text2ImgPipeline.from_pretrained(
    "Alpha-VLLM/Lumina-Image-2.0", torch_dtype=torch.bfloat16
)
# pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

prompt = "Hitoshi Ashinano style. A young girl with vibrant green hair and large purple eyes peeks out from behind a white wooden door. She is wearing a white shirt and have a curious expression on her face. The background shows a blue sky with a few clouds, and there's a white fence visible. Green leaves hang down from the top left corner, and a small white circle can be seen in the sky. The scene captures a moment of innocent curiosity and wonder."

image = pipe(
    prompt, 
    negative_prompt="blurry, ugly, bad, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, cropped, out of frame, worst quality, low quality, jpeg artifacts, fused fingers, morbid, mutilated, extra fingers, mutated hands, bad anatomy, bad proportion, extra limbs", 
    guidance_scale=6,
    num_inference_steps=35, 
    generator=torch.manual_seed(10)
).images[0]
image.save("lumina2.png")

Logs

(venv) C:\aiOWN\diffuser_webui>python lumina2_lora.py
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:07<00:00,  3.54s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:09<00:00,  3.17s/it]
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:17<00:00,  3.59s/it]
The 'batch_size' argument of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
The 'batch_size' attribute of HybridCache is deprecated and will be removed in v4.49. Use the more precisely named 'self.max_batch_size' attribute instead.
Traceback (most recent call last):
  File "C:\aiOWN\diffuser_webui\lumina2_lora.py", line 13, in <module>
    image = pipe(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 648, in __call__
    ) = self.encode_prompt(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 293, in encode_prompt
    prompt_embeds, prompt_attention_mask = self._get_gemma_prompt_embeds(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\lumina2\pipeline_lumina2.py", line 221, in _get_gemma_prompt_embeds
    prompt_embeds = self.text_encoder(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\accelerate\hooks.py", line 176, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\models\gemma2\modeling_gemma2.py", line 575, in forward
    past_key_values = HybridCache(
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\cache_utils.py", line 1657, in __init__
    cache_shape = global_cache_shape if not self.is_sliding[i] else sliding_cache_shape
  File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\_meta_registrations.py", line 6471, in meta_local_scalar_dense
    raise RuntimeError("Tensor.item() cannot be called on meta tensors")
RuntimeError: Tensor.item() cannot be called on meta tensors

System Info

(venv) C:\aiOWN\diffuser_webui>diffusers-cli env

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • πŸ€— Diffusers version: 0.33.0.dev0
  • Platform: Windows-10-10.0.26100-SP0
  • Running on Google Colab?: No
  • Python version: 3.10.11
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.27.1
  • Transformers version: 4.48.1
  • Accelerate version: 1.4.0.dev0
  • PEFT version: 0.14.0
  • Bitsandbytes version: 0.45.3.dev0
  • Safetensors version: 0.5.2
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions