-
Couldn't load subscription status.
- Fork 6.5k
Description
Describe the bug
I've trying to run CogView4 using separate pipelines to encode text and generate the image in order to save memory (Unified Memory so I can't use offloading) with the aim of doing multiple prompts
e.g.
te_pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B",
transformer=None,
vae=None,
torch_dtype=torch.bfloat16).to("mps")
with torch.no_grad():
prompt_embeds, negative_prompt_embeds = te_pipe.encode_prompt(
prompt,
negative_prompt,
num_images_per_prompt=num_images_per_prompt,
)
del te_pipe
pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", text_encoder=None, tokenizer=None, torch_dtype=torch.bfloat16).to("mps")and I get a failure with the following error
ValueError: `prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but got: `prompt_embeds` torch.Size([1, 144, 4096]) != `negative_prompt_embeds` torch.Size([1, 48, 4096]).I'm using the encode function used by the pipe and can't see why the embeds would the any different to those used internally, and if so I'm not sure why the size check is needed and assuming either the check is a bug or the size of the embeddings the encode_prompt function generates is a bug.
If I try to skip the negative embeds the code tries to generate negative prompt embeddings which fails the new pipe doesn't have an encoder.
Reproduction
from diffusers import CogView4Pipeline
import torch
import gc
te_pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B",
transformer=None,
vae=None,
torch_dtype=torch.bfloat16).to("mps")
prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured, jpeg artefacts"
num_images_per_prompt=1
with torch.no_grad():
prompt_embeds, negative_prompt_embeds = te_pipe.encode_prompt(
prompt,
negative_prompt,
num_images_per_prompt=num_images_per_prompt,
)
def flush():
gc.collect()
torch.mps.empty_cache()
gc.collect()
torch.mps.empty_cache()
del te_pipe.text_encoder
del te_pipe
flush()
pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", text_encoder=None, tokenizer=None, torch_dtype=torch.bfloat16).to("mps")
# Open it for reduce GPU memory usage
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
image = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
guidance_scale=3.5,
num_images_per_prompt=num_images_per_prompt,
num_inference_steps=50,
width=1024,
height=1024,
).images[0]
image.save("cogview4.png")Logs
$ python cogview4_split.py
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:00<00:00, 12.11it/s]
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 5.17it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 29.83it/s]
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 18.05it/s]
Traceback (most recent call last):
File "/Volumes/SSD2TB/AI/Diffusers/cogview4_split.py", line 41, in <module>
image = pipe(
^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/pipelines/cogview4/pipeline_cogview4.py", line 515, in __call__
self.check_inputs(
File "/Volumes/SSD2TB/AI/Diffusers/lib/python3.11/site-packages/diffusers/pipelines/cogview4/pipeline_cogview4.py", line 366, in check_inputs
raise ValueError(
ValueError: `prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but got: `prompt_embeds` torch.Size([1, 144, 4096]) != `negative_prompt_embeds` torch.Size([1, 48, 4096]).System Info
- π€ Diffusers version: 0.33.0.dev0
- Platform: macOS-15.3.1-arm64-arm-64bit
- Running on Google Colab?: No
- Python version: 3.11.10
- PyTorch version (GPU?): 2.6.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.49.0
- Accelerate version: 0.34.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: Apple M3
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
No response