Skip to content

Cannot load SD3.5M with from_single_file, mismatched shape for pos_embed.pos_embed in empty state dict #10016

@CodeExplode

Description

@CodeExplode

Describe the bug

When trying to load the Stable Diffusion 3.5 Medium checkpoint from Stability AI with the following command:

pipe = diffusers.StableDiffusion3Pipeline.from_single_file( '...path/sd3.5_medium.safetensors', text_encoder=None, text_encoder_2=None, text_encoder_3=None )

The following error is produced: "Cannot load because pos_embed.pos_embed expected shape torch.Size([1, 36864, 1536]), but got torch.Size([1, 147456, 1536])"

Please note that I slightly altered model_loading_utils.py to print this error, as the original line was missing the .shape property.

f"Cannot load {model_name_or_path_str} because {param_name} expected shape {empty_state_dict[param_name]}, but got {param.shape}"

I added .shape to empty_state_dict[param_name] as I think that was the intention.

The checkpoint I am using was from the first day of release or so, so it's possible that it was changed, though it doesn't look like it has been in the huggingface repo.

Reproduction

pipe = diffusers.StableDiffusion3Pipeline.from_single_file( '...path/sd3.5_medium.safetensors', text_encoder=None, text_encoder_2=None, text_encoder_3=None )

Logs

Traceback (most recent call last):
  File "...path\sd_trainer.py", line 360, in <module>
    pipe = diffusers.StableDiffusion3Pipeline.from_single_file( os.path.join(config.models_dir, config.init_model), text_encoder=None, text_encoder_2=None, text_encoder_3=None )
  File "...path\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "...path\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 495, in from_single_file
    loaded_sub_model = load_single_file_sub_model(
  File "...path\venv\src\diffusers\src\diffusers\loaders\single_file.py", line 102, in load_single_file_sub_model
    loaded_sub_model = load_method(
  File "...path\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "...path\venv\src\diffusers\src\diffusers\loaders\single_file_model.py", line 299, in from_single_file
    unexpected_keys = load_model_dict_into_meta(model, diffusers_format_checkpoint, dtype=torch_dtype)
  File "...path\venv\src\diffusers\src\diffusers\models\model_loading_utils.py", line 223, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load  because pos_embed.pos_embed expected shape torch.Size([1, 36864, 1536]), but got torch.Size([1, 147456, 1536]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

System Info

Windows 10, using the following diffusers version, which is the latest as of this post.

-e git+https://github.com/huggingface/diffusers.git@074e123#egg=diffusers
transformers==4.46.3

Who can help?

@yiyixuxu @sayakpaul @DN6 @asomoza

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions