Skip to content

Conversation

@stevhliu
Copy link
Member

Refactors the Model files and layouts guide to from a more top-down approach, beginning with the formats (Diffusers/single-file) and then discussing the individual file types.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stevhliu stevhliu marked this pull request as ready for review September 2, 2025 23:41
@stevhliu stevhliu requested review from DN6 and sayakpaul September 2, 2025 23:41
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

pipeline = DiffusionPipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
torch_dtype=torch.float16,
device_map="cuda"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support device_map="cuda" in from_single_file yet. Cc: @DN6 should we add support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported now

device_map = _determine_device_map(model, device_map, None, torch_dtype, keep_in_fp32_modules, hf_quantizer)

Snippet looks good 👍🏽

pipeline = StableDiffusionPipeline.from_single_file(
"https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.ckpt"
ckpt_path = "https://huggingface.co/lightx2v/Qwen-Image-Lightning/blob/main/Qwen-Image-Lightning-8steps-V1.1-bf16.safetensors"
original_config = "https://raw.githubusercontent.com/Wan-Video/Wan2.2/refs/heads/main/wan/configs/wan_ti2v_5B.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Python file as a config? 💡

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, just realized Wan doesn't support from_single_file. Reverted to the original code example!

## File types

## Single-file layout usage
Models can be stored in several file types. Safetensors is the most common file type but you may encounter other file types on the Hub or diffusion community.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should safetensors be hyperlinked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hyperlinked in the Safetensors section below :)

@stevhliu stevhliu requested a review from sayakpaul September 4, 2025 21:07
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6 please give it a review as well.

pipeline = DiffusionPipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
torch_dtype=torch.float16,
device_map="cuda"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported now

device_map = _determine_device_map(model, device_map, None, torch_dtype, keep_in_fp32_modules, hf_quantizer)

Snippet looks good 👍🏽

- Easier to download and share a single file.

Use the [`~loaders.FromSingleFileMixin.from_single_file`] method to load a model with all the weights stored in a single safetensors file.
Use [`~loaders.FromSingleFileMixin.from_single_file`] to load a single file. Pass `"cuda"` to the `device_map` argument to pre-allocate GPU memory and reduce model loading time (refer to the [parallel loading](../using-diffusers/loading#parallel-loading) docs for more details).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single file does support pre-allocating GPU memory to reduce model loading time, but parallel loading only works for sharded checkpoints (one more spread over multiple files).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought with #12305, it is supported for from_single_file now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel loading only works for multiple files (sharded checkpoints) since you load them simultaneously. Since single file is just one file, you won't see any advantage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! Ok, removed mention of pre-allocating :)

@stevhliu stevhliu requested a review from DN6 September 26, 2025 21:45
Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏽

@stevhliu stevhliu merged commit c07fcf7 into huggingface:main Sep 29, 2025
1 check passed
@stevhliu stevhliu deleted the formats branch September 29, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants