Skip to content

Commit fd9ed52

Browse files
authored
Merge branch 'main' into auraflow-lora
2 parents 12dc911 + 5b27f8a commit fd9ed52

File tree

234 files changed

+6129
-5019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

234 files changed

+6129
-5019
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@ jobs:
417417
additional_deps: ["peft"]
418418
- backend: "gguf"
419419
test_location: "gguf"
420-
additional_deps: []
420+
additional_deps: ["peft"]
421421
- backend: "torchao"
422422
test_location: "torchao"
423423
additional_deps: []

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,8 @@
265265
sections:
266266
- local: api/models/overview
267267
title: Overview
268+
- local: api/models/auto_model
269+
title: AutoModel
268270
- sections:
269271
- local: api/models/controlnet
270272
title: ControlNetModel
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# AutoModel
14+
15+
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
16+
17+
```python
18+
from diffusers import AutoModel, AutoPipelineForText2Image
19+
20+
unet = AutoModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
21+
pipe = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet)
22+
```
23+
24+
25+
## AutoModel
26+
27+
[[autodoc]] AutoModel
28+
- all
29+
- from_pretrained

docs/source/en/api/pipelines/sana_sprint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License. -->
1414

15-
# SanaSprintPipeline
15+
# SANA-Sprint
1616

1717
<div class="flex flex-wrap space-x-1">
1818
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>

docs/source/en/optimization/memory.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,9 @@ pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch
178178
# We can utilize the enable_group_offload method for Diffusers model implementations
179179
pipe.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True)
180180

181+
# Uncomment the following to also allow recording the current streams.
182+
# pipe.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True, record_stream=True)
183+
181184
# For any other model implementations, the apply_group_offloading function can be used
182185
apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
183186
apply_group_offloading(pipe.vae, onload_device=onload_device, offload_type="leaf_level")
@@ -205,6 +208,7 @@ Group offloading (for CUDA devices with support for asynchronous data transfer s
205208
- The `use_stream` parameter can be used with CUDA devices to enable prefetching layers for onload. It defaults to `False`. Layer prefetching allows overlapping computation and data transfer of model weights, which drastically reduces the overall execution time compared to other offloading methods. However, it can increase the CPU RAM usage significantly. Ensure that available CPU RAM that is at least twice the size of the model when setting `use_stream=True`. You can find more information about CUDA streams [here](https://pytorch.org/docs/stable/generated/torch.cuda.Stream.html)
206209
- If specifying `use_stream=True` on VAEs with tiling enabled, make sure to do a dummy forward pass (possibly with dummy inputs) before the actual inference to avoid device-mismatch errors. This may not work on all implementations. Please open an issue if you encounter any problems.
207210
- The parameter `low_cpu_mem_usage` can be set to `True` to reduce CPU memory usage when using streams for group offloading. This is useful when the CPU memory is the bottleneck, but it may counteract the benefits of using streams and increase the overall execution time. The CPU memory savings come from creating pinned-tensors on-the-fly instead of pre-pinning them. This parameter is better suited for using `leaf_level` offloading.
211+
- When using `use_stream=True`, users can additionally specify `record_stream=True` to get better speedups at the expense of slightly increased memory usage. Refer to the [official PyTorch docs](https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html) to know more about this.
208212

209213
For more information about available parameters and an explanation of how group offloading works, refer to [`~hooks.group_offloading.apply_group_offloading`].
210214

docs/source/en/using-diffusers/loading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ import torch
105105

106106
pipe = HunyuanVideoPipeline.from_pretrained(
107107
"hunyuanvideo-community/HunyuanVideo",
108-
torch_dtype={'transformer': torch.bfloat16, 'default': torch.float16},
108+
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
109109
)
110110
print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
111111
```
Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
accelerate>=0.16.0
1+
accelerate>=0.31.0
22
torchvision
3-
transformers>=4.25.1
3+
transformers>=4.41.2
44
ftfy
55
tensorboard
66
Jinja2
7-
peft==0.7.0
7+
peft>=0.11.1
8+
sentencepiece

0 commit comments

Comments
 (0)