Skip to content

Commit 2857f33

Browse files
authored
Merge branch 'main' into deprecate-jax
2 parents e889530 + 5fcd5f5 commit 2857f33

File tree

158 files changed

+734
-899
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

158 files changed

+734
-899
lines changed

docs/source/en/_toctree.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99
- local: stable_diffusion
1010
title: Basic performance
1111

12-
- title: DiffusionPipeline
12+
- title: Pipelines
1313
isExpanded: false
1414
sections:
1515
- local: using-diffusers/loading
16-
title: Load pipelines
16+
title: DiffusionPipeline
1717
- local: tutorials/autopipeline
1818
title: AutoPipeline
1919
- local: using-diffusers/custom_pipeline_overview
@@ -77,7 +77,7 @@
7777
- local: optimization/memory
7878
title: Reduce memory usage
7979
- local: optimization/speed-memory-optims
80-
title: Compile and offloading quantized models
80+
title: Compiling and offloading quantized models
8181
- title: Community optimizations
8282
sections:
8383
- local: optimization/pruna

docs/source/en/api/pipelines/skyreels_v2.md

Lines changed: 127 additions & 148 deletions
Large diffs are not rendered by default.

docs/source/en/api/pipelines/wan.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
</div>
2121
</div>
2222

23-
# Wan2.1
23+
# Wan
2424

2525
[Wan-2.1](https://huggingface.co/papers/2503.20314) by the Wan Team.
2626

@@ -42,7 +42,7 @@ The following Wan models are supported in Diffusers:
4242
- [Wan 2.2 TI2V 5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
4343

4444
> [!TIP]
45-
> Click on the Wan2.1 models in the right sidebar for more examples of video generation.
45+
> Click on the Wan models in the right sidebar for more examples of video generation.
4646
4747
### Text-to-Video Generation
4848

docs/source/en/optimization/fp16.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ There is also a [compile_regions](https://github.com/huggingface/accelerate/blob
209209
# pip install -U accelerate
210210
import torch
211211
from diffusers import StableDiffusionXLPipeline
212-
from accelerate.utils import compile regions
212+
from accelerate.utils import compile_regions
213213

214214
pipeline = StableDiffusionXLPipeline.from_pretrained(
215215
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16

docs/source/en/optimization/speed-memory-optims.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Compile and offloading quantized models
13+
# Compiling and offloading quantized models
1414

1515
Optimizing models often involves trade-offs between [inference speed](./fp16) and [memory-usage](./memory). For instance, while [caching](./cache) can boost inference speed, it also increases memory consumption since it needs to store the outputs of intermediate attention layers. A more balanced optimization strategy combines quantizing a model, [torch.compile](./fp16#torchcompile) and various [offloading methods](./memory#offloading).
1616

@@ -28,7 +28,8 @@ The table below provides a comparison of optimization strategy combinations and
2828
| quantization | 32.602 | 14.9453 |
2929
| quantization, torch.compile | 25.847 | 14.9448 |
3030
| quantization, torch.compile, model CPU offloading | 32.312 | 12.2369 |
31-
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) if you're interested in evaluating your own model.</small>
31+
32+
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the <a href="https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d">benchmarking script</a> if you're interested in evaluating your own model.</small>
3233

3334
This guide will show you how to compile and offload a quantized model with [bitsandbytes](../quantization/bitsandbytes#torchcompile). Make sure you are using [PyTorch nightly](https://pytorch.org/get-started/locally/) and the latest version of bitsandbytes.
3435

docs/source/en/tutorials/using_peft_for_inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ pipeline = AutoPipelineForText2Image.from_pretrained(
9494
pipeline.unet.load_lora_adapter(
9595
"jbilcke-hf/sdxl-cinematic-1",
9696
weight_name="pytorch_lora_weights.safetensors",
97-
adapter_name="cinematic"
97+
adapter_name="cinematic",
9898
prefix="unet"
9999
)
100100
# use cnmt in the prompt to trigger the LoRA
@@ -688,4 +688,4 @@ Browse the [LoRA Studio](https://lorastudio.co/models) for different LoRAs to us
688688
689689
You can find additional LoRAs in the [FLUX LoRA the Explorer](https://huggingface.co/spaces/multimodalart/flux-lora-the-explorer) and [LoRA the Explorer](https://huggingface.co/spaces/multimodalart/LoraTheExplorer) Spaces.
690690

691-
Check out the [Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast) blog post to learn how to optimize LoRA inference with methods like FlashAttention-3 and fp8 quantization.
691+
Check out the [Fast LoRA inference for Flux with Diffusers and PEFT](https://huggingface.co/blog/lora-fast) blog post to learn how to optimize LoRA inference with methods like FlashAttention-3 and fp8 quantization.

0 commit comments

Comments
 (0)