Skip to content

Commit d537a00

Browse files
authored
Merge branch 'main' into allow-non-list-component
2 parents dea2745 + cf1ca72 commit d537a00

File tree

3 files changed

+6
-5
lines changed

3 files changed

+6
-5
lines changed

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
- local: optimization/memory
7878
title: Reduce memory usage
7979
- local: optimization/speed-memory-optims
80-
title: Compile and offloading quantized models
80+
title: Compiling and offloading quantized models
8181
- title: Community optimizations
8282
sections:
8383
- local: optimization/pruna

docs/source/en/api/pipelines/wan.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
</div>
2121
</div>
2222

23-
# Wan2.1
23+
# Wan
2424

2525
[Wan-2.1](https://huggingface.co/papers/2503.20314) by the Wan Team.
2626

@@ -42,7 +42,7 @@ The following Wan models are supported in Diffusers:
4242
- [Wan 2.2 TI2V 5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers)
4343

4444
> [!TIP]
45-
> Click on the Wan2.1 models in the right sidebar for more examples of video generation.
45+
> Click on the Wan models in the right sidebar for more examples of video generation.
4646
4747
### Text-to-Video Generation
4848

docs/source/en/optimization/speed-memory-optims.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# Compile and offloading quantized models
13+
# Compiling and offloading quantized models
1414

1515
Optimizing models often involves trade-offs between [inference speed](./fp16) and [memory-usage](./memory). For instance, while [caching](./cache) can boost inference speed, it also increases memory consumption since it needs to store the outputs of intermediate attention layers. A more balanced optimization strategy combines quantizing a model, [torch.compile](./fp16#torchcompile) and various [offloading methods](./memory#offloading).
1616

@@ -28,7 +28,8 @@ The table below provides a comparison of optimization strategy combinations and
2828
| quantization | 32.602 | 14.9453 |
2929
| quantization, torch.compile | 25.847 | 14.9448 |
3030
| quantization, torch.compile, model CPU offloading | 32.312 | 12.2369 |
31-
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) if you're interested in evaluating your own model.</small>
31+
32+
<small>These results are benchmarked on Flux with a RTX 4090. The transformer and text_encoder components are quantized. Refer to the <a href="https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d">benchmarking script</a> if you're interested in evaluating your own model.</small>
3233

3334
This guide will show you how to compile and offload a quantized model with [bitsandbytes](../quantization/bitsandbytes#torchcompile). Make sure you are using [PyTorch nightly](https://pytorch.org/get-started/locally/) and the latest version of bitsandbytes.
3435

0 commit comments

Comments
 (0)