Skip to content

Commit fce2f9e

Browse files
authored
Merge branch 'main' into main
2 parents e511864 + 4aaa0d2 commit fce2f9e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+5451
-25
lines changed

docs/source/en/_toctree.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,14 @@
7676
- local: advanced_inference/outpaint
7777
title: Outpainting
7878
title: Advanced inference
79+
- sections:
80+
- local: hybrid_inference/overview
81+
title: Overview
82+
- local: hybrid_inference/vae_decode
83+
title: VAE Decode
84+
- local: hybrid_inference/api_reference
85+
title: API Reference
86+
title: Hybrid Inference
7987
- sections:
8088
- local: using-diffusers/cogvideox
8189
title: CogVideoX
@@ -316,6 +324,8 @@
316324
title: Transformer2DModel
317325
- local: api/models/transformer_temporal
318326
title: TransformerTemporalModel
327+
- local: api/models/wan_transformer_3d
328+
title: WanTransformer3DModel
319329
title: Transformers
320330
- sections:
321331
- local: api/models/stable_cascade_unet
@@ -348,6 +358,8 @@
348358
title: AutoencoderKLMagvit
349359
- local: api/models/autoencoderkl_mochi
350360
title: AutoencoderKLMochi
361+
- local: api/models/autoencoder_kl_wan
362+
title: AutoencoderKLWan
351363
- local: api/models/asymmetricautoencoderkl
352364
title: AsymmetricAutoencoderKL
353365
- local: api/models/autoencoder_dc
@@ -540,6 +552,8 @@
540552
title: UniDiffuser
541553
- local: api/pipelines/value_guided_sampling
542554
title: Value-guided sampling
555+
- local: api/pipelines/wan
556+
title: Wan
543557
- local: api/pipelines/wuerstchen
544558
title: Wuerstchen
545559
title: Pipelines
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLWan
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLWan
20+
21+
vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", subfolder="vae", torch_dtype=torch.float32)
22+
```
23+
24+
## AutoencoderKLWan
25+
26+
[[autodoc]] AutoencoderKLWan
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# WanTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import WanTransformer3DModel
20+
21+
transformer = WanTransformer3DModel.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## WanTransformer3DModel
25+
26+
[[autodoc]] WanTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# Wan
16+
17+
[Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
18+
19+
<!-- TODO(aryan): update abstract once paper is out -->
20+
21+
<Tip>
22+
23+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
24+
25+
</Tip>
26+
27+
Recommendations for inference:
28+
- VAE in `torch.float32` for better decoding quality.
29+
- `num_frames` should be of the form `4 * k + 1`, for example `49` or `81`.
30+
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution videos, try higher values (between `7.0` and `12.0`). The default value is `3.0` for Wan.
31+
32+
### Using a custom scheduler
33+
34+
Wan can be used with many different schedulers, each with their own benefits regarding speed and generation quality. By default, Wan uses the `UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=3.0)` scheduler. You can use a different scheduler as follows:
35+
36+
```python
37+
from diffusers import FlowMatchEulerDiscreteScheduler, UniPCMultistepScheduler, WanPipeline
38+
39+
scheduler_a = FlowMatchEulerDiscreteScheduler(shift=5.0)
40+
scheduler_b = UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=4.0)
41+
42+
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", scheduler=<CUSTOM_SCHEDULER_HERE>)
43+
44+
# or,
45+
pipe.scheduler = <CUSTOM_SCHEDULER_HERE>
46+
```
47+
48+
## WanPipeline
49+
50+
[[autodoc]] WanPipeline
51+
- all
52+
- __call__
53+
54+
## WanImageToVideoPipeline
55+
56+
[[autodoc]] WanImageToVideoPipeline
57+
- all
58+
- __call__
59+
60+
## WanPipelineOutput
61+
62+
[[autodoc]] pipelines.wan.pipeline_output.WanPipelineOutput
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Hybrid Inference API Reference
2+
3+
## Remote Decode
4+
5+
[[autodoc]] utils.remote_utils.remote_decode
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Hybrid Inference
14+
15+
**Empowering local AI builders with Hybrid Inference**
16+
17+
18+
> [!TIP]
19+
> Hybrid Inference is an [experimental feature](https://huggingface.co/blog/remote_vae).
20+
> Feedback can be provided [here](https://github.com/huggingface/diffusers/issues/new?template=remote-vae-pilot-feedback.yml).
21+
22+
23+
24+
## Why use Hybrid Inference?
25+
26+
Hybrid Inference offers a fast and simple way to offload local generation requirements.
27+
28+
- 🚀 **Reduced Requirements:** Access powerful models without expensive hardware.
29+
- 💎 **Without Compromise:** Achieve the highest quality without sacrificing performance.
30+
- 💰 **Cost Effective:** It's free! 🤑
31+
- 🎯 **Diverse Use Cases:** Fully compatible with Diffusers 🧨 and the wider community.
32+
- 🔧 **Developer-Friendly:** Simple requests, fast responses.
33+
34+
---
35+
36+
## Available Models
37+
38+
* **VAE Decode 🖼️:** Quickly decode latent representations into high-quality images without compromising performance or workflow speed.
39+
* **VAE Encode 🔢 (coming soon):** Efficiently encode images into latent representations for generation and training.
40+
* **Text Encoders 📃 (coming soon):** Compute text embeddings for your prompts quickly and accurately, ensuring a smooth and high-quality workflow.
41+
42+
---
43+
44+
## Integrations
45+
46+
* **[SD.Next](https://github.com/vladmandic/sdnext):** All-in-one UI with direct supports Hybrid Inference.
47+
* **[ComfyUI-HFRemoteVae](https://github.com/kijai/ComfyUI-HFRemoteVae):** ComfyUI node for Hybrid Inference.
48+
49+
## Contents
50+
51+
The documentation is organized into two sections:
52+
53+
* **VAE Decode** Learn the basics of how to use VAE Decode with Hybrid Inference.
54+
* **API Reference** Dive into task-specific settings and parameters.

0 commit comments

Comments
 (0)