Skip to content
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d1e75ab
Add wanx pipeline, model and example
yitongh Feb 18, 2025
ca5c724
wanx_merged_v1
wan-x-ai Feb 25, 2025
5dd22a9
change WanX into Wan
wan-x-ai Feb 25, 2025
fea59a6
fix i2v fp32 oom error
yitongh Feb 25, 2025
768995f
support t2v load fp32 ckpt
yitongh Feb 26, 2025
128f1af
add example
yitongh Feb 26, 2025
8220482
final merge v1
wan-x-ai Feb 26, 2025
9cad60e
Update autoencoder_kl_wan.py
wan-x-ai Feb 26, 2025
2e12d1b
up
yiyixuxu Feb 27, 2025
9c19bda
update middle, test up_block
yiyixuxu Feb 28, 2025
2da1feb
up up
yiyixuxu Feb 28, 2025
b7a3900
one less nn.sequential
yiyixuxu Feb 28, 2025
5f2518a
up more
yiyixuxu Feb 28, 2025
89f1c6a
up
yiyixuxu Feb 28, 2025
9e8ef93
more
yiyixuxu Feb 28, 2025
2e1924a
[refactor] [wip] Wan transformer/pipeline (#10926)
a-r-r-o-w Feb 28, 2025
425a3b0
make style
a-r-r-o-w Feb 28, 2025
157a24d
update tests
a-r-r-o-w Feb 28, 2025
a9768d2
tests
a-r-r-o-w Feb 28, 2025
d9f615d
conversion script
a-r-r-o-w Feb 28, 2025
0122271
conversion script
a-r-r-o-w Feb 28, 2025
22ea488
update
a-r-r-o-w Feb 28, 2025
4c88dbb
docs
a-r-r-o-w Feb 28, 2025
2a4cfb1
Merge branch 'main' into yiyi-refactor-wan-vae
a-r-r-o-w Feb 28, 2025
72e6eb2
remove unused code
a-r-r-o-w Feb 28, 2025
094db80
fix _toctree.yml
a-r-r-o-w Feb 28, 2025
81c5dca
update dtype
yiyixuxu Feb 28, 2025
3eb31f9
fix test
yiyixuxu Feb 28, 2025
99787be
fix tests: scale
yiyixuxu Mar 1, 2025
34a5e4b
up
yiyixuxu Mar 1, 2025
cfb32bb
more
yiyixuxu Mar 1, 2025
6e27c43
Merge branch 'main' into yiyi-refactor-wan-vae
yiyixuxu Mar 1, 2025
38e3c48
Apply suggestions from code review
yiyixuxu Mar 2, 2025
590cf97
Apply suggestions from code review
yiyixuxu Mar 2, 2025
71fa990
style
yiyixuxu Mar 2, 2025
3629794
Update scripts/convert_wan_to_diffusers.py
a-r-r-o-w Mar 2, 2025
67940a9
update docs
a-r-r-o-w Mar 2, 2025
913f83c
fix
a-r-r-o-w Mar 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,8 @@
title: Transformer2DModel
- local: api/models/transformer_temporal
title: TransformerTemporalModel
- local: api/models/wan_transformer_3d
title: WanTransformer3DModel
title: Transformers
- sections:
- local: api/models/stable_cascade_unet
Expand Down Expand Up @@ -344,6 +346,8 @@
title: AutoencoderKLLTXVideo
- local: api/models/autoencoderkl_mochi
title: AutoencoderKLMochi
- local: api/models/autoencoder_kl_wan
title: AutoencoderKLWan
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
Expand Down Expand Up @@ -534,6 +538,8 @@
title: UniDiffuser
- local: api/pipelines/value_guided_sampling
title: Value-guided sampling
- local: api/pipelines/wan
title: Wan
- local: api/pipelines/wuerstchen
title: Wuerstchen
title: Pipelines
Expand Down
32 changes: 32 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_wan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLWan

The 3D variational autoencoder (VAE) model with KL loss used in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLWan

vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B", subfolder="vae", torch_dtype=torch.float32)
```

## AutoencoderKLWan

[[autodoc]] AutoencoderKLWan
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
30 changes: 30 additions & 0 deletions docs/source/en/api/models/wan_transformer_3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# WanTransformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.

The model can be loaded with the following code snippet.

```python
from diffusers import WanTransformer3DModel

transformer = WanTransformer3DModel.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## WanTransformer3DModel

[[autodoc]] WanTransformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
46 changes: 46 additions & 0 deletions docs/source/en/api/pipelines/wan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

# Wan

[Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.

<!-- TODO(aryan): update abstract once paper is out -->

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.

</Tip>

Recommendations for inference:
- VAE in `torch.float32` for better decoding quality.
- `num_frames` should be of the form `4 * k + 1`, for example `49` or `81`.
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `3.0` for Wan.

## WanPipeline

[[autodoc]] WanPipeline
- all
- __call__

## WanImageToVideoPipeline

[[autodoc]] WanImageToVideoPipeline
- all
- __call__

## WanPipelineOutput

[[autodoc]] pipelines.wan.pipeline_output.WanPipelineOutput
Loading
Loading