Skip to content

Commit 50abf50

Browse files
add docs
1 parent 8aa458e commit 50abf50

File tree

3 files changed

+143
-0
lines changed

3 files changed

+143
-0
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanVideo15
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5) by Tencent.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanVideo15
20+
21+
vae = AutoencoderKLHunyuanVideo15.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v", subfolder="vae", torch_dtype=torch.float32)
22+
23+
# make sure to enable tiling to avoid OOM
24+
vae.enable_tiling()
25+
```
26+
27+
## AutoencoderKLHunyuanVideo15
28+
29+
[[autodoc]] AutoencoderKLHunyuanVideo15
30+
- decode
31+
- encode
32+
- all
33+
34+
## DecoderOutput
35+
36+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanVideo15Transformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanVideo15Transformer3DModel
20+
21+
transformer = HunyuanVideo15Transformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v" subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanVideo15Transformer3DModel
25+
26+
[[autodoc]] HunyuanVideo15Transformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
<div style="float: right;">
16+
<div class="flex flex-wrap space-x-1">
17+
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</a>
20+
</div>
21+
</div>
22+
23+
# HunyuanVideo-1.5
24+
25+
HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models.
26+
27+
You can find all the original HunyuanVideo checkpoints under the [Tencent](https://huggingface.co/tencent) organization.
28+
29+
> [!TIP]
30+
> Click on the HunyuanVideo models in the right sidebar for more examples of video generation tasks.
31+
>
32+
> The examples below use a checkpoint from [hunyuanvideo-community](https://huggingface.co/hunyuanvideo-community) because the weights are stored in a layout compatible with Diffusers.
33+
34+
The example below demonstrates how to generate a video optimized for memory or inference speed.
35+
36+
<hfoptions id="usage">
37+
<hfoption id="memory">
38+
39+
Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.
40+
41+
42+
```py
43+
import torch
44+
from diffusers import AutoModel, HunyuanVideo15Pipeline
45+
from diffusers.utils import export_to_video
46+
47+
48+
pipeline = HunyuanVideo15Pipeline.from_pretrained(
49+
"HunyuanVideo-1.5-Diffusers-480p_t2v",
50+
torch_dtype=torch.bfloat16,
51+
)
52+
53+
# model-offloading and tiling
54+
pipeline.enable_model_cpu_offload()
55+
pipeline.vae.enable_tiling()
56+
57+
prompt = "A fluffy teddy bear sits on a bed of soft pillows surrounded by children's toys."
58+
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
59+
export_to_video(video, "output.mp4", fps=15)
60+
```
61+
62+
63+
## HunyuanVideo15Pipeline
64+
65+
[[autodoc]] HunyuanVideo15Pipeline
66+
- all
67+
- __call__
68+
69+
## HunyuanVideo15ImageToVideoPipeline
70+
71+
[[autodoc]] HunyuanVideo15ImageToVideoPipeline
72+
- all
73+
- __call__
74+
75+
## HunyuanVideo15PipelineOutput
76+
77+
[[autodoc]] pipelines.hunyuan_video1_5.pipeline_output.HunyuanVideo15PipelineOutput

0 commit comments

Comments
 (0)