Skip to content

Commit a138d71

Browse files
yiyixuxusayakpaul
andauthored
HunyuanImage21 (huggingface#12333)
* add hunyuanimage2.1 --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent bc40398 commit a138d71

40 files changed

+6653
-221
lines changed

docs/source/en/_toctree.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,8 @@
347347
title: HiDreamImageTransformer2DModel
348348
- local: api/models/hunyuan_transformer2d
349349
title: HunyuanDiT2DModel
350+
- local: api/models/hunyuanimage_transformer_2d
351+
title: HunyuanImageTransformer2DModel
350352
- local: api/models/hunyuan_video_transformer_3d
351353
title: HunyuanVideoTransformer3DModel
352354
- local: api/models/latte_transformer3d
@@ -411,6 +413,10 @@
411413
title: AutoencoderKLCogVideoX
412414
- local: api/models/autoencoderkl_cosmos
413415
title: AutoencoderKLCosmos
416+
- local: api/models/autoencoder_kl_hunyuanimage
417+
title: AutoencoderKLHunyuanImage
418+
- local: api/models/autoencoder_kl_hunyuanimage_refiner
419+
title: AutoencoderKLHunyuanImageRefiner
414420
- local: api/models/autoencoder_kl_hunyuan_video
415421
title: AutoencoderKLHunyuanVideo
416422
- local: api/models/autoencoderkl_ltx_video
@@ -620,6 +626,8 @@
620626
title: ConsisID
621627
- local: api/pipelines/framepack
622628
title: Framepack
629+
- local: api/pipelines/hunyuanimage21
630+
title: HunyuanImage2.1
623631
- local: api/pipelines/hunyuan_video
624632
title: HunyuanVideo
625633
- local: api/pipelines/i2vgenxl
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanImage
13+
14+
The 2D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1].
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanImage
20+
21+
vae = AutoencoderKLHunyuanImage.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## AutoencoderKLHunyuanImage
25+
26+
[[autodoc]] AutoencoderKLHunyuanImage
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanImageRefiner
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) for its refiner pipeline.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanImageRefiner
20+
21+
vae = AutoencoderKLHunyuanImageRefiner.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## AutoencoderKLHunyuanImageRefiner
25+
26+
[[autodoc]] AutoencoderKLHunyuanImageRefiner
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanImageTransformer2DModel
13+
14+
A Diffusion Transformer model for [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanImageTransformer2DModel
20+
21+
transformer = HunyuanImageTransformer2DModel.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanImageTransformer2DModel
25+
26+
[[autodoc]] HunyuanImageTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# HunyuanImage2.1
16+
17+
18+
HunyuanImage-2.1 is a 17B text-to-image model that is capable of generating 2K (2048 x 2048) resolution images
19+
20+
HunyuanImage-2.1 comes in the following variants:
21+
22+
| model type | model id |
23+
|:----------:|:--------:|
24+
| HunyuanImage-2.1 | [hunyuanvideo-community/HunyuanImage-2.1-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Diffusers) |
25+
| HunyuanImage-2.1-Distilled | [hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers) |
26+
| HunyuanImage-2.1-Refiner | [hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers) |
27+
28+
> [!TIP]
29+
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
30+
31+
## HunyuanImage-2.1
32+
33+
HunyuanImage-2.1 applies [Adaptive Projected Guidance (APG)](https://huggingface.co/papers/2410.02416) combined with Classifier-Free Guidance (CFG) in the denoising loop. `HunyuanImagePipeline` has a `guider` component (read more about [Guider](../modular_diffusers/guiders.md)) and does not take a `guidance_scale` parameter at runtime. To change guider-related parameters, e.g., `guidance_scale`, you can update the `guider` configuration instead.
34+
35+
```python
36+
import torch
37+
from diffusers import HunyuanImagePipeline
38+
39+
pipe = HunyuanImagePipeline.from_pretrained(
40+
"hunyuanvideo-community/HunyuanImage-2.1-Diffusers",
41+
torch_dtype=torch.bfloat16
42+
)
43+
pipe = pipe.to("cuda")
44+
```
45+
46+
You can inspect the `guider` object:
47+
48+
```py
49+
>>> pipe.guider
50+
AdaptiveProjectedMixGuidance {
51+
"_class_name": "AdaptiveProjectedMixGuidance",
52+
"_diffusers_version": "0.36.0.dev0",
53+
"adaptive_projected_guidance_momentum": -0.5,
54+
"adaptive_projected_guidance_rescale": 10.0,
55+
"adaptive_projected_guidance_scale": 10.0,
56+
"adaptive_projected_guidance_start_step": 5,
57+
"enabled": true,
58+
"eta": 0.0,
59+
"guidance_rescale": 0.0,
60+
"guidance_scale": 3.5,
61+
"start": 0.0,
62+
"stop": 1.0,
63+
"use_original_formulation": false
64+
}
65+
66+
State:
67+
step: None
68+
num_inference_steps: None
69+
timestep: None
70+
count_prepared: 0
71+
enabled: True
72+
num_conditions: 2
73+
momentum_buffer: None
74+
is_apg_enabled: False
75+
is_cfg_enabled: True
76+
```
77+
78+
To update the guider with a different configuration, use the `new()` method. For example, to generate an image with `guidance_scale=5.0` while keeping all other default guidance parameters:
79+
80+
```py
81+
import torch
82+
from diffusers import HunyuanImagePipeline
83+
84+
pipe = HunyuanImagePipeline.from_pretrained(
85+
"hunyuanvideo-community/HunyuanImage-2.1-Diffusers",
86+
torch_dtype=torch.bfloat16
87+
)
88+
pipe = pipe.to("cuda")
89+
90+
# Update the guider configuration
91+
pipe.guider = pipe.guider.new(guidance_scale=5.0)
92+
93+
prompt = (
94+
"A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
95+
"wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
96+
"focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
97+
)
98+
99+
image = pipe(
100+
prompt=prompt,
101+
num_inference_steps=50,
102+
height=2048,
103+
width=2048,
104+
).images[0]
105+
image.save("image.png")
106+
```
107+
108+
109+
## HunyuanImage-2.1-Distilled
110+
111+
use `distilled_guidance_scale` with the guidance-distilled checkpoint,
112+
113+
```py
114+
import torch
115+
from diffusers import HunyuanImagePipeline
116+
pipe = HunyuanImagePipeline.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers", torch_dtype=torch.bfloat16)
117+
pipe = pipe.to("cuda")
118+
119+
prompt = (
120+
"A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
121+
"wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
122+
"focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
123+
)
124+
125+
out = pipe(
126+
prompt,
127+
num_inference_steps=8,
128+
distilled_guidance_scale=3.25,
129+
height=2048,
130+
width=2048,
131+
generator=generator,
132+
).images[0]
133+
134+
```
135+
136+
137+
## HunyuanImagePipeline
138+
139+
[[autodoc]] HunyuanImagePipeline
140+
- all
141+
- __call__
142+
143+
## HunyuanImageRefinerPipeline
144+
145+
[[autodoc]] HunyuanImageRefinerPipeline
146+
- all
147+
- __call__
148+
149+
150+
## HunyuanImagePipelineOutput
151+
152+
[[autodoc]] pipelines.hunyuan_image.pipeline_output.HunyuanImagePipelineOutput

0 commit comments

Comments
 (0)