Skip to content
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
415805b
initial model
yiyixuxu Sep 14, 2025
feb29c3
add vae
yiyixuxu Sep 16, 2025
bb8f753
style
yiyixuxu Sep 16, 2025
a9def70
add pipeline
yiyixuxu Sep 19, 2025
3287f4b
style
yiyixuxu Sep 19, 2025
7e0311d
add import
yiyixuxu Sep 19, 2025
9938cbb
add
yiyixuxu Sep 19, 2025
cceae4a
add refiner vae
yiyixuxu Sep 22, 2025
419c99d
remove more rearrange
yiyixuxu Sep 22, 2025
02864b5
remove einops
yiyixuxu Sep 22, 2025
790aeff
make style
yiyixuxu Sep 22, 2025
aef133d
add refiner pipeline, not tested yet
yiyixuxu Sep 22, 2025
9e8b94a
up
yiyixuxu Sep 22, 2025
58514f5
fix a bug in vae
yiyixuxu Sep 23, 2025
a9b8b8c
remove more eiops
yiyixuxu Sep 23, 2025
fd5c8b1
ffactor_spatial -> spatial_compression_ratio
yiyixuxu Sep 23, 2025
d30dc2a
work with distilled
yiyixuxu Sep 23, 2025
062c21c
fix imports
yiyixuxu Sep 24, 2025
fb6d99e
add conversion script
yiyixuxu Sep 24, 2025
75ed404
Merge branch 'main' into hunyuan21
yiyixuxu Sep 24, 2025
45da288
copies
yiyixuxu Sep 24, 2025
f9500a5
Merge branch 'hunyuan21' of github.com:huggingface/diffusers into hun…
yiyixuxu Sep 24, 2025
894f148
add guider support
yiyixuxu Oct 13, 2025
5d96356
add apg_mix
yiyixuxu Oct 14, 2025
55ac631
style
yiyixuxu Oct 14, 2025
cf93a8b
up up
yiyixuxu Oct 14, 2025
3499bbf
Update src/diffusers/models/autoencoders/autoencoder_kl_hunyuanimage_…
yiyixuxu Oct 14, 2025
46cda84
update transformer: name, maybe_allow_in_graph
yiyixuxu Oct 14, 2025
4e22f0f
style
yiyixuxu Oct 14, 2025
64cb88d
copies
yiyixuxu Oct 14, 2025
184d312
Merge branch 'hunyuan21' of github.com:huggingface/diffusers into hun…
yiyixuxu Oct 14, 2025
67a721c
remove rearrange
yiyixuxu Oct 14, 2025
69b0fc0
style
yiyixuxu Oct 14, 2025
689566d
Merge branch 'main' into hunyuan21
yiyixuxu Oct 14, 2025
0a2f56b
up
yiyixuxu Oct 15, 2025
1e045af
Merge branch 'hunyuan21' of github.com:huggingface/diffusers into hun…
yiyixuxu Oct 15, 2025
b9fd002
add distilled_guidance_scale to adp
yiyixuxu Oct 15, 2025
012b40d
style
yiyixuxu Oct 15, 2025
ec3290d
fix
yiyixuxu Oct 15, 2025
1792aab
update guider: remove distilled guidannce scale, simplify prepare_inputs
yiyixuxu Oct 23, 2025
3e12970
update pipeline, remove true_cfg_scale etc
yiyixuxu Oct 23, 2025
81d3247
update docstring example
yiyixuxu Oct 23, 2025
02ad165
add doc!
yiyixuxu Oct 23, 2025
b9de16b
style
yiyixuxu Oct 23, 2025
0ca5320
dispatch_attention_fn
yiyixuxu Oct 23, 2025
a36a8c2
Merge branch 'main' into hunyuan21
yiyixuxu Oct 23, 2025
9dc67a0
MomentumBuffer copied from
yiyixuxu Oct 23, 2025
0dbab1f
Update docs/source/en/api/pipelines/hunyuanimage21.md
yiyixuxu Oct 23, 2025
de92bb1
register hyimage
yiyixuxu Oct 23, 2025
8c140cb
remove a hadrcoded 1472
yiyixuxu Oct 23, 2025
f47a855
fix vae_tiling 5d -> 4d
yiyixuxu Oct 23, 2025
8f98b9b
add tests
yiyixuxu Oct 23, 2025
f12ca4f
style
yiyixuxu Oct 23, 2025
ddae49b
Merge branch 'hunyuan21' of github.com:huggingface/diffusers into hun…
yiyixuxu Oct 23, 2025
95fabd1
Apply suggestions from code review
yiyixuxu Oct 23, 2025
bfebde3
fix doc toctree and style
yiyixuxu Oct 23, 2025
f1e8296
update modular x guider
yiyixuxu Oct 24, 2025
6336c19
update tests
yiyixuxu Oct 24, 2025
bd6b3d3
fix
yiyixuxu Oct 24, 2025
e7a8a0c
fix
yiyixuxu Oct 24, 2025
8764ca3
up
yiyixuxu Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_hunyuanimage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLHunyuanImage

The 2D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1].

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLHunyuanImage

vae = AutoencoderKLHunyuanImage.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
```

## AutoencoderKLHunyuanImage

[[autodoc]] AutoencoderKLHunyuanImage
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
32 changes: 32 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_hunyuanimage_refiner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLHunyuanImageRefiner

The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) for its refiner pipeline.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLHunyuanImageRefiner

vae = AutoencoderKLHunyuanImageRefiner.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
```

## AutoencoderKLHunyuanImageRefiner

[[autodoc]] AutoencoderKLHunyuanImageRefiner
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
30 changes: 30 additions & 0 deletions docs/source/en/api/models/hunyuanimage_transformer_2d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# HunyuanImageTransformer2DModel

A Diffusion Transformer model for [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1).

The model can be loaded with the following code snippet.

```python
from diffusers import HunyuanImageTransformer2DModel

transformer = HunyuanImageTransformer2DModel.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## HunyuanImageTransformer2DModel

[[autodoc]] HunyuanImageTransformer2DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
152 changes: 152 additions & 0 deletions docs/source/en/api/pipelines/hunyuanimage21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

# HunyuanImage2.1


HunyuanImage-2.1 is a 17B text-to-image model that is capable of generating 2K (2048 x 2048) resolution images

HunyuanImage-2.1 comes in the following variants:

| model type | model id |
|:----------:|:--------:|
| HunyuanImage-2.1 | [hunyuanvideo-community/HunyuanImage-2.1-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Diffusers) |
| HunyuanImage-2.1-Distilled | [hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers) |
| HunyuanImage-2.1-Refiner | [hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers](https://huggingface.co/hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers) |

> [!TIP]
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
## HunyuanImage-2.1

HunyuanImage-2.1 applies [Adaptive Projected Guidance (APG)](https://huggingface.co/papers/2410.02416) combined with Classifier-Free Guidance (CFG) in the denoising loop. `HunyuanImagePipeline` has a `guider` component (read more about [Guider](../modular_diffusers/guiders.md)) and does not take a `guidance_scale` parameter at runtime. To change guider-related parameters, e.g., `guidance_scale`, you can update the `guider` configuration instead.

```python
import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
"hunyuanvideo-community/HunyuanImage-2.1-Diffusers",
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
```

You can inspect the `guider` object:

```py
>>> pipe.guider
AdaptiveProjectedMixGuidance {
"_class_name": "AdaptiveProjectedMixGuidance",
"_diffusers_version": "0.36.0.dev0",
"adaptive_projected_guidance_momentum": -0.5,
"adaptive_projected_guidance_rescale": 10.0,
"adaptive_projected_guidance_scale": 10.0,
"adaptive_projected_guidance_start_step": 5,
"enabled": true,
"eta": 0.0,
"guidance_rescale": 0.0,
"guidance_scale": 3.5,
"start": 0.0,
"stop": 1.0,
"use_original_formulation": false
}

State:
step: None
num_inference_steps: None
timestep: None
count_prepared: 0
enabled: True
num_conditions: 2
momentum_buffer: None
is_apg_enabled: False
is_cfg_enabled: True
```

To update the guider with a different configuration, use the `new()` method. For example, to generate an image with `guidance_scale=5.0` while keeping all other default guidance parameters:

```py
import torch
from diffusers import HunyuanImagePipeline

pipe = HunyuanImagePipeline.from_pretrained(
"hunyuanvideo-community/HunyuanImage-2.1-Diffusers",
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

# Update the guider configuration
pipe.guider = pipe.guider.new(guidance_scale=5.0)

prompt = (
"A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
"wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
"focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)

image = pipe(
prompt=prompt,
num_inference_steps=50,
height=2048,
width=2048,
).images[0]
image.save("image.png")
```


## HunyuanImage-2.1-Distilled

use `distilled_guidance_scale` with the guidance-distilled checkpoint,

```
import torch
from diffusers import HunyuanImagePipeline
pipe = HunyuanImagePipeline.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Distilled-Diffusers", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
prompt = (
"A cute, cartoon-style anthropomorphic penguin plush toy with fluffy fur, standing in a painting studio, "
"wearing a red knitted scarf and a red beret with the word 'Tencent' on it, holding a paintbrush with a "
"focused expression as it paints an oil painting of the Mona Lisa, rendered in a photorealistic photographic style."
)
out = pipe(
prompt,
num_inference_steps=8,
distilled_guidance_scale=3.25,
height=2048,
width=2048,
generator=generator,
).images[0]
```


## HunyuanImagePipeline

[[autodoc]] HunyuanImagePipeline
- all
- __call__

## HunyuanImageRefinerPipeline

[[autodoc]] HunyuanImageRefinerPipeline
- all
- __call__


## HunyuanImagePipelineOutput

[[autodoc]] pipelines.hunyuan_image.pipeline_output.HunyuanImagePipelineOutput
Loading
Loading