Skip to content

Commit d978f18

Browse files
authored
Merge branch 'main' into rejig-peft-state-dict-kohya
2 parents 1dff5a8 + aeac0a0 commit d978f18

File tree

118 files changed

+2499
-200
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+2499
-200
lines changed

.github/workflows/push_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ jobs:
8383
python utils/print_env.py
8484
- name: PyTorch CUDA checkpoint tests on Ubuntu
8585
env:
86-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
86+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
8787
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
8888
CUBLAS_WORKSPACE_CONFIG: :16:8
8989
run: |
@@ -137,7 +137,7 @@ jobs:
137137
138138
- name: Run PyTorch CUDA tests
139139
env:
140-
HF_TOKEN: ${{ secrets.HF_TOKEN }}
140+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
141141
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
142142
CUBLAS_WORKSPACE_CONFIG: :16:8
143143
run: |

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,8 @@
179179
title: TGATE
180180
- local: optimization/xdit
181181
title: xDiT
182+
- local: optimization/para_attn
183+
title: ParaAttention
182184
- sections:
183185
- local: using-diffusers/stable_diffusion_jax_how_to
184186
title: JAX/Flax

docs/source/en/api/pipelines/flux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ transformer_8bit = FluxTransformer2DModel.from_pretrained(
367367

368368
pipeline = FluxPipeline.from_pretrained(
369369
"black-forest-labs/FLUX.1-dev",
370-
text_encoder=text_encoder_8bit,
370+
text_encoder_2=text_encoder_8bit,
371371
transformer=transformer_8bit,
372372
torch_dtype=torch.float16,
373373
device_map="balanced",

docs/source/en/api/pipelines/hunyuan_video.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
[HunyuanVideo](https://www.arxiv.org/abs/2412.03603) by Tencent.
1818

19-
*Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at [this https URL](https://github.com/Tencent/HunyuanVideo).*
19+
*Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at [this https URL](https://github.com/tencent/HunyuanVideo).*
2020

2121
<Tip>
2222

@@ -45,14 +45,14 @@ from diffusers.utils import export_to_video
4545

4646
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
4747
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
48-
"tencent/HunyuanVideo",
48+
"hunyuanvideo-community/HunyuanVideo",
4949
subfolder="transformer",
5050
quantization_config=quant_config,
51-
torch_dtype=torch.float16,
51+
torch_dtype=torch.bfloat16,
5252
)
5353

5454
pipeline = HunyuanVideoPipeline.from_pretrained(
55-
"tencent/HunyuanVideo",
55+
"hunyuanvideo-community/HunyuanVideo",
5656
transformer=transformer_8bit,
5757
torch_dtype=torch.float16,
5858
device_map="balanced",

docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ from diffusers import StableDiffusion3Pipeline
7777
from transformers import SiglipVisionModel, SiglipImageProcessor
7878

7979
image_encoder_id = "google/siglip-so400m-patch14-384"
80-
ip_adapter_id = "InstantX/SD3.5-Large-IP-Adapter"
80+
ip_adapter_id = "guiyrt/InstantX-SD3.5-Large-IP-Adapter-diffusers"
8181

8282
feature_extractor = SiglipImageProcessor.from_pretrained(
8383
image_encoder_id,

0 commit comments

Comments
 (0)