Skip to content

Commit 9b411e5

Browse files
authored
Merge branch 'main' into layerwise-upcasting
2 parents b366b22 + 0c1e63b commit 9b411e5

File tree

64 files changed

+3472
-229
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+3472
-229
lines changed

docs/source/en/api/pipelines/cogvideox.md

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@
1515

1616
# CogVideoX
1717

18-
<!-- TODO: update paper with ArXiv link when ready. -->
19-
20-
[CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) from Tsinghua University & ZhipuAI.
18+
[CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://arxiv.org/abs/2408.06072) from Tsinghua University & ZhipuAI, by Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang.
2119

2220
The abstract from the paper is:
2321

@@ -43,43 +41,42 @@ from diffusers import CogVideoXPipeline
4341
from diffusers.utils import export_to_video
4442

4543
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b").to("cuda")
46-
prompt = (
47-
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
48-
"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
49-
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
50-
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
51-
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
52-
"atmosphere of this unique musical performance."
53-
)
54-
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
55-
export_to_video(video, "output.mp4", fps=8)
5644
```
5745

58-
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
46+
Then change the memory layout of the pipelines `transformer` component to `torch.channels_last`:
5947

6048
```python
61-
pipeline.transformer.to(memory_format=torch.channels_last)
62-
pipeline.vae.to(memory_format=torch.channels_last)
49+
pipe.transformer.to(memory_format=torch.channels_last)
6350
```
6451

6552
Finally, compile the components and run inference:
6653

6754
```python
68-
pipeline.transformer = torch.compile(pipeline.transformer)
69-
pipeline.vae.decode = torch.compile(pipeline.vae.decode)
55+
pipe.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
7056

71-
# CogVideoX works very well with long and well-described prompts
57+
# CogVideoX works well with long and well-described prompts
7258
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
73-
video = pipeline(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
59+
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
7460
```
7561

76-
The [benchmark](TODO: link) results on an 80GB A100 machine are:
62+
The [benchmark](https://gist.github.com/a-r-r-o-w/5183d75e452a368fd17448fcc810bd3f) results on an 80GB A100 machine are:
7763

7864
```
79-
Without torch.compile(): Average inference time: TODO seconds.
80-
With torch.compile(): Average inference time: TODO seconds.
65+
Without torch.compile(): Average inference time: 96.89 seconds.
66+
With torch.compile(): Average inference time: 76.27 seconds.
8167
```
8268

69+
### Memory optimization
70+
71+
CogVideoX requires about 19 GB of GPU memory to decode 49 frames (6 seconds of video at 8 FPS) with output resolution 720x480 (W x H), which makes it not possible to run on consumer GPUs or free-tier T4 Colab. The following memory optimizations could be used to reduce the memory footprint. For replication, you can refer to [this](https://gist.github.com/a-r-r-o-w/3959a03f15be5c9bd1fe545b09dfcc93) script.
72+
73+
- `pipe.enable_model_cpu_offload()`:
74+
- Without enabling cpu offloading, memory usage is `33 GB`
75+
- With enabling cpu offloading, memory usage is `19 GB`
76+
- `pipe.vae.enable_tiling()`:
77+
- With enabling cpu offloading and tiling, memory usage is `11 GB`
78+
- `pipe.vae.enable_slicing()`
79+
8380
## CogVideoXPipeline
8481

8582
[[autodoc]] CogVideoXPipeline

docs/source/en/api/pipelines/controlnet_sd3.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!--Copyright 2023 The HuggingFace Team and The InstantX Team. All rights reserved.
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
22
33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
44
the License. You may obtain a copy of the License at
@@ -22,7 +22,16 @@ The abstract from the paper is:
2222

2323
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
2424

25-
This code is implemented by [The InstantX Team](https://huggingface.co/InstantX). You can find pre-trained checkpoints for SD3-ControlNet on [The InstantX Team](https://huggingface.co/InstantX) Hub profile.
25+
This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:
26+
27+
28+
| ControlNet type | Developer | Link |
29+
| -------- | ---------- | ---- |
30+
| Canny | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Canny) |
31+
| Pose | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Pose) |
32+
| Tile | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Tile) |
33+
| Inpainting | [The AlimamaCreative Team](https://huggingface.co/alimama-creative) | [link](https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting) |
34+
2635

2736
<Tip>
2837

@@ -35,5 +44,10 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
3544
- all
3645
- __call__
3746

47+
## StableDiffusion3ControlNetInpaintingPipeline
48+
[[autodoc]] pipelines.controlnet_sd3.pipeline_stable_diffusion_3_controlnet_inpainting.StableDiffusion3ControlNetInpaintingPipeline
49+
- all
50+
- __call__
51+
3852
## StableDiffusion3PipelineOutput
3953
[[autodoc]] pipelines.stable_diffusion_3.pipeline_output.StableDiffusion3PipelineOutput

docs/source/en/training/distributed_inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ accelerate launch run_distributed.py --num_processes=2
4848

4949
<Tip>
5050

51-
To learn more, take a look at the [Distributed Inference with 🤗 Accelerate](https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference#distributed-inference-with-accelerate) guide.
51+
Refer to this minimal example [script](https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba) for running inference across multiple GPUs. To learn more, take a look at the [Distributed Inference with 🤗 Accelerate](https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference#distributed-inference-with-accelerate) guide.
5252

5353
</Tip>
5454

@@ -108,4 +108,4 @@ torchrun run_distributed.py --nproc_per_node=2
108108
```
109109

110110
> [!TIP]
111-
> You can use `device_map` within a [`DiffusionPipeline`] to distribute its model-level components on multiple devices. Refer to the [Device placement](../tutorials/inference_with_big_models#device-placement) guide to learn more.
111+
> You can use `device_map` within a [`DiffusionPipeline`] to distribute its model-level components on multiple devices. Refer to the [Device placement](../tutorials/inference_with_big_models#device-placement) guide to learn more.

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@
7171

7272

7373
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
74-
check_min_version("0.30.0.dev0")
74+
check_min_version("0.31.0.dev0")
7575

7676
logger = get_logger(__name__)
7777

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@
7979
import wandb
8080

8181
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
82-
check_min_version("0.30.0.dev0")
82+
check_min_version("0.31.0.dev0")
8383

8484
logger = get_logger(__name__)
8585

examples/community/marigold_depth_estimation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343

4444

4545
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
46-
check_min_version("0.30.0.dev0")
46+
check_min_version("0.31.0.dev0")
4747

4848

4949
class MarigoldDepthOutput(BaseOutput):

examples/consistency_distillation/train_lcm_distill_lora_sd_wds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373
import wandb
7474

7575
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
76-
check_min_version("0.30.0.dev0")
76+
check_min_version("0.31.0.dev0")
7777

7878
logger = get_logger(__name__)
7979

examples/consistency_distillation/train_lcm_distill_lora_sdxl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
import wandb
6767

6868
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
69-
check_min_version("0.30.0.dev0")
69+
check_min_version("0.31.0.dev0")
7070

7171
logger = get_logger(__name__)
7272

examples/consistency_distillation/train_lcm_distill_lora_sdxl_wds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@
7979
import wandb
8080

8181
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
82-
check_min_version("0.30.0.dev0")
82+
check_min_version("0.31.0.dev0")
8383

8484
logger = get_logger(__name__)
8585

examples/consistency_distillation/train_lcm_distill_sd_wds.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
import wandb
7373

7474
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
75-
check_min_version("0.30.0.dev0")
75+
check_min_version("0.31.0.dev0")
7676

7777
logger = get_logger(__name__)
7878

0 commit comments

Comments
 (0)