Skip to content

Commit 923da7b

Browse files
committed
Merge branch 'main' into Add-AnyText
2 parents 930c37a + 2a1d2f6 commit 923da7b

File tree

99 files changed

+15482
-440
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

99 files changed

+15482
-440
lines changed

benchmarks/push_results.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
import pandas as pd
55
from huggingface_hub import hf_hub_download, upload_file
6-
from huggingface_hub.utils._errors import EntryNotFoundError
6+
from huggingface_hub.utils import EntryNotFoundError
77

88

99
sys.path.append(".")

docker/diffusers-onnxruntime-cuda/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ ENV PATH="/opt/venv/bin:$PATH"
2828
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
2929
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3030
python3.10 -m uv pip install --no-cache-dir \
31-
torch \
31+
"torch<2.5.0" \
3232
torchvision \
3333
torchaudio \
3434
"onnxruntime-gpu>=1.13.1" \

docker/diffusers-pytorch-compile-cuda/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
2929
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
3030
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3131
python3.10 -m uv pip install --no-cache-dir \
32-
torch \
32+
"torch<2.5.0" \
3333
torchvision \
3434
torchaudio \
3535
invisible_watermark && \

docker/diffusers-pytorch-cpu/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
2929
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
3030
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3131
python3.10 -m uv pip install --no-cache-dir \
32-
torch \
32+
"torch<2.5.0" \
3333
torchvision \
3434
torchaudio \
3535
invisible_watermark \

docker/diffusers-pytorch-cuda/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
2929
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
3030
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3131
python3.10 -m uv pip install --no-cache-dir \
32-
torch \
32+
"torch<2.5.0" \
3333
torchvision \
3434
torchaudio \
3535
invisible_watermark && \

docker/diffusers-pytorch-xformers-cuda/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
2929
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
3030
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
3131
python3.10 -m pip install --no-cache-dir \
32-
torch \
32+
"torch<2.5.0" \
3333
torchvision \
3434
torchaudio \
3535
invisible_watermark && \

docs/source/en/_toctree.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@
7575
title: Outpainting
7676
title: Advanced inference
7777
- sections:
78+
- local: using-diffusers/cogvideox
79+
title: CogVideoX
7880
- local: using-diffusers/sdxl
7981
title: Stable Diffusion XL
8082
- local: using-diffusers/sdxl_turbo
@@ -129,6 +131,8 @@
129131
title: T2I-Adapters
130132
- local: training/instructpix2pix
131133
title: InstructPix2Pix
134+
- local: training/cogvideox
135+
title: CogVideoX
132136
title: Models
133137
- isExpanded: false
134138
sections:
@@ -242,6 +246,8 @@
242246
title: AuraFlowTransformer2DModel
243247
- local: api/models/cogvideox_transformer3d
244248
title: CogVideoXTransformer3DModel
249+
- local: api/models/cogview3plus_transformer2d
250+
title: CogView3PlusTransformer2DModel
245251
- local: api/models/dit_transformer2d
246252
title: DiTTransformer2DModel
247253
- local: api/models/flux_transformer
@@ -320,6 +326,8 @@
320326
title: BLIP-Diffusion
321327
- local: api/pipelines/cogvideox
322328
title: CogVideoX
329+
- local: api/pipelines/cogview3
330+
title: CogView3
323331
- local: api/pipelines/consistency_models
324332
title: Consistency Models
325333
- local: api/pipelines/controlnet
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# CogView3PlusTransformer2DModel
13+
14+
A Diffusion Transformer model for 2D data from [CogView3Plus](https://github.com/THUDM/CogView3) was introduced in [CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion](https://huggingface.co/papers/2403.05121) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import CogView3PlusTransformer2DModel
20+
21+
vae = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
22+
```
23+
24+
## CogView3PlusTransformer2DModel
25+
26+
[[autodoc]] CogView3PlusTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/cogvideox.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ There are two models available that can be used with the text-to-video and video
3636
There is one model available that can be used with the image-to-video CogVideoX pipeline:
3737
- [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V): The recommended dtype for running this model is `bf16`.
3838

39+
There are two models that support pose controllable generation (by the [Alibaba-PAI](https://huggingface.co/alibaba-pai) team):
40+
- [`alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose): The recommended dtype for running this model is `bf16`.
41+
- [`alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose): The recommended dtype for running this model is `bf16`.
42+
3943
## Inference
4044

4145
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
@@ -118,6 +122,12 @@ It is also worth noting that torchao quantization is fully compatible with [torc
118122
- all
119123
- __call__
120124

125+
## CogVideoXFunControlPipeline
126+
127+
[[autodoc]] CogVideoXFunControlPipeline
128+
- all
129+
- __call__
130+
121131
## CogVideoXPipelineOutput
122132

123133
[[autodoc]] pipelines.cogvideo.pipeline_output.CogVideoXPipelineOutput
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# CogView3Plus
17+
18+
[CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion](https://huggingface.co/papers/2403.05121) from Tsinghua University & ZhipuAI, by Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang.
19+
20+
The abstract from the paper is:
21+
22+
*Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL.*
23+
24+
<Tip>
25+
26+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
27+
28+
</Tip>
29+
30+
This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).
31+
32+
## CogView3PlusPipeline
33+
34+
[[autodoc]] CogView3PlusPipeline
35+
- all
36+
- __call__
37+
38+
## CogView3PipelineOutput
39+
40+
[[autodoc]] pipelines.cogview3.pipeline_output.CogView3PipelineOutput

0 commit comments

Comments
 (0)