Skip to content

Commit 34d0821

Browse files
committed
Merge branch 'fixes-issue-11002' of https://github.com/ishan-modi/diffusers into fixes-issue-11002
2 parents 8336e44 + 1a81b20 commit 34d0821

File tree

693 files changed

+21923
-5612
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

693 files changed

+21923
-5612
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ jobs:
142142
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
143143
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
144144
CUBLAS_WORKSPACE_CONFIG: :16:8
145+
RUN_COMPILE: yes
145146
run: |
146147
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
147148
-s -v -k "not Flax and not Onnx" \
@@ -525,6 +526,60 @@ jobs:
525526
pip install slack_sdk tabulate
526527
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
527528
529+
run_nightly_pipeline_level_quantization_tests:
530+
name: Torch quantization nightly tests
531+
strategy:
532+
fail-fast: false
533+
max-parallel: 2
534+
runs-on:
535+
group: aws-g6e-xlarge-plus
536+
container:
537+
image: diffusers/diffusers-pytorch-cuda
538+
options: --shm-size "20gb" --ipc host --gpus 0
539+
steps:
540+
- name: Checkout diffusers
541+
uses: actions/checkout@v3
542+
with:
543+
fetch-depth: 2
544+
- name: NVIDIA-SMI
545+
run: nvidia-smi
546+
- name: Install dependencies
547+
run: |
548+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
549+
python -m uv pip install -e [quality,test]
550+
python -m uv pip install -U bitsandbytes optimum_quanto
551+
python -m uv pip install pytest-reportlog
552+
- name: Environment
553+
run: |
554+
python utils/print_env.py
555+
- name: Pipeline-level quantization tests on GPU
556+
env:
557+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
558+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
559+
CUBLAS_WORKSPACE_CONFIG: :16:8
560+
BIG_GPU_MEMORY: 40
561+
run: |
562+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
563+
--make-reports=tests_pipeline_level_quant_torch_cuda \
564+
--report-log=tests_pipeline_level_quant_torch_cuda.log \
565+
tests/quantization/test_pipeline_level_quantization.py
566+
- name: Failure short reports
567+
if: ${{ failure() }}
568+
run: |
569+
cat reports/tests_pipeline_level_quant_torch_cuda_stats.txt
570+
cat reports/tests_pipeline_level_quant_torch_cuda_failures_short.txt
571+
- name: Test suite reports artifacts
572+
if: ${{ always() }}
573+
uses: actions/upload-artifact@v4
574+
with:
575+
name: torch_cuda_pipeline_level_quant_reports
576+
path: reports
577+
- name: Generate Report and Notify Channel
578+
if: always()
579+
run: |
580+
pip install slack_sdk tabulate
581+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
582+
528583
# M1 runner currently not well supported
529584
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
530585
# run_nightly_tests_apple_m1:

.github/workflows/pr_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ on:
1111
- "tests/**.py"
1212
- ".github/**.yml"
1313
- "utils/**.py"
14+
- "setup.py"
1415
push:
1516
branches:
1617
- ci-*

docs/source/en/_toctree.yml

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,6 @@
1717
title: AutoPipeline
1818
- local: tutorials/basic_training
1919
title: Train a diffusion model
20-
- local: tutorials/using_peft_for_inference
21-
title: Load LoRAs for inference
22-
- local: tutorials/fast_diffusion
23-
title: Accelerate inference of text-to-image diffusion models
24-
- local: tutorials/inference_with_big_models
25-
title: Working with big models
2620
title: Tutorials
2721
- sections:
2822
- local: using-diffusers/loading
@@ -33,11 +27,24 @@
3327
title: Load schedulers and models
3428
- local: using-diffusers/other-formats
3529
title: Model files and layouts
36-
- local: using-diffusers/loading_adapters
37-
title: Load adapters
3830
- local: using-diffusers/push_to_hub
3931
title: Push files to the Hub
4032
title: Load pipelines and adapters
33+
- sections:
34+
- local: tutorials/using_peft_for_inference
35+
title: LoRA
36+
- local: using-diffusers/ip_adapter
37+
title: IP-Adapter
38+
- local: using-diffusers/controlnet
39+
title: ControlNet
40+
- local: using-diffusers/t2i_adapter
41+
title: T2I-Adapter
42+
- local: using-diffusers/dreambooth
43+
title: DreamBooth
44+
- local: using-diffusers/textual_inversion_inference
45+
title: Textual inversion
46+
title: Adapters
47+
isExpanded: false
4148
- sections:
4249
- local: using-diffusers/unconditional_image_generation
4350
title: Unconditional image generation
@@ -59,8 +66,6 @@
5966
title: Create a server
6067
- local: training/distributed_inference
6168
title: Distributed inference
62-
- local: using-diffusers/merge_loras
63-
title: Merge LoRAs
6469
- local: using-diffusers/scheduler_features
6570
title: Scheduler features
6671
- local: using-diffusers/callback
@@ -97,20 +102,12 @@
97102
title: SDXL Turbo
98103
- local: using-diffusers/kandinsky
99104
title: Kandinsky
100-
- local: using-diffusers/ip_adapter
101-
title: IP-Adapter
102105
- local: using-diffusers/omnigen
103106
title: OmniGen
104107
- local: using-diffusers/pag
105108
title: PAG
106-
- local: using-diffusers/controlnet
107-
title: ControlNet
108-
- local: using-diffusers/t2i_adapter
109-
title: T2I-Adapter
110109
- local: using-diffusers/inference_with_lcm
111110
title: Latent Consistency Model
112-
- local: using-diffusers/textual_inversion_inference
113-
title: Textual inversion
114111
- local: using-diffusers/shap-e
115112
title: Shap-E
116113
- local: using-diffusers/diffedit
@@ -180,7 +177,7 @@
180177
title: Quantization Methods
181178
- sections:
182179
- local: optimization/fp16
183-
title: Speed up inference
180+
title: Accelerate inference
184181
- local: optimization/memory
185182
title: Reduce memory usage
186183
- local: optimization/torch2.0
@@ -296,6 +293,8 @@
296293
title: CogView4Transformer2DModel
297294
- local: api/models/consisid_transformer3d
298295
title: ConsisIDTransformer3DModel
296+
- local: api/models/cosmos_transformer3d
297+
title: CosmosTransformer3DModel
299298
- local: api/models/dit_transformer2d
300299
title: DiTTransformer2DModel
301300
- local: api/models/easyanimate_transformer3d
@@ -364,6 +363,8 @@
364363
title: AutoencoderKLAllegro
365364
- local: api/models/autoencoderkl_cogvideox
366365
title: AutoencoderKLCogVideoX
366+
- local: api/models/autoencoderkl_cosmos
367+
title: AutoencoderKLCosmos
367368
- local: api/models/autoencoder_kl_hunyuan_video
368369
title: AutoencoderKLHunyuanVideo
369370
- local: api/models/autoencoderkl_ltx_video
@@ -434,6 +435,8 @@
434435
title: ControlNet-XS with Stable Diffusion XL
435436
- local: api/pipelines/controlnet_union
436437
title: ControlNetUnion
438+
- local: api/pipelines/cosmos
439+
title: Cosmos
437440
- local: api/pipelines/dance_diffusion
438441
title: Dance Diffusion
439442
- local: api/pipelines/ddim
@@ -452,6 +455,8 @@
452455
title: Flux
453456
- local: api/pipelines/control_flux_inpaint
454457
title: FluxControlInpaint
458+
- local: api/pipelines/framepack
459+
title: Framepack
455460
- local: api/pipelines/hidream
456461
title: HiDream-I1
457462
- local: api/pipelines/hunyuandit
@@ -568,6 +573,8 @@
568573
title: UniDiffuser
569574
- local: api/pipelines/value_guided_sampling
570575
title: Value-guided sampling
576+
- local: api/pipelines/visualcloze
577+
title: VisualCloze
571578
- local: api/pipelines/wan
572579
title: Wan
573580
- local: api/pipelines/wuerstchen

docs/source/en/api/models/asymmetricautoencoderkl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# AsymmetricAutoencoderKL
1414

15-
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
15+
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://huggingface.co/papers/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
1616

1717
The abstract from the paper is:
1818

docs/source/en/api/models/autoencoderkl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# AutoencoderKL
1414

15-
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
15+
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://huggingface.co/papers/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
1616

1717
The abstract from the paper is:
1818

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLCosmos
13+
14+
[Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer).
15+
16+
Supported models:
17+
- [nvidia/Cosmos-1.0-Tokenizer-CV8x8x8](https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-CV8x8x8)
18+
19+
The model can be loaded with the following code snippet.
20+
21+
```python
22+
from diffusers import AutoencoderKLCosmos
23+
24+
vae = AutoencoderKLCosmos.from_pretrained("nvidia/Cosmos-1.0-Tokenizer-CV8x8x8", subfolder="vae")
25+
```
26+
27+
## AutoencoderKLCosmos
28+
29+
[[autodoc]] AutoencoderKLCosmos
30+
- decode
31+
- encode
32+
- all
33+
34+
## AutoencoderKLOutput
35+
36+
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
37+
38+
## DecoderOutput
39+
40+
[[autodoc]] models.autoencoders.vae.DecoderOutput

docs/source/en/api/models/consisid_transformer3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# ConsisIDTransformer3DModel
1313

14-
A Diffusion Transformer model for 3D data from [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) was introduced in [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://arxiv.org/pdf/2411.17440) by Peking University & University of Rochester & etc.
14+
A Diffusion Transformer model for 3D data from [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) was introduced in [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://huggingface.co/papers/2411.17440) by Peking University & University of Rochester & etc.
1515

1616
The model can be loaded with the following code snippet.
1717

docs/source/en/api/models/controlnet_hunyuandit.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# HunyuanDiT2DControlNetModel
1414

15-
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
15+
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://huggingface.co/papers/2405.08748).
1616

1717
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
1818

docs/source/en/api/models/controlnet_sparsectrl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ specific language governing permissions and limitations under the License. -->
1111

1212
# SparseControlNetModel
1313

14-
SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://arxiv.org/abs/2307.04725).
14+
SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://huggingface.co/papers/2307.04725).
1515

1616
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
1717

18-
The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
18+
The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://huggingface.co/papers/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
1919

2020
The abstract from the paper is:
2121

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# CosmosTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import CosmosTransformer3DModel
20+
21+
transformer = CosmosTransformer3DModel.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## CosmosTransformer3DModel
25+
26+
[[autodoc]] CosmosTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

0 commit comments

Comments
 (0)