Skip to content

Commit 2db5af1

Browse files
authored
Merge branch 'main' into compile-ci
2 parents 23e2794 + 53bd367 commit 2db5af1

36 files changed

+4896
-24
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,8 @@
295295
title: CogView4Transformer2DModel
296296
- local: api/models/consisid_transformer3d
297297
title: ConsisIDTransformer3DModel
298+
- local: api/models/cosmos_transformer3d
299+
title: CosmosTransformer3DModel
298300
- local: api/models/dit_transformer2d
299301
title: DiTTransformer2DModel
300302
- local: api/models/easyanimate_transformer3d
@@ -363,6 +365,8 @@
363365
title: AutoencoderKLAllegro
364366
- local: api/models/autoencoderkl_cogvideox
365367
title: AutoencoderKLCogVideoX
368+
- local: api/models/autoencoderkl_cosmos
369+
title: AutoencoderKLCosmos
366370
- local: api/models/autoencoder_kl_hunyuan_video
367371
title: AutoencoderKLHunyuanVideo
368372
- local: api/models/autoencoderkl_ltx_video
@@ -433,6 +437,8 @@
433437
title: ControlNet-XS with Stable Diffusion XL
434438
- local: api/pipelines/controlnet_union
435439
title: ControlNetUnion
440+
- local: api/pipelines/cosmos
441+
title: Cosmos
436442
- local: api/pipelines/dance_diffusion
437443
title: Dance Diffusion
438444
- local: api/pipelines/ddim
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLCosmos
13+
14+
[Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer).
15+
16+
Supported models:
17+
- [nvidia/Cosmos-1.0-Tokenizer-CV8x8x8](https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-CV8x8x8)
18+
19+
The model can be loaded with the following code snippet.
20+
21+
```python
22+
from diffusers import AutoencoderKLCosmos
23+
24+
vae = AutoencoderKLCosmos.from_pretrained("nvidia/Cosmos-1.0-Tokenizer-CV8x8x8", subfolder="vae")
25+
```
26+
27+
## AutoencoderKLCosmos
28+
29+
[[autodoc]] AutoencoderKLCosmos
30+
- decode
31+
- encode
32+
- all
33+
34+
## AutoencoderKLOutput
35+
36+
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
37+
38+
## DecoderOutput
39+
40+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# CosmosTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import CosmosTransformer3DModel
20+
21+
transformer = CosmosTransformer3DModel.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## CosmosTransformer3DModel
25+
26+
[[autodoc]] CosmosTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# Cosmos
16+
17+
[Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
18+
19+
*Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.*
20+
21+
<Tip>
22+
23+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
24+
25+
</Tip>
26+
27+
## CosmosTextToWorldPipeline
28+
29+
[[autodoc]] CosmosTextToWorldPipeline
30+
- all
31+
- __call__
32+
33+
## CosmosVideoToWorldPipeline
34+
35+
[[autodoc]] CosmosVideoToWorldPipeline
36+
- all
37+
- __call__
38+
39+
## CosmosPipelineOutput
40+
41+
[[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput

docs/source/en/quantization/bitsandbytes.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
4848
```py
4949
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
5050
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
51-
51+
import torch
5252
from diffusers import AutoModel
5353
from transformers import T5EncoderModel
5454

@@ -88,6 +88,8 @@ Setting `device_map="auto"` automatically fills all available space on the GPU(s
8888
CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
8989

9090
```py
91+
from diffusers import FluxPipeline
92+
9193
pipe = FluxPipeline.from_pretrained(
9294
"black-forest-labs/FLUX.1-dev",
9395
transformer=transformer_8bit,
@@ -132,7 +134,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
132134
```py
133135
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
134136
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
135-
137+
import torch
136138
from diffusers import AutoModel
137139
from transformers import T5EncoderModel
138140

@@ -171,6 +173,8 @@ Let's generate an image using our quantized models.
171173
Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
172174

173175
```py
176+
from diffusers import FluxPipeline
177+
174178
pipe = FluxPipeline.from_pretrained(
175179
"black-forest-labs/FLUX.1-dev",
176180
transformer=transformer_4bit,
@@ -214,6 +218,8 @@ Check your memory footprint with the `get_memory_footprint` method:
214218
print(model.get_memory_footprint())
215219
```
216220

221+
Note that this only tells you the memory footprint of the model params and does _not_ estimate the inference memory requirements.
222+
217223
Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
218224

219225
```py
@@ -413,4 +419,4 @@ transformer_4bit.dequantize()
413419
## Resources
414420

415421
* [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)
416-
* [Training](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527)
422+
* [Training](https://github.com/huggingface/diffusers/blob/8c661ea586bf11cb2440da740dd3c4cf84679b85/examples/dreambooth/README_hidream.md#using-quantization)

0 commit comments

Comments
 (0)