Skip to content

Commit 32d5b97

Browse files
authored
Merge branch 'main' into integrations/ltx-097
2 parents f3ab327 + 66e50d4 commit 32d5b97

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+4927
-23
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,8 @@
295295
title: CogView4Transformer2DModel
296296
- local: api/models/consisid_transformer3d
297297
title: ConsisIDTransformer3DModel
298+
- local: api/models/cosmos_transformer3d
299+
title: CosmosTransformer3DModel
298300
- local: api/models/dit_transformer2d
299301
title: DiTTransformer2DModel
300302
- local: api/models/easyanimate_transformer3d
@@ -363,6 +365,8 @@
363365
title: AutoencoderKLAllegro
364366
- local: api/models/autoencoderkl_cogvideox
365367
title: AutoencoderKLCogVideoX
368+
- local: api/models/autoencoderkl_cosmos
369+
title: AutoencoderKLCosmos
366370
- local: api/models/autoencoder_kl_hunyuan_video
367371
title: AutoencoderKLHunyuanVideo
368372
- local: api/models/autoencoderkl_ltx_video
@@ -433,6 +437,8 @@
433437
title: ControlNet-XS with Stable Diffusion XL
434438
- local: api/pipelines/controlnet_union
435439
title: ControlNetUnion
440+
- local: api/pipelines/cosmos
441+
title: Cosmos
436442
- local: api/pipelines/dance_diffusion
437443
title: Dance Diffusion
438444
- local: api/pipelines/ddim
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLCosmos
13+
14+
[Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer).
15+
16+
Supported models:
17+
- [nvidia/Cosmos-1.0-Tokenizer-CV8x8x8](https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-CV8x8x8)
18+
19+
The model can be loaded with the following code snippet.
20+
21+
```python
22+
from diffusers import AutoencoderKLCosmos
23+
24+
vae = AutoencoderKLCosmos.from_pretrained("nvidia/Cosmos-1.0-Tokenizer-CV8x8x8", subfolder="vae")
25+
```
26+
27+
## AutoencoderKLCosmos
28+
29+
[[autodoc]] AutoencoderKLCosmos
30+
- decode
31+
- encode
32+
- all
33+
34+
## AutoencoderKLOutput
35+
36+
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
37+
38+
## DecoderOutput
39+
40+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# CosmosTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import CosmosTransformer3DModel
20+
21+
transformer = CosmosTransformer3DModel.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## CosmosTransformer3DModel
25+
26+
[[autodoc]] CosmosTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# Cosmos
16+
17+
[Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
18+
19+
*Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.*
20+
21+
<Tip>
22+
23+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
24+
25+
</Tip>
26+
27+
## CosmosTextToWorldPipeline
28+
29+
[[autodoc]] CosmosTextToWorldPipeline
30+
- all
31+
- __call__
32+
33+
## CosmosVideoToWorldPipeline
34+
35+
[[autodoc]] CosmosVideoToWorldPipeline
36+
- all
37+
- __call__
38+
39+
## CosmosPipelineOutput
40+
41+
[[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -430,6 +430,9 @@ def parse_args(input_args=None):
430430
default=4,
431431
help=("The dimension of the LoRA update matrices."),
432432
)
433+
434+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
435+
433436
parser.add_argument(
434437
"--with_prior_preservation",
435438
default=False,
@@ -1554,6 +1557,7 @@ def main(args):
15541557
transformer_lora_config = LoraConfig(
15551558
r=args.rank,
15561559
lora_alpha=args.rank,
1560+
lora_dropout=args.lora_dropout,
15571561
init_lora_weights="gaussian",
15581562
target_modules=target_modules,
15591563
)
@@ -1562,6 +1566,7 @@ def main(args):
15621566
text_lora_config = LoraConfig(
15631567
r=args.rank,
15641568
lora_alpha=args.rank,
1569+
lora_dropout=args.lora_dropout,
15651570
init_lora_weights="gaussian",
15661571
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
15671572
)

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,6 +658,8 @@ def parse_args(input_args=None):
658658
default=4,
659659
help=("The dimension of the LoRA update matrices."),
660660
)
661+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
662+
661663
parser.add_argument(
662664
"--use_dora",
663665
action="store_true",
@@ -1248,6 +1250,7 @@ def main(args):
12481250
unet_lora_config = LoraConfig(
12491251
r=args.rank,
12501252
lora_alpha=args.rank,
1253+
lora_dropout=args.lora_dropout,
12511254
use_dora=args.use_dora,
12521255
init_lora_weights="gaussian",
12531256
target_modules=["to_k", "to_q", "to_v", "to_out.0"],
@@ -1260,6 +1263,7 @@ def main(args):
12601263
text_lora_config = LoraConfig(
12611264
r=args.rank,
12621265
lora_alpha=args.rank,
1266+
lora_dropout=args.lora_dropout,
12631267
use_dora=args.use_dora,
12641268
init_lora_weights="gaussian",
12651269
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -767,6 +767,9 @@ def parse_args(input_args=None):
767767
default=4,
768768
help=("The dimension of the LoRA update matrices."),
769769
)
770+
771+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
772+
770773
parser.add_argument(
771774
"--use_dora",
772775
action="store_true",
@@ -1558,6 +1561,7 @@ def main(args):
15581561
r=args.rank,
15591562
use_dora=args.use_dora,
15601563
lora_alpha=args.rank,
1564+
lora_dropout=args.lora_dropout,
15611565
init_lora_weights="gaussian",
15621566
target_modules=target_modules,
15631567
)
@@ -1570,6 +1574,7 @@ def main(args):
15701574
r=args.rank,
15711575
use_dora=args.use_dora,
15721576
lora_alpha=args.rank,
1577+
lora_dropout=args.lora_dropout,
15731578
init_lora_weights="gaussian",
15741579
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
15751580
)

examples/dreambooth/train_dreambooth_lora.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,9 @@ def parse_args(input_args=None):
524524
default=4,
525525
help=("The dimension of the LoRA update matrices."),
526526
)
527+
528+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
529+
527530
parser.add_argument(
528531
"--image_interpolation_mode",
529532
type=str,
@@ -932,6 +935,7 @@ def main(args):
932935
unet_lora_config = LoraConfig(
933936
r=args.rank,
934937
lora_alpha=args.rank,
938+
lora_dropout=args.lora_dropout,
935939
init_lora_weights="gaussian",
936940
target_modules=["to_k", "to_q", "to_v", "to_out.0", "add_k_proj", "add_v_proj"],
937941
)
@@ -942,6 +946,7 @@ def main(args):
942946
text_lora_config = LoraConfig(
943947
r=args.rank,
944948
lora_alpha=args.rank,
949+
lora_dropout=args.lora_dropout,
945950
init_lora_weights="gaussian",
946951
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
947952
)

examples/dreambooth/train_dreambooth_lora_flux.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,9 @@ def parse_args(input_args=None):
358358
default=4,
359359
help=("The dimension of the LoRA update matrices."),
360360
)
361+
362+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
363+
361364
parser.add_argument(
362365
"--with_prior_preservation",
363366
default=False,
@@ -1236,6 +1239,7 @@ def main(args):
12361239
transformer_lora_config = LoraConfig(
12371240
r=args.rank,
12381241
lora_alpha=args.rank,
1242+
lora_dropout=args.lora_dropout,
12391243
init_lora_weights="gaussian",
12401244
target_modules=target_modules,
12411245
)
@@ -1244,6 +1248,7 @@ def main(args):
12441248
text_lora_config = LoraConfig(
12451249
r=args.rank,
12461250
lora_alpha=args.rank,
1251+
lora_dropout=args.lora_dropout,
12471252
init_lora_weights="gaussian",
12481253
target_modules=["q_proj", "k_proj", "v_proj", "out_proj"],
12491254
)

examples/dreambooth/train_dreambooth_lora_hidream.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,9 @@ def parse_args(input_args=None):
417417
default=4,
418418
help=("The dimension of the LoRA update matrices."),
419419
)
420+
421+
parser.add_argument("--lora_dropout", type=float, default=0.0, help="Dropout probability for LoRA layers")
422+
420423
parser.add_argument(
421424
"--with_prior_preservation",
422425
default=False,
@@ -1161,6 +1164,7 @@ def main(args):
11611164
transformer_lora_config = LoraConfig(
11621165
r=args.rank,
11631166
lora_alpha=args.rank,
1167+
lora_dropout=args.lora_dropout,
11641168
init_lora_weights="gaussian",
11651169
target_modules=target_modules,
11661170
)

0 commit comments

Comments
 (0)