Skip to content

Commit 3b17e2e

Browse files
authored
Merge branch 'main' into main
2 parents e17aa82 + 6dfaec3 commit 3b17e2e

File tree

90 files changed

+1504
-552
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

90 files changed

+1504
-552
lines changed

.github/workflows/pypi_publish.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
- name: Test installing diffusers and importing
6969
run: |
7070
pip install diffusers && pip uninstall diffusers -y
71-
pip install -i https://testpypi.python.org/pypi diffusers
71+
pip install -i https://test.pypi.org/simple/ diffusers
7272
python -c "from diffusers import __version__; print(__version__)"
7373
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
7474
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -435,7 +435,7 @@
435435
- local: api/pipelines/ledits_pp
436436
title: LEDITS++
437437
- local: api/pipelines/ltx_video
438-
title: LTX
438+
title: LTXVideo
439439
- local: api/pipelines/lumina
440440
title: Lumina-T2X
441441
- local: api/pipelines/marigold

docs/source/en/api/models/autoencoderkl_ltx_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLLTXVideo
2020

21-
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLLTXVideo

docs/source/en/api/models/ltx_video_transformer3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import LTXVideoTransformer3DModel
2020

21-
transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
21+
transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
2222
```
2323

2424
## LTXVideoTransformer3DModel

docs/source/en/api/models/sana_transformer2d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The model can be loaded with the following code snippet.
2222
```python
2323
from diffusers import SanaTransformer2DModel
2424

25-
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers", subfolder="transformer", torch_dtype=torch.float16)
25+
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
2626
```
2727

2828
## SanaTransformer2DModel

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License. -->
1414

15-
# LTX
15+
# LTX Video
1616

1717
[LTX Video](https://huggingface.co/Lightricks/LTX-Video) is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image + text-to-video usecases.
1818

@@ -22,14 +22,24 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
2222

2323
</Tip>
2424

25+
Available models:
26+
27+
| Model name | Recommended dtype |
28+
|:-------------:|:-----------------:|
29+
| [`LTX Video 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
30+
| [`LTX Video 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
31+
32+
Note: The recommended dtype is for the transformer component. The VAE and text encoders can be either `torch.float32`, `torch.bfloat16` or `torch.float16` but the recommended dtype is `torch.bfloat16` as used in the original repository.
33+
2534
## Loading Single Files
2635

27-
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`].
36+
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`]. We recommend using `from_single_file` for the Lightricks series of models, as they plan to release multiple models in the future in the single file format.
2837

2938
```python
3039
import torch
3140
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel
3241

42+
# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
3343
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
3444
transformer = LTXVideoTransformer3DModel.from_single_file(
3545
single_file_url, torch_dtype=torch.bfloat16
@@ -99,6 +109,34 @@ export_to_video(video, "output_gguf_ltx.mp4", fps=24)
99109

100110
Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn more about our GGUF support.
101111

112+
<!-- TODO(aryan): Update this when official weights are supported -->
113+
114+
Loading and running inference with [LTX Video 0.9.1](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) weights.
115+
116+
```python
117+
import torch
118+
from diffusers import LTXPipeline
119+
from diffusers.utils import export_to_video
120+
121+
pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
122+
pipe.to("cuda")
123+
124+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
125+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
126+
127+
video = pipe(
128+
prompt=prompt,
129+
negative_prompt=negative_prompt,
130+
width=768,
131+
height=512,
132+
num_frames=161,
133+
decode_timestep=0.03,
134+
decode_noise_scale=0.025,
135+
num_inference_steps=50,
136+
).frames[0]
137+
export_to_video(video, "output.mp4", fps=24)
138+
```
139+
102140
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
103141

104142
## LTXPipeline

docs/source/en/api/pipelines/sana.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ Available models:
3232

3333
| Model | Recommended dtype |
3434
|:-----:|:-----------------:|
35+
| [`Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | `torch.bfloat16` |
3536
| [`Efficient-Large-Model/Sana_1600M_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | `torch.float16` |
3637
| [`Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | `torch.float16` |
37-
| [`Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | `torch.bfloat16` |
3838
| [`Efficient-Large-Model/Sana_1600M_512px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | `torch.float16` |
3939
| [`Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | `torch.float16` |
4040
| [`Efficient-Large-Model/Sana_600M_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | `torch.float16` |

docs/source/en/community_projects.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,4 +79,8 @@ Happy exploring, and thank you for being part of the Diffusers community!
7979
<td><a href="https://github.com/Netwrck/stable-diffusion-server"> Stable Diffusion Server </a></td>
8080
<td>A server configured for Inpainting/Generation/img2img with one stable diffusion model</td>
8181
</tr>
82+
<tr style="border-top: 2px solid black">
83+
<td><a href="https://github.com/suzukimain/auto_diffusers"> Model Search </a></td>
84+
<td>Search models on Civitai and Hugging Face</td>
85+
</tr>
8286
</table>

docs/source/en/quantization/torchao.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The example below only quantizes the weights to int8.
2727
```python
2828
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
2929

30-
model_id = "black-forest-labs/Flux.1-Dev"
30+
model_id = "black-forest-labs/FLUX.1-dev"
3131
dtype = torch.bfloat16
3232

3333
quantization_config = TorchAoConfig("int8wo")
@@ -45,7 +45,9 @@ pipe = FluxPipeline.from_pretrained(
4545
pipe.to("cuda")
4646

4747
prompt = "A cat holding a sign that says hello world"
48-
image = pipe(prompt, num_inference_steps=28, guidance_scale=0.0).images[0]
48+
image = pipe(
49+
prompt, num_inference_steps=50, guidance_scale=4.5, max_sequence_length=512
50+
).images[0]
4951
image.save("output.png")
5052
```
5153

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
import wandb
7575

7676
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
77-
check_min_version("0.32.0.dev0")
77+
check_min_version("0.33.0.dev0")
7878

7979
logger = get_logger(__name__)
8080

0 commit comments

Comments
 (0)