Skip to content

Commit b11f228

Browse files
authored
Merge branch 'main' into main
2 parents f518f5f + 1b202c5 commit b11f228

File tree

88 files changed

+1447
-556
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1447
-556
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,8 @@ jobs:
359359
test_location: "bnb"
360360
- backend: "gguf"
361361
test_location: "gguf"
362+
- backend: "torchao"
363+
test_location: "torchao"
362364
runs-on:
363365
group: aws-g6e-xlarge-plus
364366
container:

.github/workflows/pypi_publish.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
- name: Test installing diffusers and importing
6969
run: |
7070
pip install diffusers && pip uninstall diffusers -y
71-
pip install -i https://testpypi.python.org/pypi diffusers
71+
pip install -i https://test.pypi.org/simple/ diffusers
7272
python -c "from diffusers import __version__; print(__version__)"
7373
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
7474
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@
429429
- local: api/pipelines/ledits_pp
430430
title: LEDITS++
431431
- local: api/pipelines/ltx_video
432-
title: LTX
432+
title: LTXVideo
433433
- local: api/pipelines/lumina
434434
title: Lumina-T2X
435435
- local: api/pipelines/marigold

docs/source/en/api/models/autoencoderkl_ltx_video.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import AutoencoderKLLTXVideo
2020

21-
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
21+
vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
2222
```
2323

2424
## AutoencoderKLLTXVideo

docs/source/en/api/models/ltx_video_transformer3d.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
1818
```python
1919
from diffusers import LTXVideoTransformer3DModel
2020

21-
transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
21+
transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
2222
```
2323

2424
## LTXVideoTransformer3DModel

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License. -->
1414

15-
# LTX
15+
# LTX Video
1616

1717
[LTX Video](https://huggingface.co/Lightricks/LTX-Video) is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image + text-to-video usecases.
1818

@@ -22,14 +22,24 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
2222

2323
</Tip>
2424

25+
Available models:
26+
27+
| Model name | Recommended dtype |
28+
|:-------------:|:-----------------:|
29+
| [`LTX Video 0.9.0`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.safetensors) | `torch.bfloat16` |
30+
| [`LTX Video 0.9.1`](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) | `torch.bfloat16` |
31+
32+
Note: The recommended dtype is for the transformer component. The VAE and text encoders can be either `torch.float32`, `torch.bfloat16` or `torch.float16` but the recommended dtype is `torch.bfloat16` as used in the original repository.
33+
2534
## Loading Single Files
2635

27-
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`].
36+
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`]. We recommend using `from_single_file` for the Lightricks series of models, as they plan to release multiple models in the future in the single file format.
2837

2938
```python
3039
import torch
3140
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel
3241

42+
# `single_file_url` could also be https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors
3343
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
3444
transformer = LTXVideoTransformer3DModel.from_single_file(
3545
single_file_url, torch_dtype=torch.bfloat16
@@ -99,6 +109,34 @@ export_to_video(video, "output_gguf_ltx.mp4", fps=24)
99109

100110
Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn more about our GGUF support.
101111

112+
<!-- TODO(aryan): Update this when official weights are supported -->
113+
114+
Loading and running inference with [LTX Video 0.9.1](https://huggingface.co/Lightricks/LTX-Video/blob/main/ltx-video-2b-v0.9.1.safetensors) weights.
115+
116+
```python
117+
import torch
118+
from diffusers import LTXPipeline
119+
from diffusers.utils import export_to_video
120+
121+
pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
122+
pipe.to("cuda")
123+
124+
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
125+
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
126+
127+
video = pipe(
128+
prompt=prompt,
129+
negative_prompt=negative_prompt,
130+
width=768,
131+
height=512,
132+
num_frames=161,
133+
decode_timestep=0.03,
134+
decode_noise_scale=0.025,
135+
num_inference_steps=50,
136+
).frames[0]
137+
export_to_video(video, "output.mp4", fps=24)
138+
```
139+
102140
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
103141

104142
## LTXPipeline

docs/source/en/community_projects.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,4 +79,8 @@ Happy exploring, and thank you for being part of the Diffusers community!
7979
<td><a href="https://github.com/Netwrck/stable-diffusion-server"> Stable Diffusion Server </a></td>
8080
<td>A server configured for Inpainting/Generation/img2img with one stable diffusion model</td>
8181
</tr>
82+
<tr style="border-top: 2px solid black">
83+
<td><a href="https://github.com/suzukimain/auto_diffusers"> Model Search </a></td>
84+
<td>Search models on Civitai and Hugging Face</td>
85+
</tr>
8286
</table>

docs/source/en/quantization/torchao.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Quantize a model by passing [`TorchAoConfig`] to [`~ModelMixin.from_pretrained`]
2525
The example below only quantizes the weights to int8.
2626

2727
```python
28+
import torch
2829
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
2930

3031
model_id = "black-forest-labs/FLUX.1-dev"
@@ -44,6 +45,10 @@ pipe = FluxPipeline.from_pretrained(
4445
)
4546
pipe.to("cuda")
4647

48+
# Without quantization: ~31.447 GB
49+
# With quantization: ~20.40 GB
50+
print(f"Pipeline memory usage: {torch.cuda.max_memory_reserved() / 1024**3:.3f} GB")
51+
4752
prompt = "A cat holding a sign that says hello world"
4853
image = pipe(
4954
prompt, num_inference_steps=50, guidance_scale=4.5, max_sequence_length=512
@@ -88,6 +93,63 @@ Some quantization methods are aliases (for example, `int8wo` is the commonly use
8893

8994
Refer to the official torchao documentation for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
9095

96+
## Serializing and Deserializing quantized models
97+
98+
To serialize a quantized model in a given dtype, first load the model with the desired quantization dtype and then save it using the [`~ModelMixin.save_pretrained`] method.
99+
100+
```python
101+
import torch
102+
from diffusers import FluxTransformer2DModel, TorchAoConfig
103+
104+
quantization_config = TorchAoConfig("int8wo")
105+
transformer = FluxTransformer2DModel.from_pretrained(
106+
"black-forest-labs/Flux.1-Dev",
107+
subfolder="transformer",
108+
quantization_config=quantization_config,
109+
torch_dtype=torch.bfloat16,
110+
)
111+
transformer.save_pretrained("/path/to/flux_int8wo", safe_serialization=False)
112+
```
113+
114+
To load a serialized quantized model, use the [`~ModelMixin.from_pretrained`] method.
115+
116+
```python
117+
import torch
118+
from diffusers import FluxPipeline, FluxTransformer2DModel
119+
120+
transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_int8wo", torch_dtype=torch.bfloat16, use_safetensors=False)
121+
pipe = FluxPipeline.from_pretrained("black-forest-labs/Flux.1-Dev", transformer=transformer, torch_dtype=torch.bfloat16)
122+
pipe.to("cuda")
123+
124+
prompt = "A cat holding a sign that says hello world"
125+
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0).images[0]
126+
image.save("output.png")
127+
```
128+
129+
Some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
130+
131+
```python
132+
import torch
133+
from accelerate import init_empty_weights
134+
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
135+
136+
# Serialize the model
137+
transformer = FluxTransformer2DModel.from_pretrained(
138+
"black-forest-labs/Flux.1-Dev",
139+
subfolder="transformer",
140+
quantization_config=TorchAoConfig("uint4wo"),
141+
torch_dtype=torch.bfloat16,
142+
)
143+
transformer.save_pretrained("/path/to/flux_uint4wo", safe_serialization=False, max_shard_size="50GB")
144+
# ...
145+
146+
# Load the model
147+
state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
148+
with init_empty_weights():
149+
transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
150+
transformer.load_state_dict(state_dict, strict=True, assign=True)
151+
```
152+
91153
## Resources
92154

93155
- [TorchAO Quantization API](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md)

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
import wandb
7575

7676
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
77-
check_min_version("0.32.0.dev0")
77+
check_min_version("0.33.0.dev0")
7878

7979
logger = get_logger(__name__)
8080

examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@
7373

7474

7575
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
76-
check_min_version("0.32.0.dev0")
76+
check_min_version("0.33.0.dev0")
7777

7878
logger = get_logger(__name__)
7979

0 commit comments

Comments
 (0)