Skip to content

Commit 4b01401

Browse files
Merge branch 'main' into lora-hot-swapping
2 parents 1b834ec + 9a147b8 commit 4b01401

File tree

63 files changed

+2093
-139
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+2093
-139
lines changed

docs/source/en/api/pipelines/lumina2.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,56 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
2626

2727
</Tip>
2828

29+
## Using Single File loading with Lumina Image 2.0
30+
31+
Single file loading for Lumina Image 2.0 is available for the `Lumina2Transformer2DModel`
32+
33+
```python
34+
import torch
35+
from diffusers import Lumina2Transformer2DModel, Lumina2Text2ImgPipeline
36+
37+
ckpt_path = "https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/blob/main/consolidated.00-of-01.pth"
38+
transformer = Lumina2Transformer2DModel.from_single_file(
39+
ckpt_path, torch_dtype=torch.bfloat16
40+
)
41+
42+
pipe = Lumina2Text2ImgPipeline.from_pretrained(
43+
"Alpha-VLLM/Lumina-Image-2.0", transformer=transformer, torch_dtype=torch.bfloat16
44+
)
45+
pipe.enable_model_cpu_offload()
46+
image = pipe(
47+
"a cat holding a sign that says hello",
48+
generator=torch.Generator("cpu").manual_seed(0),
49+
).images[0]
50+
image.save("lumina-single-file.png")
51+
52+
```
53+
54+
## Using GGUF Quantized Checkpoints with Lumina Image 2.0
55+
56+
GGUF Quantized checkpoints for the `Lumina2Transformer2DModel` can be loaded via `from_single_file` with the `GGUFQuantizationConfig`
57+
58+
```python
59+
from diffusers import Lumina2Transformer2DModel, Lumina2Text2ImgPipeline, GGUFQuantizationConfig
60+
61+
ckpt_path = "https://huggingface.co/calcuis/lumina-gguf/blob/main/lumina2-q4_0.gguf"
62+
transformer = Lumina2Transformer2DModel.from_single_file(
63+
ckpt_path,
64+
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
65+
torch_dtype=torch.bfloat16,
66+
)
67+
68+
pipe = Lumina2Text2ImgPipeline.from_pretrained(
69+
"Alpha-VLLM/Lumina-Image-2.0", transformer=transformer, torch_dtype=torch.bfloat16
70+
)
71+
pipe.enable_model_cpu_offload()
72+
image = pipe(
73+
"a cat holding a sign that says hello",
74+
generator=torch.Generator("cpu").manual_seed(0),
75+
).images[0]
76+
image.save("lumina-gguf.png")
77+
```
78+
2979
## Lumina2Text2ImgPipeline
3080

3181
[[autodoc]] Lumina2Text2ImgPipeline

docs/source/en/api/utilities.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,7 @@ Utility and helper functions for working with 🤗 Diffusers.
4545
## apply_layerwise_casting
4646

4747
[[autodoc]] hooks.layerwise_casting.apply_layerwise_casting
48+
49+
## apply_group_offloading
50+
51+
[[autodoc]] hooks.group_offloading.apply_group_offloading

docs/source/en/optimization/memory.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,46 @@ In order to properly offload models after they're called, it is required to run
158158

159159
</Tip>
160160

161+
## Group offloading
162+
163+
Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.
164+
165+
To enable group offloading, call the [`~ModelMixin.enable_group_offload`] method on the model if it is a Diffusers model implementation. For any other model implementation, use [`~hooks.group_offloading.apply_group_offloading`]:
166+
167+
```python
168+
import torch
169+
from diffusers import CogVideoXPipeline
170+
from diffusers.hooks import apply_group_offloading
171+
from diffusers.utils import export_to_video
172+
173+
# Load the pipeline
174+
onload_device = torch.device("cuda")
175+
offload_device = torch.device("cpu")
176+
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
177+
178+
# We can utilize the enable_group_offload method for Diffusers model implementations
179+
pipe.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True)
180+
181+
# For any other model implementations, the apply_group_offloading function can be used
182+
apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
183+
apply_group_offloading(pipe.vae, onload_device=onload_device, offload_type="leaf_level")
184+
185+
prompt = (
186+
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
187+
"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
188+
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
189+
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
190+
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
191+
"atmosphere of this unique musical performance."
192+
)
193+
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
194+
# This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline.
195+
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
196+
export_to_video(video, "output.mp4", fps=8)
197+
```
198+
199+
Group offloading (for CUDA devices with support for asynchronous data transfer streams) overlaps data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Group offloading also supports leaf-level offloading (equivalent to sequential CPU offloading) but can be made much faster when using streams.
200+
161201
## FP8 layerwise weight-casting
162202

163203
PyTorch supports `torch.float8_e4m3fn` and `torch.float8_e5m2` as weight storage dtypes, but they can't be used for computation in many different tensor operations due to unimplemented kernel support. However, you can use these dtypes to store model weights in fp8 precision and upcast them on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting.

docs/source/en/tutorials/using_peft_for_inference.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,3 +221,7 @@ pipe.delete_adapters("toy")
221221
pipe.get_active_adapters()
222222
["pixel"]
223223
```
224+
225+
## PeftInputAutocastDisableHook
226+
227+
[[autodoc]] hooks.layerwise_casting.PeftInputAutocastDisableHook

examples/community/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,9 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
5050
| IADB Pipeline | Implementation of [Iterative α-(de)Blending: a Minimalist Deterministic Diffusion Model](https://arxiv.org/abs/2305.03486) | [IADB Pipeline](#iadb-pipeline) | - | [Thomas Chambon](https://github.com/tchambon)
5151
| Zero1to3 Pipeline | Implementation of [Zero-1-to-3: Zero-shot One Image to 3D Object](https://arxiv.org/abs/2303.11328) | [Zero1to3 Pipeline](#zero1to3-pipeline) | - | [Xin Kong](https://github.com/kxhit) |
5252
| Stable Diffusion XL Long Weighted Prompt Pipeline | A pipeline support unlimited length of prompt and negative prompt, use A1111 style of prompt weighting | [Stable Diffusion XL Long Weighted Prompt Pipeline](#stable-diffusion-xl-long-weighted-prompt-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1LsqilswLR40XLLcp6XFOl5nKb_wOe26W?usp=sharing) | [Andrew Zhu](https://xhinker.medium.com/) |
53-
| Stable Diffusion Mixture Tiling Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending | [Stable Diffusion Mixture Tiling Pipeline SD 1.5](#stable-diffusion-mixture-tiling-sd-15) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/albarji/mixture-of-diffusers) | [Álvaro B Jiménez](https://github.com/albarji/) |
54-
| Stable Diffusion Mixture Tiling Pipeline SDXL | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending | [Stable Diffusion Mixture Tiling Pipeline SDXL](#stable-diffusion-mixture-tiling-sdxl) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/elismasilva/mixture-of-diffusers-sdxl-tiling) | [Eliseu Silva](https://github.com/DEVAIEXP/) |
53+
| Stable Diffusion Mixture Tiling Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending | [Stable Diffusion Mixture Tiling Pipeline SD 1.5](#stable-diffusion-mixture-tiling-pipeline-sd-15) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/albarji/mixture-of-diffusers) | [Álvaro B Jiménez](https://github.com/albarji/) |
54+
| Stable Diffusion Mixture Canvas Pipeline SD 1.5 | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending. Works by defining a list of Text2Image region objects that detail the region of influence of each diffuser. | [Stable Diffusion Mixture Canvas Pipeline SD 1.5](#stable-diffusion-mixture-canvas-pipeline-sd-15) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/albarji/mixture-of-diffusers) | [Álvaro B Jiménez](https://github.com/albarji/) |
55+
| Stable Diffusion Mixture Tiling Pipeline SDXL | A pipeline generates cohesive images by integrating multiple diffusion processes, each focused on a specific image region and considering boundary effects for smooth blending | [Stable Diffusion Mixture Tiling Pipeline SDXL](#stable-diffusion-mixture-tiling-pipeline-sdxl) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/elismasilva/mixture-of-diffusers-sdxl-tiling) | [Eliseu Silva](https://github.com/DEVAIEXP/) |
5556
| FABRIC - Stable Diffusion with feedback Pipeline | pipeline supports feedback from liked and disliked images | [Stable Diffusion Fabric Pipeline](#stable-diffusion-fabric-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_fabric.ipynb)| [Shauray Singh](https://shauray8.github.io/about_shauray/) |
5657
| sketch inpaint - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion Pipeline](#stable-diffusion-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) |
5758
| sketch inpaint xl - Inpainting with non-inpaint Stable Diffusion | sketch inpaint much like in automatic1111 | [Masked Im2Im Stable Diffusion XL Pipeline](#stable-diffusion-xl-masked-im2im) | - | [Anatoly Belikov](https://github.com/noskill) |
@@ -2404,7 +2405,7 @@ pipe_images = mixing_pipeline(
24042405

24052406
![image_mixing_result](https://huggingface.co/datasets/TheDenk/images_mixing/resolve/main/boromir_gigachad.png)
24062407

2407-
### Stable Diffusion Mixture Tiling SD 1.5
2408+
### Stable Diffusion Mixture Tiling Pipeline SD 1.5
24082409

24092410
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
24102411

@@ -2435,7 +2436,7 @@ image = pipeline(
24352436

24362437
![mixture_tiling_results](https://huggingface.co/datasets/kadirnar/diffusers_readme_images/resolve/main/mixture_tiling.png)
24372438

2438-
### Stable Diffusion Mixture Canvas
2439+
### Stable Diffusion Mixture Canvas Pipeline SD 1.5
24392440

24402441
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
24412442

@@ -2470,7 +2471,7 @@ output = pipeline(
24702471
![Input_Image](https://huggingface.co/datasets/kadirnar/diffusers_readme_images/resolve/main/input_image.png)
24712472
![mixture_canvas_results](https://huggingface.co/datasets/kadirnar/diffusers_readme_images/resolve/main/canvas.png)
24722473

2473-
### Stable Diffusion Mixture Tiling SDXL
2474+
### Stable Diffusion Mixture Tiling Pipeline SDXL
24742475

24752476
This pipeline uses the Mixture. Refer to the [Mixture](https://arxiv.org/abs/2302.02412) paper for more details.
24762477

@@ -2516,14 +2517,13 @@ image = pipe(
25162517
tile_col_overlap=256,
25172518
guidance_scale_tiles=[[7, 7, 7]], # or guidance_scale=7 if is the same for all prompts
25182519
height=1024,
2519-
width=3840,
2520-
target_size=(1024, 3840),
2520+
width=3840,
25212521
generator=generator,
25222522
num_inference_steps=30,
25232523
)["images"][0]
25242524
```
25252525

2526-
![mixture_tiling_results](https://huggingface.co/datasets/elismasilva/results/resolve/main/mixture_sdxl.png)
2526+
![mixture_tiling_results](https://huggingface.co/datasets/elismasilva/results/resolve/main/mixture_of_diffusers_sdxl_1.png)
25272527

25282528
### TensorRT Inpainting Stable Diffusion Pipeline
25292529

0 commit comments

Comments
 (0)