Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
f666908
a
tolgacangoz Aug 12, 2024
c092728
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 12, 2024
cfe8dcc
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 13, 2024
c1b6c0f
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 25, 2024
60021b8
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 26, 2024
0ad7101
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 27, 2024
aabac0a
refactor: add `ff_act_fn` parameter to `UNet2DConditionModel` and `ge…
tolgacangoz Aug 27, 2024
ad4c6a3
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Aug 29, 2024
649baa6
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 4, 2024
279d613
Study as an ordinary UNet model
tolgacangoz Sep 5, 2024
5f5bd08
make style
tolgacangoz Sep 5, 2024
bfd8b9d
make fix-copies
tolgacangoz Sep 5, 2024
c3b004b
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 5, 2024
eaef037
Up
tolgacangoz Sep 5, 2024
2e99ec7
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 5, 2024
99d9099
Up
tolgacangoz Sep 6, 2024
33dd50d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 7, 2024
56e61f0
Fix timestep embedding conditioning in `MatryoshkaCombinedTimestepTex…
tolgacangoz Sep 12, 2024
376500a
make style
tolgacangoz Sep 12, 2024
4c16e5b
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 12, 2024
8c4dcb3
Revert; cuz I should have created (probably) a new attention processo…
tolgacangoz Sep 14, 2024
ef38541
Revert to create your own custom transformer block
tolgacangoz Sep 15, 2024
8eadb30
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 16, 2024
19d6c17
Init template for the pipeline
tolgacangoz Sep 16, 2024
7d1a0ab
Add `MatryoshkaTransformerBlock` and `MatryoshkaFeedForward` classes
tolgacangoz Sep 16, 2024
5754bc6
Revert
tolgacangoz Sep 16, 2024
bc1f68b
Add `GELU` activation function module
tolgacangoz Sep 16, 2024
906298b
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 17, 2024
23f4ced
Revert
tolgacangoz Sep 17, 2024
a2ca8ef
Revert
tolgacangoz Sep 17, 2024
bcd8939
make fix-copies
tolgacangoz Sep 17, 2024
e014e3e
All in one file
tolgacangoz Sep 17, 2024
f264b9f
Up
tolgacangoz Sep 17, 2024
1a40f68
Replace `MatryoshkaTransformerBlock` with `MatryoshkaTransformer2DModel`
tolgacangoz Sep 17, 2024
221c954
make style
tolgacangoz Sep 17, 2024
c75e723
Refactor `MatryoshkaTransformer2DModel` to add `forward()`and add `Ma…
tolgacangoz Sep 17, 2024
e5db6e3
make style
tolgacangoz Sep 17, 2024
728fb42
Up
tolgacangoz Sep 17, 2024
5b5747d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 17, 2024
464600d
Remove redundant attention projections in `MatryoshkaTransformerBlock`
tolgacangoz Sep 18, 2024
36d9d29
Up
tolgacangoz Sep 18, 2024
b0bf23f
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 18, 2024
9e37e00
Fix shape issue
tolgacangoz Sep 18, 2024
b573182
Up
tolgacangoz Sep 18, 2024
f35a8f9
make style
tolgacangoz Sep 18, 2024
0f6bce5
Up
tolgacangoz Sep 18, 2024
1d48420
Refactor condition embedding in `MatryoshkaCombinedTimestepTextEmbedd…
tolgacangoz Sep 18, 2024
b476da9
Adapt `DDIMScheduler` for `x_0` prediction by exploiting `gammas`
tolgacangoz Sep 19, 2024
a3fc84d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 19, 2024
6a978b2
Fix `prev_timestep` index
tolgacangoz Sep 20, 2024
368e044
Up
tolgacangoz Sep 20, 2024
02e67c3
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 20, 2024
a146ae4
Fix normalization group size in `MatryoshkaTransformerBlock`
tolgacangoz Sep 20, 2024
abbb3d4
Refactor class names
tolgacangoz Sep 20, 2024
a09266e
Add `NestedUNet2DConditionModel` template
tolgacangoz Sep 20, 2024
85241b3
Adapt `NestedUNet2DConditionModel` initialization and configuration
tolgacangoz Sep 20, 2024
b7df3bb
make style
tolgacangoz Sep 20, 2024
22c148f
Add template of `forward` for `NestedUNet2DConditionModel`
tolgacangoz Sep 20, 2024
651cd76
Refactor `NestedUNet2DConditionModel` forward method
tolgacangoz Sep 21, 2024
ea60da3
Refactor `NestedUNet2DConditionModel` forward method
tolgacangoz Sep 21, 2024
8b29e7d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 21, 2024
e01421c
Fix `NestedUNet2DConditionModel` initialization
tolgacangoz Sep 21, 2024
4d06f29
Up
tolgacangoz Sep 21, 2024
29fa257
Generalize `MatryoshkaCombinedTimestepTextEmbedding` for nesting leve…
tolgacangoz Sep 22, 2024
1a22767
make style
tolgacangoz Sep 22, 2024
d0fa5ca
Up
tolgacangoz Sep 22, 2024
6b65f9f
Generalize time projection for different model types in `MatryoshkaUN…
tolgacangoz Sep 23, 2024
62db4b0
Fix `cond_emb` usage
tolgacangoz Sep 23, 2024
a57b5fc
Up
tolgacangoz Sep 24, 2024
db809dc
style
tolgacangoz Sep 24, 2024
ff301b6
`Up`
tolgacangoz Sep 25, 2024
77732bb
style
tolgacangoz Sep 25, 2024
577875a
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 25, 2024
154c1be
Refactor `NestedUNet2DConditionModel` to handle `sample_low` conditio…
tolgacangoz Sep 26, 2024
b363cc1
Simplify
tolgacangoz Sep 26, 2024
028a685
Refactor `MatryoshkaDDIMScheduler` to use `alpha_prod` instead of `ga…
tolgacangoz Sep 26, 2024
a5b3c37
Refactor `MatryoshkaDDIMScheduler` to remove unused import and simpli…
tolgacangoz Sep 26, 2024
30c6881
Refactor `NestedUNet2DConditionModel` to handle `inner_config` condit…
tolgacangoz Sep 26, 2024
e34fb48
Refactor `_set_time_proj` to handle with `micro_conditioning_scale` c…
tolgacangoz Sep 26, 2024
dd88c37
Generalize for `nesting_level=2`
tolgacangoz Sep 27, 2024
02cec3f
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 27, 2024
a7c7c9a
Refactor `MatryoshkaUNet2DConditionModel` and `NestedUNet2DConditionM…
tolgacangoz Sep 28, 2024
0144e14
Cleansing
tolgacangoz Sep 28, 2024
5e2e939
Clean up the `NestedUNet2DConditionModel` constructor
tolgacangoz Sep 28, 2024
5fbba0e
No need for VAE
tolgacangoz Sep 28, 2024
2934570
Up
tolgacangoz Sep 28, 2024
f2f2f9c
style
tolgacangoz Sep 28, 2024
9d85d0d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Sep 28, 2024
6ed3d63
Refactor `MatryoshkaUNet2DConditionModel` and `MatryoshkaPipeline`
tolgacangoz Sep 29, 2024
1df22a6
Remove safety checker
tolgacangoz Sep 29, 2024
a4be940
style
tolgacangoz Sep 29, 2024
ba39b8d
Refactor 'NestedUNet2DConditionModel' to add 'sample_size' parameter …
tolgacangoz Sep 29, 2024
319a4d6
Up
tolgacangoz Sep 30, 2024
95a293c
Refactor 'MatryoshkaPipeline' to process multiple images for nesting_…
tolgacangoz Sep 30, 2024
b54d9ef
revert the last
tolgacangoz Sep 30, 2024
c9f17bb
Refactor 'MatryoshkaDDIMScheduler' to handle multiple model outputs f…
tolgacangoz Sep 30, 2024
67a2917
Refactor 'MatryoshkaPipeline' to remove unused 'model_type' property
tolgacangoz Oct 2, 2024
60e9e77
Fix masking
tolgacangoz Oct 2, 2024
261f135
style
tolgacangoz Oct 2, 2024
aebb266
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 2, 2024
25b56a8
Fix and improve mask handling
tolgacangoz Oct 3, 2024
38c5455
style
tolgacangoz Oct 3, 2024
d13081e
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 3, 2024
3ada184
style
tolgacangoz Oct 4, 2024
0efa0ae
Fix mask handling
tolgacangoz Oct 4, 2024
5737d95
Refactor attention mask handling in Matryoshka models
tolgacangoz Oct 4, 2024
b691f16
Refactor attention mask handling in Matryoshka models
tolgacangoz Oct 4, 2024
ccdee35
Fix mask handling for `nesting_level=2`
tolgacangoz Oct 5, 2024
2efe7b0
Attempt for scheduler usage generalization
tolgacangoz Oct 5, 2024
5c90be9
Equalize tokenizer usage fully
tolgacangoz Oct 5, 2024
31c73fa
style
tolgacangoz Oct 5, 2024
a21e110
Up
tolgacangoz Oct 6, 2024
bc073fc
Refactor `matryoshka.py` to include proper licensing and attribution
tolgacangoz Oct 7, 2024
33edbdd
Refactor `matryoshka.py` to remove deprecated `_encode_prompt()` method
tolgacangoz Oct 7, 2024
96a788c
Refactor `matryoshka.py` to include nesting levels for the UNet model
tolgacangoz Oct 7, 2024
d4f2911
Up
tolgacangoz Oct 7, 2024
3bc6f80
Fix scaling issue for high resolutions
tolgacangoz Oct 8, 2024
e4259ac
Add `self.change_nesting_level(int)` function
tolgacangoz Oct 8, 2024
942c54a
Refactor `matryoshka.py` to handle multiple model outputs in `Matryos…
tolgacangoz Oct 10, 2024
737bca0
This model uses this.
tolgacangoz Oct 10, 2024
aabba08
Move `extra_step_kwargs`
tolgacangoz Oct 10, 2024
360f57e
style
tolgacangoz Oct 10, 2024
83262f8
Refactor optional components in `MatryoshkaPipeline`
tolgacangoz Oct 11, 2024
e543379
Simplify
tolgacangoz Oct 11, 2024
bd91585
Add 🪆Matryoshka Diffusion Models to community pipelines in `Readme.md`
tolgacangoz Oct 11, 2024
3430345
Update example usage
tolgacangoz Oct 11, 2024
149e8b5
Refactor `MatryoshkaTransformerBlock` to use `MatryoshkaFusedAttnProc…
tolgacangoz Oct 13, 2024
ecca7e3
style
tolgacangoz Oct 13, 2024
6fd62e0
simplify
tolgacangoz Oct 13, 2024
c90993d
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 13, 2024
5009be1
Add `trust_remote_code=True` requirement for custom components
tolgacangoz Oct 13, 2024
29eceb7
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 13, 2024
4c3ba48
revert
tolgacangoz Oct 13, 2024
8f0e888
Merge branch 'Add-Matryoshka-Diffusion-Models' of github.com:tolgacan…
tolgacangoz Oct 13, 2024
1b756d1
Update README.md
tolgacangoz Oct 14, 2024
787b1a5
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 14, 2024
d197cc1
Merge branch 'main' into Add-Matryoshka-Diffusion-Models
tolgacangoz Oct 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 56 additions & 10 deletions examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| Stable Diffusion BoxDiff Pipeline | Training-free controlled generation with bounding boxes using [BoxDiff](https://github.com/showlab/BoxDiff) | [Stable Diffusion BoxDiff Pipeline](#stable-diffusion-boxdiff) | - | [Jingyang Zhang](https://github.com/zjysteven/) |
| FRESCO V2V Pipeline | Implementation of [[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation](https://arxiv.org/abs/2403.12962) | [FRESCO V2V Pipeline](#fresco) | - | [Yifan Zhou](https://github.com/SingleZombie) |
| AnimateDiff IPEX Pipeline | Accelerate AnimateDiff inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [AnimateDiff on IPEX](#animatediff-on-ipex) | - | [Dan Li](https://github.com/ustcuna/) |
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffsuion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
| HunyuanDiT Differential Diffusion Pipeline | Applies [Differential Diffusion](https://github.com/exx8/differential-diffusion) to [HunyuanDiT](https://github.com/huggingface/diffusers/pull/8240). | [HunyuanDiT with Differential Diffusion](#hunyuandit-with-differential-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing) | [Monjoy Choudhury](https://github.com/MnCSSJ4x) |
| [🪆Matryoshka Diffusion Models](https://huggingface.co/papers/2310.15111) | A diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. See [original codebase](https://github.com/apple/ml-mdm). | [🪆Matryoshka Diffusion Models](#matryoshka-diffusion-models) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/pcuenq/mdm) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/tolgacangoz/1f54875fc7aeaabcf284ebde64820966/matryoshka_hf.ipynb) | [M. Tolga Cangöz](https://github.com/tolgacangoz) |

To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.

Expand All @@ -85,28 +86,28 @@ pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion

### Flux with CFG

Know more about Flux [here](https://blackforestlabs.ai/announcing-black-forest-labs/). Since Flux doesn't use CFG, this implementation provides one, inspired by the [PuLID Flux adaptation](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md).
Know more about Flux [here](https://blackforestlabs.ai/announcing-black-forest-labs/). Since Flux doesn't use CFG, this implementation provides one, inspired by the [PuLID Flux adaptation](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md).

Example usage:

```py
from diffusers import DiffusionPipeline
import torch
import torch

pipeline = DiffusionPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
custom_pipeline="pipeline_flux_with_cfg"
)
pipeline.enable_model_cpu_offload()
prompt = "a watercolor painting of a unicorn"
negative_prompt = "pink"

img = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
true_cfg=1.5,
guidance_scale=3.5,
prompt=prompt,
negative_prompt=negative_prompt,
true_cfg=1.5,
guidance_scale=3.5,
num_images_per_prompt=1,
generator=torch.manual_seed(0)
).images[0]
Expand Down Expand Up @@ -2656,7 +2657,7 @@ image with mask mech_painted.png

<img src=https://github.com/noskill/diffusers/assets/733626/c334466a-67fe-4377-9ff7-f46021b9c224 width="25%" >

result:
result:

<img src=https://github.com/noskill/diffusers/assets/733626/5043fb57-a785-4606-a5ba-a36704f7cb42 width="25%" >

Expand Down Expand Up @@ -4324,6 +4325,51 @@ image = pipe(

A colab notebook demonstrating all results can be found [here](https://colab.research.google.com/drive/1v44a5fpzyr4Ffr4v2XBQ7BajzG874N4P?usp=sharing). Depth Maps have also been added in the same colab.

### 🪆Matryoshka Diffusion Models

![🪆Matryoshka Diffusion Models](https://github.com/user-attachments/assets/bf90b53b-48c3-4769-a805-d9dfe4a7c572)

The Abstract of the paper:
>Diffusion models are the _de-facto_ approach for generating high-quality images and videos but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space, or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion (MDM), **a novel framework for high-resolution image and video synthesis**. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a **NestedUNet** architecture where features and parameters for small scale inputs are nested within those of the large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a **_single pixel-space model_ at resolutions of up to 1024 × 1024 pixels**, demonstrating strong zero shot generalization using the **CC12M dataset, which contains only 12 million images**. Code and pre-trained checkpoints are released at https://github.com/apple/ml-mdm.

- `64×64, nesting_level=0`: 1.719 GiB. With `50` DDIM inference steps:

**64x64**
:-------------------------:
| <img src="https://github.com/user-attachments/assets/9e7bb2cd-45a0-4bd1-adb8-23e283baed39" width="222" height="222" alt="bird_64"> |

- `256×256, nesting_level=1`: 1.776 GiB. With `150` DDIM inference steps:

**64x64** | **256x256**
:-------------------------:|:-------------------------:
| <img src="https://github.com/user-attachments/assets/6b724c2e-5e6a-4b63-9b65-c1182cbb67e0" width="222" height="222" alt="64x64"> | <img src="https://github.com/user-attachments/assets/7dbab2ad-bf40-4a73-ab04-f178347cb7d5" width="222" height="222" alt="256x256"> |

- `1024×1024, nesting_level=2`: 1.792 GiB. As one can realize the cost of adding another layer is really negligible. With `250` DDIM inference steps:

**64x64** | **256x256** | **1024x1024**
:-------------------------:|:-------------------------:|:-------------------------:
| <img src="https://github.com/user-attachments/assets/4a9454e4-e20a-4736-a196-270e2ae796c0" width="222" height="222" alt="64x64"> | <img src="https://github.com/user-attachments/assets/4a96555d-0fda-4303-82b1-a4d886f770b9" width="222" height="222" alt="256x256"> | <img src="https://github.com/user-attachments/assets/e0239b7a-ab73-4d45-8f3e-b4e6b4b50abe" width="222" height="222" alt="1024x1024"> |

```py
from diffusers import DiffusionPipeline
from diffusers.utils import make_image_grid

# nesting_level=0 -> 64x64; nesting_level=1 -> 256x256 - 64x64; nesting_level=2 -> 1024x1024 - 256x256 - 64x64
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/matryoshka-diffusion-models",
nesting_level=0,
trust_remote_code=False, # One needs to give permission for this code to run
).to("cuda")

prompt0 = "a blue jay stops on the top of a helmet of Japanese samurai, background with sakura tree"
prompt = f"breathtaking {prompt0}. award-winning, professional, highly detailed"
negative_prompt = "deformed, mutated, ugly, disfigured, blur, blurry, noise, noisy"
image = pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=50).images
make_image_grid(image, rows=1, cols=len(image))

# pipe.change_nesting_level(<int>) # 0, 1, or 2
# 50+, 100+, and 250+ num_inference_steps are recommended for nesting levels 0, 1, and 2 respectively.
```

# Perturbed-Attention Guidance

[Project](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) / [arXiv](https://arxiv.org/abs/2403.17377) / [GitHub](https://github.com/KU-CVLAB/Perturbed-Attention-Guidance)
Expand Down
Loading
Loading