-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[Community Pipeline] Add 🪆Matryoshka Diffusion Models
#9157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Community Pipeline] Add 🪆Matryoshka Diffusion Models
#9157
Conversation
|
@tolgacangoz would you have cycles to work on this soon? Another contributor has expressed interest in working on it. Maybe you two could collaborate? |
|
I am into the inference code atm. Will the training code in |
|
For now, we don't have to focus on training. |
…t_down_block` for FF layers in attention
Community Pipeline] Add 🪆Matryoshka Diffusion Models
…r for Matryoshka models
…goz/diffusers into Add-Matryoshka-Diffusion-Models
|
Thank you for working on this @tolgacangoz! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
Thanks for merging! |
|
Hey @tolgacangoz, are there any changes we need to make here to incorporate Jiatao's latest changes apple/ml-mdm#21 |
|
Probably. I will look into it tomorrow. Edit: The usage of |
Thanks for the opportunity to work on this model!
The Abstract of the paper:
Paper: 🪆Matryoshka Diffusion Models
Repository: https://github.com/apple/ml-mdm
Hugging Face Space: https://huggingface.co/spaces/pcuenq/mdm
License: MIT license
Key takeaways from the paper:
None; since Matryoshka Diffusion Models work on the (extended) pixel space(s).flan-t5-xlTODOs:
✅ The U-Net; in other words, the inner-most structure,
nesting_level=0; approximately would be as follows:✅ Scheduler:
timestepsand utilizesprev_timestepin a slightly different way. Givest-1timestep to theunet, givestto thescheduler, and doesn't use the lasttimestep.nesting_level=1-type uses2noise matrices:3×64×64and3×256×256. And,nesting_level=2-type uses3noise matrices:3×64×64,3×256×256,3×1024×1024. Each noise matrix has its own calculations in the scheduler. One produces3images from anesting_level=2model with3different resolutions.✅
convert_matryoshka_model_to_diffusers.py✅ Show example results:
64×64, nesting_level=0: 1.719 GiB. With50DDIM inference steps:256×256, nesting_level=1: 1.776 GiB. With150DDIM inference steps:1024×1024, nesting_level=2: 1.792 GiB. As one can realize the cost of adding another layer is really negligible in this context! With250DDIM inference steps:✅ Finish HF integration & upload converted checkpoints to HF.
✅
README.md⏳ Make it as simple as possible, but not simpler. Note: I could make small additions/modifications in the future, e.g., for comments, etc...
❓
examples/**/train_matryoshka.pyI would like to congratulate you for this great work and thank you for open-sourcing the codebase with MIT license @MultiPath, @Shuangfei, @dreasysnail, Josh Susskind, @ndjaitly, @luke-carlson!
I believe/anticipate that this kind of representation learning will become popular, that acceleration improvements from contemporary diffusion modeling will be adapted to this model, and that training will be democratized without the need for large resources in the future.
@sayakpaul @pcuenca @a-r-r-o-w