-
-
Notifications
You must be signed in to change notification settings - Fork 258
BETA-Tuned Timestep Distribution #1225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Does this apply to all models? Only diffusion models are Beta-sampled during inference. Flow matching models are sampled with linear sigmas and often with timestep-shifting ("Flux-shift"). is that correct? did #1124 also only apply to diffusion, not to flow matching? |
It’s a tunable distribution, but it’s specifically intended for diffusion models (SD, SDXL, etc.).
Here's examples:
The issue is that #1124 lacks a theoretical basis (it’s more of a heuristic method) but it functions similarly. Also, while it supports flow matching by accepting sigmas, requiring both betas and sigmas added too much code. |
|
Do we have any results of our own showing this actually works on SD1.5 and SDXL and not on these specific datasets? The paper only covers training at 32x32, 128x128 and 256x256 which are not resolutions either model can do? |
It is a known observation in diffusion papers that the later timesteps are relatively easy for the model compared to others (since most of the image is still noise). |
|
So we havent tried it for any training, at all? |
|
You mean testing? Yes, I tested it in my recent runs (SDXL - 1024) and they went very well. |



This PR implements the timestep distribution proposed in the paper:
Beta-Tuned Timestep Diffusion Model
This method aims to align timestep sampling with the diffusion model's forward pass, resulting in faster convergence and improved training performance. The paper observes that the data distribution changes most significantly during the initial timesteps, rendering standard uniform sampling sub-optimal.
Usage
BETAtimestep distribution.Noising biasto1(corresponds to Beta in the paper; recommended: 1).Noising weightto< 1(corresponds to Alpha in the paper; recommended: 0.8).Note: This is compatible with existing loss weighting strategies (e.g., Min-SNR, Debiased, etc.).