Skip to content

Conversation

@Koratahiu
Copy link
Contributor

@Koratahiu Koratahiu commented Dec 26, 2025

This PR implements the timestep distribution proposed in the paper:
Beta-Tuned Timestep Diffusion Model

This method aims to align timestep sampling with the diffusion model's forward pass, resulting in faster convergence and improved training performance. The paper observes that the data distribution changes most significantly during the initial timesteps, rendering standard uniform sampling sub-optimal.

image

Usage

  • Select BETA timestep distribution.
  • Set Noising bias to 1 (corresponds to Beta in the paper; recommended: 1).
  • Set Noising weight to < 1 (corresponds to Alpha in the paper; recommended: 0.8).

Note: This is compatible with existing loss weighting strategies (e.g., Min-SNR, Debiased, etc.).

@Koratahiu Koratahiu mentioned this pull request Dec 26, 2025
2 tasks
@dxqb
Copy link
Collaborator

dxqb commented Dec 27, 2025

Does this apply to all models? Only diffusion models are Beta-sampled during inference. Flow matching models are sampled with linear sigmas and often with timestep-shifting ("Flux-shift").
This would mean that using a beta timestep distribution during training is equivalent to using (dynamic) timestep shifting during training for flow matching models, which we already have.

is that correct? did #1124 also only apply to diffusion, not to flow matching?

@Koratahiu
Copy link
Contributor Author

Does this apply to all models? Only diffusion models are Beta-sampled during inference.

It’s a tunable distribution, but it’s specifically intended for diffusion models (SD, SDXL, etc.).
For flow-matching, we need to identify where the data distribution changes most significantly.

Flow matching models are sampled with linear sigmas and often with timestep-shifting ("Flux-shift"). This would mean that using a beta timestep distribution during training is equivalent to using (dynamic) timestep shifting during training for flow matching models, which we already have.

Here's examples:

  1. (08, 1) The paper's J-shaped:
image
  1. (2, 2) This is very similar to Chroma timestep distribution.
image
  1. (1, 1.2) the reverse
image

is that correct? did #1124 also only apply to diffusion, not to flow matching?

The issue is that #1124 lacks a theoretical basis (it’s more of a heuristic method) but it functions similarly. Also, while it supports flow matching by accepting sigmas, requiring both betas and sigmas added too much code.

@O-J1
Copy link
Collaborator

O-J1 commented Jan 2, 2026

Do we have any results of our own showing this actually works on SD1.5 and SDXL and not on these specific datasets? The paper only covers training at 32x32, 128x128 and 256x256 which are not resolutions either model can do?

@Koratahiu
Copy link
Contributor Author

Koratahiu commented Jan 2, 2026

Do we have any results of our own showing this actually works on SD1.5 and SDXL? The paper only covers training at 32x32, 128x128 and 256x256 which are not resolutions either model can do?

It is a known observation in diffusion papers that the later timesteps are relatively easy for the model compared to others (since most of the image is still noise).
While the initial timesteps have near-infinite possibilities and are relatively hard (e.g., the issue mentioned in #1230).
I implemented the method from this paper as it was straightforward to do; it should provide similar benefits to those seen in #1124.

@O-J1
Copy link
Collaborator

O-J1 commented Jan 2, 2026

So we havent tried it for any training, at all?

@Koratahiu
Copy link
Contributor Author

You mean testing? Yes, I tested it in my recent runs (SDXL - 1024) and they went very well.
I haven't done any direct comparisons yet, though I did run some tests using #1124, which was more stable and faster to train (in terms of validation loss) compared to uniform sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants