A question on convergence of DSM

Hey,

I'm currently tracing out the story of diffusion generative models and right now, I'm studying the denoising score matching objective (DSM). I've noticed that your multi-scale approach relies heavily on it (and the original paper is quite old), so I decided that to ask my question here.

I gone through the theory of DSM and got a good grip on how it works and why it works. However, in practice I observe slow convergence (much slower convergence than with ISM) on toy examples. In particular I believe this might be due type of noise distribution selected. While not restricted, it seems everyone goes with a normal distribution since it provides a simple derivative. The derivate being `1/sigma**2 * (orig-perturbed)`. In practive, I've observed that the scale term in front causes the derivative to take values on the order of 1e4 for `sigma=1e-2` and loss jumps around quite heavily. The smaller sigma, the slower the convergence.  The loss never actually decreases, but the resulting gradient field looks comparable to what ISM gives.

Did you observe this in your experiments as well?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on convergence of DSM #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A question on convergence of DSM #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions