-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
- AutoencoderKL ends up using L1 reconstruction error instead of L2 reconstruction error during training, which does not coincide with classical VAEs theoretically - data likelihood conditioned on latents
$p(x | z)$ is Gaussian so taking the log of Gaussian PDF gives L2 reconstruction error, up to a scaling factor. (https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L48) - Same question for VQ-GAN, although I can see that it supports both L2 and L1 regimes, with L1 being the default one. https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L103
- Generator loss for both autoencoders is computed as
-torch.mean(logits_fake). (https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L123), https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L71). Correct me if I'm wrong, but I think this corresponds to generator loss under WGAN framework, but discriminator loss only supportsnon-saturatingvanilla discriminator loss and hinge discriminator loss. https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/contperceptual.py#L27), https://github.com/CompVis/latent-diffusion/blob/main/ldm/modules/losses/vqperceptual.py#L73, https://github.com/CompVis/taming-transformers/blob/master/taming/modules/losses/vqperceptual.py#L20
RunkunL
Metadata
Metadata
Assignees
Labels
No labels