When I use the vae encoder of StableDiffusionXLPipeline to encode an image of [B, 3, 1024, 1024], the latents output is "Nan" value. Is there any solutions?
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16").to("cuda")
vae = pipe.vae
input_dtype = imgs.dtype
imgs = imgs * 2.0 - 1.0
posterior = vae.encode(imgs.to(weights_dtype)).latent_dist
latents = posterior.sample() * self.vae.config.scaling_factor