The official code of "Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation".
python prepare_nulltext_checkpoint.pypython nulltext_unet.pySee generation_with_nulltext_model.py for details.
The core code of GCFG are as follows:
noise_pred_text = self.unet(
latent_model_input[1:],
t,
encoder_hidden_states=prompt_embeds[2:3],
cross_attention_kwargs=cross_attention_kwargs,
return_dict=False,
)[0]
noise_pred_text_ori = self.unet1(
latent_model_input[1:],
t,
encoder_hidden_states=prompt_embeds[3:4],
cross_attention_kwargs=cross_attention_kwargs,
return_dict=False,
)[0]
noise_pred_uncond = self.unet0(
latent_model_input[:1],
t,
encoder_hidden_states=prompt_embeds[:1],
cross_attention_kwargs=cross_attention_kwargs,
return_dict=False,
)[0]
# perform guidance
if do_classifier_free_guidance:
noise_pred = noise_pred_uncond + \
guidance_scale * (noise_pred_text - noise_pred_uncond) + \
guidance_scale_ori * (noise_pred_text_ori - noise_pred_uncond)where self.unet0 is SD1.5 for unconditional guidance, self.unet is in-domain diffusion model for domain guidance, and self.unet1 is SD1.5 or customized SD for control guidance.
- Updating training codes in UniDiffusion.
- Results on SDXL and SD3.
