- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.5k
Closed
Description
I've noticed a potential inconsistency in how the VAE-encoded control_image is processed between the training script for ControlNet with Stable Diffusion 3 and the corresponding inference pipeline.
In the inference pipeline (pipeline_stable_diffusion_3_controlnet.py):
The control_image latent is processed by both subtracting the vae_shift_factor and multiplying by the scaling_factor.
diffusers/src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py
Line 1074 in 425a715
| control_image = (control_image - vae_shift_factor) * self.vae.config.scaling_factor | 
However, in the provided training example, the VAE-encoded controlnet_image is only multiplied by the scaling_factor, without subtracting the shift_factor.
| controlnet_image = controlnet_image * vae.config.scaling_factor | 
controlnet_image = (controlnet_image - vae.config.shift_factor) * vae.config.scaling_factorviv92 and tushar-10xConstruction
Metadata
Metadata
Assignees
Labels
No labels