Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/diffusers/pipelines/flux/pipeline_flux.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,7 +691,11 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "mimic" mean it still produces the same result as true CFG? If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.

Maybe it'd be better to make a note of it on the Flux model card?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the true CFG results I think. Maybe "resembles" is a better phrase?

If the effects are the same (despite the implementation), I'm not too sure the end-user will care or notice.

I think docstrings are important and it can be confusing to users if we're not putting the right phrases here.

Maybe it'd be better to make a note of it on the Flux model card?

Here, I disagree. I think just clarifying it at the docstring level is more than sufficient w.r.t the info already available in the model card (for example, we already mention that Dev is a guidance-distlled model).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe lets go with something like this then?

"Guidance-distilled models don't implement true classifier-free guidance and for guidance_scale > 1, it only resembles it."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resembles/mimics is very confusing terminology to me.

doing CFG on a guidance-distilled model with cfg_scale=X and embedded_cfg_scale=Y is effectively/approximately the same as doing CFG on base model with cfg_scale=X*Y based on how the math works out. this can easily be validated by running inference with same seed and making sure product of true and embedded scale is the same value. the results will not be the exact same but will be similar-ish (because distilled model is a noisy approximator of base model outputs)

better not to change explantation imo. if we want to, we can just say it is guidance-distilled, and leave the interested user to google it and find necessary information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I hear both of you. I have had complaints over DMs multiple times regarding this.

How about:

Guidance-distilled models (such as …) don't implement true classifier-free guidance and for guidance_scale > 1, it approximates its effects. Refer to https://arxiv.org/abs/2210.03142 for more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be fine. @a-r-r-o-w good with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. It's a bit weird here though that guidance_scale actually means the embedded guidance scale, whereas we have true_cfg_scale to actually mean guidance_scale 😞

Maybe clarifying this is very important

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to that. @stevhliu how should go about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about? 😅

For guidance_scale:

"Embedded guidance scale is enabled by setting guidance_scale > 1. Higher guidance_scale encourages a model to generate images more aligned with prompt at the expense of lower image quality.

Guidance-distilled models approximates true classifier-free guidance for guidance_scale > 1. Refer to the paper to learn more."

For true_cfg_scale:

"True classifier-free guidance (guidance scale) is enabled when true_cfg_scale > andnegative_prompt` is provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's go with that. Do you mind pushing to this PR directly? I can do it as well.

distilled model, `guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model,
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1`
just mimics it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplicate sentence

Suggested change
`guidance_scale > 1` just mimics it. In case of Flux, which is a guidance- distilled model,
`guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying `guidance_scale > 1`
just mimics it.
`guidance_scale > 1` just mimics it.

num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
Expand Down
4 changes: 3 additions & 1 deletion src/diffusers/pipelines/flux/pipeline_flux_control.py
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,9 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
distilled model, `guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying
`guidance_scale > 1` just mimics it.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
Expand Down
4 changes: 3 additions & 1 deletion src/diffusers/pipelines/flux/pipeline_flux_kontext.py
Original file line number Diff line number Diff line change
Expand Up @@ -799,7 +799,9 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
distilled model, `guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying
`guidance_scale > 1` just mimics it.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1019,7 +1019,9 @@ def __call__(
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a guidance-
distilled model, `guidance_scale > 1` doesn't implement true classifier-free guidance. Specifying
`guidance_scale > 1` just mimics it.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
Expand Down
12 changes: 7 additions & 5 deletions src/diffusers/pipelines/sana/pipeline_sana_sprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -643,11 +643,13 @@ def __call__(
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
passed will be used. Must be in descending order.
guidance_scale (`float`, *optional*, defaults to 4.5):
Guidance scale as defined in [Classifier-Free Diffusion
Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2.
of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting
`guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to
the text `prompt`, usually at the expense of lower image quality.
Guidance scale as defined in [Classifier-Free Diffusion Guidance scale as defined in [Classifier-Free
Diffusion Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of
equation 2. of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by
setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely
linked to the text `prompt`, usually at the expense of lower image quality. In case of Flux, which is a
guidance- distilled model, `guidance_scale > 1` doesn't implement true classifier-free guidance.
Specifying `guidance_scale > 1` just mimics it.
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
height (`int`, *optional*, defaults to self.unet.config.sample_size):
Expand Down
Loading