Important! Flux Dual Prompting #1182
Replies: 4 comments 5 replies
-
Since forge is easy to develop, this should be extremely easy to write an extension in 10 minutes to just patch See also extension example here: https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file#unetpatcher If it really has some merits maybe someone will write an extension |
Beta Was this translation helpful? Give feedback.
-
This experiment was designed to test the effect of prompting Clip_L and T5xxl with separate prompts vs. with identical prompts. Two prompts are created representing the same scene, formatted appropriately to each of the text encoders: Renders are seeds 1000 to 1003, all settings held constant (Sampler Euler, Scheduler Beta, 23 Steps). Seed 1000 and 1003 are perfect in every way. 1001 is missing the hair ribbon. 1002 has too many fingers.Next, both Clip_L and T5xxl are prompted with full english sentences, using the T5xxl prompt given above. Detailed grading is possible, but the differences are so stark that it is unnecessary. Prompt adherence drops, but more importantly, seeds 1001 and 1003 are now utterly mangled abominations. Conclusion: Giving Clip_L full English sentences will result in at least a 50% drop in quality across general knowledge.Next, both Clip_L and T5xxl are prompted using only comma-separated descriptors. Prompt adherence drops to 25%, and mangled abominations still emerge. Conclusion: Giving T5xxl comma-separated descriptors will cause Flux prompt adherence to fail.Finally, Clip_L and T5xxl are given the same prompt as a concatenation of full English sentences followed by comma-separated descriptors. Prompt adherence reemerges, but only seed 1003 is perfect. But seeds 1000 to 1002 again feature unusably mangled forms.Conclusion: Using a unified prompt for both Clip_L and T5xxl reduces Flux's overall quality by 50% to 75%, while mangling forms. This happens regardless of the format of that prompt. |
Beta Was this translation helpful? Give feedback.
-
Good topic and testing, QuintessentialForms. I was surprised to see that, unlike other multi-text-encoder models (SD3, Hunyuan), Flux doesn't use the full conds from the CLIP, only the pooled. So I'd speculate that the benefit of dual prompts is likely to be less. But probably not zero. All same prompt (from earlier in thread), Euler-simple, 20 steps, flux-dev-bnb-nf4-v2, 512x768 to save my old GPU. first effort extension: https://github.com/DenOfEquity/forgeFlux_dualPrompt |
Beta Was this translation helpful? Give feedback.
-
Thanks for the above extension! Below is a shorter one written by me, may be less maintained and but shows how to do things in a “Forge” way: Just create a folder and put this like import gradio as gr
from modules import scripts
class DifferentClipLForForge(scripts.Script):
def title(self):
return "Different Clip L Prompt"
def show(self, is_img2img):
return scripts.AlwaysVisible
def ui(self, *args, **kwargs):
with gr.Accordion(open=False, label=self.title()):
enabled = gr.Checkbox(label='Enabled', value=False)
prompt = gr.Textbox(label='CLIP L Prompt')
return enabled, prompt
def process(self, p, *script_args, **kwargs):
self.enabled, self.prompt = script_args
if not self.enabled:
return
p.clear_prompt_cache()
if not hasattr(self, 'org_clip_l'):
self.org_clip_l = p.sd_model.text_processing_engine_l
p.sd_model.text_processing_engine_l = lambda x: self.org_clip_l([self.prompt] if self.enabled else x)
return I hope that in the future more people will just have things like this instead of asking me 😹 Remember that we are not Automatic1111 - difficulty to develop is no longer a thing that can stop webui Again, use https://github.com/DenOfEquity/forgeFlux_dualPrompt instead for real use - my codes are not maintained |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
For Flux, how can I prompt separately for Clip-L and T5xxl?
Clip-L expects a comma-separated list of descriptors and fails badly when given full English sentences.
T5xxl expects full English sentences and fails badly with comma-separated lists.
In Comfy I prompt them separately. How do I enable that in Forge?
Clip-L example:
cat, relaxing, windowsill, window, streaming sunlight, rich detailed fur
T5xxl example:
A cat is relaxing on a windowsill. The sunlight streaming through the window shows the rich detail of the cat's fur.
Edit: After testing, this separation turned out to be critical. Flux's general domain knowledge dropped by 50%-75% when Clip-L and T5xxl were given the same prompt, regardless of that prompt's format or contents.
These two images say it all, but please see the full experiment methodology and results posted below.
Can you tell which one tried feeding the same prompt to both text encoders?
Both images used identical seed and render settings. (Again, see full experiment with many more samples and strategies tested below.)
Beta Was this translation helpful? Give feedback.
All reactions