-
Couldn't load subscription status.
- Fork 6.5k
[Modular] Qwen #12220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Modular] Qwen #12220
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| _skip_proc_output_fn_Attention_WanAttnProcessor2_0 = _skip_attention___ret___hidden_states | ||
| # not sure what this is yet. | ||
| _skip_proc_output_fn_Attention_FluxAttnProcessor = _skip_attention___ret___hidden_states | ||
| _skip_proc_output_fn_Attention_QwenDoubleStreamAttnProcessor2_0 = _skip_attention___ret___hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding. This one is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for guiders/hooks
| return image | ||
|
|
||
|
|
||
| class InpaintProcessor(ConfigMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
(not for this PR, we could attempt to have an example of the processor for an inpaint pipeline)
| ComponentSpec( | ||
| "guider", | ||
| ClassifierFreeGuidance, | ||
| config=FrozenDict({"guidance_scale": 4.0}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the QwenImage pipeline, guidance_scale is akin to the one we have in Flux. However, I think we want to enable CFG with this which is done through true_cfg_scale. Should this be taken into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good questions,
the true_cfg_scale in flux/qwen is actually just guidance_scale in every other pipeline - it is part of guider and should be set in guider
we had to use a different name (true_cfg_scale) for flux because guidance_scale was already taken to use as an input for distilled model. I think it would have been a lot better if we had gave the distilled guidance a different name so that we can keep the definition of guidance_scale consistent across all pipelines
I'd like to fix it here in modular. IMO It won't confuse user too much because they won't be able to use guidance_scale or true_cfg_scale during runtime in modular as it is, so they will have to take some time to figure out how to use guidance properly and we will have chance to explain.
cc @DN6 @asomoza too, let me know if you have any thoughts around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would have been a lot better if we had gave the distilled guidance a different name so that we can keep the definition of guidance_scale consistent across all pipelines
I like this point a lot! However, we have guidance_scale in Flux (without the use of the Guider component):
| InputParam("guidance_scale", default=3.5), |
Maybe we could change that to something better suited (something like distilled_guidance_scale). This way, we can keep the meaning of guidance_scale consistent across the pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree, let's keep the guidance_scale consistent and use a different one for the distilled models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the proposal is that guidance scale would always imply CFG guidance scale?
I would argue that keeping guidance_scale for all guidance methods makes sense since it implies how large of a step you want take in the guidance direction.
Alternatively we could introduce the concept of a DistilledGuidance guider which is effectively a no-op and it makes it more explicit about exactly what's happening with latents rather than having to introduce new scale parameters, internal checks for negative embeds or checks like self._is_cfg_enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Just for clarity, guidance_scale here would mean what true_cfg_scale means in the QwenImage pipelines, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we don't require passing a negative prompt to use it in modular
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few notes for understanding.
I see that the default negative prompt we're using in modular is "":
| negative_prompt = block_state.negative_prompt or "" |
However, Qwen usually does " ". So, in case we don't require the user to not pass e negative prompt to enable CFG, maybe we could use " " instead of "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
|
Will there be any features added soon for training Qwen models with HF libraries? What are the major barriers right now to making this happen? |
|
This is not the right PR to discuss Qwen training. You can check out https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_qwen.md as well as https://github.com/ostris/ai-toolkit for training with HF libs. |
src/diffusers/image_processor.py
Outdated
| else: | ||
| raise ValueError(f"Unsupported image type: {type(image)}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| else: | |
| raise ValueError(f"Unsupported image type: {type(image)}") |
remove this for now, will update in a seperate PR since it requires changes into the regular qwen/flux pipelines
Qwen-Image
Test Script for Qwen-Image Auto Pipeline
QwenImage Edit
Test script for QwenImage-Edit in Modular
How to Use
the shorter version, check the test scripts above for complete, runnable examples
to load from standard repo
add controlnet (we currently only have controlnet for Qwen-Image)
update guider
change
guidance_scaleuse a different guidance method
to run inference
You can use same pipeline to run all tasks we support, the code is pretty much same as in regular pipelines
add controlnet to text2image, img2img, inpaint, just pass
control_imagealong with any other controlnet related arguments