-
Notifications
You must be signed in to change notification settings - Fork 1
Support unet model #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unet model #188
Conversation
|
@xadupre I wonder what are linters of this repo? I guess you don't like lintrunner? But it seems you still set dome formatter. Could you tell me the official way to lint them, so I can pass the checks? |
| else None | ||
| ) | ||
| assert type(inputs) is dict, f"Unexpected type for inputs {type(inputs)}" | ||
| assert isinstance(inputs, dict), f"Unexpected type for inputs {type(inputs)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should keep type(inputs) is dict, isinstance(inputs, dict) is True for dict and output classes and I need to distinguish between the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them? Because diffusers use FrozenDict, and it works with isinstance(x, dict), but not is dict. What is the output class, I can rule it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem of checking FrozenDict is that we need to import diffusers, if that's fine with you? I suggest we rule out output class instead?
9172e68 to
c84d3f7
Compare
|
Closed as subfolder parameter should be enough for diffusers models to get their configs. |
Previous to this PR,
text-to-imagemodels did not work because they are usually from diffusers package, which they are not like transformers models. Transformers models only have one configuration, and the whole model is trained together even if it's multi-modal model.However, in diffusers models, they usually have 4 configurations for 4 independent models: text encoder (takes text input), unet (backbone: denoising), vae (Maps between pixel space ↔ latent space (compression)), and scheduler (Controls the denoising process over time steps). Unfortunately, due to the current code design, we can't test a full diffuser model without a big refactor, since the code is written to run LLM. We will need more follow-ups to test a full model.
This PR enables us to test unet model only.