Skip to content

Conversation

@titaiwangms
Copy link
Collaborator

Previous to this PR, text-to-image models did not work because they are usually from diffusers package, which they are not like transformers models. Transformers models only have one configuration, and the whole model is trained together even if it's multi-modal model.

However, in diffusers models, they usually have 4 configurations for 4 independent models: text encoder (takes text input), unet (backbone: denoising), vae (Maps between pixel space ↔ latent space (compression)), and scheduler (Controls the denoising process over time steps). Unfortunately, due to the current code design, we can't test a full diffuser model without a big refactor, since the code is written to run LLM. We will need more follow-ups to test a full model.

This PR enables us to test unet model only.

@titaiwangms
Copy link
Collaborator Author

@xadupre I wonder what are linters of this repo? I guess you don't like lintrunner? But it seems you still set dome formatter. Could you tell me the official way to lint them, so I can pass the checks?

else None
)
assert type(inputs) is dict, f"Unexpected type for inputs {type(inputs)}"
assert isinstance(inputs, dict), f"Unexpected type for inputs {type(inputs)}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should keep type(inputs) is dict, isinstance(inputs, dict) is True for dict and output classes and I need to distinguish between the two.

Copy link
Collaborator Author

@titaiwangms titaiwangms Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them? Because diffusers use FrozenDict, and it works with isinstance(x, dict), but not is dict. What is the output class, I can rule it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem of checking FrozenDict is that we need to import diffusers, if that's fine with you? I suggest we rule out output class instead?

@titaiwangms
Copy link
Collaborator Author

Closed as subfolder parameter should be enough for diffusers models to get their configs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants