Regarding textual inversion implementation in webui #6018
AdjointOperator
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the original paper on textual inversion, it is proposed to train the word2vec embedding of a placeholder token, then pass everything to the text transformer in CLIP to generate the unet's cond embedding.
In the webui's implementation, it seems that the embedding vector is directly placed on the cond embedding outputted by the text transformer (the code is refactored and I cannot find the current matching part anymore.). As the webui supports arbitrary numbers of embeddings to be used at the same time with the same placeholder token "*", I assume the underlying logic remains the same.
I would like to check if my understanding here is correct. And if yes, if anyone has made any performance comparison on this, and whether there is any intention to implement the original version as described by the paper.
Reference:
original paper: https://arxiv.org/abs/2208.01618
webui's impl (outdated): https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/textual_inversion/textual_inversion.py#L325 I cannot trace back how
shared.sd_model.cond_stage_model
behaves in the current version at the moment.Beta Was this translation helpful? Give feedback.
All reactions