Regarding textual inversion implementation in webui #6018

AdjointOperator · 2022-12-25T13:53:40Z

AdjointOperator
Dec 25, 2022

In the original paper on textual inversion, it is proposed to train the word2vec embedding of a placeholder token, then pass everything to the text transformer in CLIP to generate the unet's cond embedding.

In the webui's implementation, it seems that the embedding vector is directly placed on the cond embedding outputted by the text transformer (the code is refactored and I cannot find the current matching part anymore.). As the webui supports arbitrary numbers of embeddings to be used at the same time with the same placeholder token "*", I assume the underlying logic remains the same.

I would like to check if my understanding here is correct. And if yes, if anyone has made any performance comparison on this, and whether there is any intention to implement the original version as described by the paper.

Reference:
original paper: https://arxiv.org/abs/2208.01618
webui's impl (outdated): https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/textual_inversion/textual_inversion.py#L325 I cannot trace back how shared.sd_model.cond_stage_model behaves in the current version at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regarding textual inversion implementation in webui #6018

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Regarding textual inversion implementation in webui #6018

Uh oh!

AdjointOperator Dec 25, 2022

Replies: 0 comments

AdjointOperator
Dec 25, 2022