You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does anyone have any tips on the fastest way to train a TI embedding to reproduce a single image?
What's the smallest number of vectors you've needed?
Can anyone explain what the seed would do in this scenario, or would training have to be on a fixed seed? Or "find the seed" too?
Embeddings could be used anywhere you'd use a CLIP prompt / BLIP caption but a vector embedding rather than inefficient human language?
They also open the possibility of new features:
a) "Extract style from embeddings" to extract the common elements in each embedding, so generate an instant style?
b) "Joint object style training" A modified version of textual inversion training that jointly retrains [unique image embedding] and trains a new [joint style embedding] to reproduce each image to a higher fidelity than in the original training, thus training a meaningful joint style embedding?
I'm just thinking, if optimised, this could be a faster and deterministic approach, as the creation of the original embeddings can be done by overfitting. Or does this already exist?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Firstly, Thank you everyone for the amazing work so far.
My question is inspired by:
https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202
Does anyone have any tips on the fastest way to train a TI embedding to reproduce a single image?
What's the smallest number of vectors you've needed?
Can anyone explain what the seed would do in this scenario, or would training have to be on a fixed seed? Or "find the seed" too?
Embeddings could be used anywhere you'd use a CLIP prompt / BLIP caption but a vector embedding rather than inefficient human language?
They also open the possibility of new features:
a) "Extract style from embeddings" to extract the common elements in each embedding, so generate an instant style?
b) "Joint object style training" A modified version of textual inversion training that jointly retrains [unique image embedding] and trains a new [joint style embedding] to reproduce each image to a higher fidelity than in the original training, thus training a meaningful joint style embedding?
I'm just thinking, if optimised, this could be a faster and deterministic approach, as the creation of the original embeddings can be done by overfitting. Or does this already exist?
Beta Was this translation helpful? Give feedback.
All reactions