You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
init_input is actually the prompt with the superclass. this will be hard to do in a modular fashion, may have to directly modify stable diffusion so that text encoder receives the two prompts and pass them directly to all the cross attention modules
asserttext_enc.shape[-2] ==self.text_seq_len, f'CLIP text sequence length is set to be {self.text_seq_len}, but received text encoding with length {text_enc.shape[-2]}'
superclass_text_enc=rearrange(superclass_text_enc, 'b 1 d -> b d')
119
93
120
-
outputs=torch.where(
121
-
rearrange(initted, 'b -> b 1 1'),
122
-
outputs,
123
-
einsum('o i, b n i -> b n o', weights, text_enc)
124
-
)
94
+
# take care of initializing with superclass prompt
95
+
# for key-locking - this assumes stable diffusion was modified so text encoder takes in a prompt with both the <concept> as well as <superclass> - it seems this also has the limitation that <superclass> must be one token
125
96
126
-
# update using exponential moving average
97
+
text_enc_with_superclass_output=einsum('b n i, o i -> b n o', text_enc_with_superclass, weights)
0 commit comments