You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had read the code of class AdaLayerNormZero, it seems there are two ways to use this module on timestep_embedding, One way is the emb will be a value from outside this module, Another way, is to set the num_embeddings initial param to make this AdaLayerNormZero module to keep its time_embedding layer weights.
The first way will save time_embedding weights (in the self.emb=CombinedTimestepLableEmbeddings), to let all DiT AdaLayerNormZero blocks share a single set of time_embedding weights; While the second way makes each AdaLayerNormZero keep independent time_embedding weights.
The second way is more costly, but will it make better performance at the whole model?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I had read the code of
class AdaLayerNormZero, it seems there are two ways to use this module on timestep_embedding, One way is the emb will be a value from outside this module, Another way, is to set thenum_embeddingsinitial param to make thisAdaLayerNormZeromodule to keep its time_embedding layer weights.The first way will save time_embedding weights (in the
self.emb=CombinedTimestepLableEmbeddings), to let all DiT AdaLayerNormZero blocks share a single set of time_embedding weights; While the second way makes each AdaLayerNormZero keep independent time_embedding weights.The second way is more costly, but will it make better performance at the whole model?
Beta Was this translation helpful? Give feedback.
All reactions