You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @davidbrowne17 , thank you for spinning this up. Really appreciate the work to allow finetuning CSM.
I have a quick question regarding the behavior of finetuning. In forward_and_loss, it seems that we are predicting for N codebooks' tokens directly using the backbone hidden state, along with a codebook specific head.
However this seems different from the original CSM model behavior where the N-1 (except first codebook) is generated auto-regressively using another decoder transformer, along with the codebook specific head.
Is this difference intentional? I can see the obvious savings on compute and complexity. I'm curious if you have any data point on the performance of this simplified model architecture.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @davidbrowne17 , thank you for spinning this up. Really appreciate the work to allow finetuning CSM.
I have a quick question regarding the behavior of finetuning. In
forward_and_loss, it seems that we are predicting for N codebooks' tokens directly using the backbone hidden state, along with a codebook specific head.However this seems different from the original CSM model behavior where the N-1 (except first codebook) is generated auto-regressively using another decoder transformer, along with the codebook specific head.
Is this difference intentional? I can see the obvious savings on compute and complexity. I'm curious if you have any data point on the performance of this simplified model architecture.
Thank you for all your work!
Beta Was this translation helpful? Give feedback.
All reactions