-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Hi,
Thank you for all your effort in creating this great tool! I am trying to finetune a pretrained LlamaGen model on some of my own data. As a sanity check, I try fine-tuning a c2i GPT-B model on a subset of imagenet (the first ten classes). I see a randomly initialized model starts off with a cross-entropy loss ~9, while the pretrained model starts off with a cross-entropy loss of ~7.
I just wanted to confirm that this result is expected (I naively assumed the loss would be lower). And how should I be thinking about the magnitude of training and/or validation cross-entropy loss with regards to expected generation performance? Would it also be possible to share the final training losses for the different sized GPT models on Imagenet train.
Thank you!