You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file model/model_training/trainer_rl.py, maybe I missed something. Could you tell me how these two loss combined?
I found the loss function "PolyLoss" in the model/model_training/losses.py. Is this the loss function for the "per-token KL penalty from the SFT model" part? If so, I am wondering why there is a CE function combined?