Normalize gates on expert dim before calculating seq_aux_loss#11160
Open
lshpku wants to merge 1 commit intoPaddlePaddle:dsv3_devfrom
Open
Normalize gates on expert dim before calculating seq_aux_loss#11160lshpku wants to merge 1 commit intoPaddlePaddle:dsv3_devfrom
lshpku wants to merge 1 commit intoPaddlePaddle:dsv3_devfrom