fix(101): align KL divergence calculation with GRPO paper and fix test#576
Open
cyk1337 wants to merge 1 commit intoOpen-Deep-ML:mainfrom
Open
fix(101): align KL divergence calculation with GRPO paper and fix test#576cyk1337 wants to merge 1 commit intoOpen-Deep-ML:mainfrom
cyk1337 wants to merge 1 commit intoOpen-Deep-ML:mainfrom