Skip to content

fix(101): fix the computation of kl divergence in GRPO

db888da
Select commit
Loading
Failed to load commit list.
Open

fix(101): align KL divergence calculation with GRPO paper and fix test #576

fix(101): fix the computation of kl divergence in GRPO
db888da
Select commit
Loading
Failed to load commit list.

Workflow runs completed with no jobs