fix(101): align KL divergence calculation with GRPO paper and fix test #576

cyk1337 · 2025-11-28T15:41:45Z

Problem: The current implementation in question 101 computes a standard KL divergence, which does not match the mathematical formulation defined in the original GRPO paper. Also, the expected outputs did not match solution results on the website.
Solution:
1. Updated the grpo_objective function to implement the calculation as described in the GRPO paper (https://arxiv.org/pdf/2402.03300).
2. Fixed the corresponding test cases to match the corrected implementation.
Impact: This fix ensures the correctness of the GRPO algorithm's core component.

cyk1337 · 2025-12-01T07:57:24Z

fix(101): fix the computation of kl divergence in GRPO

db888da

cyk1337 changed the title ~~fix: align KL divergence calculation with GRPO paper and fix test~~ fix(101): align KL divergence calculation with GRPO paper and fix test Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(101): align KL divergence calculation with GRPO paper and fix test #576

fix(101): align KL divergence calculation with GRPO paper and fix test #576

Uh oh!

cyk1337 commented Nov 28, 2025 •

edited

Loading

Uh oh!

cyk1337 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(101): align KL divergence calculation with GRPO paper and fix test #576

Are you sure you want to change the base?

fix(101): align KL divergence calculation with GRPO paper and fix test #576

Uh oh!

Conversation

cyk1337 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyk1337 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyk1337 commented Nov 28, 2025 •

edited

Loading