Skip to content

Conversation

@cyk1337
Copy link

@cyk1337 cyk1337 commented Nov 28, 2025

  • Problem: The current implementation in question 101 computes a standard KL divergence, which does not match the mathematical formulation defined in the original GRPO paper. Also, the expected outputs did not match solution results on the website.
  • Solution:
    1. Updated the grpo_objective function to implement the calculation as described in the GRPO paper (https://arxiv.org/pdf/2402.03300).
    2. Fixed the corresponding test cases to match the corrected implementation.
  • Impact: This fix ensures the correctness of the GRPO algorithm's core component.

@cyk1337 cyk1337 changed the title fix: align KL divergence calculation with GRPO paper and fix test fix(101): align KL divergence calculation with GRPO paper and fix test Nov 28, 2025
@cyk1337
Copy link
Author

cyk1337 commented Dec 1, 2025

@moe18 @Open-Deep-ML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant