Hybrid Group Relative Policy Optimization (Hybrid GRPO): A Multi-Sample Approach to Reinforcement Learning#275

Open

Soham4001A wants to merge 8 commits intoStable-Baselines-Team:masterfrom

Commits on Jan 29, 2025

init - untested

Soham Sane
authored and
Soham Sane
committed
Reformatted but yet untested - still need to edit test files

Soham Sane
authored and
Soham Sane
committed
Ready for PR (Untested Still

Soham Sane
authored and
Soham Sane
committed
Ready for PR - Tested

Soham Sane
authored and
Soham Sane
committed
Changelog updated

Soham Sane
authored and
Soham Sane
committed

Commits on Jan 30, 2025

Updated GRPO to use environment reward function for sampled rewards

Soham Sane
authored and
Soham Sane
committed

Commits on Mar 30, 2025

Updated Method - GRPO is now in Hybrid Implementation & not standard

Soham Sane
authored and
Soham Sane
committed
Formatting + Commenting

Soham Sane
authored and
Soham Sane
committed