-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
@nikg4 @taenin what other functions should we put into the GrpoRewards struct? other rewards functions? Here is a first cut from sonnet
For a
grpo_rewardspackage written in Rust and wrapped for Python, I'd suggest focusing on these key functions:
- Reward Calculation Functions:
calculate_reward(actions, state)- Core function to compute rewards based on agent actions and environment statediscount_rewards(rewards, gamma)- Apply temporal discounting to reward sequencesnormalize_rewards(rewards)- Standardize rewards for stable training
Implement discount_rewards, normalize_rewards
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request