Skip to content

Conversation

JenniferWang
Copy link
Contributor

@JenniferWang JenniferWang commented Sep 18, 2025

Before

Policy version is tracked in two places: main controller and Policy Actor (which increments the version number upon update_weights call). This feels fragile and unnecessary.

After

This PR is not to debate the need for a versioning number but rather assuming the concept is valid, only track it in the controller.

Tests

Run on both grpo and toy_rl. But mainly look at progress on the toy_rl

https://www.internalfb.com/phabricator/paste/view/P1954914879

No regression
Before: https://meta.wandb.io/jiyue/sumdigits-training/runs/omqmpkq1?nw=nwuserjiyue
image

After: https://meta.wandb.io/jiyue/sumdigits-training/runs/xv5dduh5?nw=nwuserjiyue
image

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 18, 2025
Copy link
Member

@joecummings joecummings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any concerns - thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@JenniferWang JenniferWang force-pushed the change-policy-versioning-logic branch from b8934b3 to 29a2143 Compare September 19, 2025 14:56
@JenniferWang JenniferWang changed the title [Pending Tests] Update logic to only track policy version in the main controller Update logic to only track policy version in the main controller Sep 19, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this up to be the first method.

@JenniferWang
Copy link
Contributor Author

I'd like to test on #183 as well...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment, could we leave this out? Or at least switch it to be debug only?

@JenniferWang JenniferWang force-pushed the change-policy-versioning-logic branch from eac4e78 to a0ccc7e Compare September 19, 2025 18:45
@JenniferWang JenniferWang merged commit b92b095 into meta-pytorch:main Sep 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants