Check in prototype notebook #102

allenwang28 · 2025-08-31T01:32:28Z

Checks in a notebook for our initial prototype!

pbontrager

I've skimmed it and would want to take another pass, but at a high level I think it covers a lot of good ground. My one concern is that it leads with a lot of infra concepts that won't be grounded in anything to a reader who's background is in RL. It could be easier to grasp for the first time if you're introduced to a few high level concepts (services, multiprocessing, simple orchestration, data movement, etc) and then tie it into RL before going deeper on each part.

demo.ipynb

pbontrager · 2025-09-02T00:57:59Z

demo.ipynb

+    "\n",
+    "This is the role of \"rollouts\" - creating the dataset used to update our policy. Rather than training on a static dataset, RL dynamically generates training data by having the current policy interact with the environment.\n",
+    "\n",
+    "Let's build a step-by-step synchronous training loop to see how these services work together. The basic RL cycle is:\n",


This loop is too generic I think, it doesn't make it obvious why some random off the shelf services, like found in other libraries couldn't work instead.

Collect Experience: run tasks, tools, and mutli-step workflows

Compute Rewards: run verifiers and judges to calculate rewards

Store Experience: Add the episode to our replay buffer

Sample & Train: Sample a batch and update the policy in parallel

Broadcast Policy: Overlapped policy broadcast for services to update from

Repeat: Continue this cycle to improve the policy

This maybe goes too far into incomplete features, but maybe something in between

service env

439e435

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 31, 2025

allenwang28 added 6 commits August 31, 2025 18:52

park demo

cd29251

create a service without nesting

6c3d838

some fixes

efac94f

Merge branch 'reduce_service' into demo

08f0802

final notebook

9166178

Merge branch 'main' into demo

4af740b

allenwang28 changed the title ~~[not for land] - service env~~ Check in prototype notebook Sep 1, 2025

undo some other commits

0e5618d

allenwang28 marked this pull request as ready for review September 1, 2025 19:09

pbontrager reviewed Sep 2, 2025

View reviewed changes

allenwang28 added 3 commits September 2, 2025 06:37

clean notebook

e0e280e

merge

b8319c5

advantage -> reward

dccdf26

pradeepfn approved these changes Sep 2, 2025

View reviewed changes

LucasLLC approved these changes Sep 2, 2025

View reviewed changes

allenwang28 merged commit d4011ea into meta-pytorch:main Sep 2, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check in prototype notebook #102

Check in prototype notebook #102

Uh oh!

allenwang28 commented Aug 31, 2025 •

edited

Loading

Uh oh!

pbontrager left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbontrager Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Check in prototype notebook #102

Check in prototype notebook #102

Uh oh!

Conversation

allenwang28 commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbontrager Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

allenwang28 commented Aug 31, 2025 •

edited

Loading