Skip to content

Conversation

init27
Copy link
Contributor

@init27 init27 commented Oct 3, 2025

Hi team,

As promised, here is the three part tutorial series covering (written in MD):

  1. RL and Forge concepts (with conceptual training loops)
  2. Forge Internals (With training loop borrowed from GRPO)
  3. Monarch 101: Sharing the overview of how Monarch Internals behave and come together

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sanyam, this is awesome! Had a bunch of comments but I do generally like the flow and where we're at with this


## Enter Forge: RL-Native Architecture

Forge solves these problems by treating each RL component as an **independent, scalable service**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by treating each RL component as an independent, scalable service

@init27 - we have actually made pieces like the Trainer a regular actor, since the recovery semantics are unclear, whereas handling a vLLM generator going down is pretty straightforward. We could have torchft style recovery but it's unclear how well this affects RL numerics (and we just haven't had time to add it yet)

It's also why you see some call_one(), call() (actor APIs) mixed in with route() and fanout() which are service APIs in the real RL training step. It's a logical choice, but hard to tell the story at the 101 level. I'm wondering if this is something we should mention at all, or if you have any ideas for how to simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks yes it was quite hard to get this right, for now I mentioned some high level details and acknowledge we cover them properly in part 2.

Open to any other suggestions

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just a partial review for now, but will come back for the rest later

@init27
Copy link
Contributor Author

init27 commented Oct 3, 2025

Thanks so much Allen! Will wait for Evan's review and then will address all the comments to land this. Thanks for your time both!

@svekars
Copy link
Contributor

svekars commented Oct 6, 2025

Can we move the files under https://github.com/meta-pytorch/forge/tree/main/docs/source/tutorial_sources directory and list them in the https://github.com/meta-pytorch/forge/blob/main/docs/source/tutorials.md toctree?

Also, if you want to make them executable and have a link to a Google Colab. You could convert them into .py files using this template: https://github.com/meta-pytorch/forge/blob/main/docs/source/tutorial_sources/template_tutorial.py

@init27 init27 requested a review from ebsmothers October 12, 2025 18:57
@init27
Copy link
Contributor Author

init27 commented Oct 12, 2025

@allenwang28 @ebsmothers-Sorry for being slow because of Flu, just addressed all comments, I think we are good to merge.

Once we merge, I will work with @svekars to maybe create excalidraw diagrams for docs

@init27 init27 requested a review from felipemello1 October 13, 2025 22:04
Copy link
Contributor

@felipemello1 felipemello1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tutorials look great. Thanks for working on them! Approving to unblock. Please check the comments i left.

@init27
Copy link
Contributor Author

init27 commented Oct 14, 2025

Thanks so much for the detailed review and awesome guidance @allenwang28, @ebsmothers and @felipemello1. I've addressed all comments and merging now.

I'll trouble you for examples soon now :)

@init27 init27 merged commit 6ec973f into main Oct 14, 2025
9 checks passed
allenwang28 added a commit to allenwang28/forge that referenced this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants