RLTrainer #40

allenwang28 · 2025-08-07T15:59:50Z

This implements the RLTrainer actor following the pattern from apps/sft_v2. Currently this PR works with setup and implements train_step. I'm putting up this PR to land the trainer actor and unblock weight sync work. This is still missing RLLoss which is blocked on a update in ForgeEngine. After this lands and the ForgeEngine update, this will be integrated into apps/grpo (pending #58) and apps/rl will be removed.

Jack-Khuu · 2025-08-27T16:24:54Z

src/forge/controller/proc_mesh.py

+        if env:
+
+            def setup():  # noqa: FB811
+                for k, v in env.items():
+                    os.environ[k] = v
+
+            p = await ProcMesh.from_alloc(alloc, setup=setup)
+        else:
+            p = await ProcMesh.from_alloc(alloc)


nit: No strong pref

Suggested change

if env:

def setup(): # noqa: FB811

for k, v in env.items():

os.environ[k] = v

p = await ProcMesh.from_alloc(alloc, setup=setup)

else:

p = await ProcMesh.from_alloc(alloc)

def setup(): # noqa: FB811

if env:

for k, v in env.items():

os.environ[k] = v

p = await ProcMesh.from_alloc(alloc, setup=setup)

I wanted to pass None instead of a noop function in case it's faster, and the linter didn't let me conditionally define setup as None or a function.

Jack-Khuu · 2025-08-27T16:27:55Z

src/forge/actors/policy.py

-from forge.controller import spawn_actors
-from forge.controller.service import ServiceConfig
-from forge.controller.spawn import spawn_service


I blame my linter for missing this

Thanks

Jack-Khuu · 2025-08-27T16:46:07Z

apps/rl/main.py

+@parse
+def recipe_main(cfg: DictConfig) -> None:
+    asyncio.run(run(cfg))


Is the indirection to run because parse doesn't play well with async? vs @parse on run

We should probably not have run run but I left it this way for the reason you said

Jack-Khuu · 2025-08-27T16:47:46Z

apps/rl/main.py

+
+async def run(cfg: DictConfig):
+    trainer, buffer = await asyncio.gather(
+        spawn_actors(


Doesn't matter for the sake of the apps, but why not spawn_services?

I had this before spawn_services and the next step is integration with grpo so it wasn't worth updating

Jack-Khuu · 2025-08-27T16:51:31Z

src/forge/actors/trainer.py

+    tokenizer: Tokenizer
+    train_dataloader: Dataloader
+    # val_dataloader: Dataloader
+    metric_logger: MetricLogger


Suggested change

metric_logger: MetricLogger

metric_logger: MetricLogger | None

Jack-Khuu · 2025-08-27T17:04:06Z

src/forge/actors/trainer.py

+        return loss
+
+    @endpoint
+    def train_step(self, batch) -> None:


Not in this PR since things are still moving, but let's type batch

joecummings · 2025-08-27T17:22:05Z

src/forge/actors/trainer.py

+        for k, v in batch.items():
+            if isinstance(v, torch.Tensor):
+                batch[k] = v.to("cuda")  # TODO: hardcoded for now
+        self.train_step(batch)


Maybe I'm misreading my indents, but train_step calls train_step?

Forgot to remove that line

joecummings · 2025-08-27T17:22:22Z

src/forge/actors/trainer.py

+        )
+
+    @endpoint
+    def push_weights(self) -> None:


nit: slightly more descriptive name here.

joecummings · 2025-08-27T17:23:58Z

src/forge/controller/proc_mesh.py



-async def get_proc_mesh(process_config: ProcessConfig) -> ProcMesh:
+async def get_proc_mesh(process_config: ProcessConfig, set_address=False) -> ProcMesh:


This does the automatic GPU alloc right? Is there a reference from Ray on how they do this?

This is just adding distributed variables that need to be shared across the proc_mesh. This should also be handled by the host_mesh setup in the future.

joecummings · 2025-08-27T17:24:28Z

src/forge/actors/trainer.py

+        self._init_dist()
+        super().__init__(job_config)
+
+    def _init_dist(self):


Should probably go inside Monarch or a general helper function at some point

This is brought over from sft_v2. There is a monarch helper that landed yesterday, but in general we're waiting for hostmesh support so we can handle this more robustly.

ebsmothers

A bunch of small comments, stamping to unblock

ebsmothers · 2025-08-27T17:15:14Z

apps/rl/llama3_8b.yaml

+    num_hosts: 1
+    num_procs: 1
+
+# policy:


Remove from here on down?

ebsmothers · 2025-08-27T17:15:49Z

apps/rl/main.py

+"""A working example showcasing a practical example of forge with RL.
+
+Run this with:
+    HF_HUB_DISABLE_XET=1 python -m apps.rl.main --config apps/rl/llama3_8b.yaml


Why do we need HF_HUB_DISABLE_XET here, we aren't downloading anything?

ebsmothers · 2025-08-27T17:15:55Z

apps/rl/main.py

+import logging
+import sys
+
+# from forge.actors import Policy, PolicyRouter, RLTrainer, ReplayBuffer


ebsmothers · 2025-08-27T17:18:52Z

src/forge/actors/__init__.py


-__all__ = ["Collector"]
+
+def __getattr__(name):


You trying to sneak this in or what? 👀

ebsmothers · 2025-08-27T17:19:15Z

pyproject.toml

-    "torch==2.9.0.dev20250826",
-    "monarch-no-torch==0.1.0.dev20250826",
-]
+# cpu = [


What's this about?

Removing, I forgot it there. It's broken with that addition but it needs to be fixed separate

ebsmothers · 2025-08-27T17:28:02Z

src/forge/actors/trainer.py

+
+        # apply context parallelism if cp is enabled
+        # ensure CP handles the separate freqs_cis buffer for each pp stage
+        inputs = input_dict["tokens"]


Fyi for any models with flex you will need to add the changes from #76

ebsmothers · 2025-08-27T17:28:26Z

src/forge/actors/trainer.py

+        # self.pbar.set_description(f"{self.current_step}|Loss: {loss}")
+
+        self.optimizers.step()
+        self.optimizers.zero_grad()


ebsmothers · 2025-08-27T17:29:12Z

src/forge/actors/trainer.py

+    @endpoint
+    def train_step(self, batch) -> None:
+        # Move tensors to the appropriate device
+        for k, v in batch.items():


nit: we have your utility batch_to_device now

ebsmothers · 2025-08-27T17:29:50Z

src/forge/actors/trainer.py

+        )
+
+    @endpoint
+    def push_weights(self) -> None:


nit: add a comment explaining what this will be used for

ebsmothers · 2025-08-27T17:35:44Z

src/forge/controller/proc_mesh.py

+    actor_cls: ForgeActor,
+    cfg: DictConfig,
+    processes: ProcessConfig,
+    set_address: bool = False,


Why/when do we actually need this? (I saw you use it in the spawn for ForgeSFTRecipe, but why there and nowhere else?)

Changes will propagate, but you need a proper way to set a unique communication address per proc_mesh. This isn't at the actor level. This will be used for any mesh that uses distributed apis, including policy

Allen Wang added 5 commits August 1, 2025 08:13

add forge actor and mast scheduler

a06b0ec

fix merge

d1eab28

update policy to use ForgeActor

8e5c3d3

park changes for rl

ba424b7

pause

9e41f2b

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 7, 2025

Merge branch 'main' into rl_main

33a5cb3

pbontrager self-assigned this Aug 8, 2025

pbontrager added 5 commits August 14, 2025 13:17

Merge branch 'main' into rl_main

f40d9bb

some rough updates

49e5996

Merge branch 'main' into rl_main

2b35e29

wip setup runs

3678d49

removed preproc

f5c9fa0

pbontrager changed the title ~~[wip] working towards RL prototype...~~ RLTrainer Aug 26, 2025

pbontrager added 2 commits August 26, 2025 13:10

updated train_step

54d776c

Merge branch 'main' into rl_main

cd83ac0

pbontrager marked this pull request as ready for review August 26, 2025 20:36

pbontrager added 2 commits August 27, 2025 08:30

Merge branch 'main' into rl_main

1105456

fixed unit tests

28878c6

Jack-Khuu reviewed Aug 27, 2025

View reviewed changes

joecummings reviewed Aug 27, 2025

View reviewed changes

ebsmothers approved these changes Aug 27, 2025

View reviewed changes

updated from comments

3ec6f94

pbontrager merged commit 13fd9f1 into meta-pytorch:main Aug 27, 2025
4 checks passed

	metric_logger: MetricLogger
	metric_logger: MetricLogger \| None



		async def get_proc_mesh(process_config: ProcessConfig) -> ProcMesh:
		async def get_proc_mesh(process_config: ProcessConfig, set_address=False) -> ProcMesh:

RLTrainer #40

RLTrainer #40

Uh oh!

Conversation

allenwang28 commented Aug 7, 2025 • edited by pbontrager Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebsmothers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

allenwang28 commented Aug 7, 2025 •

edited by pbontrager

Loading