Publishing weights in to torchstore from RLTrainer and getting them from policy engine. #138

pradeepfn · 2025-09-08T19:34:54Z

Working integration test that works with Llama8B.

TorchTitan publishes weights to torchstore (TS),
vLLM service consumes weights from TS.

State-dict conversion to HF format happens during weight-save/publish in to TS using sd_adaptor (mostly). Custome param concatanations that are not supported by the current sd_adaptor impl of the model was implemented directly within the weights publish function.

LucasLLC · 2025-09-08T21:13:56Z

@pradeepfn please clarify, does put hang or throw?

tests/integration_tests/test_policy_update.py

pradeepfn · 2025-09-10T16:18:16Z

@pradeepfn please clarify, does put hang or throw?

No torchstore issues.

pbontrager

This looks really great! Glad we can finally have this started. Mostly left questions and a few nits.

src/forge/actors/trainer.py

pbontrager · 2025-09-10T18:36:28Z

src/forge/actors/trainer.py

Why are we starting at 1? Also, we probably want a todo to update this from the checkpoint

Because policy engine starting at 1. Lets keep this fragile contract as it is. The true version has to come from a config or external book-keeping entity.

I don't think we can change this without risking breaking checkpoint expectations from titan side. I'd rather just use a separate variable in the trainer for "checkpoint name" (can be a property that's just current_step + 1 for now). This could also be passed in from the controller which would be better.

src/forge/actors/trainer.py

pbontrager · 2025-09-10T18:42:58Z

src/forge/actors/trainer.py

Where is this coming from? When you call this, does it create the sd right then or did it have to be saved in the train step earlier? Does it return the sd on GPU or CPU? Also does it handle blocking the trainer from updating the weights while it's getting them?

I'm accessing the module state-dict prepped by torch.titan as part of checkpoint save.

This is a in-memory state-dict. ( Tensor/DTensor).

It returns tensors with original storage. Means GPU/UVM backed tensors.

Also does it handle blocking the trainer from updating the weights while it's getting them?

Hmm.. it does not block the trainer. However, ForgeEngine drive the trainer using train_step. Therefore, there is no race-conditions with current code.

There is improvements to be made to this code. In the ideal case;

the state-dict get prepped for weight-exchange and checkpoint save purposes.

Once the initial state-dict prep we can cache the prepped state-dict for later iterations of the training steps for efficiency reasons ( if there is opportunity).

We move all the model weights and optimizer state to torchstore.

Policy engine (only) lookup the model-weights from torchstore

Async checkpointing upload lookups model-weights and optimizer states for uploading in to remote persistent storage.

We don't have all the piece right now. But tapping in to checkpoint state-dict is the right thing to do as the first step.

I guess you're right that it should be mostly safe since we control the update from the controller. But since they're async calls they could be overlapped so we'll have to be careful for now.

tests/integration_tests/test_policy_update.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

allenwang28

awesome!

src/forge/actors/trainer.py

allenwang28 · 2025-09-15T17:48:46Z

src/forge/actors/trainer.py

When would "model" not be in the self.engine.checkpointer.states? In other words, can we update the assertion error to be more informative? Does this fail if the user didn't initialize the trainer properly/what do they need to do to make it work??

Essentially, this only happens if the checkpoint_manager of torchtitan is not initialized prior to calling push_weights routine. I can update the error message (followup PR) with that.

allenwang28 · 2025-09-15T17:51:29Z

tests/integration_tests/test_policy_update.py

is this new rtol/atol expected? question for @pbontrager

This is never great, but given the bf16/fp16 comments I could see that. This is also an allclose and not a comparison of the mean so we should be safe here. If we can load the hf side with bf16 instead of fp16 we might be able to regain the tighter tolerance.

allenwang28 · 2025-09-15T17:52:20Z

tests/integration_tests/test_policy_update.py

maybe we should hardcode the trainer config here rather than load from apps/rl/llama3_8b.yaml

pbontrager

Approving this with a few nits

pbontrager · 2025-09-15T18:51:20Z

src/forge/actors/trainer.py

I don't think we can change this without risking breaking checkpoint expectations from titan side. I'd rather just use a separate variable in the trainer for "checkpoint name" (can be a property that's just current_step + 1 for now). This could also be passed in from the controller which would be better.

pbontrager · 2025-09-15T19:06:17Z

src/forge/actors/trainer.py

I guess you're right that it should be mostly safe since we control the update from the controller. But since they're async calls they could be overlapped so we'll have to be careful for now.

pbontrager · 2025-09-15T19:09:53Z

tests/integration_tests/test_policy_update.py

This is never great, but given the bf16/fp16 comments I could see that. This is also an allclose and not a comparison of the mean so we should be safe here. If we can load the hf side with bf16 instead of fp16 we might be able to regain the tighter tolerance.

pradeepfn · 2025-09-15T19:13:20Z

FYI: @joecummings is working on merging this diff. This is because of recent API changes that was pushed by him. @joecummings can add more insights. thanks.

Ritesh1905 · 2025-09-15T20:29:26Z

src/forge/actors/trainer.py

Question: are there benfits to doing this at the state dict level and not at the key level where we could parallelize the individual put operation per key?

Right now, the benefit is simplicity.

Eventually I imagine we will want to do this on a per-slice level.

cc @LucasLLC

pbontrager

Stamping. Only note is to remember to update the Policy or the controller to remove old policies from the store once all generators are updated.

Ritesh1905 · 2025-09-15T20:33:03Z

src/forge/actors/trainer.py

Curious: what is the reasoning for doing this at the learner and not at the generator? The trainer just pushes it weights and the generator can based on it's implementation (vLLM, sglang etc.) can modify the sd.

This is an temp thing I did. It will be moved to generator sd loading + it will be moved out of the trainer/generator critical path based on efficiency numbers.

To add to Pradeep's answer, vLLM already handles it's own hf -> vllm mapping. The only reason we've recreated it is so we can add a shaded loading solution which we want to eventually upstream. It will be on the generator side like he said.

pradeepfn added 4 commits September 8, 2025 08:49

initial update

b8632e5

minor update

60ca8c5

Merge branch 'main' into ts_trainer2

d9e1e84

RLTrainer to policy weight transfer

66b969d

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025

LucasLLC reviewed Sep 8, 2025

View reviewed changes

tests/integration_tests/test_policy_update.py Outdated Show resolved Hide resolved

pradeepfn added 4 commits September 9, 2025 07:58

intermediate

bc58196

working code

5f7cf3c

refactor

32ac7f3

test is passing

fe0d924

pradeepfn requested review from joecummings and pbontrager September 10, 2025 16:08

docs

2793033

pradeepfn changed the title ~~[Draft] Publishing weights in to torchstore from RLTrainer and getting them from policy engine.~~ Publishing weights in to torchstore from RLTrainer and getting them from policy engine. Sep 10, 2025

pradeepfn added 2 commits September 10, 2025 09:22

remove force failure

856661f

remove prints

264ccc8

pbontrager reviewed Sep 10, 2025

View reviewed changes

pradeepfn added 3 commits September 10, 2025 13:15

resolving merge conflicts

45eb52d

removing mistaken store ref

0d11d36

put special hf->vllm code in function

dd555e5

pradeepfn requested a review from pbontrager September 10, 2025 22:18

<Replace this line with a title. Use 1 line only, 67 chars or less>

bcc0038

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

allenwang28 reviewed Sep 15, 2025

View reviewed changes

joecummings added 2 commits September 15, 2025 11:20

Update tests, formatting, and way to call push weights

5a062be

No need for an extra file to change

1656d5d

pbontrager approved these changes Sep 15, 2025

View reviewed changes

Update integration test

825bd1c

joecummings added 2 commits September 15, 2025 13:23

Merge remote-tracking branch 'upstream/main' into ts_trainer2

a8b4514

Cleanup

8bc468d

Ritesh1905 reviewed Sep 15, 2025

View reviewed changes

pbontrager approved these changes Sep 15, 2025

View reviewed changes

Ritesh1905 reviewed Sep 15, 2025

View reviewed changes

joecummings merged commit 9aa6cbc into meta-pytorch:main Sep 15, 2025
5 checks passed

joecummings deleted the ts_trainer2 branch September 15, 2025 21:02

Publishing weights in to torchstore from RLTrainer and getting them from policy engine. #138

Publishing weights in to torchstore from RLTrainer and getting them from policy engine. #138

Uh oh!

Conversation

pradeepfn commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucasLLC commented Sep 8, 2025

Uh oh!

Uh oh!

pradeepfn commented Sep 10, 2025

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pradeepfn commented Sep 15, 2025

Uh oh!

Ritesh1905 Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pradeepfn commented Sep 8, 2025 •

edited

Loading

Ritesh1905 Sep 15, 2025 •

edited

Loading