Adds TitanRefModel in place of HF based Reference Model #94

Jack-Khuu · 2025-08-29T00:41:33Z

Replaces the existing HF based RefModel with a torchtitan backed TitanRefModel

Note: This PR persists the old implementation using AutoModelForCausalLM until GRPO is ready for full migration to torchtitan

python apps/grpo/main.py

Example Output

Generated 60 rollouts w/ average reward 0.0
Generated 70 rollouts w/ average reward 0.1
Generated 80 rollouts w/ average reward 0.1
Completed 20 training steps
Latest loss: 75.34173269197345
Generated 90 rollouts w/ average reward 0.1
Generated 100 rollouts w/ average reward 0.0
Generated 110 rollouts w/ average reward 0.1
Generated 120 rollouts w/ average reward 0.1
Completed 30 training steps
Latest loss: 32.164904564619064
Generated 130 rollouts w/ average reward 0.0
Generated 140 rollouts w/ average reward 0.1
Generated 150 rollouts w/ average reward 0.1
Generated 160 rollouts w/ average reward 0.1
Completed 40 training steps
Latest loss: 18.617477774620056
Generated 170 rollouts w/ average reward 0.1
Generated 180 rollouts w/ average reward 0.1
Generated 190 rollouts w/ average reward 0.1
Generated 200 rollouts w/ average reward 0.1
Completed 50 training steps
Latest loss: 9.065430700778961

src/forge/actors/reference_actor.py

pbontrager · 2025-08-30T23:29:39Z

src/forge/actors/reference_actor.py

+        self.engine = ForgeEngine(ForgeJobConfig(**engine_config))
+
+    @endpoint
+    async def forward(self, token_ids: list[int]) -> torch.Tensor:


We can keep things bs=1 for now, but we're going to need to add our own batching solution where we take n requests off the queue at a time and process them as a batch.

The batching itself is an easy add using the queue in ReferenceActor (the one in the PR that's unused) or we can surface it as a common API in Replica (Just processes_single_request atm)

apps/grpo/main.py

src/forge/actors/reference_actor.py

Jack-Khuu · 2025-09-01T20:26:21Z

src/forge/actors/policy.py

                if request_output.finished:
                    _, fut = self.requests.pop(request_output.request_id)
-                    fut.set_result(request_output.outputs)
+                    fut.set_result(request_output)


Adopted from #97

Why this instead of raw outputs?

Pragmatically: Less merge conflict with Philip's PR

I don't have strong preference, but it does make the output self contained which is nice when we need to pass the results around

joecummings · 2025-09-01T21:02:11Z

src/forge/actors/__init__.py

        from .replay_buffer import ReplayBuffer

        return ReplayBuffer
+    elif name == "TitanRefModel":


joecummings · 2025-09-01T21:02:47Z

src/forge/actors/policy.py

                if request_output.finished:
                    _, fut = self.requests.pop(request_output.request_id)
-                    fut.set_result(request_output.outputs)
+                    fut.set_result(request_output)


Why this instead of raw outputs?

pbontrager

Post land review

pbontrager · 2025-09-02T01:58:41Z

src/forge/actors/reference_actor.py

+
+
+@dataclass
+class TitanRefModel(ForgeActor):


On naming: I would just call this ReferenceModel, we don't use Titan anywhere else and we havn't generalized it yet

pbontrager · 2025-09-02T02:00:43Z

src/forge/actors/reference_actor.py

+
+    # Refer to titan JobConfig for enabling more ForgeEngine configuration
+    model: Model = field(default_factory=Model)
+    parallelism: Parallelism = field(default_factory=Parallelism)


I feel like we would need Checkpoint and Compile too. Or is checkpoint not needed for loading?

Not needed by default, Checkpointing is used when we want to write out or load from a checkpoint

Former doesn't happen with the reference model, the latter does unlock custom ref models (or using checkpoints as ref models)

pbontrager · 2025-09-02T02:40:46Z

src/forge/actors/reference_actor.py

+        if parallel_dims.pp_enabled:
+            raise NotImplementedError("PP not implemented yet")
+        else:
+            # (jackkhuu) Not sure if either context are needed for inference here


Can you separate out the request call from forward? Also look at the updated RefActor in main to make sure they're in sync

pbontrager · 2025-09-02T02:41:23Z

src/forge/actors/reference_actor.py

+        return sequence_log_probs
+
+
+"""


Can we comment out everything below this?

pbontrager · 2025-09-02T02:43:30Z

src/forge/actors/reference_actor.py

+        )  # Remove batch dimension for single response
+
+
+def compute_sequence_logprobs(


You should update this to match the one from main. This should probably go in the trainer.py file

This was copy paste from main (at the time Joe's)

compute_logprobs is the file is the implementation from #97

Jack-Khuu · 2025-09-02T23:38:50Z

Follow up changes in #118

) * Wrapping RefModel * Debugging Cuda issue * Still debugging * Commit prior to cleanup incase rebase gets rough * Initial clean up * More cleaning before mega rebase

Wrapping RefModel

c3f2c8c

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 29, 2025

Jack-Khuu requested a review from pbontrager August 29, 2025 00:44

pbontrager reviewed Aug 29, 2025

View reviewed changes

src/forge/actors/reference_actor.py Outdated Show resolved Hide resolved

Jack-Khuu added 3 commits August 29, 2025 09:38

Merge remote-tracking branch 'origin/main' into ref-actor

4ca0685

Merge remote-tracking branch 'origin/main' into ref-actor

3b7ee6d

Debugging Cuda issue

41bdd93

pbontrager reviewed Aug 30, 2025

View reviewed changes

src/forge/actors/reference_actor.py Outdated Show resolved Hide resolved

pbontrager reviewed Aug 30, 2025

View reviewed changes

Still debugging

7621fe4

Jack-Khuu commented Aug 31, 2025

View reviewed changes

apps/grpo/main.py Outdated Show resolved Hide resolved

src/forge/actors/reference_actor.py Outdated Show resolved Hide resolved

Jack-Khuu added 3 commits September 1, 2025 12:57

Commit prior to cleanup incase rebase gets rough

29e74aa

Initial clean up

135deaf

More cleaning before mega rebase

ff7c120

Jack-Khuu changed the title ~~[WIP] Creating ReferenceActor~~ [WIP] Adds TitanRefModel in place of HF based Reference Model Sep 1, 2025

Merge remote-tracking branch 'origin/main' into ref-actor

a33030d

Jack-Khuu commented Sep 1, 2025

View reviewed changes

Jack-Khuu changed the title ~~[WIP] Adds TitanRefModel in place of HF based Reference Model~~ Adds TitanRefModel in place of HF based Reference Model Sep 1, 2025

Jack-Khuu marked this pull request as ready for review September 1, 2025 20:26

Jack-Khuu requested review from allenwang28, joecummings and pradeepfn September 1, 2025 20:30

joecummings approved these changes Sep 1, 2025

View reviewed changes

Jack-Khuu merged commit 46deb59 into main Sep 1, 2025
4 checks passed

pbontrager reviewed Sep 2, 2025

View reviewed changes

Jack-Khuu mentioned this pull request Sep 2, 2025

[WIP] Various ReferenceModel updates to match main #118

Closed

joecummings deleted the ref-actor branch September 3, 2025 20:09

Jack-Khuu mentioned this pull request Sep 4, 2025

[ez] Fix app/vllm response processing broken by change in policy #123

Merged

		) # Remove batch dimension for single response


		def compute_sequence_logprobs(



		@dataclass
		class TitanRefModel(ForgeActor):

Adds TitanRefModel in place of HF based Reference Model #94

Adds TitanRefModel in place of HF based Reference Model #94

Uh oh!

Conversation

Jack-Khuu commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pbontrager left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jack-Khuu commented Aug 29, 2025 •

edited

Loading