Skip to content

Conversation

DNXie
Copy link
Member

@DNXie DNXie commented Sep 24, 2025

Within all of our applications, everything is currently a Service. For example, in apps/grpo/main, trainer and replay_buffer should be actors.

Summary:

  • Actors: dataloader, trainer, replay_buffer, compute_advantages
  • Services: Policy, ref_model, reward_actor

Changes:

  • Updated apps/grpo, apps/toy_rl (with dcp off)
  • Updated Policy to take use_dcp from config.
  • Dropped apps/rl since it is deprecated.
  • Single actor call call_one instead of choose. The difference is that call_one makes sure the caller is a singleton.

Test

python -m apps.grpo.main --config apps/grpo/qwen3_8b.yaml
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
python -m apps.toy_rl.sumdigits --config apps/toy_rl/sumdigits.yaml

I didn't run

python -m apps.grpo.main --config apps/grpo/qwen3_multinode.yaml

since the config is outdated. Looks like this one is already deprecated.

cc @Ritesh1905

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 24, 2025
@DNXie DNXie changed the title [WIP] Change services to actors except Policy Change services to actors except Policy and drop apps/rl Sep 25, 2025
@DNXie DNXie requested a review from allenwang28 September 25, 2025 19:33
@DNXie DNXie requested a review from Ritesh1905 September 25, 2025 19:52
@casteryh
Copy link
Contributor

Would love some clarification on service vs policy.
Especially why we have this distinction at all.
cc @allenwang28

@allenwang28
Copy link
Contributor

Would love some clarification on service vs policy. Especially why we have this distinction at all.

I am going to assume you mean service vs actor! It's a good question. For context, the main reason we have services in the first place is to exactly handle load balancing and fault tolerance. These are things which Monarch doesn't give you out of the box, but gives you the capabilities to implement. We surely want this for something like vLLM, or if we want the ability to spin up and load balance across multiple execution environments in-band.

We initially had everything just be services for simplicity, but in reality not everything needs to be a service. For replay buffer, trainer, etc., you don't need the routing capabilities. Additionally, the fault tolerance story for those are not as well defined as that of the policy/environments/reference model.

Since the world will also learn about Monarch, I think the layering is clearer this way - Actors are the base capabilities from Monarch, Service is a distinct abstraction that's built on top of it.

procs: 1
num_replicas: 1
with_gpus: false
ref_model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I was wrong - I think we only have trainer and replay buffer be actors, the rest are ok to keep as services

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Why are dataset, compute_advantages, and reward_actor also services? Do they need replicas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If num_replica=1, what's the difference between actor and service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry Danning, Dataset is an actor, compute_advantages is an actor, reward_actor is a service

@DNXie DNXie requested a review from allenwang28 September 26, 2025 04:55
Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm let's just use call_one() wherever appropriate

Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm let's just use call_one() wherever appropriate

import torch
import torch.nn.functional as F
import torchstore as ts
from forge.actors._torchstore_utils import get_param_key
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the typo here. CC @casteryh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@DNXie DNXie merged commit 510a523 into meta-pytorch:main Sep 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants