Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions apps/grpo/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,17 +259,17 @@ async def main(cfg: DictConfig):
ref_model,
reward_actor,
) = await asyncio.gather(
DatasetActor.options(**cfg.services.dataset).as_service(**cfg.dataset),
DatasetActor.options(**cfg.actors.dataset).as_actor(**cfg.dataset),
Policy.options(**cfg.services.policy).as_service(**cfg.policy),
RLTrainer.options(**cfg.services.trainer).as_service(
RLTrainer.options(**cfg.actors.trainer).as_actor(
**cfg.trainer, loss=simple_grpo_loss
),
ReplayBuffer.options(**cfg.services.replay_buffer).as_service(
ReplayBuffer.options(**cfg.actors.replay_buffer).as_actor(
**cfg.replay_buffer, collate=collate
),
ComputeAdvantages.options(**cfg.services.compute_advantages).as_service(),
ReferenceModel.options(**cfg.services.ref_model).as_service(**cfg.ref_model),
RewardActor.options(**cfg.services.reward_actor).as_service(
ComputeAdvantages.options(**cfg.actors.compute_advantages).as_actor(),
ReferenceModel.options(**cfg.actors.ref_model).as_actor(**cfg.ref_model),
RewardActor.options(**cfg.actors.reward_actor).as_actor(
reward_functions=[MathReward(), ThinkingReward()]
),
)
Expand Down
14 changes: 5 additions & 9 deletions apps/grpo/qwen3_1_7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -100,31 +100,27 @@ ref_model:

# All resource allocations
services:
dataset:
procs: 1
num_replicas: 1
with_gpus: false
policy:
procs: ${policy.engine_config.tensor_parallel_size}
num_replicas: 1
with_gpus: true

actors:
dataset:
procs: 1
with_gpus: false
trainer:
procs: 1
num_replicas: 1
with_gpus: true
replay_buffer:
procs: 1
num_replicas: 1
with_gpus: false
ref_model:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I was wrong - I think we only have trainer and replay buffer be actors, the rest are ok to keep as services

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Why are dataset, compute_advantages, and reward_actor also services? Do they need replicas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If num_replica=1, what's the difference between actor and service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry Danning, Dataset is an actor, compute_advantages is an actor, reward_actor is a service

procs: 1
num_replicas: 1
with_gpus: true
compute_advantages:
procs: 1
num_replicas: 1
with_gpus: false
reward_actor:
procs: 1
num_replicas: 1
with_gpus: false
14 changes: 5 additions & 9 deletions apps/grpo/qwen3_8b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,31 +101,27 @@ ref_model:

# All resource allocations
services:
dataset:
procs: 1
num_replicas: 1
with_gpus: false
policy:
procs: ${policy.engine_config.tensor_parallel_size}
num_replicas: 1
with_gpus: true

actors:
dataset:
procs: 1
with_gpus: false
trainer:
procs: 2
num_replicas: 1
with_gpus: true
replay_buffer:
procs: 1
num_replicas: 1
with_gpus: false
ref_model:
procs: 1
num_replicas: 1
with_gpus: true
compute_advantages:
procs: 1
num_replicas: 1
with_gpus: false
reward_actor:
procs: 1
num_replicas: 1
with_gpus: false
14 changes: 5 additions & 9 deletions apps/grpo/qwen3_multinode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,33 +46,29 @@ ref_model:
model_name: ${model}

services:
dataset:
procs: 1
num_replicas: 1
with_gpus: false
policy:
procs: 1
hosts: 1
num_replicas: 1
with_gpus: true

actors:
dataset:
procs: 1
with_gpus: false
trainer:
procs: 1
hosts: 1
num_replicas: 1
with_gpus: true
replay_buffer:
procs: 1
num_replicas: 1
with_gpus: false
compute_advantages:
procs: 1
num_replicas: 1
with_gpus: false
ref_model:
procs: 1
num_replicas: 1
with_gpus: true
reward_actor:
procs: 1
num_replicas: 1
with_gpus: false
5 changes: 0 additions & 5 deletions apps/rl/__init__.py

This file was deleted.

62 changes: 0 additions & 62 deletions apps/rl/llama3_8b.yaml

This file was deleted.

182 changes: 0 additions & 182 deletions apps/rl/main.py

This file was deleted.

Loading
Loading