Skip to content

Commit a179cce

Browse files
committed
rename to task
1 parent 5851329 commit a179cce

File tree

4 files changed

+5
-5
lines changed

4 files changed

+5
-5
lines changed

torchtitan/experiments/rl/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This directory contains code for RL training using TorchTitan model definitions
66
The integration consists of the following components:
77

88
1. **vLLM Model Wrapper** (`models/vllm_wrapper.py`): Adapts TorchTitan models for vLLM's inference engine
9-
2. **RL Training Loop** (`simple_grpo_sum_digits.py`): GRPO-based RL training with Monarch actors
9+
2. **RL Training Loop** (`tasks/sum_digits/simple_grpo.py`): GRPO-based RL training with Monarch actors
1010
3. **Inference Script** (`inference_example.py`): Standalone inference using the vLLM engine
1111

1212

@@ -57,7 +57,7 @@ torchrun --nproc_per_node=2 torchtitan/experiments/rl/inference_example.py
5757

5858
6. Run simple GRPO RL loop to learn sum digits task
5959
```bash
60-
python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b
60+
python torchtitan/experiments/rl/tasks/sum_digits/simple_grpo.py --module rl --config rl_grpo_qwen3_0_6b
6161
```
6262

6363
**NOTE:** If you downloaded your HF model to a different path than the one in step 4, specify it in your command with `--hf_assets_path=<path_to_model_checkpoint>`.

torchtitan/experiments/rl/config_registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
VLLMGenerator,
2121
)
2222
from torchtitan.experiments.rl.actors.trainer import PolicyTrainer
23-
from torchtitan.experiments.rl.simple_grpo_sum_digits import RLTrainer
23+
from torchtitan.experiments.rl.tasks.sum_digits.simple_grpo import RLTrainer
2424
from torchtitan.models.qwen3 import model_registry
2525

2626

torchtitan/experiments/rl/simple_grpo_sum_digits.py renamed to torchtitan/experiments/rl/tasks/sum_digits/simple_grpo.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
The architecture mirrors monarch's grpo_actor.py but adapted for vLLM rollouts + TorchTitan training.
1818
1919
Command to run:
20-
python3 torchtitan/experiments/rl/simple_grpo_sum_digits.py \
20+
python3 torchtitan.experiments.rl.tasks.sum_digits/simple_grpo.py \
2121
--module rl --config rl_grpo_qwen3_0_6b \
2222
--hf_assets_path=<path_to_model_checkpoint>
2323
"""
@@ -40,7 +40,7 @@
4040
from torchtitan.experiments.rl.actors.generator import VLLMGenerator
4141
from torchtitan.experiments.rl.actors.grader import Grader
4242
from torchtitan.experiments.rl.actors.trainer import PolicyTrainer
43-
from torchtitan.experiments.rl.sum_digits import extract_answer, SumDigitsTask
43+
from torchtitan.experiments.rl.tasks.sum_digits.task import extract_answer, SumDigitsTask
4444
from torchtitan.experiments.rl.types import Episode
4545
from torchtitan.protocols.model_spec import ModelSpec
4646

File renamed without changes.

0 commit comments

Comments
 (0)