rename to task

wwwjn · wwwjn · commit a179cce04846 · 2026-03-18T10:44:22.000-07:00
diff --git a/torchtitan/experiments/rl/README.md b/torchtitan/experiments/rl/README.md
@@ -6,7 +6,7 @@ This directory contains code for RL training using TorchTitan model definitions
 The integration consists of the following components:
 
 1. **vLLM Model Wrapper** (`models/vllm_wrapper.py`): Adapts TorchTitan models for vLLM's inference engine
-2. **RL Training Loop** (`simple_grpo_sum_digits.py`): GRPO-based RL training with Monarch actors
+2. **RL Training Loop** (`tasks/sum_digits/simple_grpo.py`): GRPO-based RL training with Monarch actors
 3. **Inference Script** (`inference_example.py`): Standalone inference using the vLLM engine
 
 
@@ -57,7 +57,7 @@ torchrun --nproc_per_node=2 torchtitan/experiments/rl/inference_example.py
 
 6. Run simple GRPO RL loop to learn sum digits task
 ```bash
-python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b
+python torchtitan/experiments/rl/tasks/sum_digits/simple_grpo.py --module rl --config rl_grpo_qwen3_0_6b
 ```
 
 **NOTE:** If you downloaded your HF model to a different path than the one in step 4, specify it in your command with `--hf_assets_path=<path_to_model_checkpoint>`.
diff --git a/torchtitan/experiments/rl/config_registry.py b/torchtitan/experiments/rl/config_registry.py
@@ -20,7 +20,7 @@
     VLLMGenerator,
 )
 from torchtitan.experiments.rl.actors.trainer import PolicyTrainer
-from torchtitan.experiments.rl.simple_grpo_sum_digits import RLTrainer
+from torchtitan.experiments.rl.tasks.sum_digits.simple_grpo import RLTrainer
 from torchtitan.models.qwen3 import model_registry
 
 
diff --git a/torchtitan/experiments/rl/tasks/sum_digits/simple_grpo.py b/torchtitan/experiments/rl/tasks/sum_digits/simple_grpo.py
@@ -17,7 +17,7 @@
 The architecture mirrors monarch's grpo_actor.py but adapted for vLLM rollouts + TorchTitan training.
 
 Command to run:
-python3 torchtitan/experiments/rl/simple_grpo_sum_digits.py \
+python3 torchtitan.experiments.rl.tasks.sum_digits/simple_grpo.py \
     --module rl --config rl_grpo_qwen3_0_6b \
     --hf_assets_path=<path_to_model_checkpoint>
 """
@@ -40,7 +40,7 @@
 from torchtitan.experiments.rl.actors.generator import VLLMGenerator
 from torchtitan.experiments.rl.actors.grader import Grader
 from torchtitan.experiments.rl.actors.trainer import PolicyTrainer
-from torchtitan.experiments.rl.sum_digits import extract_answer, SumDigitsTask
+from torchtitan.experiments.rl.tasks.sum_digits.task import extract_answer, SumDigitsTask
 from torchtitan.experiments.rl.types import Episode
 from torchtitan.protocols.model_spec import ModelSpec
 
diff --git a/torchtitan/experiments/rl/tasks/sum_digits/task.py b/torchtitan/experiments/rl/tasks/sum_digits/task.py

Original file line number	Diff line number	Diff line change
`@@ -20,7 +20,7 @@`
`20`	`20`	`VLLMGenerator,`
`21`	`21`	`)`
`22`	`22`	`from torchtitan.experiments.rl.actors.trainer import PolicyTrainer`
`23`		`-from torchtitan.experiments.rl.simple_grpo_sum_digits import RLTrainer`
	`23`	`+from torchtitan.experiments.rl.tasks.sum_digits.simple_grpo import RLTrainer`
`24`	`24`	`from torchtitan.models.qwen3 import model_registry`
`25`	`25`
`26`	`26`