-
Notifications
You must be signed in to change notification settings - Fork 755
Flattenrl directory: remove vllm_compat, consolidate unified
#2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
4842e48
Flatten rl/ directory: remove vllm_compat/, move unified/ contents to…
wwwjn ab6b14f
Update README and remove rl_grpo_qwen3_0_6b_tp1 config
wwwjn 378fc67
rename types
wwwjn 62e844b
rename to task
wwwjn a160e09
add tasks folder
wwwjn e73c4f9
remove tasks folder
wwwjn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,6 @@ | |
| "autoparallel.deepseek_v3", | ||
| "autoparallel.local_map_deepseek_v3", | ||
| "ft.llama3", | ||
| "rl.unified", | ||
| "rl", | ||
| ] | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,69 @@ | ||
| # Deterministic RL Training with vLLM | ||
| # RL Training with TorchTitan and vLLM | ||
|
|
||
| This package provides two approaches for integrating TorchTitan models with vLLM: | ||
| This directory contains code for RL training using TorchTitan model definitions with vLLM inference engine for fast rollout generation. | ||
|
|
||
| 1. vllm_compat/ - vLLM-Compatible approach | ||
| - Separate model definition matching vLLM's weight format | ||
| - Support batch-invariant and bit-wise identity between train and inference | ||
| - Custom backward passes for attention gradient computation | ||
| ## Overview | ||
| The integration consists of the following components: | ||
|
|
||
| 2. unified/ - Unified approach | ||
| - Uses canonical TorchTitan model definition for inference directly | ||
| - Replaces attention with vLLM Compatible attention for inference | ||
| 1. **vLLM Model Wrapper** (`models/vllm_wrapper.py`): Adapts TorchTitan models for vLLM's inference engine | ||
| 2. **RL Training Loop** (`simple_grpo_sum_digits.py`): GRPO-based RL training with Monarch actors | ||
| 3. **Inference Script** (`inference_example.py`): Standalone inference using the vLLM engine | ||
|
|
||
|
|
||
| ## Quick Start | ||
| ### Prerequisites | ||
|
|
||
| 0. Create and activate environment with uv: | ||
| ```bash | ||
| uv venv --python 3.12 titan-rl | ||
| source titan-rl/bin/activate | ||
| ``` | ||
|
|
||
| 1. Install Monarch: | ||
| ```bash | ||
| uv pip install torchmonarch | ||
| ``` | ||
|
|
||
|
|
||
| 2. Install PyTorch nightly for torchtitan, and pre-built vllm wheels (based on PyTorch nightly version). | ||
| ```bash | ||
| # Install vllm with nightly torch | ||
| uv pip install torch vllm xformers --pre \ | ||
| --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \ | ||
| --index-strategy unsafe-best-match | ||
| ``` | ||
|
|
||
| **NOTE:** The pre-built vLLM wheels are only compatible with CUDA 12.8, though they should work with most older CUDA versions. Alternatively, you can install the corresponding vLLM pre-built wheels directly from https://download.pytorch.org/whl/nightly/cu128, for example: `uv pip install vllm-1.0.0.dev20260219+cu130-<suffix>.whl`. Ensure the build version number (e.g., `dev20260219`) matches your PyTorch nightly installation. | ||
|
|
||
|
|
||
| 3. Install TorchTitan in editable mode: | ||
| ```bash | ||
| uv pip install -e . | ||
| ``` | ||
|
|
||
| 4. Download `Qwen/Qwen3-0.6B` (or `Qwen/Qwen3-1.7B`) checkpoint from HuggingFace to `torchtitan/experiments/rl/example_checkpoint` folder. | ||
| ```bash | ||
| python scripts/download_hf_assets.py --repo_id Qwen/Qwen3-0.6B --local_dir torchtitan/experiments/rl/example_checkpoint --all --hf_token=... | ||
|
|
||
| python scripts/download_hf_assets.py --repo_id Qwen/Qwen3-1.7B --local_dir torchtitan/experiments/rl/example_checkpoint --all --hf_token=... | ||
| ``` | ||
|
|
||
| 5. Run inference with torchtitan model definition: | ||
| ```bash | ||
| torchrun --nproc_per_node=2 torchtitan/experiments/rl/inference_example.py | ||
| ``` | ||
|
|
||
| **NOTE:**: Set `--nproc_per_node` to the world size, which should match the `tensor_parallel_degree` in the `VLLMGenerator` config. | ||
|
|
||
| 6. Run simple GRPO RL loop to learn sum digits task | ||
| ```bash | ||
| python torchtitan/experiments/rl/simple_grpo_sum_digits.py --module rl --config rl_grpo_qwen3_0_6b | ||
| ``` | ||
|
|
||
| **NOTE:** If you downloaded your HF model to a different path than the one in step 4, specify it in your command with `--hf_assets_path=<path_to_model_checkpoint>`. | ||
|
|
||
| We use a unified model definition from torchtitan for the trainer and generator, ensuring bitwise-identical models to address a class of subtle correctness bugs in RL for LLMs. | ||
|
|
||
|
|
||
|
|
||
| **Current status:** Batch invariance is only supported for single-GPU configurations (TP=1) for both the trainer and generator. When tensor parallelism is enabled (TP > 1), batch-invariant mode is not yet supported. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.