Skip to content

Commit 3a0721d

Browse files
committed
Move rl_perf_reproduce.py to tests/ dir and add functional test
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
1 parent 38bb037 commit 3a0721d

File tree

3 files changed

+119
-22
lines changed

3 files changed

+119
-22
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# RL Framework Integration Tests
2+
3+
This directory contains integration tests for TensorRT-LLM with Ray orchestrator, specifically designed to cover usage patterns from various RL (Reinforcement Learning) frameworks.
4+
5+
## Available Scripts
6+
7+
| Script | Description |
8+
|--------|-------------|
9+
| `run_rl_perf_reproduce.py` | Emulates RL workload performance with multiple AsyncLLM instances distributed across GPUs using Ray placement groups |
10+
11+
## Usage Examples
12+
13+
### RL Performance Reproduction
14+
15+
The `run_rl_perf_reproduce.py` script creates multiple TensorRT-LLM instances in parallel to simulate RL rollout workloads.
16+
17+
**TP=4 with 2 instances (8 GPUs total):**
18+
19+
```bash
20+
python run_rl_perf_reproduce.py \
21+
--model_dir /path/to/model_dir \
22+
--data_path /path/to/prompts.json \
23+
--num_instances 2 \
24+
--tp_size 4 \
25+
--top_p 1 \
26+
--logprobs 1 \
27+
--max_batch_size 1024 \
28+
--enable_cuda_graph_padding
29+
```
30+
31+
**TP=1 with 8 instances (8 GPUs total):**
32+
33+
```bash
34+
python run_rl_perf_reproduce.py \
35+
--model_dir /path/to/model_dir \
36+
--data_path /path/to/prompts.json \
37+
--num_instances 8 \
38+
--tp_size 1 \
39+
--top_p 1 \
40+
--logprobs 1 \
41+
--max_batch_size 384 \
42+
--enable_cuda_graph_padding
43+
```
44+
45+
## Data Format
46+
47+
The `--data_path` should point to a JSON file containing a list of prompts, where each prompt is a list of token IDs:
48+
49+
```json
50+
[
51+
[1, 2345, 6789, ...],
52+
[1, 3456, 7890, ...],
53+
...
54+
]
55+
```
56+
57+
## Notes
58+
59+
- RL Perf reproduction scripts support single-node execution only (max 8 GPUs)

examples/ray_orchestrator/rl_perf_repro.py renamed to tests/integration/defs/ray_orchestrator/RL/run_rl_perf_reproduce.py

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,3 @@
1-
##############################################################################
2-
# OVERVIEW:
3-
# This script is to emulate the performance of running TensorRT-LLM with Ray
4-
# orchestrator for Reinforcement Learning (RL) workloads. It creates multiple
5-
# AsyncLLM instances distributed across GPUs using Ray placement groups,
6-
# enabling parallel generation for RL training scenarios.
7-
#
8-
# EXAMPLE USAGE:
9-
# python rl_perf_repro.py \
10-
# --model_dir /path/to/model_dir \
11-
# --data_path /path/to/prompts.json \
12-
# --num_instances 2 \
13-
# --tp_size 4 \
14-
# --max_batch_size 1024 \
15-
# --enable_cuda_graph_padding \
16-
# --enable_block_reuse \
17-
# --logprobs 1
18-
#
19-
# NOTE:
20-
# - This script supports single-node execution only (max 8 GPUs)
21-
##############################################################################
22-
231
import argparse
242
import asyncio
253
import json
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import json
2+
import tempfile
3+
from pathlib import Path
4+
5+
import pytest
6+
from defs.common import venv_check_call
7+
from defs.conftest import integration_path, llm_models_root
8+
from transformers import AutoTokenizer
9+
10+
11+
@pytest.mark.skip_less_device(4)
12+
@pytest.mark.parametrize(
13+
"tp_size, num_instances", [(2, 2), (1, 4)], ids=["tp2_instances2", "tp1_instances4"]
14+
)
15+
def test_rl_perf_reproduce(llm_venv, tp_size, num_instances):
16+
script_path = (
17+
integration_path() / "defs" / "ray_orchestrator" / "RL" / "run_rl_perf_reproduce.py"
18+
)
19+
math_txt_path = integration_path() / "test_input_files" / "math.txt"
20+
model_dir = f"{llm_models_root()}/Qwen2-7B-Instruct"
21+
22+
if tp_size == 2:
23+
max_batch_size = 512
24+
else:
25+
max_batch_size = 256
26+
27+
with tempfile.TemporaryDirectory() as tmpdir:
28+
prompt_text = "The president of the United States is"
29+
30+
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
31+
token_ids = tokenizer.encode(prompt_text, add_special_tokens=False)
32+
33+
# Replicate to create batch of 1024 prompts
34+
batch_size = 1024
35+
prompts = [token_ids for _ in range(batch_size)]
36+
37+
data_path = Path(tmpdir) / "prompts.json"
38+
with open(data_path, "w") as f:
39+
json.dump(prompts, f)
40+
41+
venv_check_call(
42+
llm_venv,
43+
[
44+
str(script_path),
45+
"--model_dir",
46+
model_dir,
47+
"--data_path",
48+
str(data_path),
49+
"--num_instances",
50+
str(num_instances),
51+
"--tp_size",
52+
str(tp_size),
53+
"--logprobs",
54+
"1",
55+
"--max_batch_size",
56+
str(max_batch_size),
57+
"--enable_block_reuse",
58+
"--enable_cuda_graph_padding",
59+
],
60+
)

0 commit comments

Comments
 (0)