@@ -6,7 +6,6 @@ Benchmarking scripts for TensorRT-LLM serving performance tests with configurati
66
77- Run performance benchmarks across multiple model configurations
88- Manage test cases through YAML configuration files
9- - Generate comprehensive CSV reports with complete test case coverage
109- Support selective execution of specific test cases
1110
1211## Scripts Overview
@@ -16,123 +15,120 @@ Benchmarking scripts for TensorRT-LLM serving performance tests with configurati
1615
1716** Structure** :
1817``` yaml
19- test_cases :
20- - id : 1
21- model : " 70B-FP8"
22- gpus : 1
23- tp : 1
24- ep : 1
25- attn_backend : " TRTLLM"
26- moe_backend : " "
27- enable_attention_dp : false
28- free_gpu_mem_fraction : 0.9
29- max_batch_size : 512
30- isl : 1024
31- osl : 1024
32- max_num_tokens : 16384
18+ server_configs :
19+ - name : " r1_fp4_dep4"
20+ model_name : " deepseek_r1_0528_fp4"
21+ tp : 4
22+ ep : 4
23+ pp : 1
24+ attention_backend : " TRTLLM"
25+ moe_backend : " CUTLASS"
26+ moe_max_num_tokens : " "
27+ enable_attention_dp : true
28+ enable_chunked_prefill : false
29+ max_num_tokens : 2176
30+ disable_overlap_scheduler : false
31+ kv_cache_dtype : " fp8"
32+ enable_block_reuse : false
33+ free_gpu_memory_fraction : 0.8
34+ max_batch_size : 256
35+ enable_padding : true
36+ client_configs :
37+ - name : " con1_iter1_1024_1024"
38+ concurrency : 1
39+ iterations : 1
40+ isl : 1024
41+ osl : 1024
42+ random_range_ratio : 0.0
43+ - name : " con8_iter1_1024_1024"
44+ concurrency : 8
45+ iterations : 1
46+ isl : 1024
47+ osl : 1024
48+ random_range_ratio : 0.0
49+
50+ - name : " r1_fp4_tep4"
51+ model_name : " deepseek_r1_0528_fp4"
52+ tp : 4
53+ ep : 4
54+ pp : 1
55+ attention_backend : " TRTLLM"
56+ moe_backend : " CUTLASS"
3357 moe_max_num_tokens : " "
34- concurrency_iterations :
35- - [1, 10]
36- - [8, 10]
37- - [64, 5]
38- - [512, 2]
58+ enable_attention_dp : false
59+ enable_chunked_prefill : false
60+ max_num_tokens : 2176
61+ disable_overlap_scheduler : false
62+ kv_cache_dtype : " fp8"
63+ enable_block_reuse : false
64+ free_gpu_memory_fraction : 0.8
65+ max_batch_size : 256
66+ enable_padding : true
67+ client_configs :
68+ - name : " con1_iter1_1024_1024"
69+ concurrency : 1
70+ iterations : 1
71+ isl : 1024
72+ osl : 1024
73+ random_range_ratio : 0.0
74+ - name : " con8_iter1_1024_1024"
75+ concurrency : 8
76+ iterations : 1
77+ isl : 1024
78+ osl : 1024
79+ random_range_ratio : 0.0
3980` ` `
4081
41- **Configuration Fields**:
42- - ` id`: Unique identifier for the test case
43- - `model` : Model name (e.g., "70B-FP8", "Scout-FP4")
44- - `gpus` : Number of GPUs to use
45- - `tp` : Tensor parallelism size
46- - `ep` : Expert parallelism size
47- - `attn_backend` : Attention backend ("TRTLLM", "FLASHINFER")
48- - `moe_backend` : MoE backend ("DEEPGEMM", "TRTLLM", "CUTLASS", "")
49- - `enable_attention_dp` : Enable attention data parallelism
50- - `free_gpu_mem_fraction` : GPU memory fraction to reserve
51- - `max_batch_size` : Maximum batch size
52- - `isl` : Input sequence length
53- - `osl` : Output sequence length
54- - `max_num_tokens` : Maximum number of tokens
55- - `moe_max_num_tokens` : Maximum number of tokens for MoE
56- - `concurrency_iterations` : List of [concurrency, iteration] pairs
57-
58-
5982### 2. ` run_benchmark_serve.py` - Main Benchmark Runner
6083**Purpose**: Executes performance benchmarks based on YAML configuration files.
6184
6285**Usage**:
6386` ` ` bash
64- python run_benchmark_serve.py --output_folder <output_folder > --config_file <config_file> [--skip <skip_pattern >] [--select <select_pattern> ]
87+ python run_benchmark_serve.py --log_folder <log_folder > --config_file <config_file> [--select <select_pattern >] [--timeout 5400 ]
6588` ` `
6689
6790**Arguments**:
68- - `--output_folder ` : Directory to store benchmark results (required)
91+ - `--log_folder ` : Directory to store benchmark logs (required)
6992- `--config_file` : Path to YAML configuration file (required)
70- - `--skip ` : Skip pattern for specific test cases/concurrencies (optional, default: no skipping )
71- - `--select ` : Select pattern for specific test cases/concurrencies (optional, default: all test cases )
93+ - `--select ` : Select pattern for specific Server and Client Config. (optional, default: all test cases )
94+ - `--timeout ` : Timeout for server setup. (optional, default: 3600 seconds )
7295
7396**Examples**:
7497` ` ` bash
75- # Run all test cases
76- python run_benchmark_serve.py --output_folder results --config_file benchmark_config.yaml --skip default --select default
77-
78- # Skip specific test cases
79- python run_benchmark_serve.py --output_folder results --config_file benchmark_config.yaml --skip "2-1,4"
80-
81- # Run specific concurrencies from specific test cases
82- python run_benchmark_serve.py --output_folder results --config_file benchmark_config.yaml --select "1,2-3"
98+ # Select
99+ python run_benchmark_serve.py --log_folder ./results --config_file benchmark_config.yaml --select "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024"
83100
84101` ` `
85102
86- **Skip Pattern**:
87- Format : ` "test_case1,test_case2,test_case3"` or `"test_case1-concurrency1,test_case2-concurrency3"`
88- - `"2,4"` : Skip test cases 2 and 4 entirely
89- - `"2-1,4-2"` : Skip test case 2's 1st concurrency and test case 4's 2nd concurrency
90- - `"default"` or empty : No skipping (default)
91-
92- **Select Pattern**:
93- Format : ` "test_case1,test_case2,test_case3"` or `"test_case1-concurrency1,test_case2-concurrency3"`
94- - `"1,3,5"` : Run only test cases 1, 3, and 5 (all concurrencies)
95- - `"1-1,2-3"` : Run test case 1's 1st concurrency and test case 2's 3rd concurrency
96- - `"default"` or empty : Run all test cases (default)
97-
98-
99103# ## 3. `parse_benchmark_results.py` - Results Parser
100- **Purpose**: Parses benchmark log files and generates comprehensive CSV reports with all test cases from the configuration file.
101-
102- **Usage**:
103- ` ` ` bash
104- python parse_benchmark_results.py --input_folder <input_folder> --output_csv <output_csv> --config_file <config_file>
105- ` ` `
104+ **Purpose**: Print log's perf.
106105
107106**Arguments**:
108- - `input_folder` : Folder containing benchmark log files (serve.*.log) (required)
109- - `output_csv` : Output CSV filename for the results table (required)
110- - `config_file` : Path to benchmark_config.yaml file (required)
107+ - `--log_folder` : Directory to store benchmark logs (required)
111108
112- **Examples **:
109+ **Usage **:
113110` ` ` bash
114- python parse_benchmark_results.py --config_file ./benchmark_logs --output_csv results.csv --input_folder ./benchmark_config.yaml
115-
111+ python parse_benchmark_results.py --log_folder <log_folder>
116112` ` `
117113
114+
118115# ## 4. `benchmark-serve.sh` - SLURM Job Script
119116**Usage**:
120117` ` ` bash
121- sbatch benchmark-serve.sh [IMAGE] [bench_dir] [output_dir ] [select_pattern] [skip_pattern ]
118+ sbatch benchmark-serve.sh [IMAGE] [bench_dir] [log_folder ] [select_pattern]
122119` ` `
123120
124121**Parameters**:
125122- `IMAGE` : Docker image (default: tensorrt-llm-staging/release:main-x86_64)
126123- `bench_dir` : Directory containing config file and benchmark scripts (default: current directory)
127- - `output_dir ` : Directory containing output logs and csv. (default: current directory)
124+ - `log_folder ` : Directory containing output logs and csv. (default: current directory)
128125- `select_pattern` : Select pattern (default: default - all test cases)
129- - `skip_pattern` : Skip pattern (default: default - no skipping)
130126
131127**Examples**:
132128` ` ` bash
133129
134130bench_dir="/path/to/benchmark/scripts"
135- output_dir ="/path/to/store/output/files"
136- sbatch --reservation=RES--COM-3970 --qos=reservation -D ${output_dir } ${bench_dir}/benchmark-serve.sh urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release:main-x86_64 ${bench_dir} ${output_dir } "1-1" " "
131+ log_folder ="/path/to/store/output/files"
132+ sbatch --reservation=RES--COM-3970 --qos=reservation -D ${log_folder } ${bench_dir}/benchmark-serve.sh urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm-staging/release:main-x86_64 ${bench_dir} ${log_folder } "r1_fp4_dep4:con8_iter1_1024_1024,r1_fp4_tep4:con1_iter1_1024_1024 "
137133
138134` ` `
0 commit comments