You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add MacOSX DS_Store to gitignore.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Update imports.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Update click group.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Updates to CLI.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Rename.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add name.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Renamed real dataset command.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Change to group.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add docstring.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Remove pass_obj.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Fix context subscription.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Updates to output.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Updates to remove stdout.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add deprecation flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Code clean up.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Fix generator call.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Update prepare_dataset in docs.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Update examples.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Update testing for trtllm-bench dataset.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Remove trtllm-bench dataset from run_ex.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Add missed __init__.py
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Re-add check for dataset subcommand.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Fix execution of trtllm-bench dataset.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
@@ -401,7 +402,7 @@ Average request latency (ms): 181540.5739
401
402
402
403
## Exploring more ISL/OSL combinations
403
404
404
-
To benchmark TensorRT LLM on DeepSeek models with more ISL/OSL combinations, you can use `prepare_dataset.py` to generate the dataset and use similar commands mentioned in the previous section. TensorRT LLM is working on enhancements that can make the benchmark process smoother.
405
+
To benchmark TensorRT LLM on DeepSeek models with more ISL/OSL combinations, you can use the `trtllm-bench dataset` subcommand to generate the dataset and use similar commands mentioned in the previous section. TensorRT LLM is working on enhancements that can make the benchmark process smoother.
405
406
### WIP: Enable more features by default
406
407
407
408
Currently, there are some features that need to be enabled through a user-defined file `extra-llm-api-config.yml`, such as CUDA graph, overlap scheduler and attention dp. We're working on to enable those features by default, so that users can get good out-of-the-box performance on DeepSeek models.
@@ -414,7 +415,7 @@ For more details on `max_batch_size` and `max_num_tokens`, refer to [Tuning Max
414
415
415
416
### MLA chunked context
416
417
417
-
MLA currently supports the chunked context feature on both Hopper and Blackwell GPUs. You can use `--enable_chunked_context` to enable it. This feature is primarily designed to reduce TPOT (Time Per Output Token). The default chunk size is set to `max_num_tokens`. If you want to achieve a lower TPOT, you can appropriately reduce the chunk size. However, please note that this will also decrease overall throughput. Therefore, a trade-off needs to be considered.
418
+
MLA currently supports the chunked context feature on both Hopper and Blackwell GPUs. You can use `--enable_chunked_context` to enable it. This feature is primarily designed to reduce TPOT (Time Per Output Token). The default chunk size is set to `max_num_tokens`. If you want to achieve a lower TPOT, you can appropriately reduce the chunk size. However, please note that this will also decrease overall throughput. Therefore, a trade-off needs to be considered.
418
419
419
420
For more details on `max_num_tokens`, refer to [Tuning Max Batch Size and Max Num Tokens](../performance/performance-tuning-guide/tuning-max-batch-size-and-max-num-tokens.md).
@@ -231,13 +231,13 @@ The PyTorch workflow supports benchmarking with LoRA (Low-Rank Adaptation) adapt
231
231
232
232
**Preparing LoRA Dataset**
233
233
234
-
Use `prepare_dataset.py` with LoRA-specific options to generate requests with LoRA metadata:
234
+
Use `trtllm-bench dataset` with LoRA-specific options to generate requests with LoRA metadata:
235
235
236
236
```shell
237
-
python3 benchmarks/cpp/prepare_dataset.py \
238
-
--stdout \
237
+
trtllm-bench \
238
+
--model /path/to/tokenizer \
239
+
dataset \
239
240
--rand-task-id 0 1 \
240
-
--tokenizer /path/to/tokenizer \
241
241
--lora-dir /path/to/loras \
242
242
token-norm-dist \
243
243
--num-requests 100 \
@@ -308,17 +308,18 @@ Each subdirectory should contain the LoRA adapter files for that specific task.
308
308
To benchmark multi-modal models with PyTorch workflow, you can follow the similar approach as above.
309
309
310
310
First, prepare the dataset:
311
-
```python
312
-
python ./benchmarks/cpp/prepare_dataset.py \
313
-
--tokenizer Qwen/Qwen2-VL-2B-Instruct \
314
-
--stdout \
311
+
```bash
312
+
trtllm-bench \
313
+
--model Qwen/Qwen2-VL-2B-Instruct \
315
314
dataset \
315
+
--output mm_data.jsonl
316
+
real-dataset
316
317
--dataset-name lmms-lab/MMMU \
317
318
--dataset-split test \
318
319
--dataset-image-key image \
319
320
--dataset-prompt-key question \
320
321
--num-requests 10 \
321
-
--output-len-dist 128,5> mm_data.jsonl
322
+
--output-len-dist 128,5
322
323
```
323
324
It will download the media files to `/tmp` directory and prepare the dataset with their paths. Note that the `prompt` fields are texts and not tokenized ids. This is due to the fact that
324
325
the `prompt` and the media (image/video) are processed by a preprocessor for multimodal files.
0 commit comments