Skip to content

Commit b5116a7

Browse files
Jenny LiuJenny Liu
authored andcommitted
[TRTLLM-10271][test] Add DGX-Spark QA functional test cases for single node
This commit includes: - Add QA functional test cases for LLM and VLM models - Add core test list for DGX-Spark top models - Update test lists based on reviewer feedback Signed-off-by: Jenny Liu <[email protected]>
1 parent 56e779d commit b5116a7

File tree

115 files changed

+553
-936
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

115 files changed

+553
-936
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.<
1010
[![python](https://img.shields.io/badge/python-3.10-green)](https://www.python.org/downloads/release/python-31012/)
1111
[![cuda](https://img.shields.io/badge/cuda-13.0.0-green)](https://developer.nvidia.com/cuda-downloads)
1212
[![torch](https://img.shields.io/badge/torch-2.9.0-green)](https://pytorch.org)
13-
[![version](https://img.shields.io/badge/release-1.2.0rc8-green)](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/version.py)
13+
[![version](https://img.shields.io/badge/release-1.2.0rc7-green)](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/version.py)
1414
[![license](https://img.shields.io/badge/license-Apache%202-blue)](https://github.com/NVIDIA/TensorRT-LLM/blob/main/LICENSE)
1515

1616
[Architecture](https://nvidia.github.io/TensorRT-LLM/developer-guide/overview.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Performance](https://nvidia.github.io/TensorRT-LLM/developer-guide/perf-overview.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](https://nvidia.github.io/TensorRT-LLM/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Roadmap](https://github.com/NVIDIA/TensorRT-LLM/issues?q=is%3Aissue%20state%3Aopen%20label%3Aroadmap)
Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,19 @@
11
# Feature Combination Matrix
22

3-
| Feature | Overlap Scheduler | CUDA Graph | Tensor Parallelism | Pipeline Parallelism | Expert Parallelism | Helix Parallelism | Attention Data Parallelism | Disaggregated Serving | Chunked Prefill | MTP | EAGLE-3(One Model Engine) | EAGLE-3(Two Model Engine) | Torch Sampler | TLLM C++ Sampler | KV Cache Reuse | Slide Window Attention | Logits Post Processor | Guided Decoding | LoRA |
4-
| -------------------------- | ----------------- | ---------- | ------------------ | -------------------- | ------------------ | ----------------- | -------------------------- | --------------------- | --------------- | -------- | ------------------------- | ------------------------- | ------------- | ---------------- | -------------- | ---------------------- | --------------------- | --------------- | -------- |
5-
| Overlap Scheduler | --- | | | | | | | | | | | | | | | | | | |
6-
| CUDA Graph | Yes | --- | | | | | | | | | | | | | | | | | |
7-
| Tensor Parallelism | Yes | Yes | --- | | | | | | | | | | | | | | | | |
8-
| Pipeline Parallelism | Yes | Yes | Yes | --- | | | | | | | | | | | | | | | |
9-
| Expert Parallelism | Yes | Yes | Yes | Yes | --- | | | | | | | | | | | | | | |
10-
| Helix Parallelism | Untested | Yes | Yes | Yes | Yes | --- | | | | | | | | | | | | | |
11-
| Attention Data Parallelism | Yes | Yes | Yes | Yes | Yes | Known issues | --- | | | | | | | | | | | | |
12-
| Disaggregated Serving | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | | | | | | | |
13-
| Chunked Prefill | Yes | Yes | Yes | Untested | Yes | Yes | Yes | Yes | --- | | | | | | | | | | |
14-
| MTP | Yes | Yes | Yes | No | Yes | No | Yes | Yes | Yes | --- | | | | | | | | | |
15-
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | No | Yes | No | Yes | Yes | Yes | No | --- | | | | | | | | |
16-
| EAGLE-3(Two Model Engine) | Yes | Yes | Yes | No | Yes | No | Yes | Yes | Yes | No | No | --- | | | | | | | |
17-
| Torch Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | | |
18-
| TLLM C++ Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No | No | --- | | | | | |
19-
| KV Cache Reuse | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | |
20-
| Slide Window Attention | Yes | Yes | Yes | Yes | Yes | Untested | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | |
21-
| Logits Post Processor | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No | No | Yes | Yes | Yes | Yes | --- | | |
22-
| Guided Decoding | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | |
23-
| LoRA | Yes | No | Yes | Yes | Untested | Untested | Untested | Untested | Yes | Untested | Untested | Untested | Yes | Yes | Yes | Yes | Yes | Untested | --- |
3+
| Feature | Overlap Scheduler | CUDA Graph | Attention Data Parallelism | Disaggregated Serving | Chunked Prefill | MTP | EAGLE-3(One Model Engine) | EAGLE-3(Two Model Engine) | Torch Sampler | TLLM C++ Sampler | KV Cache Reuse | Slide Window Attention | Logits Post Processor | Guided Decoding | LoRA |
4+
| -------------------------- | ----------------- | ---------- | -------------------------- | --------------------- | --------------- | -------- | ------------------------- | ------------------------- | ------------- | ---------------- | -------------- | ---------------------- | --------------------- | --------------- | ---- |
5+
| Overlap Scheduler | --- | | | | | | | | | | | | | | |
6+
| CUDA Graph | Yes | --- | | | | | | | | | | | | | |
7+
| Attention Data Parallelism | Yes | Yes | --- | | | | | | | | | | | | |
8+
| Disaggregated Serving | Yes | Yes | Yes | --- | | | | | | | | | | | |
9+
| Chunked Prefill | Yes | Yes | Yes | Yes | --- | | | | | | | | | | |
10+
| MTP | Yes | Yes | Yes | Yes | Yes | --- | | | | | | | | | |
11+
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | Yes | Yes | No | --- | | | | | | | | |
12+
| EAGLE-3(Two Model Engine) | Yes | Yes | Yes | Yes | Yes | No | No | --- | | | | | | | |
13+
| Torch Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | | |
14+
| TLLM C++ Sampler | Yes | Yes | Yes | Yes | Yes | No | No | No | No | --- | | | | | |
15+
| KV Cache Reuse | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | |
16+
| Slide Window Attention | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | |
17+
| Logits Post Processor | Yes | Yes | Yes | No | Yes | No | No | No | Yes | Yes | Yes | Yes | --- | | |
18+
| Guided Decoding | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | |
19+
| LoRA | Yes | No | Untested | Untested | Untested | Untested | Untested | Untested | Yes | Yes | Yes | Yes | Yes | Untested | --- |

docs/source/models/supported-models.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,10 @@ Note: Support for other models may vary. Features marked "N/A" are not applicabl
4040
| `Qwen3MoeForCausalLM` | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | N/A | Yes | Yes |
4141
| `Qwen3NextForCausalLM` | Yes | Yes | No | Untested | Yes | No | No | No | Yes | Yes | No | No | Untested | Untested |
4242
| `Llama4ForConditionalGeneration` | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Untested | N/A | Yes | Yes |
43-
| `GptOssForCausalLM` | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes [^3] | Yes | Yes | Yes | N/A | Yes | Yes |
43+
| `GptOssForCausalLM` | Yes | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | No | N/A | Yes | Yes |
4444

4545
[^1]: Chunked Prefill for MLA can only be enabled on SM100/SM103.
4646
[^2]: KV cache reuse for MLA can only be enabled on SM90/SM100/SM103 and in BF16/FP8 KV cache dtype.
47-
[^3]: Overlap scheduler isn't supported when using EAGLE-3(Two Model Engine) for GPT-OSS.
4847

4948

5049
# Multimodal Feature Support Matrix (PyTorch Backend)

examples/constraints.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
tensorrt_llm==1.2.0rc8
1+
tensorrt_llm==1.2.0rc7
22
evaluate~=0.4.1
33
rouge_score~=0.1.2

examples/models/core/mistral_large_3/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,7 @@ mpirun -n 1 --allow-run-as-root --oversubscribe python3 examples/llm-api/quickst
1919
--max_tokens 100 \
2020
--checkpoint_format mistral \
2121
--model_type mistral_large_3 \
22-
--moe_backend TRTLLM \
23-
--image_format pil
22+
--moe_backend TRTLLM
2423
```
2524

2625
## LLM-only run

jenkins/L0_Test.groovy

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -808,7 +808,7 @@ def getPytestBaseCommandLine(
808808
portEnvVars,
809809
pytestUtil,
810810
"pytest",
811-
"-vv",
811+
"-v",
812812
testFilter[(DETAILED_LOG)] ? "-s" : "",
813813
"--timeout-method=thread",
814814
"--apply-test-list-correction",

security_scanning/docs/poetry.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

security_scanning/examples/auto_deploy/poetry.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)