Skip to content

Commit 0788635

Browse files
authored
[TRTLLM-9762] [doc] Update documents for GB300 NVL72 (NVIDIA#9987)
Signed-off-by: Kaiyu Xie <[email protected]>
1 parent b57650f commit 0788635

File tree

8 files changed

+15
-14
lines changed

8 files changed

+15
-14
lines changed

docs/source/legacy/reference/support-matrix.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ In addition, older architectures can have limitations for newer software release
133133
* - GPU Model Architectures
134134
-
135135
- [NVIDIA GB200 NVL72](https://www.nvidia.com/en-us/data-center/gb200-nvl72/)
136+
- [NVIDIA GB300 NVL72](https://www.nvidia.com/en-us/data-center/gb300-nvl72/)
136137
- [NVIDIA Blackwell Architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)
137138
- [NVIDIA Grace Hopper Superchip](https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/)
138139
- [NVIDIA Hopper Architecture](https://www.nvidia.com/en-us/data-center/technologies/hopper-architecture/)

docs/source/overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
## About TensorRT LLM
66

7-
[TensorRT LLM](https://developer.nvidia.com/tensorrt) is NVIDIA's comprehensive open-source library for accelerating and optimizing inference performance of the latest large language models (LLMs) on NVIDIA GPUs.
7+
[TensorRT LLM](https://developer.nvidia.com/tensorrt) is NVIDIA's comprehensive open-source library for accelerating and optimizing inference performance of the latest large language models (LLMs) on NVIDIA GPUs.
88

99
## Key Capabilities
1010

@@ -40,7 +40,7 @@ TensorRT LLM strives to support the most popular models on **Day 0**.
4040
### 🚀 **Advanced Optimization & Production Features**
4141
- **[In-Flight Batching & Paged Attention](./features/paged-attention-ifb-scheduler.md)**: In-flight batching eliminates wait times by dynamically managing request execution, processing context and generation phases together for maximum GPU utilization and reduced latency.
4242
- **[Multi-GPU Multi-Node Inference](./features/parallel-strategy.md)**: Seamless distributed inference with tensor, pipeline, and expert parallelism across multiple GPUs and nodes through the Model Definition API.
43-
- **[Advanced Quantization](./features/quantization.md)**:
43+
- **[Advanced Quantization](./features/quantization.md)**:
4444
- **FP4 Quantization**: Native support on NVIDIA B200 GPUs with optimized FP4 kernels
4545
- **FP8 Quantization**: Automatic conversion on NVIDIA H100 GPUs leveraging Hopper architecture
4646
- **[Speculative Decoding](./features/speculative-decoding.md)**: Multiple algorithms including EAGLE, MTP and NGram
@@ -54,7 +54,7 @@ TensorRT LLM strives to support the most popular models on **Day 0**.
5454
### 🔧 **Latest GPU Architecture Support**
5555

5656
TensorRT LLM supports the full spectrum of NVIDIA GPU architectures:
57-
- **NVIDIA Blackwell**: B200, GB200, RTX Pro 6000 SE with FP4 optimization
57+
- **NVIDIA Blackwell**: B200, GB200, B300, GB300, and RTX Pro 6000 SE with FP4 optimization
5858
- **NVIDIA Hopper**: H100, H200,GH200 with FP8 acceleration
5959
- **NVIDIA Ada Lovelace**: L40/L40S, RTX 40 series with FP8 acceleration
6060
- **NVIDIA Ampere**: A100, RTX 30 series for production workloads

examples/disaggregated/slurm/benchmark/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ slurm:
3131
job_name: "<job_name>"
3232
extra_args: "" # Additional SLURM arguments (e.g., "--gres=gpu:4 --exclude=node1")
3333
set_segment: true # Optional: whether to set the segment for the job
34-
numa_bind: true # Enable NUMA binding for GB200 NVL72
34+
numa_bind: true # Enable NUMA binding for GB200/GB300 NVL72
3535
```
3636
3737
### 2. Benchmark Configuration

examples/disaggregated/slurm/benchmark/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ slurm:
77
job_name: "<job_name>"
88
extra_args: "" # Cluster specific arguments, e.g. "--gres=gpu:4 --exclude=node1,node2"
99
set_segment: true # Optional: whether to set the segment for the job
10-
numa_bind: true # Only enable for GB200 NVL72
10+
numa_bind: true # Only enable for GB200/GB300 NVL72
1111

1212
# Benchmark Mode
1313
benchmark:

examples/disaggregated/slurm/benchmark/start_worker.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ done
2727

2828
if [ "${numa_bind}" = "true" ]; then
2929
numa_bind_cmd="numactl -m 0,1"
30-
echo "numactl -m 0,1 - Only allocate memory from nodes on GB200"
30+
echo "numactl -m 0,1 - Only allocate memory from nodes on GB200/GB300 NVL72"
3131
else
3232
numa_bind_cmd=""
33-
echo "Not binding memory. If on GB200, use \"numactl -m 0,1\" to only allocate memory from nodes."
33+
echo "Not binding memory. If on GB200/GB300 NVL72, use \"numactl -m 0,1\" to only allocate memory from nodes."
3434
fi
3535

3636
if [ "${benchmark_mode}" = "gen_only" ]; then

examples/wide_ep/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,13 @@ Wide-EP solves these challenges through:
2121

2222
### Prerequisites
2323

24-
* GPU: GB200 NVL72, H20, or RTX 6000D.
24+
* GPU: GB200 NVL72, GB300 NVL72, H20, or RTX 6000D.
2525
* OS: Linux
2626
* Drivers: CUDA Driver 575 or Later
2727
* Docker with NVIDIA Container Toolkit installed
2828
* Python3 and python3-pip (Optional, for accuracy evaluation only)
2929

30-
For GB200 NVL72, to make sure that Multi-Node NVLink (MNNVL) is correctly setup, check if the path `/dev/nvidia-caps-imex-channels` exists in the container. If the path doesn't exist, mount it when launching the Docker container.
30+
For GB200/GB300 NVL72, to make sure that Multi-Node NVLink (MNNVL) is correctly setup, check if the path `/dev/nvidia-caps-imex-channels` exists in the container. If the path doesn't exist, mount it when launching the Docker container.
3131

3232
For more information on NVIDIA IMEX service for NVLink networks, refer to https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html.
3333

@@ -108,16 +108,16 @@ If `never` is highlighted, enable Transparent HugePages by the following command
108108
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
109109
```
110110

111-
### GB200 NUMA binding
111+
### GB200/GB300 NVL72 NUMA binding
112112

113-
GPU memory is also on NUMA nodes on GB200 and the system can also use that. Bind memory to CPU nodes to avoid GPU memory being used as host memory.
113+
GPU memory is also on NUMA nodes on GB200/GB300 NVL72 and the system can also use that. Bind memory to CPU nodes to avoid GPU memory being used as host memory.
114114
```bash
115115
numactl -m 0,1 <command>
116116
```
117117

118118
### Shared Memory on EPLB
119119

120-
To achieve online load balancing, all expert weights are stored in shared host memory. Four ranks on the same GB200 node share the same expert weights to save memory.
120+
To achieve online load balancing, all expert weights are stored in shared host memory. Four ranks on the same GB200/GB300 NVL72 node share the same expert weights to save memory.
121121

122122
There is one environment variable `TRTLLM_EPLB_SHM_NAME` to specify the base name of the shared memory. This environment variable may need to be specified if there are multiple instances on one node. If not, you can ignore it.
123123

examples/wide_ep/slurm_scripts/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Before running benchmarks, ensure you have:
5151
1. **SLURM Cluster Access**: Valid account and partition allocation
5252
2. **Container Environment**:
5353
- NVIDIA Container Toolkit configured
54-
- Required device mappings (e.g., `/dev/nvidia-caps-imex-channels` for GB200, `/dev/gdrdrv` for GDRCopy)
54+
- Required device mappings (e.g., `/dev/nvidia-caps-imex-channels` for GB200/GB300 NVL72, `/dev/gdrdrv` for GDRCopy)
5555
3. **Model Files**: Checkpoint files accessible from all cluster nodes
5656
4. **Configuration**: Updated `config.yaml` with your cluster-specific settings
5757

examples/wide_ep/slurm_scripts/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ slurm:
66
job_time: "02:00:00"
77
job_name: "<job_name>"
88
extra_args: "" # Cluster specific arguments, e.g. "--gres=gpu:4 --exclude=node1,node2"
9-
numa_bind: true # Only enable for GB200 NVL72
9+
numa_bind: true # Only enable for GB200/GB300 NVL72
1010

1111
# Benchmark Mode
1212
benchmark:

0 commit comments

Comments
 (0)