You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/overview.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
## About TensorRT LLM
6
6
7
-
[TensorRT LLM](https://developer.nvidia.com/tensorrt) is NVIDIA's comprehensive open-source library for accelerating and optimizing inference performance of the latest large language models (LLMs) on NVIDIA GPUs.
7
+
[TensorRT LLM](https://developer.nvidia.com/tensorrt) is NVIDIA's comprehensive open-source library for accelerating and optimizing inference performance of the latest large language models (LLMs) on NVIDIA GPUs.
8
8
9
9
## Key Capabilities
10
10
@@ -40,7 +40,7 @@ TensorRT LLM strives to support the most popular models on **Day 0**.
40
40
### 🚀 **Advanced Optimization & Production Features**
41
41
-**[In-Flight Batching & Paged Attention](./features/paged-attention-ifb-scheduler.md)**: In-flight batching eliminates wait times by dynamically managing request execution, processing context and generation phases together for maximum GPU utilization and reduced latency.
42
42
-**[Multi-GPU Multi-Node Inference](./features/parallel-strategy.md)**: Seamless distributed inference with tensor, pipeline, and expert parallelism across multiple GPUs and nodes through the Model Definition API.
Copy file name to clipboardExpand all lines: examples/wide_ep/README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,13 +21,13 @@ Wide-EP solves these challenges through:
21
21
22
22
### Prerequisites
23
23
24
-
* GPU: GB200 NVL72, H20, or RTX 6000D.
24
+
* GPU: GB200 NVL72, GB300 NVL72, H20, or RTX 6000D.
25
25
* OS: Linux
26
26
* Drivers: CUDA Driver 575 or Later
27
27
* Docker with NVIDIA Container Toolkit installed
28
28
* Python3 and python3-pip (Optional, for accuracy evaluation only)
29
29
30
-
For GB200 NVL72, to make sure that Multi-Node NVLink (MNNVL) is correctly setup, check if the path `/dev/nvidia-caps-imex-channels` exists in the container. If the path doesn't exist, mount it when launching the Docker container.
30
+
For GB200/GB300 NVL72, to make sure that Multi-Node NVLink (MNNVL) is correctly setup, check if the path `/dev/nvidia-caps-imex-channels` exists in the container. If the path doesn't exist, mount it when launching the Docker container.
31
31
32
32
For more information on NVIDIA IMEX service for NVLink networks, refer to https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html.
33
33
@@ -108,16 +108,16 @@ If `never` is highlighted, enable Transparent HugePages by the following command
GPU memory is also on NUMA nodes on GB200 and the system can also use that. Bind memory to CPU nodes to avoid GPU memory being used as host memory.
113
+
GPU memory is also on NUMA nodes on GB200/GB300 NVL72 and the system can also use that. Bind memory to CPU nodes to avoid GPU memory being used as host memory.
114
114
```bash
115
115
numactl -m 0,1 <command>
116
116
```
117
117
118
118
### Shared Memory on EPLB
119
119
120
-
To achieve online load balancing, all expert weights are stored in shared host memory. Four ranks on the same GB200 node share the same expert weights to save memory.
120
+
To achieve online load balancing, all expert weights are stored in shared host memory. Four ranks on the same GB200/GB300 NVL72 node share the same expert weights to save memory.
121
121
122
122
There is one environment variable `TRTLLM_EPLB_SHM_NAME` to specify the base name of the shared memory. This environment variable may need to be specified if there are multiple instances on one node. If not, you can ignore it.
0 commit comments