Skip to content

Commit 9ddb48c

Browse files
[Docs] Fix NCCL typo
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
1 parent c027541 commit 9ddb48c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/design/p2p_nccl_connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Currently, only symmetric TP (Tensor Parallelism) methods are supported for KVCa
5252

5353
![image2](https://github.com/user-attachments/assets/837e61d6-365e-4cbf-8640-6dd7ab295b36)
5454

55-
Each NCCL group occupies a certain amount of GPU memory buffer for communication, the size of which is primarily influenced by the `NCCL_MAX_NCHANNELS` environment variable. When `NCCL_MAX_NCHANNELS=16`, an NCCL group typically occupies 100MB, while when `NCCL_MAX_NCHANNELS=8`, it usually takes up 52MB. For large-scale xPyD configurations—such as DeepSeek's 96P144D—this implementation is currently not feasible. Moving forward, we are considering using RDMA for point-to-point communication and are also keeping an eye on UCCL.
55+
Each NCCL group occupies a certain amount of GPU memory buffer for communication, the size of which is primarily influenced by the `NCCL_MAX_NCHANNELS` environment variable. When `NCCL_MAX_NCHANNELS=16`, an NCCL group typically occupies 100MB, while when `NCCL_MAX_NCHANNELS=8`, it usually takes up 52MB. For large-scale xPyD configurations—such as DeepSeek's 96P144D—this implementation is currently not feasible. Moving forward, we are considering using RDMA for point-to-point communication and are also keeping an eye on NCCL.
5656

5757
### GPU Memory Buffer and Tensor Memory Pool
5858

0 commit comments

Comments
 (0)