Commit 5c2c8df
authored
Update turbomind communication library (#3736)
* relax fp8 tp requirement
* initial NVLS support
* refactor allocation & registration
* more nvls collectives
* semaphore v2
* allgather v2
* add broadcast experiments
* minor
* allreduce
* refactor
* per group semaphore
* add NCCL 2.27 window registration
* bf16
* minor
* fix lint
* fix lint
* minor
* guard sm90 & cu12 features
* disable multimem for world size < 4
* fix semaphore init
* add test for broadcast
* fix semaphore init
* disable low latency kernels
* enable low latency kernels1 parent d78115d commit 5c2c8df
File tree
18 files changed
+1995
-1053
lines changed- src/turbomind/comm
- cuda_ipc
- nccl
18 files changed
+1995
-1053
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
0 commit comments