Skip to content

Commit 5c2c8df

Browse files
authored
Update turbomind communication library (#3736)
* relax fp8 tp requirement * initial NVLS support * refactor allocation & registration * more nvls collectives * semaphore v2 * allgather v2 * add broadcast experiments * minor * allreduce * refactor * per group semaphore * add NCCL 2.27 window registration * bf16 * minor * fix lint * fix lint * minor * guard sm90 & cu12 features * disable multimem for world size < 4 * fix semaphore init * add test for broadcast * fix semaphore init * disable low latency kernels * enable low latency kernels
1 parent d78115d commit 5c2c8df

18 files changed

+1995
-1053
lines changed

src/turbomind/comm/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,6 @@ if (BUILD_MULTI_GPU)
2323
if (BUILD_TEST)
2424
add_executable(test_comm test_comm.cu)
2525
target_link_libraries(test_comm PRIVATE device_comm host_comm core pthread nvtx_utils)
26-
target_compile_options(test_comm PRIVATE -O3 -march=native -mtune=native)
26+
target_compile_options(test_comm PRIVATE -march=native -mtune=native)
2727
endif ()
2828
endif ()

src/turbomind/comm/cuda_ipc/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ add_library(cuda_ipc_comm STATIC
77
allreduce.cu
88
allgather.cu
99
fused_allreduce.cu
10-
fused_allreduce_ex.cu)
10+
fused_allreduce_ex.cu
11+
broadcast.cu)
1112

1213
target_link_libraries(cuda_ipc_comm PRIVATE
1314
rms_norm

0 commit comments

Comments
 (0)