Skip to content

Commit d91c85b

Browse files
fduwjjpytorchmergebot
authored andcommitted
[c10d][fr] Split cuda and non-cuda fr logic into two cpp file (pytorch#154929)
During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases. Pull Request resolved: pytorch#154929 Approved by: https://github.com/kwen2501
1 parent 13044b2 commit d91c85b

File tree

4 files changed

+674
-663
lines changed

4 files changed

+674
-663
lines changed

build_variables.bzl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -497,6 +497,7 @@ libtorch_distributed_base_sources = [
497497
"torch/csrc/distributed/c10d/Backoff.cpp",
498498
"torch/csrc/distributed/c10d/DMAConnectivity.cpp",
499499
"torch/csrc/distributed/c10d/control_collectives/StoreCollectives.cpp",
500+
"torch/csrc/distributed/c10d/FlightRecorder.cpp",
500501
"torch/csrc/distributed/c10d/FileStore.cpp",
501502
"torch/csrc/distributed/c10d/Functional.cpp",
502503
"torch/csrc/distributed/c10d/GlooDeviceFactory.cpp",
@@ -696,7 +697,7 @@ libtorch_cuda_distributed_base_sources = [
696697
libtorch_cuda_distributed_extra_sources = [
697698
"torch/csrc/distributed/c10d/CudaDMAConnectivity.cpp",
698699
"torch/csrc/distributed/c10d/NCCLUtils.cpp",
699-
"torch/csrc/distributed/c10d/FlightRecorder.cpp",
700+
"torch/csrc/distributed/c10d/FlightRecorderCuda.cpp",
700701
"torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp",
701702
"torch/csrc/distributed/c10d/ProcessGroupGlooCuda.cpp",
702703
"torch/csrc/distributed/c10d/ProcessGroupUCC.cpp",

0 commit comments

Comments
 (0)