Skip to content

Commit 22eb163

Browse files
authored
[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang (#8413)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
1 parent 46ee7ac commit 22eb163

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

cpp/tensorrt_llm/common/opUtils.cpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,12 +115,16 @@ std::shared_ptr<ncclComm_t> getComm(std::set<int> const& group)
115115
ncclCommDestroy(*comm);
116116
delete comm;
117117
});
118-
// Need static connection initialization for accurate KV cache size estimation
119118
#if defined(_WIN32)
119+
// Need static connection initialization for accurate KV cache size estimation
120120
if (getenv("NCCL_RUNTIME_CONNECT") == nullptr)
121121
_putenv_s("NCCL_RUNTIME_CONNECT", "0");
122+
// Disable graph register to avoid startup hangs
123+
if (getenv("NCCL_GRAPH_REGISTER") == nullptr)
124+
_putenv_s("NCCL_GRAPH_REGISTER", "0");
122125
#else
123126
setenv("NCCL_RUNTIME_CONNECT", "0", 0);
127+
setenv("NCCL_GRAPH_REGISTER", "0", 0);
124128
#endif // _WIN32
125129
NCCLCHECK_THROW(ncclCommInitRank(ncclComm.get(), group.size(), id, groupRank));
126130
commMap[group] = ncclComm;

0 commit comments

Comments
 (0)