Skip to content

Commit d76ebd7

Browse files
author
yi.wu
committed
fix nccl dist train bug
1 parent 88fa9c2 commit d76ebd7

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

paddle/fluid/operators/gen_nccl_id_op.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ class GenNCCLIdOp : public framework::OperatorBase {
6767
client->AsyncSendVar(ep, dev_ctx, *scope, NCCL_ID_VARNAME);
6868
}
6969
client->Wait();
70+
for (auto& ep : endpoint_list) {
71+
client->AsyncSendBatchBarrier(ep);
72+
}
73+
client->Wait();
7074
VLOG(3) << "sending completed...";
7175
}
7276

0 commit comments

Comments
 (0)