Commit ff4dddf
[c10d] Turn off default non-blocking API mode to work around hang in NCCL 2.26 (pytorch#154085)
[c10d] Turn off default non-blocking API mode to work around hang in NCCL 2.26 (pytorch#154055)
Work around issues like pytorch#153960, pytorch#152623
NCCL 2.26 seems to introduce random hang in non-blocking API mode. This PR opts out of non-blocking mode to work around it. Previously torch turned it on by default in eager init (i.e. `device_id` passed) to avoid init overhead.
Pull Request resolved: pytorch#154055
Approved by: https://github.com/atalman
(cherry picked from commit 87fc5af)
Co-authored-by: Ke Wen <[email protected]>1 parent e8f8a35 commit ff4dddf
1 file changed
+6
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1064 | 1064 | | |
1065 | 1065 | | |
1066 | 1066 | | |
1067 | | - | |
1068 | | - | |
1069 | | - | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
1070 | 1073 | | |
1071 | 1074 | | |
1072 | 1075 | | |
| |||
0 commit comments