You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #858
We have two remaining tests that are still failing, with the following error message:
```
[Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=BROADCAST, NumelIn=2, NumelOut=2, Timeout(ms)=60000) ran for 60033 milliseconds before timing out.
```
Let's attempt to increase the collective timeout for those tests. There's no guarantee this will work, but it's worth trying. Otherwise we may consider deleting the failing tests to avoid flakyness.
Reviewed By: galrotem
Differential Revision: D59342738
fbshipit-source-id: 220f1f359eb0f98e5175e93badc7e998ae00db64
0 commit comments