Commit 50da5af
torchx - fix race condition issue that local_scheduler LogIterator that reads early (meta-pytorch#1099)
Summary:
torchx/cli/test:cmd_run_test - test_run_with_log (https://www.internalfb.com/intern/test/281475186013299?ref_report_id=0) regularly failed due to assertion on local_scheduler output is missing expected content. This is causing noise to oncall due to failed release test blocking torchx release. https://fburl.com/conveyor/a5u31rby
issue looked to be in the LogIterator abort early if content has not written: https://www.internalfb.com/code/fbsource/[922fd5827417][history]/fbcode/torchx/schedulers/local_scheduler.py?lines=1185-1189
The propose fixed is add a small delay before fp_log is setup.
Differential Revision: D80716088
Co-authored-by: Tony Kao <[email protected]>1 parent 29472a9 commit 50da5af
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1159 | 1159 | | |
1160 | 1160 | | |
1161 | 1161 | | |
| 1162 | + | |
1162 | 1163 | | |
1163 | 1164 | | |
1164 | 1165 | | |
| |||
0 commit comments