Skip to content

Commit d1933d4

Browse files
authored
Add error handling for executor deserialization in dgxcloud scheduler (#166)
Signed-off-by: Hemil Desai <[email protected]>
1 parent 51556e9 commit d1933d4

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

nemo_run/run/torchx_backend/schedulers/dgxcloud.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,10 @@ def _get_job_dirs() -> dict[str, dict[str, str]]:
240240

241241
serializer = ZlibJSONSerializer()
242242
for app in data.values():
243-
app["executor"] = fdl.build(serializer.deserialize(app["executor"]))
243+
try:
244+
app["executor"] = fdl.build(serializer.deserialize(app["executor"]))
245+
except Exception as e:
246+
log.warning(f"Failed to deserialize executor: {e}")
247+
continue
244248

245249
return data

0 commit comments

Comments
 (0)