Skip to content

Commit 9143ee8

Browse files
anshulvermafacebook-github-bot
authored andcommitted
warn users when checkpoint save fails (#879)
Summary: Pull Request resolved: #879 ATTS Reviewed By: diego-urgell Differential Revision: D61165902 fbshipit-source-id: 895a2814ff4c12f2afb0f61ee0794a52740aec8e
1 parent f7e53c1 commit 9143ee8

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

torchtnt/framework/callbacks/dcp_saver.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,10 @@ def _save(
242242
storage_writer=storage_writer,
243243
planner=planner,
244244
)
245-
except AttributeError:
245+
except AttributeError as ex:
246+
logger.warning(
247+
f"Unable to save checkpoint (will retry saving using deprecated API). Error: {ex}"
248+
)
246249
dcp.save_state_dict(
247250
state_dict={"app_state": MultiStateful(app_state)},
248251
process_group=self._process_group,

0 commit comments

Comments
 (0)