- I was thinking if we should extend the client API by doing the signal handling part internally, i.e. the user can then just check something like
cluster_utils.should_exit_for_resume() in their training code, and react accordingly. (This does not need to be part of this PR.)
Originally posted by @mseitzer in #84 (review)