Skip to content

ALF_2023.03.24

Choose a tag to compare

@Haichao-Zhang Haichao-Zhang released this 25 Mar 04:00
· 328 commits to pytorch since this release
997084a
Gracious quit of FastParallelEnvironment (#1456)

* Gracious quit of fast_parallel_environment

Previously, when there are a lot of parallel environments, ctrl-C often cannot terminate
the training successfully. Although "ctrl-|" can be used to terminate, it leaves the
system is a bad state which subsequent training often fails.

This PR make the ProcessEnvironment detect the KeyboardInterrupt and inform the main
process to quit by throwing an exception. This exception will be handled in python
code and thus calling FastParallelEnvironment.close() to graciouly terminate all
the processes.

* Showing progress for closing all processes