Skip to content

KeyboardInterrupt raises an exception which results in a zero exit codeΒ #19916

@amarckal

Description

@amarckal

Bug description

During training whenever there is a keyboard interrupt the fit loop raises a SIGTERMException

if trainer.received_sigterm:
raise SIGTERMException

which results in a 0 exit code. Other scripts relying on the exit code of the training script pick this up as if the training script has exited normally.

The issue comes from here:

class SIGTERMException(SystemExit):
"""Exception used when a :class:`signal.SIGTERM` is sent to a process.
This exception is raised by the loops at specific points. It can be used to write custom logic in the
:meth:`lightning.pytorch.callbacks.callback.Callback.on_exception` method.
For example, you could use the :class:`lightning.pytorch.callbacks.fault_tolerance.OnExceptionCheckpoint` callback
that saves a checkpoint for you when this exception is raised.
"""

raising a SystemExit in python without specifying the exit code, has the code set to None which gets converted to 0. The fix would be to have:

 class SIGTERMException(SystemExit): 
     """Exception used when a :class:`signal.SIGTERM` is sent to a process. 
  
     This exception is raised by the loops at specific points. It can be used to write custom logic in the 
     :meth:`lightning.pytorch.callbacks.callback.Callback.on_exception` method. 
  
     For example, you could use the :class:`lightning.pytorch.callbacks.fault_tolerance.OnExceptionCheckpoint` callback 
     that saves a checkpoint for you when this exception is raised. 
  
     """
     code = 128 + 15  # see https://tldp.org/LDP/abs/html/exitcodes.html

What version are you seeing the problem on?

v2.0, v2.1, v2.2, master

How to reproduce the bug

In a python console run

import pytorch_lightning as pl
raise pl.utilities.exceptions.SIGTERMException

then do

echo $?

or
Start a training and then send a keyboard interrupt signal to it, and run echo $? to see the exit code.

cc @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions