Skip to content

Better error message when "if name == main" guard is needed #16900

@awaelchli

Description

@awaelchli

Description & Motivation

When you run with DDP-spawn, and don't add the if __name__ == "__main__" guard at the entry point, the user gets this error:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The error "looks" strange and the instructions how to fix it are a bit unclear. The user may be confused what "freeze_support" means.

Pitch

In the MultiProcessingLauncher, wrap the call with a try catch and re-raise a better error message, with Lightning-specific instructions of possible solutions (perhaps suggesting ddp instead of spawn).

Note: The issue will still independently exist if you don't run with ddp-spawn but have DataLoader(num_workers>0).

Alternatives

Keep as is. The error message and solutions are all documented.
Recently, we switched the default to ddp when devices>1 in Trainer/Fabric, which should mitigate the issue.

Additional context

No response

cc @Borda @justusschock @awaelchli @carmocca

Metadata

Metadata

Assignees

Labels

distributedGeneric distributed-related topicfeatureIs an improvement or enhancementstrategy: ddpDistributedDataParallel

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions