Sync output dir between DDP processes #6176

rubencart · 2021-02-24T11:18:20Z

rubencart
Feb 24, 2021

I'm looking for a good way to sync my output dir name (which contains a timestamp etc) between DDP processes. For now, I'm doing something like this:

    local_rank = os.environ.get('LOCAL_RANK', 0)

    if local_rank == 0:
        now = datetime.now(dateutil.tz.tzlocal())
        timestamp = now.strftime('%Y_%m_%d_%H_%M_%S')
        run_output_dir = os.path.join(cfg.output_dir,
                                      '%s_%s_%s_%s' % (cfg.dataset, cfg.cfg_name, timestamp, cfg.seed))
        os.environ['RUN_OUTPUT_DIR'] = run_output_dir
    else:
        run_output_dir = os.environ['RUN_OUTPUT_DIR']

Is this OK or does someone have a better solution?

I've tried to use torch.distributed.send and torch.distributed.recv, but these only work for tensors.

I'm also using the WandBLogger, so I have considered having all processes save output to wandb_logger.experiment.dir, but that doesn't work because the logger returns a dummy experiment in all but the main process (link).

justusschock · 2021-02-26T12:47:40Z

justusschock
Feb 26, 2021
Maintainer

The accelerators also have a broadcast method, which we use to send around stuff like that by byte-encoding it and then use torch.distributed.broadcast

4 replies

rubencart Mar 1, 2021
Author

Okay sounds good, where do I find these methods? Are you talking about DDPPlugin.broadcast? Would be nice if this was documented somewhere

rubencart Mar 19, 2021
Author

Any update?

justusschock Mar 19, 2021
Maintainer

Yes that's what I was talking about. We are still in the process of documenting all these changes after the big refactor we did.

littlespice Apr 13, 2021

Hi, @justusschock
Is there an ETA on DDPPlugin.broadcast?
I'm using torch.distributed.all_gather now, and it would be much more concise and useful when such update is made from PL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync output dir between DDP processes #6176

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Sync output dir between DDP processes #6176

Uh oh!

rubencart Feb 24, 2021

Replies: 1 comment · 4 replies

Uh oh!

justusschock Feb 26, 2021 Maintainer

Uh oh!

rubencart Mar 1, 2021 Author

Uh oh!

rubencart Mar 19, 2021 Author

Uh oh!

justusschock Mar 19, 2021 Maintainer

Uh oh!

Uh oh!

littlespice Apr 13, 2021

rubencart
Feb 24, 2021

Replies: 1 comment 4 replies

justusschock
Feb 26, 2021
Maintainer

rubencart Mar 1, 2021
Author

rubencart Mar 19, 2021
Author

justusschock Mar 19, 2021
Maintainer