Sort out distributed computation

Distributed computation is not working well, and we should switch to `DistributedDataParallel` for better efficiency

- [ ] Samplers should work on independent data subsets
- [ ] Checkpointing needs to be done properly

Solve multiple backwards issues:

- [ ] Backward is called within trainers (using the `no_sync` context might lead to problems if the involved parameters are not the same...)
- [ ] Micro-batching (using the `no_sync` context)

See https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

Depends on https://github.com/experimaestro/experimaestro-python/issues/32 since object duplication does not work with the current config/object layout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort out distributed computation #32

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sort out distributed computation #32

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions