MultiGPU efficient densenets are slow

I just want to benchmark the new implementation of efficient densenet with the code [here](https://gist.github.com/wandering007/d7f3db11285f382fefa39a9cda99ad0d). However, it seems that the used checkpointed modules are not broadcast to multiple GPUs as I got the following errors:

```
  File "/home/changmao/efficient_densenet_pytorch/models/densenet.py", line 16, in bn_function
    bottleneck_output = conv(relu(norm(concated_features)))
  File "/home/changmao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/changmao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 49, in forward
    self.training or not self.track_running_stats, self.momentum, self.eps)
  File "/home/changmao/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1194, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_batch_norm)
```
I think that the checkpoint feature provides weak support for `nn.DataParallel`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiGPU efficient densenets are slow #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MultiGPU efficient densenets are slow #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions